Research Methods in the Study of L2 Writing Processes (Research Methods in Applied Linguistics, 5) 9027214107, 9789027214102

This volume brings together the perspectives of new and established scholars who have connected with the broad fields of

126 88 17MB

English Pages 393 [395] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Research Methods in the Study of L2 Writing Processes (Research Methods in Applied Linguistics, 5)
 9027214107, 9789027214102

  • Commentary
  • No bookmarks or hyperlinks.
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Research Methods in the Study of L2 Writing Processes

Research Methods in Applied Linguistics (RMAL) issn 2590-096X The Research Methods in Applied Linguistics (RMAL) series publishes authoritative general guides and in-depth explorations of central research methodology concerns in the entire field of Applied Linguistics. The hallmark of the series is the contribution to stimulating and advancing professional methodological debates in the domain. Books published in the series (both authored and edited volumes) will be key resources for applied linguists (including established researchers and newcomers to the field) and an invaluable source for research methodology courses. Main directions for the volumes in the series include (but are not limited to): Comprehensive introductions to research methods in Applied Linguistics (authoritative, introductions to domain-non specific methodologies); In-depth explorations of central methodological considerations and developments in specific areas of Applied Linguistics (authoritative treatments of domain-specific methodologies); Critical analyses that develop, expand, or challenge existing and/or novel methodological frameworks; In-depth reflections on central considerations in employing specific methodologies and/or addressing specific questions and problems in Applied Linguistics research; Authoritative accounts that foster improved understandings of the behind the scenes, inside story of the research process in Applied Linguistics. For an overview of all books published in this series, please see benjamins.com/catalog/rmal

Editor Rosa M. Manchón University of Murcia

Editorial Board David Britain

Juan Manuel Hernández-Campoy

Diane Pecorari

University of Bern

University of Murcia

City University of Hong Kong

Gloria Corpas Pastor

Ute Knoch

Luke Plonsky

University of Malaga

University of Melbourne

Northern Arizona University

Marta González-Lloret

Anthony J. Liddicoat

Li Wei

University of Hawai’i

University of Warwick

University College London

Laura Gurzynski-Weiss

Brian Paltridge

Indiana University Bloomington

University of Sydney

Volume 5 Research Methods in the Study of L2 Writing Processes Edited by Rosa M. Manchón and Julio Roca de Larios

Research Methods in the Study of L2 Writing Processes Edited by

Rosa M. Manchón Julio Roca de Larios University of Murcia

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

doi 10.1075/rmal.5 Cataloging-in-Publication Data available from Library of Congress: lccn 2023028961 (print) / 2023028962 (e-book) isbn 978 90 272 1410 2 (Hb) isbn 978 90 272 1409 6 (Pb) isbn 978 90 272 4948 7 (e-book)

© 2023 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company · https://benjamins.com

Table of contents Foreword Alister Cumming Introduction chapter 1. The study of L2 writing processes: Lines and methods of inquiry Rosa M. Manchón & Julio Roca de Larios part i. Investigating writing processes chapter 2. Writing process studies. Struggling with complexities: Looking back, moving forward Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen chapter 3. Overview of methodological procedures in research on written corrective feedback processing Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo part ii. Critical reflections on the affordances of data collection instruments and procedures chapter 4. Survey data: Questionnaires, interviews, and process logs Sofia Hort & Olena Vasylets

1

6

34

60

84

chapter 5. Verbally mediated data: Concurrent/retrospective verbalizations via think-aloud protocols and stimulated recalls 104 Ronald P. Leow & Melissa A. Bowles chapter 6. Verbally mediated data: Written verbalizations Wataru Suzuki, Masako Ishikawa & Neomy Storch chapter 7. Direct observation of writing activity: Screen capture technologies Jeremy Séror & Guillaume Gentil

123

141

chapter 8. Using keystroke logging for studying L2 writing processes Victoria Johansson, Åsa Wengelin & Roger Johansson

161

chapter 9. Using eye tracking to study digital writing processes Victoria Johansson, Roger Johansson & Åsa Wengelin

183

vi

Research Methods in the Study of L2 Writing Processes

part iii. Critical reflections on the implementation of data collection instruments and procedures and on data analysis procedures chapter 10. Exploring the generation, development, and integration of argumentative goals in L1 and L2 composition processes: Methodological considerations Julio Roca de Larios

202

chapter 11. Affordances and limitations when using Inputlog to study young learners’ pausing behavior in L2 writing 224 Aitor Garcés, Raquel Criado & Rosa M. Manchón chapter 12. Investigating cognitive processes during writing tests: Methodological considerations when triangulating data from eye tracking, keystroke logging, and stimulated recalls Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

247

chapter 13. Methodology and multimodality: Implications for research on digital composition with emergent bilingual students Mark B. Pacheco & Blaine E. Smith

269

chapter 14. Setting up a coding scheme for the analysis of the dynamics of children’s engagement with WCF: Triangulating data sources Yvette Coyle

292

chapter 15. Methodological considerations in the analysis of synchronous and asynchronous written corrective feedback: The affordances of online technologies Natsuko Shintani & Scott Aubrey

315

chapter 16. Analyzing L2 writers’ processing of written corrective feedback via written languaging and think-aloud protocols: Methodological considerations 337 Sophie McBride & Rosa M. Manchón Afterword Charlene Polio

364

Index

383

Foreword Alister Cumming

University of Toronto

This book will be an essential guide for novice and experienced researchers, senior students, and educators investigating the processes of writing in additional languages. The chapters provide authoritative advice, drawn from the authors’ own expert research experiences, on the design, logistics, principles, challenges, as well as problems of conducting empirical studies on the processes of writing across languages. The focus is on studies of composing and of responding to feedback on written drafts with particular attention to methods of process-tracing through data such as concurrent or stimulated verbal reports, interviews, diaries, digital recording, visual screen capture, eye tracking, keystroke logging, questionnaires, and/or ethnographic observation. Conditions addressed include individual, group, and assessment tasks, pedagogical interventions, emergent or pre-ordained designs, natural or laboratory settings, with adult or child populations, and in digital, pen and paper, and multi-modal media of composing. Theoretical perspectives span a range of foundations from cognitive psychology, second language acquisition, social semiotics, socio cultural theory, and complex dynamic systems. In short, the scope is impressively comprehensive and wideranging. Moreover, the viewpoints are admirably international, including authors from various countries in Europe, North America, and Asia, although the volume has been conceived and directed by members of the productive research group at the University of Murcia in Spain. A unique feature of the book is the authors’ efforts to make their chapters useful for conducting research. There are informative syntheses of trends in recent research on writing and feedback processes, but the main emphasis of most chapters is to give insiders’ advice on recommending steps for research methods, problems to address or avoid, and relevant procedures and current technologies for certain research purposes and questions. References are provided to the authors’ publications about their studies so readers can follow up on details or background. The upshot is a kind of catalogue of key choices to be made in designing research studies on processes of writing in second languages, particularly for gathering, sampling, analyzing, interpreting, and reporting unique tasks and valid https://doi.org/10.1075/rmal.5.foreword © 2023 John Benjamins Publishing Company

2

Alister Cumming

data. Common research dilemmas are helpfully outlined as tips for handling, reducing, coding, synthesizing, storing, and presenting results from the excessive quantities of complex data that arise from detailed methods of process tracing, selecting appropriate units and granularity of analyses, distinguishing microfrom macro-processes (e.g., spelling or punctuation vs. planning or revising), making choices about the language(s) in which data are elicited, documenting changes in behaviors or texts over short- from long-term durations, and weighing the benefits of designing writing tasks to address particular audiences such as classmates or family members in addition to teachers, researchers, or oneself. All authors acknowledge the incompleteness of process-tracing data, whether as representations of people’s cognition or decision making, or as samples of language and literacy performance amid the countless additional situations of writing and interaction that cannot be accounted for in any one research study or in people’s lives. Many chapters are concerned with processes of learning, improvement, or development while writing in the short as well as long term, and as processes of revision (e.g., after feedback), self-control (e.g., goal setting), or verbalization (e.g., languaging). Important ethical concerns are discussed too, highlighting issues like informed consent, privacy, and unforeseen consequences. All of this advice would have been so helpful back in the 1980s for people like me who were conducting the initial process-tracing studies of writing in second languages. We had precedents, inspiration, and theoretical guidance, all right, from skilled researchers such as Carl Bereiter and Marlene Scardamalia, Dick Hayes and Linda Flower, or Ian Pringle and Aviva Freedman, who had investigated writing by students whose home language was English. But several decades ago, nobody seemed to know how to appreciate that – or systematically try to document and explain how – people could write, think in, learn, or use multiple, diverse languages. There was great enthusiasm to try to discover insights for pedagogy by examining, not just the final written texts that students produced in a second language (as had been the conventional practice for applied linguists), but also to probe their very processes of planning, organizing, reasoning, and revising that led to the successful and unsuccessful qualities of those compositions! Alas, the enthusiastic burst of research on writing processes in the 1980s and 1990s gave rise to educational practices as well as research that bifurcated counter-productively between teaching writing in second languages with orientations either to (a) conventional aspects of written texts or genres or (b) the new insights of composing processes. The present book demonstrates that the state of knowledge, research, and practice about writing processes has progressed markedly in recent decades. Simple, antagonistic dichotomies (such as product vs. process) do not appear in the present volume. Likewise, no authors suggest that there might be a singular,

Foreword

uniform, or ideal process for writing, using multiple languages, or researching processes of writing or learning languages. Instead, authors in the present volume readily acknowledge the multifaceted, pluralistic, and variable nature of writing, languages, interaction, feedback, and research. Constructs are framed in multiple (rather than dualistic) dimensions such as, for example, student writers’ intentions, awareness, and realizations; descriptive, hypothesis-testing, or explanatory research purposes; or learners’ noticing, understanding, or efforts to process feedback on their writing. Suggestions are consistently for multiple, complementary methods of data collection suited to specific research questions and contexts followed by confirmatory triangulation of their results rather than advocating as superior any one method of tracing writing processes. Acknowledgements are made that languages differ from one another, not only in their being first or second or third in the sequence of their acquisition during people’s lifespans, but also in their linguistic features, social contexts, societal status, and uses in education, workplaces, and communities around the world. The present book is a valuable consolidation, as well as demonstration of the depth and range, of current ideas about empirical inquiry into writing processes and language learning, based firmly on the experiences, insights, and wisdom of people who are now actively producing this research.

3

Introduction

chapter 1

The study of L2 writing processes Lines and methods of inquiry Rosa M. Manchón & Julio Roca de Larios University of Murcia

This introductory chapter serves two main purposes. One is to contextualize the book within the larger professional discussion, which we do through a look-back approach in order to provide a synthetic review of the main lines of research in the study of L2 writing processes and the main research instruments employed. The second aim of the chapter is to introduce readers to the aims, structure, and contents of the book.

Introduction This volume brings together the perspectives of new and established scholars who have connected with the broad fields of first language (L1) and second language (L2) writing to discuss critically key methodological developments in the study of L2 writing processes. To this end, the chapters in the book illustrate how progress has been made in developing research methods and empirical understandings of writing processes, in introducing methodological innovations, and in pointing to future methodological directions helpful in providing valid answers to empirical questions in the domain. As will be more evident in the sections that follow, the contribution of this collective project results from the wide range of dimensions of L2 writing processes in focus, together with the equally broad range of research instruments and methodological approaches covered. The chapter is organized as follows. We start by contextualizing the book within the larger professional discussion, which we do (i) by articulating more precisely the range of focal phenomena subsumed under the macro label of “writing processes”; and (ii) by providing an updated, synthetic overview of the main lines of inquiry in the study of this domain in L2 writing and of the main research methods (especially in terms of research instruments) employed in empirical research. Against this background, we elaborate on the aims of the book and introduce readers to its structure and contents.

https://doi.org/10.1075/rmal.5.01man © 2023 John Benjamins Publishing Company

Chapter 1. The study of L2 writing processes

Writing processes in L2 writing research The study of writing processes has been at the heart of research interests in the fields of both L1 and L2 writing. As this book illustrates, research concerns in this domain have notably expanded over the years and so have (i) the understandings of the phenomena globally subsumed under the macro category of “writing processes”; (ii) the theoretical frameworks informing research; and (iii) the research methods used to observe and analyse the processing dimension of composing. When the phenomenon in focus is writing in an additional language (L2), the study of writing processes has evolved to encompass two global phenomena, namely, the processing dimension of writing itself, on the one hand, and the processing of the feedback provided on one’s own writing, on the other (Figure 1). From the writing angle, writing processes correspond to those phenomena that Séror (2013) characterized as the “hidden sequences of events at the heart of L2 writers’ text production” (p. 1). As shown in Figure 1, these hidden sequences of events have gradually been made to encompass both “writing processes” and “text production processes”. Following Manchón (2021), writing processes refer primarily to the set of cognitive operations and online behaviors that characterize composing (before and after receiving feedback) in pen and paper and/or digital environments (the phenomenon most people would associate with writing processes). Text production processes, in turn, refer to the hidden sequence of events underlying the dynamism of text production as it unfolds in naturally occurring, space- and time-distributed literacy practices. As noted below, the study of these different writing phenomena has been informed by a range of theoretical perspectives and, accordingly, the methodological approaches adopted have also varied. The inquiry into writing processes also includes the study of feedback processing (Figure 1). This more recent research direction in L2 writing studies is centrally concerned with the (primarily cognitive) actions performed by L2 writers when engaging with and processing the feedback provided on their own writing. Research agendas on feedback processing have moved along three main directions: Researchers have inspected the very nature of feedback processing (with special attention to the depth of such processing activity. See reviews in Manchón & Vasylets, 2019; Roca de Larios & Coyle, 2021), the potential effects of such processing activity on the quality of the texts written, and, more recently, on the strategies used by learners to produce those texts after processing the feedback received on them. Additionally, two recent model-building attempts are worth mentioning, namely, Leow’s (2020) framework based on his model of SLA processes (Leow, 2015), and Bitchener’s (2019) model of WCF processing based on

7

8

Rosa M. Manchón & Julio Roca de Larios

Gass’s (1997) cognitive model of input processing. These enlarged understandings of and approaches to the study of writing processes as applied to writing itself and to feedback processes are present in the current book, which we consider one of the innovative features of our collective project.

Figure 1. Overview of phenomena corresponding to “writing processes”

The next two sections expand on the theoretical approaches and main research directions of research in the domain, and synthesize key methodological considerations in this body of work, avoiding undue reiteration of the full methodological discussions provided in the rest of chapters in the book.

Theoretical approaches and main research directions in the study of L2 writing processes Table 1 outlines the main research directions in the study of the processing dimension of writing and feedback use, as well as the theoretical perspectives informing each of them. Table 1. An overview of research on L2 writing processes

WRITING

Writing processes

Theoretical frameworks

Main lines of research

Representative studies

– Models of L1 writing – Cognitive theories of SLA & theoretical positions on L2 writing as

Nature and temporal distribution of composition processes and on-line behaviors, as moderated by

– Barkaoui (2019); Chukharev-Hudilainen et al. (2019); Leijten et al. (2019); Révész et al. (2017, 2019); Xu & Qi (2017). See also review in Roca et al. (2016) – Temporal distribution: Gánem-Gutiérrez & Gilmore

Chapter 1. The study of L2 writing processes

Table 1. (continued) Theoretical frameworks a site for language learning

Text – Models of L1 production writing processes – Activity Theory & Sociocultural theories of L2 learning & literacy development – Social Semiotics and Translanguaging

Main lines of research

Representative studies

learner-related and task-related variables.

(2018); Roca de Larios et al. (2008) – Task-related variables: Barkaoui (2016); Michel et al. (2020); Révesz et al. (2016); Zalbidea (2020)

Comparisons of writing processes and on-line behaviors in L1 and L2

Beare & Bourdages (2007); Breuer (2019); Chenoweth & Hayes (2001); Leijten et al. (2019); Manchón & Roca de Larios (2007); Roca de Larios et al. (2001); Stevenson et al. (2006); Tillema (2012); Tiryakioglu et al. (2019)

Problemsolving nature of writing and training and use of problemsolving strategies

Chan (2017); De Silva & Graham (2015), Ferez (2005); Knospe et al. (2019); Leijten et al. (2019); López-Serrano et al. (2019, 2020); Manchón et al. (2009); Murphy & Roca de Larios (2010); Plakans (2008); Roca de Larios et al. (2021); Van Weijen et al. (2009); Wang & Wen (2002)

Socially situated and timedistributed nature of writing processes and strategies

Green (2013); Lei (2008); Hamel, Séror, & Dion (2015); Séror (2013); Smith et al. (2017); Yasuda (2005)

9

10

Rosa M. Manchón & Julio Roca de Larios

Table 1. (continued)

FEEDBACK PROCESSING

Theoretical frameworks

Main lines of research

Representative studies

Cognitive theories of SLA & corresponding theoretical positions on L2 writing as a site for language learning

Feedback processing in individual and collaborative (pen & paper and digital) writing conditions

Bowles & Gastañaga (2022); Caras (2019); Cerezo et al. (2019); Criado et al. (2022; Coyle et al. (2018); Coyle & Roca de Larios (2014, 2020); Cruz et al. (2022); García Hernández et al. (2017); Han & Hyland (2015); Hanaoka (2007); Hanaoka & Izumi (2012); Kim & Bowles (2019); Leow et al. (2022); Manchón et al. (2020); Park & Kim (2019); Sachs & Polio (2007); Suzuki (2017); Uscinski (2017)

Effect of/ correlation feedback processing _written products

Writing processes The main areas of scholarly interest in the study of L2 writing processes and text production processes have been framed in different theoretical paradigms and have employed different methodologies in their empirical endeavours (as outlined in Table 1).

Cognitively-oriented studies of writing processes Cognitively-oriented studies of writing processes have traditionally constituted the main strand in this domain. Interestingly, this area of scholarly work has recently made its way in a substantive manner into cognitive studies of second language acquisition (SLA), which is rather remarkable given the practical absence of L2 writing in key SLA disciplinary discussions until recently (see Manchón, 2020a, 2023; Manchón & Williams, 2016, for further elaboration). Three macro research domains can be established, all of them theoretically informed by well-established models of L1 writing (especially Bereiter & Scardamalia, 1987; Flower & Hayes, 1981; Galbraith, 2009; Hayes, 1996, 2012; Kellogg, 1996, 2001), as well as cognitive accounts of SLA resulting theoretical positions on L2 writing as a site for language learning (see reviews in Manchón, 2020a, 2023; Manchón & Vasylets, 2019; Manchón & Williams, 2016; Williams, 2012). These research domains are the following:

Chapter 1. The study of L2 writing processes

1. Nature and temporal distribution of writing processes and online behaviors. An important line of research (with a solid tradition in L1 and L2 writing studies) is concerned with the nature and temporal distribution of composition processes and online behaviors (both while writing and in test-taking conditions – as in Révesz et al., 2017), often as moderated by (some or a combination of ) writer internal variables (i.e., proficiency, writing ability, keyboard skills, ease of retrieval, or working memory) and/or external, task-related variables (primarily the role of task type or task complexity). In addition to the body of work reviewed by Roca de Larios et al. (2016), representative examples of more recent studies on the nature of writing processes and online writing behaviors in individual writing conditions include those of Barkaoui (2019), Chan (2017), ChukharevHudilainen et al. (2019), Leijten et al. (2019), Révész et al. (2019), and Xu and Qi (2017). The study of collaborative writing has gradually gained momentum in L2 writing studies and, as result, an incipient line of research has begun to inspect writing processes in collaborative writing conditions and in digitally-mediated interactive writing (Michel et al., 2021, for a comprehensive review). Other studies have focused on the temporality of composition processes, with Gánem-Gutiérrez and Gilmore (2018) and Roca de Larios et al. (2008) being representative examples. Interestingly, although inspecting the temporal distribution of writing processes in different writing conditions (pen-and-paper writing without access to external sources [Roca et al., 2008] and in digital writing with access to sources [Gánem-Gutiérrez & Gilmore, 2018]) and through the use of different methodological approaches (think-aloud protocols [Roca et al., 2008] versus screen capture techniques, eye tracking, and stimulated retrospective recalls [Gánem-Gutiérrez & Gilmore, 2018]), these studies reached very similar findings, especially concerning the preponderance of formulation processes in composing. A growing concern in this body of cognitively-oriented work is the study of the manner in which writing processes and online writing behaviors may be mediated by task-related variables, including task type (such as independent vs. integrated writing tasks, e.g. Barkaoui, 2016; Michel et al., 2020; Révesz et al., 2017) or task complexity (Revesz et al., 2016; Zalbidea, 2020). Another area of research that is gaining momentum is concerned with the effects of learnerindividual differences on writing processes and writing behaviors (Révesz et al., 2023; Torres, 2023, for representative examples). 2. Writing processes in L1 and L2 writing. Considerable empirical efforts have also been devoted to comparing writing processes and online behaviors (e.g., pausing, fluency) across the languages in the L2 writer’s linguistic repertoire. Older and more recent studies in this strand include Beare & Bourdages (2007), Breuer (2019), Chenoweth & Hayes (2001), Leijten et al. (2019), Manchón & Roca de Larios (2007), Roca de Larios et al. (2001), Stevenson et al. (2006), Tillema

11

12

Rosa M. Manchón & Julio Roca de Larios

(2012) or Tiryakioglu et al. (2019). Within this strand, it is relevant to note the shifts observed in the type of writing in focus (gradually moving from print-based to screen-based writing) and the methodology employed (adding screen capturing, eye tracking, and keystroke logging tools to traditional uses of stimulated recall and think aloud protocols) in earlier and more recent studies comparing writing processes crosslinguistically. 3. L2 writing problem-solving behavior and strategies. The analysis of the problem-solving nature of writing and the strategies used by writers in their problem-solving behavior has traditionally been a central concern in cognitivelyoriented studies of writing processes (see Manchón, 2018; Manchón et al., 2007 for reviews). Accordingly, empirical progress has been and continues to be especially relevant: A. Starting from the most recent developments, an emerging SLA-oriented line of research is concerned with underscoring the language learning affordances of the problem-solving activity that characterizes L2 writing in both individual (e.g., López-Serrano et al., 2019, 2020) and collaborative writing conditions (Roca de Larios et al., 2021; Stiefenhöfer & Michel, 2020. For a recent review, see Michel et al., 2021). We anticipate this strand will be made much more central in future SLA-oriented research agendas on the processing dimension of writing, given that it connects directly with central empirical questions on how and why L2 writing may be a site for language learning (a fast-expanding domain at the intersection between L2 writing and SLA studies): The theoretically propounded and empirically attested problem-solving nature of writing (and by now fully attested associated intense linguistic processing while composing) is thought to induce focused attention to languagerelated concerns during text composition, which, it is also suggested, may create favourable conditions for language learning through opportunities for monitoring, restructuring, elaboration, consolidation and/or refinement of the language used (see Leow & Suh, 2021; Leow & Manchón, 2021; Manchón, 2020b, 2023; Manchón & Leow, 2020; Michel et al., 2021, for further elaboration) B. Writing in an additional language is a bilingual phenomenon and, hence, the study of the manner in which L2 writers strategically resort to their whole linguistic repertoire while composing has attracted considerable empirical attention in process-oriented L2 writing research. (e.g. Ferez, 2005; Manchón et al., 2009; Murphy & Roca de Larios, 2010; Van Weijen et al., 2009; Wang & Wen, 2002) C. A relevant strand in the study of writing strategies has inspected the strategies employed while making use of external (online) resources in the writing

Chapter 1. The study of L2 writing processes

process (e.g. Chan, 2017; Knospe et al., 2019; Leijten et al., 2019), at times in comparison with writing without access to such resources (e.g. Plakans, 2008)

Socio-cultural, ethnographically-oriented studies of writing processes and text production processes An important theoretical and methodological shift in the study of L2 writing processes resulted from the so-called social turn in Applied Linguistics, which, as Manchón (2021) notes in her extensive treatment of this shift, in its application to L2 writing studies led to an interest in the socially-situated nature of literacy practices, as well as in the time- and space-distributed nature of writing events. This has led to an expansion of research agendas on the processing dimension of writing (present in the current book. See especially Chapter 13) with key items that differ from those in cognitively-oriented studies. Theoretical approaches informing this body of work include models of L1 writing, Activity Theory, sociocultural theories of L2 learning and literacy development, social semiotics and theoretical accounts of translanguaging. Central in this expanded research agenda is the socially-situated nature of writing events and the space- and time-extended conditions of text production processes. Manchón (2021) notes in her analysis of research on text production processes that these theoretical and empirical movements brought with them two developments that are especially relevant for the present book: “a shift from individual to socially-situated cognition, and from writing in laboratory settings to the socially-situated and time-extended conditions of text production processes in diverse settings” (p. 86). These settings necessarily go beyond the academic domain and crucially include workplace writing. Park and Kinginger (2010) note key features of this writing environment, which have been instrumental in directing research. They note (p. 33): the contemporary writing context now offers an extensive array of networked artefacts. In such an environment, the post-cognitive perspective enhances our understanding of the composing process by bringing to the fore the nature of writing as a multi-resourced and multi-party activity.

The acknowledgement of the multi-resourced and collaborative nature of many forms of writing is also emphasized by scholars such as Prior (2015), who adds a key element: In his view, adopting a more encompassing, supra-individual perspective means that text production processes must be viewed as “a blend of texts, persons, activities, mediational means, and social formations/practices” (p. 188). As a result, Prior argues, writing must be seen as “temporally and spatially stretched out trajectories rather than as punctual events in a narrow and isolated here-and-now” (p. 188. Emphasis added). Therefore, research on “text production

13

14

Rosa M. Manchón & Julio Roca de Larios

processes” (e.g. Green, 2013; Hamel et al., 2015; Lei, 2008; Séror, 2013; Smith et al., 2017; Yasuda, 2005) has zoomed precisely into this socially-situated, often collaborative, extended, dynamic, time-distributed nature of the processing dimension of writing, which has entailed the use of alternative methodologies from those used in the cognitively-oriented, laboratory-type of studies on writing processes referred to above.

Feedback processing We noted earlier that the study of feedback processing has been added more recently to research agendas on L2 writing processes. As more fully discussed in Chapter 3 (this volume) and as also illustrated in various other chapters in the book (especially chapters 14, 15, and 16), we are concerned essentially with SLAoriented studies on feedback processing, primarily (although not solely) from the perspective of the potential of such processing activity for L2 learning. This research interest derives from the theoretical predictions (in turn closely associated with theories of attention in SLA studies) on the connection between feedback processing and language learning that emphasize the key role played by how deeply L2 writers process the feedback they are provided with in bringing about potential language learning gains (see Bitchener, 2019; Leow, 2020; Leow & Suh, 2021; Polio, 2012; Manchón & Vasylets, 2019; Roca de Larios & Coyle, 2021). Three main lines of research are currently being pursued in this area of inquiry into writing processes. One gradually expanding research trend (for representative examples see Caras, 2019; Cerezo et al., 2019; Coyle et al., 2018; Kim & Bowles, 2019; Park & Kim, 2019; Suzuki, 2017) is concerned with the analysis of the manner in which different types of feedback (more/less comprehensive, more/less explicit) foster depth of processing, often operationalized in recent studies as “the relative amount of cognitive effort, level of analysis, elaboration of intake together with the usage of prior knowledge, hypothesis testing and rule formation employed in decoding and encoding some grammatical or lexical item in the input” (Leow, 2015, p. 204). Another research trend corresponds to those studies that look into the correlation between depth of processing and the nature of the revisions undertaken, which are taken as evidence of language learning (Bowles & Gastañaga, 2022; Cerezo et al., 2019; Leow et al., 2022; Manchón et al., 2020 for representative examples). More recently, a third line of inquiry has begun to explore the problem-solving strategies engaged in by child learners to compose their texts after processing the feedback received on them (García Hernández et al., 2017).

Chapter 1. The study of L2 writing processes

Methodological considerations in the study of L2 writing processes and feedback processing Given the extensive treatment of methodological considerations in research on writing processes included in the rest of the chapters in the book, in what follows we will limit ourselves to providing just broad strokes of the main research instruments used to record and analyze L2 writing behaviors both while writing and while processing feedback. We shall not account for other nevertheless important methodological considerations (such as populations, designs, or data analyses) and instead refer readers to relevant chapters in the book for further elaboration (see also synthesis of the book contents in the next section in this chapter).

Writing and text production processes In the case of writing processes, both traditional and more recent approaches coincide in developing methods for recording and analyzing writers’ behavior while composing, and in attempting to associate this behavior to underlying cognitive processes, hence implicitly assuming the interdependent nature of this double aim, i.e., “what behavior is recorded and how it is analyzed depends on the researcher’s theory of the cognitive processes involved, and, to a certain extent, theories of these cognitive processes depend on what it is possible to observe” (Galbraith & Vedder, 2019, p. 633). The research instruments used range from traditional techniques, such as analysis of written texts, questionnaires, interviews, and verbal reports, to more innovative ones, such as screen capture technologies, keystroke logging, or eyetracking, as more fully discussed in several chapters in Part II of this book (chapters 4, 5, 6, 7, 8 and 9). This gradual expansion of research instruments is not solely due to technological advances and the attested problems with more traditional methodologies (such as the reactivity of think-aloud protocols or the incompleteness due to memory decay associated with stimulated recall protocols). Rather, we would argue, full recognition of several dimensions of L2 writing has not only contributed to innovations in research tools and data collection procedures but has also posed new methodological challenges in terms of designs and analytic procedures. For instance, as noted in earlier sections, acknowledging the space- and time-distributed, often collaborative, and source-based nature of many forms of writing brings with it needed innovations in the quests for principled answers to pending questions (see Chapter 7). In turn, finer-grained analyses of writing behaviors have allowed researchers to bring some composing processes to light. For example, the use of keystroke logging techniques allowed Leijten et al. (2014) to examine the use of sources by a professional writer and led them to

15

16

Rosa M. Manchón & Julio Roca de Larios

refine Hayes’s model of L1 writing (2012) by adding a “searcher component” to it. Understanding online literacies in full adds another layer of complexity to the study of L2 writing processes, as does the recognition of the gradually increasing digital and multimodal nature of writing in a variety of learning and academic contexts, as well as in workplace environments (see, for instance, Leijten et al., 2014; Smith et al., 2017). Being cognizant of these various dimensions and developments poses new methodological challenges that entail, at a minimum, expanding and/or refashioning existing research methodologies, refining our constructs and analytic lenses, or expanding research designs (in part to make longitudinal investigations, as well as qualitatively/ethnographically-oriented studies, more central in our research agendas). These crucial methodological considerations in research on L2 writing processes and text production processes are central in the scholarly discussions included in the book.

Feedback processing Although work in this area is still in its infancy, relevant attempts have been made to move this strand forward, both in terms of needed theoretical developments capable of informing research, and in terms of empirical efforts, as more fully discussed in Chapter 3 and as illustrated in chapters 14, 15, and 16. The available (albeit limited in number to date) descriptive and correlational studies that have investigated feedback processing in individual or collaborative writing conditions have made use of a range of methodological procedures that include thinkaloud, oral languaging, and written languaging. As noted earlier, this research has attempted to establish levels of processing of feedback, as well as the impact of levels of processing on the characteristics of revised texts written after the processing of the feedback provided on the initial texts. Yet, some voices (e.g., Manchón & Leow, 2020; Manchón et al., 2020) have underscored a range of pending questions on the validity and affordances of the diverse research instruments used to capture the processing dimension of the engagement with feedback provided in traditional pen and paper as well as in digital environments (for the latter, see Chapter 15, this volume). Similarly, recent empirical studies (e.g. Coyle et al., 2018) and position papers (e.g. Leow & Manchón, 2021; Manchón & Leow, 2020) have drawn our attention to the relevance of looking at how L2 writers engage with the feedback provided on their writing through alternative analytic lenses in an attempt to capture the dynamism of feedback processing, another crucial dimension of the temporality of writing that has made its way into global disciplinary discussions on writing processes (see methodological discussions in chapters 14, 15 and 16).

Chapter 1. The study of L2 writing processes

The present book Against the background provided in the previous sections, we shall now (i) outline the main aims guiding our collective project; (b) provide a synthesis of key elements of the various contributions to the book and the way in which they are framed in a common research agenda; (c) describe the interconnection among contributions and internal coherence of the volume; and (d) state the way in which the book attempts to move methodological discussions in research on L2 writing processes forward.

Aims and scope The above synthesis of empirical research trends and methods attests to the breadth and growth of research into L2 writing processes as well as to the dilemmas and challenges surrounding methodological decision-making. We, therefore, considered it timely and relevant for the field to engage in a critical reflection of past and current methodological practices in the domain, as well as in an equally critical prospective, look-forward analysis of what lies ahead in terms of innovative inquiry methods and resulting challenges. The book hence attempts to advance professional discussion of inquiry methods in the study of the writing process from two main perspectives. The first relates to the expanded dimensions of L2 writing processes in focus, as already noted in earlier sections. Chapters cover methodological considerations and approaches in the study of writing processes in controlled and more naturalistic research environments to shed light on methodological issues in the study of writing processes and online behaviors, but also on the space- and the timedistributed nature of writing and feedback processing. The book also covers central methodological concerns in understanding writing as performed individually and collaboratively, in pen and paper and digital environments, and with/without access to feedback and to external sources. To our knowledge, this will be the first book to cover these various facets of writing in a research methodological collective reflection. The second intended contribution to the field derives from the expanded range of research instruments and methodological approaches covered, as well as the analytic lenses adopted. In this sense, chapters in Part II contribute thorough and critical reflections on the affordances of a wide range of methodological approaches capable of informing research in the domain. Part III chapters add an equally critical, insider perspective on key issues in data collection and data analysis in the inquiry process followed in a range of studies concerned with the study

17

18

Rosa M. Manchón & Julio Roca de Larios

of writing processes and feedback processing that have used the methodologies discussed in Part II.

Structure and contents The chapters in the book are structured in 3 main parts, which are preceded by a Foreword (by Alister Cumming) together with this introductory chapter by the Editors, and followed by an Afterword (by Charlene Polio). Part I is made up of 2 chapters that expand the analysis presented in this introductory chapter. They provide a critical analysis of research approaches in the study of writing processes (Chapter 2), and a detailed overview of methodological procedures in extant research on feedback processing (Chapter 3). These 2 chapters will serve to more narrowly situate the analyses of methodological tools and procedures covered in the chapters included in Part II. Part II is made up of 6 chapters, each one covering a distinct (set of ) data collection instrument(s) or procedure(s). These include survey data (i.e., questionnaires, interviews, and process logs, Chapter 4); verbally mediated data (including both concurrent/retrospective verbalizations via think-aloud protocols and stimulated recalls -Chapter 5- and written languaging –Chapter 6); direct observation of writing activity (through video recording and digital screen capture software applications. Chapter 7); and key approaches in the study of digital writing, including keystroke and handwriting logging tools (Chapter 8), as well as eye tracking technology (Chapter 9). All chapters in Part II follow an identical structure: A general description of the procedure in focus coupled with a more detailed analysis of (i) the type of research questions that can be answered using the methodological procedure/ technique/instrument concerned; (ii) the potential methodological challenges to be faced and potential solutions; and (iii) an account of best practices in using the technique/methodology in focus. Part III includes 7 chapters that contribute critical, reflective narrative accounts on key methodological decision-making guiding the inquiry process followed in specific studies on writing processes and feedback processing. Collectively, these contributions portray the inside story of research and exemplify and analyze critically the use of most of the methodological procedures covered in Part II chapters. They do so in the analysis of both pen-and-paper and digital writing in controlled situations as well as in more expanded time conditions. The range of populations in these studies included children, adolescents, and adults, a feature of the book that we consider particularly relevant. Contributions to Part III were also asked to apply an identical structure, in this case covering (i) an overview of the study/program of research in focus (in terms of rationale, aims,

Chapter 1. The study of L2 writing processes

and methods); (ii) a detailed analysis of methodological decisions, challenges/ problems experienced and solutions adopted; and (iii) a final section including relevant methodological conclusions and implications for future studies. In Chapter 2, Rijlaarsdam, Van Steendam, and van Weijen provide a muchneeded theoretical contribution to the field with their in-depth analysis of the “writing process construct”. Drawing on Cook and Campbell’s (1979) validity framework, the authors suggest a number of structural parameters (strings of activities, forward or backward direction, thinking or performative domains) as well as functional parameters (the drive to understand and/or to be understood) to characterize the writing process construct as intended. These parameters are then used to conduct a critical evaluation of oft-cited L2 writing process studies (from the late 1990s up to now) in terms of their construct validity. It is concluded that, although some progress has been made in technicalities and sample sizes over the years, there are still some key components of the writing process construct that must be seen as a challenge. Some suggestions for further research in this direction, together with a selective list of guidelines for statistical and internal validity, are offered. Moving on to written corrective feedback (WCF), Coyle, Nicolás-Conesa, and Cerezo (Chapter 3) draw on cognitive and sociocultural approaches to provide a critical overview of major methodological practices in extant research on WCF processing. The distinction between interventionist and naturalistic studies allows the authors to (i) differentiate feedback processing as a cognitive activity from engagement with feedback, which is considered a broader construct encompassing cognitive, behavioral, and affective dimensions; and (ii) describe and critically evaluate research designs and populations studied, as well as data collection and analytical procedures used. Especially enlightening is their classification of concurrent and non-concurrent data elicitation methods which, among other criteria, may vary as a function of their degree of reactivity and veridicality, ecological validity, and the information they provide on learners’ levels of awareness and depth of processing. The authors suggest that, among other considerations, future studies should include a wider range of populations, make use of more longitudinal and classroom-based designs, and pay attention to learners’ engagement with digital and multimodal feedback as well as to the development of their feedback literacy. Hort and Vasylets (Chapter 4) discuss the affordances of three self-report techniques for studying L2 writing processes, namely, questionnaires, interviews, and process logs. The authors describe their main characteristics, provide examples of their independent and/or combined use in L1 and L2 writing studies, and underscore their versatility. After making different methodological recommendations to tackle the threats to reliability and validity involved in their application,

19

20

Rosa M. Manchón & Julio Roca de Larios

the importance of data triangulation is emphasized. Among the different avenues for future research mentioned in the chapter, four are especially relevant: (i) the use of greater terminological consistency in the processes of data collection and analysis; (ii) the use of group interviews as an opportunity for joint reflection and learners’ co-construction of perceptions on writing processes; (iii) the linguistic analysis of the data obtained from self-reports as a new procedure to access learners’ thoughts and feelings; and (iv) the use of mobile technologies to facilitate the exploration of the spatial and temporal distribution of composing events and thus widen the scope of writing process inquiry. Leow and Bowels (Chapter 5) discuss the use of think-aloud protocols (TAs) and stimulated recalls (SRs) to study cognitive processing in general and L2 writing processes more specifically. The authors describe the main characteristics of both techniques in terms of timing, type of support needed and modality, and claim that their use, irrespective of the theoretical position adopted (cognitive or sociocultural), has made it possible to address a wide range of questions in both L1 and L2 writing research. Among the various limitations of TAs and SCs, two are highlighted, i.e., reactivity, regarded as one of the main threats to validity; and the nature of the data collected, which are not sufficiently fine-grained to capture low levels of awareness or fleeting attention to L2 features. In consonance with these limitations, the chapter ends with several recommendations to minimize the threats to validity involved in the use of TAs and SRs, and with suggestions to triangulate data, as is the case with most chapters in Part II. Sasaki, Ishihawa, and Storch (Chapter 6) focus on written verbalizations (WVs). After discussing some veridicality and reactivity issues involved in the use of concurrent and retrospective WVs, the authors mostly focus on the latter when they (i) describe different data collection instruments such as diaries, journals, written reports or guided worksheets to look at learners’ general experiences or specific thinking processes after completing a task; (ii) compare the self-directed or other-directed nature of writing prompts and the different effects they may have; and (iii) look at the language of reporting as a topic that may add a new dimension to reactivity issues. After discussing several research questions that may be answered with the use of WVs, the authors analyze some methodological challenges that their use may involve. Finally, WVs are seen as appropriate instruments to gain insights into L2 writing (e.g., how goals change over time) or feedback processing (e.g., in terms of cognitive effort), although the authors contend that reactivity and veridicality issues need to be further explored. Séror and Gentil (Chapter 7) discuss screen capture technologies (SCTs) as an unobtrusive, real-time observational method to record everything that occurs on a writer’s digital screen. They claim that SCTs are now often used in research under the umbrella of cognitive, sociocultural and sociomaterial theoret-

Chapter 1. The study of L2 writing processes

ical frameworks. Among other benefits, the authors see SCTs as an important tool to (i) provide insights into previously undocumented events associated with writing processes; (ii) examine writers’ handling of space and visual elements while composing; or (iii) document their use of online resources. The use of SCTs, however, involves different challenges mostly related to the management of the amount and complexity of the data gathered, which, according to the authors, may be tackled by means of judicious decisions related to the research design and the methodology used in the study at hand. The chapter concludes with suggestions for further research which include, for example, the use of SCTs for the analysis of out-of-class writing practices and emotions in writing or the study of writers’ management of multilingual, multimodal and AI resources. In Chapter 8, Johansson, Wengelin, and Johansson contribute a comprehensive discussion of keystroke logging to examine digital writing in real-time. The analysis focuses on how this technology works, why and when it is appropriate, as well as on methodological considerations related to its use. The discussion also provides a synthesis of previous L2 writing keystroke logging process-oriented studies to illustrate the type of questions that can be addressed with this data collection technique. As in the rest of the contributions to Part II, the chapter also discusses some methodological challenges in keystroke logging studies, such as (i) the choice of keystroke logging program; (ii) the procedures used for data recording and analysis; and (iii) the ways of interpreting findings. In the last part of the chapter, the authors provide different suggestions for best practices, among which it is worth mentioning the need for novice researchers to get acquainted with the variety of ways writing processes may be analyzed and measured before attempting a study with this technology. On the premise that gaze direction and visual attention may provide important information on cognitive and linguistic processing, Johansson, Johansson, and Wengelin (Chapter 9) offer a general overview of eye tracking (ET) as an unobstrusive and observational method to explore the role played by reading during writing. After presenting the rationale for using ET, the authors describe how the technique works and discuss previous L1 and L2 writing studies to illustrate the variety of issues it may help uncover. Some methodological challenges are discussed, especially those associated with (i) the spatial and temporal resolution of eye trackers; (ii) the dynamic nature of the emerging text; (iii) the choice of appropriate methods (in combination) for registering the writing process; or (iv) the danger of using ET as a purely observational tool. The authors conclude the chapter by suggesting a set of best practices closely related to those challenges and recommending further studies that may look, for example, at text production processes based on orthographies other than the Latin alphabet.

21

22

Rosa M. Manchón & Julio Roca de Larios

Assuming that writing is an inherently individual and social activity and, more specifically, that writing processes may be interpreted as a genre-oriented sequence of decisions, Roca de Larios (Chapter 10) discusses the methodological challenges involved in analyzing how writers generate, develop, and integrate their goals while composing L1 and L2 argumentative texts. The author revisited the data (handwritten texts and TA protocols) of a previous study with the double aim of identifying the argumentative moves in the written texts and uncovering the goals and processes responsible for their written formulation. This involved a set of methodological decisions specifically oriented to the analysis of whether and how the participants (i) conceptualized the task as an argumentative problem; (ii) managed to construct a network of argumentative goals throughout the composition; and (iii) integrated conflicting goals by means of one-sided or two-sided reasoning. The author concludes with suggestions for further writing process research within the theoretical framework of this genre-based, goaloriented approach. Garcés, Criado, and Manchón (Chapter 11) reflect on the methodological challenges they faced in their pioneering study on the use of keystroke logging (KSL) to explore young learners’ pausing behavior in L2 writing before and after receiving feedback in the form of model texts. These challenges ranged from the identification of children’s typing skills to the analysis of data in terms of pause location (e.g., inexactitudes in the automatic coding of children’s pausing by Inputlog), frequency (e.g., interpretation of the number of pauses in terms of writing processes) and duration (e.g., differences in the interpretation of adults’ and children’s long pauses before textual units). The chapter concludes with recommendations for future KSL research with children, which include, for example, (i) the use of multiple pause thresholds to better uncover their low-level and higher-level processes; (ii) the manual reassessment of pauses and text boundaries provided automatically by Inputlog; (iii) the triangulation of KSL data with those elicited through other techniques; and (iv) the analysis of the potential connection between pausing behavior and text quality. In Chapter 12, Guggenbichler, Eberharter, and Kremmel critically reflect on key methodological issues related to the investigation of L2 writing processes for assessment purposes. Relying on cognitive validity as a measure of how closely the writing processes elicited by a writing test represent those predicted by cognitive models of writing, the authors draw on their experiences from two research projects to discuss the main challenges involved in the combined use of eye-tracking, keystroke logging and stimulated recall. These challenges included (i) the definition of stopping rules for stimulated recalls to be initiated; (ii) the setting of pause length thresholds as a function of the effects to be analyzed; (iii) the decision on which parts of the writing session should be included for analysis; (iv) the inter-

Chapter 1. The study of L2 writing processes

pretation of stimulated recall data bearing in mind individual differences between writers; and (v) the identification of nuanced differences in writing processes across languages not usually covered by traditional writing models. The chapter concludes with a set of recommendations for future research closely related to those challenges. Pacheco and Smith (Chapter 13) discuss the methodological challenges involved in the study of emergent bilinguals’ processes (e.g., learning about modal affordances, seeking information, iteratively creating digital texts), products (e.g., PowerPoint presentations), and perceptions (e.g., descriptions of design decisions) within a research project on multilingual and multimodal composing framed in social semiotics and translingual literacy. The authors focus on the problems addressed and the solutions adopted when capturing and analyzing data on (i) the repertoires of semiotic resources available to students both inside and outside the physical boundaries of the classroom; (ii) the intentions and decisions involved in their choices of multiple modalities and languages when composing; and (iii) the trajectories they followed across modalities, which were identified as multimodal timescapes and used for assessment and instructional purposes. In the conclusion, three main avenues for research are offered, namely, triangulating multiple data sources, accounting for the ways contextual elements interact with multimodal composition, and considering writing as an activity where social interactions may shape composing processes. As a response to the difficulties involved in the application of analytical categories developed with adults to younger and less proficient learners, Coyle (Chapter 14) reflects on the challenges she and her colleagues experienced in the development of a process-product coding scheme used in a previous study to analyze EFL children’s engagement with feedback in the form of model texts. After outlining the socio-cognitive approach underlying the study, the author goes on to describe the analytical procedures followed to triangulate the data collected from the children’s written texts, collaborative dialogue protocols, and written notes. These procedures entailed a number of methodological decisions such as (i) identifying the language-related problems (LREs) and problem-solving strategies activated by the children while writing; (ii) selecting appropriate measures to analyze their written output; (iii) identifying the extent of their noticing processes; (iv) tracing each LRE across task stages; and (v) classifying the trajectories identified in terms of their language learning potential. In the conclusion, the author highlights some limitations of these analytical procedures and offers suggestions for future research. Shintani and Aubrey (Chapter 15) focus on synchronous corrective feedback (SCF), which is mostly conducted in digital environments, and contrast it with traditional, asynchronous feedback (ACF). After a theoretical discussion of the

23

24

Rosa M. Manchón & Julio Roca de Larios

different cognitive processes and sociocultural affordances potentially involved in each feedback modality, they provide a detailed reflection on the multiplicity of methodological challenges the authors had to face in their 2016 study. These challenges included decisions on (i) the operationalization of SCF and ACF in terms of timing, scope and degree of explicitness; (ii) the target linguistic structure to be addressed; (iii) the design of tasks that might act as appropriate contexts for the structure to be used, (iv) the treatment procedures used; and (v) the coding of learners’ processes and products to analyze the effects of the feedback provided. Finally, the usefulness of SCF is addressed through a number of methodological implications that involve, for example, the exploration of the effectiveness of different SCF modalities or the analysis of teachers’ and students’ perceptions of SCF. Drawing on Depth of Processing (DoP) as a key variable in the connection between WCF processing and language learning, McBride and Manchón (Chapter 16) provide a critical reflection of the methodological problems they addressed when inspecting the affordances of three feedback processing conditions with a group of undergraduates majoring in linguistics: metacognitive think-aloud protocols (TA), written languaging (WT), and a combination of both (TA + WL). After situating the chapter in a larger research program, the authors discuss the decision-making process involved in the development of a DoP coding scheme which, given the diversity of data collected, led them to distinguish between DoP levels, in the case of TA, and awareness levels, in the case of WL. The subsequent attempt to set up a global coding scheme equally valid for both types of data enabled the authors to realize that it was the combination of both instruments (TA + WL) that was the most valid to provide information on learners’ time on task and cognitive effort (DoP). The chapter closes with a number of methodological conclusions for future studies on WCF processing.

Final comments The review of research trends and methods undertaken in this chapter allows us to conclude that scholarly interest in writing processes, i.e., the invisible dimension of L2 writing which began to be explored in the late 80s of the last century, has re-emerged with force as a result of a double, interactive move: (i) the reconceptualization of the notion of “process” through the diversification of the theoretical assumptions informing L2 writing research; and (ii) the concomitant expansion of new lines of inquiry and methodological approaches to explore composition processes. The understanding of the writing process construct has evolved from being initially seen only as a set of individual cognitive operations

Chapter 1. The study of L2 writing processes

and online behaviors (writing processes) to being contemplated as multidimensional in nature. “Process” in L2 writing is now thought to comprise not only its traditional cognitive dimensions but also (i) the socially-situated, time-distributed and source-based components of literacy practices (text production processes); and (ii) the notion of “feedback processing”, broadly understood as the multiplicity of cognitive, affective, and behavioral actions engaged in by L2 writers when addressing the feedback provided on their writing. As shown in the preceding discussion, this reconceptualization of the writing process construct has been informed by a range of theoretical assumptions that have inspired and framed the shift from individual to socially situated cognition. They include cognitive, sociocognitive, sociocultural, sociomaterial, social semiotic, and translingual literacy approaches. Closely related to this enlarged understanding of the processing dimension of composing is the development of research instruments and procedures intended to observe and analyze its multiple features. The synthesis provided above has shown that innovations include both (i) the use of traditional techniques (e.g., analysis of written texts, think-aloud, stimulated recalls, interviews) with a new orientation and/or combined with more recent screen-based technologies (e.g., eye-tracking, keyboard logging, and screen capturing) to explore writing processes or feedback processing; and, more importantly, (ii) the recognition of the multi-resourced and collaborative nature of many forms of writing in a variety of learning and academic contexts. This expansion, in turn, involves new methodological challenges not only in terms of the affordances offered by the different data collection instruments used (covered in contributions to Part II) but also in terms of the reliability and validity issues involved in the procedures employed to analyze the data collected and their triangulation (as discussed in chapters in Part III). Summing up, the book intends to contribute to professional discussions in L2 writing research by documenting how the field has gradually been constructing a multi-dimensional (and more ecologically valid) picture of the “L2 writing process” construct through the parallel development, by way of cross-fertilization, of the research instruments and analytic methodological procedures employed.

Funding The research synthesis reported on in this chapter, as well as the ediiton of the book as a whole, is part of a wider research programme on L2 writing financed by the Spanish Ministry of Science and Innovation (Research grant PID2019-104353GB-100) and the Séneca Foundation (Research Grant 20832/PI/18).

25

26

Rosa M. Manchón & Julio Roca de Larios

References Barkaoui, K. (2016). What and when second-language learners revise when responding to timed writing tasks on the computer: The roles of task type, second language proficiency, and keyboarding skills. The Modern Language Journal, 100, 320–340. Barkaoui, K. (2019). What can L2 writers’ pausing behavior tell us about their L2 writing processes? Studies in Second Language Acquisition, 41, 529–554. Beare, S., & Bourdages, J. (2007). Skilled writers’ generating strategies in L1 and L2: An exploratory study. In M. Torrance, L. Van Waes, & D. Galbraith (Eds.), Writing and cognition. Research and applications (pp. 151–161). Elsevier. Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Lawrence Erlbaum Associates. Bitchener, J. (2019). The interaction between SLA and feedback research. In K. Hyland & F. Hyland (Eds.), Feedback in second language writing: Contexts and issues (pp. 85–105). Cambridge University Press. Bowles, M., & Gastañaga, K (2022). Heritage, second and third language learner processing of written corrective feedback: Evidence from think-alouds. Studies in Second Language Learning and Teaching, 12(4), 677–698. Breuer, E. O. (2019). Fluency in L1 and FL writing: An analysis of planning, essay writing and final revision. In E. Lindgren & K. Sullivan (Eds.), Observing writing. Insights from keystroke logging and handwriting (pp. 190–211). Brill. Caras, A. (2019). Written corrective feedback in compositions and the role of depth of processing. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 188–200). Routledge. Cerezo, L., Manchón, R. M., & Nicolás-Conesa, F. (2019). What do learners notice while processing written corrective feedback? A look at depth of processing via written languaging. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 173–187). Routledge. Chan, S. (2017). Using keystroke logging to understand writers’ processes on a reading-intowriting test. Language Testing in Asia, 7. Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18(1), 80–98. Chukharev-Hudilainen, E., Saricaouglu, A., Torrance, M., & Feng, H-H. (2019). Combined deployable keystroke logging and eye tracking for investigating L2 writing fluency. Studies in Second Language Acquisition, 41, 583–604. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Houghton Mifflin. Coyle Y., & Roca de Larios, J. (2014). Exploring the role played by error correction and models on children’s reported noticing and output production in a L2 writing task. Studies in Second Language Acquisition, 36, 451–485. Coyle, Y., Cánovas-Guirao, J., & Roca de Larios, J. (2018). Identifying the trajectories of young EFL learners across multi-stage writing and feedback processing tasks with model texts. Journal of Second Language Writing, 42, 25–43.

Chapter 1. The study of L2 writing processes

Coyle Y., & Roca de Larios, J. (2020). Exploring young learners’ engagement with models as a written corrective technique in EFL and CLIL settings. System, 95, 102374. Criado, R., Garcés, A., & Plonsky, L. (2022). Models as written corrective feedback: Effects on young L2 learners’ fluency in digital writing from product and process perspectives. Studies in Second Language Learning & Teaching, 12(4), 699–721. . Cruz, B., Cerezo, L., & Nicolás-Conesa, F. (2022). A classroom-based study on the effects of WCF on accuracy in pen-and-paper versus computer-mediated collaborative writing. Studies in Second Language Learning & Teaching, 12(4), 623–650. De Silva, R., & Graham, S. (2015). The effects of strategy instruction on writing strategy use for students of different proficiency levels. System, 53, 47–59. Ferenz, O. (2005). First and second language use during planning processes. In T. Kostouli (Ed), Writing in context(s): Textual practices and learning processes in sociocultural settings (pp. 185–205). Springer. Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387. Galbraith, D. (2009). Cognitive models of writing. German as a Foreign Language, 2–3, 7–22. Galbraith, D., & Vedder, I. (2019). Methodological advances in investigating L2 writing processes. Challenges and perspectives. Studies in Second Language Acquisition, 41, 633–645. Gánem-Gutiérrez, G. A., & Gilmore, A. (2018). Tracking the real-time evolution of a writing event: Second language writers at different proficiency levels. Language Learning, 68(2), 469–506. García Hernández, J., Roca de Larios, J., & Coyle, Y. (2017). Reformulation as a problemsolving space for young EFL writers: A longitudinal study of language learning strategies. In M. P. García Mayo (Ed.) Learning foreign languages in primary school. Research insights (pp. 193–222). Multilingual Matters. Gass, S. (1997). Input, interaction, and the second language learner. Lawrence Erlbaum Associates. Green, S. (2013). Novice ESL writers: A longitudinal case-study of the situated academic writing processes of three undergraduates in a TESOL context. Journal of English for Academic Purposes, 12(3), 180–191. Hamel, M. -J., Séror, J., & Dion, C. (2015). Writers in action: Modelling and scaffolding secondlanguage learners’ writing process. Higher Education Quality Council of Ontario. Han, Y., & Hyland, F. (2015). Exploring learner engagement with written corrective feedback in a Chinese tertiary EFL classroom. Journal of Second Language Writing, 30, 31–44. Hanaoka, O. (2007). Output, noticing, and learning: An investigation into the role of spontaneous attention to form in a four-stage writing task. Language Teaching Research, 11, 459–479. Hanaoka, O. & Izumi, S. (2012). Noticing and uptake: Addressing pre-articulated covert problems in L2 writing. Journal of Second Language Writing, 21, 332–347. Hayes, J. R. (1996). A new framework for understanding cognition and affect in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing (pp. 1–27). Lawrence Erlbaum Associates.

27

28

Rosa M. Manchón & Julio Roca de Larios

Hayes, J. R. (2012). Evidence from language bursts, revision, and transcription for translation and its relation to other writing processes. In M. Fayol, D. Alamargot, & V. Berninger (Eds.), Translation of thought to written text while composing (pp. 15–25). Psychology Press. Kellogg, R. (1996). A model of working memory in writing. In M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences, and applications (pp. 57–72). Lawrence Erlbaum Associates. Kellogg, R. (2001). Competition for working memory among writing processes. American Journal of Psychology, 114, 175–191. Kim, H. R., & Bowles, M. (2019). How deeply do second language learners process written corrective feedback? Insights gained from think-alouds. TESOL Quarterly, 4, 913–938. Knospe, Y., Sullivan, K., Malmqvist, A., & Valfridsson, I. (2019). Observing writing and website browsing: Swedish students write L3 German (pp. 258–254). In E. Lindgren & K. Sullivan (Eds.), Observing writing. Insights from keystroke logging and handwriting. Brill. Lei, X. (2008). Exploring a sociocultural approach to writing strategy research: Mediated actions in writing activities. Journal of Second Language Writing, 17(4), 217–236. Leijten, M., Van Waes, L. Schover, K. & Hayes, J. R. (2014). Writing in the workplace: Constructing documents using multiple digital sources. Journal of writing Research, 5(3), 285–337. Leijten, M., Van Waes, L., Schrijver, I., Bernolet, S., & Vangehuchten, L. (2019). Mapping master’s students’ use of external sources in source-based writing in L1 and L2. Studies in Second Language Acquisition, 41, 555–582. Leow, R. P. (2015). Explicit learning in the L2 classroom: A student-centered approach. Routledge. Leow, R. P. (2020). L2 writing-to-learn: Theory, research, and a curricular approach. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas. John Benjamins. Leow, R., & Suh, B-R. (2021). Theoretical perspectives on writing, written corrective feedback, and language learning in individual writing conditions. In R. M. Manchón & C. Polio (Eds.), Handbook of SLA and writing (pp. 9–21). Routledge. Leow, R., & Manchón, R. M. (2021). Expanding research agendas: Directions for future research agendas on writing, WCF, language learning, and ISLA. In R. M. Manchón & C. Polio (Eds.), Handbook of SLA and writing. Routledge. Leow, R. P., Thinglum, A., & Leow, S. A. (2022). WCF processing in the L2 curriculum: A look at type of WCF, type of linguistic item, and L2 performance. Studies in Second Language Learning and Teaching, 12(4), 653–675. López-Serrano, S., Roca de Larios, J. & Manchón, R. M. (2019). Language reflection fostered by individual L2 writing tasks: Developing a theoretically-motivated and empirically-based coding system. Studies in Second Language Acquisition, 41, 503–527. López-Serrano, S., Roca de Larios, J., & Manchón, R. M. (2020). Processing output during L2 individual writing tasks: An exploration of depth of processing and the effects of proficiency. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 231–253). John Benjamins.

Chapter 1. The study of L2 writing processes

Manchón, R. M. (2018). Past and future research agendas on writing strategies: Conceptualizations, inquiry methods, and research findings. Studies in Second Language Learning and Teaching, 8(2), 247–267. Manchón, R. M. (2020a). Writing and language learning. Looking back and moving forward. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 3–26). John Benjamins. Manchón, R. M. (2020b). The language learning potential of L2 writing: Moving forward in theory and research. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 405–426). John Benjamins. Manchón, R. M. (2021). The contribution of ethnographically-oriented approaches to the study of writing processes. In I. Guillén-Galve & A. Bocanegra-Valle (Eds.), Ethnographies of academic writing research. Theory, methods, and interpretation (pp. 83–103). John Benjamins. Manchón, R. M. (2023). The psycholinguistics of second language writing. In A. Godfroid & H. Hopp (Eds.), The Routledge handbook of second language acquisition and psycholinguistics (pp. 400–412). Routledge. Manchón, R. M., & Leow, R. P. (2020). An ISLA perspective on L2 learning through writing. Implications for future research agendas. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 335–355). John Benjamins. Manchón, R. M., Nicolás-Conesa, F., Cerezo, L., & Criado, R. (2020). L2 writers’ processing of written corrective feedback: Depth of processing via written languaging. In W. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching (pp. 241–265). John Benjamins. Manchón, R. M., & Roca de Larios, J. (2007). On the temporal nature of planning in L1 and L2 composing. Language Learning, 57, 549–593. Manchón, R. M., Roca de Larios, J., & Murphy, L. (2007). A review of writing strategies: Focus on conceptualizations and impact of the first language. In A. Cohen & E. Macaro (Eds.), Language learner strategies: Thirty years of research and practice (pp. 229–250). Oxford University Press. Manchón, R. M., Roca de Larios, J. & Murphy, L. (2009). The temporal dimension and problem-solving nature of foreign language composing processes. Implications for theory. In R. M. Manchón (Ed.), Writing in foreign language contexts: Learning, teaching and research (pp. 102–124). Multilingual Matters. Manchón, R. M., & Vasylets, O. (2019). Language learning through writing: Theoretical perspectives and empirical evidence. In J. B. Schwieter & A. Benati (Eds.), The Cambridge handbook of language learning (pp. 341–362). Cambridge University Press. Manchón, R. M., & Williams, J. (2016). L2 writing and SLA studies. In R. M. Manchón & P. K. Matsuda (Eds.), The handbook of second and foreign language writing (pp. 567–586). De Gruyter Mouton. Michel, M., Révész, A., Lu, X., Kourtali, N. E., Lee, M., & Borges, L. (2020). Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study. Second Language Research, 36(3), 277–304. Michel, M., Stiefenhöfer, L., Verspoor, M., & Manchón, R. M. (2021). L2 Writing processes and language learning in individual and collaborative conditions. In R. M. Manchón & C. Polio (Eds.), Handbook of SLA and writing (pp. 67–80). Routledge.

29

30

Rosa M. Manchón & Julio Roca de Larios

Murphy, L., & Roca de Larios, J. (2010). Searching for words: One strategic use of the mother tongue by advanced Spanish EFL writers. Journal of Second Language Writing, 19(2), 61–8. Park, E. S., & Kim, O. Y. (2019). Learners’ use of indirect written corrective feedback: Depth of processing and self-correction. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 214–228). Routledge. Park, K., & Kinginger, C. (2010). Writing/thinking in real time: Digital video and corpus query analysis. Language Learning and Technology, 14(3), 31–50. Plakans L. (2008). Comparing composing in writing-only and reading-to-write test tasks. Assessing Writing, 13, 111–29. Polio, C. (2012). The relevance of second language acquisition theory to the written error correction debate. Journal of Second Language Writing, 21, 375–389. Prior, P. (2015). Writing, literate activity, semiotic remediation. A sociocultural approach. In G. Cislaru (Ed.), Writing(s) at the crossroads. The process-product interface (pp. 185–201). John Benjamins. Révész A, Kourtali, N. E., & Mazgutova, D. (2016). Effects of task complexity on L2 writing behaviors and linguistic complexity. Language Learning, 67, 208–41. Révész A, Michel M, & Lee M. J. (2017). Investigating IELTS Academic Writing Task 2: Relationships between cognitive writing processes, text quality, and working memory. IELTS Research Report. The British Council. Révész, A., Michel, M., & Lee, M. (2019). Exploring second language writers’ pausing and revision behaviors: A mixed-methods study. Studies in Second Language Acquisition, 41, 605–631. Révész, A., Michel, M., & Lee, M. (2023). An exploration of the relationship of working memory to pausing and revision behaviors at different stages of writing. Studies in Second Language Acquisition. Roca de Larios, J., & Coyle, Y. (2021). Written corrective processing in individual and collaborative writing conditions. In R. M. Manchón & C. Polio (Eds.), Handbook of SLA and writing (pp. 81–93). Routledge. Roca de Larios, J., García Hernández, J., & Coyle, Y. (2021). A theoretically-grounded classification of EFL children’s formulation strategies in collaborative writing. Language Teaching for Young Learners, 3(2), 330–336. Roca de Larios, J., Marín, J., & Murphy, L. (2001). A temporal analysis of formulation processes in L1 and L2 writing. Language Learning, 51, 497–538. Roca de Larios, J., Manchón, R., Murphy, L., & Marín, J. (2008). The foreign language writer’s strategic behavior in the allocation of time to writing processes. Journal of Second Language Writing, 17, 30–47. Roca de Larios, J., Nicolás-Conesa, F., & Coyle, Y. (2016). Focus on writers: Processes and strategies. In R. M. Manchón & P. K. Matsuda (Eds.), Handbook of second and foreign language writing (pp. 267–286). De Gruyter Mouton. Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on an L2 writing revision task. Studies in Second Language Acquisition, 29, 67–100. Séror, J. (2013). Screen capture technology: A digital window into students’ writing processes. Canadian Journal of Learning and Technology, 39(3), 1–16.

Chapter 1. The study of L2 writing processes

Smith, B. E., Pacheco, M. B., & De Almeida, C. R. (2017). Multimodal codemeshing: Bilingual adolescents’ processes composing across models of language. Journal of Second Language Writing, 36, 6–22. Stevenson, M., Schoonen, R., & de Glopper, K. (2006). Revising in two languages: A multidimensional comparison of online writing revisions in L1 and FL. Journal of Second Language Writing, 15, 201–233. Stiefenhöfer, L., & Michel, M. (2020). Investigating the relationship between peer interaction and writing processes in computer-supported collaborative L2 writing: A mixed-methods study. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 255–279). John Benjamins. Suzuki, W. (2017). The effect of quality of written languaging on second language learning. Writing & Pedagogy, 8(3), 461–482. Tillema, M. (2012). Writing in first and second language: Empirical studies on text quality and writing processes (Doctoral dissertation). Utrecht University. Lot Publications. Tiryakioglu, G., Peters, E., & Verschaffel, L. (2019). The effect of L2 proficiency level on composing processes of EFL learners: Data from keystroke Loggings, think alouds and questionnaires. In E. Lindgren & K. Sullivan (Eds.), Observing writing. Insights from keystroke logging and handwriting (pp. 212–235). Brill. Torres, J. (2023). Exploring working memory and language dominance in heritage bilinguals’ writing behaviors. Studies in Second Language Acquisition. Uscinski, I. (2017). L2 learners’ engagement with direct written corrective feedback in first year composition courses. Journal of Response to Writing, 3(2), 36–62. https://scholarsarchive .byu.edu/journalrw/vol3/iss2/3 Van Weijen, D., van den Bergh, H., Rijlaarsdam, G., & Sanders, T. (2009). L1 use during L2 writing: An empirical study of a complex phenomenon. Journal of Second Language Writing, 18, 235–250. Wang, W., & Wen, Q. (2002). L1 use in the L2 composing process: An exploratory study of 16 Chinese EFL writers. Journal of Second Language Writing, 11, 225–246. Williams, J. (2012). The potential role(s) of writing in second language development. Journal of Second Language Writing, 21, 321–331. Xu, C., & Qi, Y. (2017). Analyzing pauses in computer-assisted EFL writing – A computer keystroke-Log perspective. Educational Technology & Society, 20(4), 24–34. Yasuda, S. (2005). Different activities in the same task: An Activity Theory approach to ESL students’ writing process. JALT Journal, 27(2), 139–168. Zalbidea, J. (2020). A mixed-methods approach to exploring the L2 learning potential of writing versus speaking. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 207–230). John Benjamins.

31

part i

Investigating writing processes The overall picture

chapter 2

Writing process studies. Struggling with complexities Looking back, moving forward Gert Rijlaarsdam,1,2 Elke Van Steendam3 & Daphne van Weijen1 1

University of Amsterdam | 2 Norwegian University of Science and Technology | 3 KU Leuven

This chapter discusses validity parameters for studies on writing processes in a second or foreign language (L2). To that end, Cook and Campbell’s validity framework, which discerns four types of validity, i.e., statistical, internal, construct and external validity, has been used. The chapter especially hones in on construct validity by combining a case-based approach, based on a selection of frequently cited L2-writing process studies, with a comprehensive causal model often used for analyzing writing process studies (in terms of four components, Process – Task – Learner – Output). We suggest seven functional and directional parameters to discuss the construct of the writing process as intended against the construct as studied, and propose a selective and non-exhaustive list of guidelines for statistical and internal validity. Both parameters and guidelines are provided to inform (the design of ) future L2 writing process studies.

Introduction In this chapter we discuss conceptual and methodological issues in L2-writing process research from a non-comprehensive perspective. Instead of analysing the almost 800 studies that resulted from our initial search of studies on writing processes from 1980 until July 2021, we opted for a case study approach based on a selection of the most frequently cited L2-writing process studies (exemplars). We analysed each of the selected studies as a ‘case’ and annotated them for conceptual and methodological issues. To critically examine the selected studies, we applied Cook and Campbell’s (1979) validity framework as it is one of the seminal and most complete methodological frameworks to review research in terms of different types of validity. The https://doi.org/10.1075/rmal.5.02rij © 2023 John Benjamins Publishing Company

Chapter 2. Writing process studies. Struggling with complexities

authors listed four key questions to evaluate studies that investigate causal relations between two variables (X causes Y, see Figure 1): 1.

Statistical validity: Is there indeed a relation between two variables under study? 2. Internal validity: Is the relation causal? 3. Construct validity: What constructs are represented as cause and effect? 4. External validity: How generalizable is this relation across persons, settings, and times? For the purpose of this chapter, construct validity is key. To evaluate the progress of L2 writing research in this respect, we first establish the descriptors of the construct ‘Writing Process’ in the section “L2 Writing Processes: The construct as intended”. We then discuss the construct as it was studied in the selected studies in the section entitled “Writing Processes: The construct as studied”. Given the emphasis of the volume and the restrictions on length, the other types of validity (i.e., statistical, internal, and external validity) remain largely outside the scope of this chapter. However, as construct validity depends to a large extent on statistical and internal validity, we extracted fallacies from the eight studies reviewed (cf. Table 1), and compiled a list with guidelines for these two types of validity that need to be considered for process studies in addition to the guidelines presented for construct validity (see Appendix A). Although, as mentioned above, external validity has not been explicitly addressed either, since none of the selected studies paid attention to this issue, we have assumed that, if the guidelines for internal, statistical, and construct validity are taken into account in future studies, the generalizability of results across participants and contexts may be increased. In the last part of the chapter, we link our discussion to the critical review conducted by Roca de Larios et al. (2002) as a baseline: looking back and forward.

Research in L2 writing processes: Selection of exemplar studies and overall framework To select the most frequently cited studies on L2 writing processes, we ran a search in Scopus: (TITLE-ABS-KEY ("Writing Process") AND (L2 OR Second Language), with no restrictions on year or type of publication. This step resulted in 788 hits, after which we filtered the output in Excel, and selected empirical studies, qualitative or quantitative, in four periods: pre-1990, 1990–2000, 2000–2010, and 2010-now. Next, we sorted them based on the citations received (most to least), removed book chapters, checked on process focus, and then selected one to three studies per decade. Table 1 presents the selected studies and their main characteristics.

35

36

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen Features of eight selected studies for case analysis

Reference in this chapter

Year Paper selected

Citations

Methodology

Scopus Google Scholar TA

Process

Task

Objects of attention: Language use, discourse organization; gist; intentions; procedures for writing. Decision/control strategies: Generation strategies, structuring, goal-setting

DES, ARG, Writing SUM Expertise ESL

TA

Objects of attention: metacognitive, discourse, linguistic, personal

SRV

Planning, retrieving, generating ideas, translating, verbalizing, rereading, evaluating, ‘others’

Learner

Output Text quality: Content, organization, language use

Quality of language use ARG

Experts vs Within novices: Low vs High

TA, then SR

Task examining; idea-generation; idea-organization; text

Wen

NAR; ARG TA

& Hayes

Formulation process DES

Writers’ experience in FL (German, French)

Burst length, revision episodes

Chapter 2. Writing process studies. Struggling with complexities (continued) Reference in this chapter

Year Paper selected

Citations

Methodology

Scopus Google Scholar

Process TA//KL

Learner

ARG

Mono- vs Bilingual

Multiple revision categories

et al. KL

Task

Output Text quality

Fluency indicators > Fluency DES

& Leijten

MEM et al.

KL: Various writing behaviors: processing speed, pausing, KL SRVsubsample revision. SRV: various subcategories within Planning, Translation, Monitoring

ARG Condition: CS vs no CS

TA: Think Aloud; SR: Stimulated recall; SRV: Stimulated recall, based on process video; KL: Keystroke logging; TA//KL: TA and KL simultaneous. Task; NAR: narrative ARG: Argumentative; DES: Descriptive; DES MEM: Descriptive: Memory dump; SUM: Summary; CS: Content Support.

Linguistic complexity; Linguistic text features

37

38

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

After having selected the case studies, our next step was to identify their main characteristics by applying a variant of a comprehensive model intended to represent the four components considered to be inherent to process studies: Process – Task – Learner – Output (Figure 1). In essence, this is a causal model in which the task component serves as an independent variable that can be manipulated to study its effects on processes (see Table 1, column “Task”). The table shows that the language of composition is a key task variable in four of the eight studies selected (studies 2, 5, 6, 7), while other task variables such as genre or communicative text function (studies 1, 4), or adding planning support or not (Study 8) have also been analyzed. Differences between task conditions may result in differences in output, directly and/or indirectly, via differences in processes. Learner variables may also directly or indirectly affect writing processes through the writing task. In the selected studies such variables are level of writing proficiency (L1 or L2; studies 1, 3), language proficiency (L2; studies 1, 3, 4, 5), and degree of bilingualism (Study 6) (see Table 1). While there seems to be a direct relationship between learner variables and processes in some cases (studies 3, 5), in other cases these variables seem to play a moderating role in the relationship between task conditions and processes (studies 1, 4). In other studies, not included in Table 1, differences between processes have been accounted for through other learner variables such as the writer’s attitude towards the L2, or their cognitive resources – working memory – and motivation (see, for example, Kormos, 2012). Output variables in some of the selected studies (studies, 1, 2, 3, 6, 8) correspond to text quality or a specific aspect of the resulting text – e.g., linguistic features. In other studies, results of processes are not included, but constructs of the process itself are in focus as dependent variables, e.g., fluency (speed, Study 7), L1-use during L2-writing (Study 4), and p-burst length or productivity (Study 5). A final characteristic of the studies included in Table 1 is that they mainly fall into the domain of ‘learning-to-write’ in that they are intended to investigate writers’ acquisition of writing skills and to assume that differences in processes correlate with differences in text quality. This chapter focuses on the model's Process component (see Figure 1). More specifically, the aim is to gain insight into what happens during writing processes and to investigate if variations in processes occur because of specific task or learner variables. To establish such a causal relation between two components in the model, i.e., the cause – a task, an intervention, a learner characteristic, on the one hand, and an effect – a process characteristic, on the other, the researcher must operationally define and measure both components validly and reliably. This is a matter of construct validity. In the case of writing process studies, this means that researchers need to ensure that the construct-as-measured accurately repre-

Chapter 2. Writing process studies. Struggling with complexities

sents the construct-as-intended, a key distinction that we elaborate on more fully in what follows.

Figure 1. Writing processes, and their relations to output (correlational) and input (causal) variables1

L2 writing processes: The construct as intended To discuss the constructs used in the selected studies, we must first define the main features of the “construct-as-intended”. To that end, we will introduce two sets of parameters of the construct ‘Writing Process’: Parameter 1, which deals with the basic structural architecture – the building blocks of any writing process –, and Parameter 2, which addresses the functional architecture of these blocks from the perspective of goals.

Basic parameter 1. Structure: Activity and time A process is defined by two features: mental and performative actions and progression over time (Rijlaarsdam & Van den Bergh, 1996). Seen in this light, a writing process is taken to consist of more than one action since various cognitive, 1. Note that the same model may apply to studies that test the effects of interventions or instructional sequences. Then the component Task Variables becomes the independent variable ‘Intervention’. The intervention – instead of Task – causes effects on processes, possibly moderated by learner variables, and results in variations in text quality, via the writing process.

39

40

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

affective, physical, and metacognitive activities are at play in the act of composing, as major writing process models have shown. Each activity can take place at a certain moment in the course of time of an entire process and be followed by the same or another activity. Although these activities can, in theory, be combined into any (type of ) sequence, writers, in practice, build ‘blocks’, i.e., basic or default sequences of actions, which may develop over time through practice, maturation, and instruction. These default blocks may be interrupted by another activity or a string of activities (Van den Bergh et al., 2016). Figure 2 illustrates the initial situation at the start of the writing process, when various activities are available to be called on. When one activity is activated (e.g., Activity A), it may ‘awake’ other activities, such as Activity B or C, which usually follow Activity A. This is what we know from think-aloud protocols, when we hear a writer produce a string of words (generating: Activity A), after which she transcribes that string (self-dictation: Activity B), ending it with ‘but’ as a springboard for generating new ideas. In order to do that, she feels she has to reread the words of the produced string (Activity C). This rereading feeds the monitor that regulates the process. The options then are to either generate (new) ideas or to revise part of the already written text to create a better connection with what follows. In this sense, a writing process is a string of probabilities, not very different from smart algorithms. What the brain does is to fixate frequent connections between activities. If this were not the case, a writing process would be far too costly, cognitively speaking. Therefore, we must rely on routines and patterns, whenever possible. If a routine fails at a certain moment, strategic knowledge will be applied, via the monitor, as shown in Figure 2. Activities may form strings of interdependent activities in the same way as subsequent paragraphs in a text form a coherent unit, a text, sentences form a coherent unit, a paragraph, and words form a coherent message, a sentence. An interesting example in this respect is the study conducted by López-Serrano et al. (2019), in which the authors cluster process segments from think-aloud protocols into strategy steps, that is, goal-oriented problem-solving actions. In principle, three forms of goal-directed strings or building blocks can occur in the writing process, i.e., serial, nested, and a-linear strings. Serial strings of activities are coordinated or juxtaposed sequences of activities: Activity N triggers N + 1, which triggers N + 2, etc. This process of string-building stops when the intermediate goal of the process is reached, after which a new series starts. Process activities can also form nested, hierarchical patterns, which are subordinately related in a recursive fashion: Activity N triggers a serial string (N + 1, N + 2), and then the string formation is interrupted. It is not N + 3 that follows, but N + 2.1, because N2 triggered an embedded string of activities, such as N2.1-N2.2, etc., until the goal of string N1-N2 is reached and the writer can move forward. The

Chapter 2. Writing process studies. Struggling with complexities

Figure 2. Dynamics of composing: A probabilistic model of writing processes. Adapted from Rijlaarsdam & Van den Bergh, 1996, p. 108

third option is an a-linear or non-progressive pattern, in which the writer moves backwards from the point of inscription to N-x to repair – delete, add, insert, move – something because of new insights leading to the improvement of the text coherence. Temporal development has only two directions: forwards and backwards, and two modes: continuous and discontinuous, or start and stop. A stop signals that the writer has options and can adjust how to proceed. After a stop, the options are a serial, nested or backward (i.e., a-linear) pattern of activities, as explained above. A stop can signal both a move back and a move forward in the text. The fewer stops in a process, the fewer options to switch and change direction, and the more continuous and ‘monotonous’ the process. Backward actions can be seen as temporary interruptions of composing at the service of the goal-directed forward process (see De Beaugrande’s [1984] distinction between the look-back and the look-ahead principles of linearization). For example, when one rereads and reviews a sentence that has just been written (backward movement), this may trigger either the generation of a new idea and string of words (forward movement) or the editing of the sentence just written (backward movement). Both actions may take place inside as well as outside the writer, that is, internally and externally. A writer can repair a formulation mentally, which is thinking, internal and backward. Or she can reread aloud a written

41

42

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

fragment, which is performing, based on external input, and backward, and then generate a new idea, which is thinking, forward, or if she writes the idea, it is performing, that is, generating externally. If we define just two basic activities of the writing process such as Generating and Reviewing, for example, there are already twelve possible combinations of adjacent pairs depending on their domain, i.e., whether the activities occur internally (thinking) or externally (performing), and direction, i.e., whether they involve moving backwards or forwards. Model 3 shows the two driving forces in a writing process, Generating and Reviewing, and all possible cross-domain and cross-direction interactions of forward and backward movements and internal and external activities operating in two ‘worlds’, internally and externally (Kim, 2014). Processes generate objects (internally or externally, ideas or text) and reprocess or review these objects, mentally or physically. The probability of occurrence of these twelve adjacent pairs may differ, in general, and vary between writers’ proficiency in L1 writing, L2 proficiency, topic knowledge, etc., and tasks.

Figure 3. Model 3. Dynamic interactions between generating & reviewing; back & forth movements (adapted with permission from Kim, 2014)

As for the studies selected, these basic strings of actions can only be found in Study 5 (see Table 1), where a distinction is made between P- and R-bursts, representing forward and backward movements respectively (see next section for a thorough description). Most of the other studies in the Table provide lists of

Chapter 2. Writing process studies. Struggling with complexities

cognitive activities without classifying them in terms of direction (forward, backward) or domain (thinking, performing).

Basic parameter 2. Goal: Understanding and being understood The interaction between Generating and Reviewing is driven by two innate, deepseated driving forces of the writing process. One is the drive to create meaning, the creative impulse to understand the issue at stake. The other is the rhetorical drive to be understood by one’s readers. The essence of a verbal act of writing is that it should ideally involve two types of output that lead to a change in topic understanding and disposition in both the writer and the reader. This is what the dual writing process model (Galbraith & Baaijen, 2018) describes so well. It accounts for two routes of content generation, which occur in combination in most cases. First, there is what Galbraith and Baaijen label as “synthetic content planning”, an interaction between the act of verbal production and the activation of neural networks, mostly at the point of inscription. Second, there is “rhetorical planning”, in which there is a goal-directed search in the writer’s memory to adapt the content to the reader. In some early Flower and Hayes’ papers, this creative, uncertain meaning-making path, combined with the need to communicate, was already outlined in their discussion of think-aloud protocols of teenage writers (see, for example, Flower & Hayes, 1980). These two drives produce three kinds of ‘products’ or ‘outcomes’ in writing (see Model 1, Figure 1): 1.

A mental product, which may involve various states of transformation and sustainability. It is a generally accepted distinction that inexperienced writers often have a reproductive orientation toward content (knowledge telling, memory dump) and that more experienced writers tend to transform it (knowledge-transforming; Scardamalia & Bereiter, 1987). Galbraith and Baaijen (2018) nuance this distinction, suggesting that associative content retrieval via synthetic planning may also prompt a mental transformation – understanding something new, or connecting knowledge elements in ‘a creative experience of insight’ which may range from instantaneous to durable. 2. A physical product, a text in all kinds of forms, i.e., on paper, digital, or multimodal. 3. A meta-product, i.e., the experience of having been engaged in a process of thinking and text production guided by strategic considerations. There will be a trace, at least, a residue so to speak of the writing experience; for example, about how it went, how (in)effective and (in)efficient it was, how enjoyable, etc. This product may be stored in long-term memory to be used by the

43

44

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

monitor in other writing processes experienced as similar, and lead to new or differentiated ‘building blocks’. Which of the three outcomes overrules the others will depend on the writer’s task representation, the goals they set themselves when representing the given external task and how they adjust these goals over the course of their writing processes. Even when a well-formulated task description is provided to students, such as an academic writing task, it is the writer’s individually constructed representation of that description that governs the process. This representation functions as what Van den Broek and Helder call a ‘standard of coherence’ in text comprehension processes, i.e., “the (often implicit) criteria that a reader has for what constitutes adequate comprehension and coherence in a particular reading situation” (Van den Broek & Helder, 2017, p. 364). This may also hold for writing. The writer decides whether the text as written aligns with her standard of comprehension of the issue at stake, – ‘now I really understand how volcanoes work’ – and/or with the rhetorical standard she has set herself – ‘now, through this text, they will understand what I mean”. The way the writer meets the standard set is a product of two interacting forces. One force is generating ideas and text – moving forward, which is basically an associative process made up of a serial string of actions (see above). These associative processes can vary along the remoteness dimension of connected elements, that is, from a memory dump of elements easily and quickly retrieved to the retrieval of elements that are remote. The other force is reprocessing – working on the objects already generated, both mental ideas as well as text. Reprocessing transforms generated elements and may vary according to the level of abstraction used, i.e., from almost no abstraction (pure association) to a high degree of abstracting (Van de Kamp et al., 2016, p. 545). Figure 4 illustrates two possible routes to reach a writing goal. In both routes, both generating and reprocessing are the powers at play, but they interact differently. One route leans heavily on generating in the beginning via free and then more remote associations through a relatively fast-forward production process mostly consisting of serially organized strings of actions. The other route leans more on reprocessing, combining, and abstracting. This might be a slower process initially, with less continuous text production and more nested strings of actions, which gradually accelerates as the process unfolds.

Chapter 2. Writing process studies. Struggling with complexities

Figure 4. The two forces that create the route to the process goal

Conclusion: The construct of writing Process-as-Intended in seven parameters In this section, we have discussed those parameters that determine the construct ‘Writing Process’ as parsimoniously as possible. We started with the basic features of a writing process, activities, and time, and then moved to the aims. When we now summarize the parameters of the construct, moving from the aim to smaller units, five main functional parameters and two directional ones may be identified. 1. 2. 3. 4. 5. 6. 7.

A writing process’s primary aim is to understand and/or be understood (functional). Therefore, it is a forward-driven process, monitored by an internal representation of a standard of coherence (directional). Two forces, generating (forward) and reprocessing (backward), interact to create the resultant outcome (directional). Generating and reprocessing work on both internal (thoughts) and external (text) objects (functional). A parsimonious, limited number of basic activities constitute generating and reprocessing (functional). These activities form goal-directed strings, in serial, nested or a-linear patterns (functional). These strings form a writer’s individual repertoire of building blocks, which are subject to change via practice and instruction (functional).

When all these elements are brought together, the following definition of the “writing process” is suggested:

45

46

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

A writing process is characterized by the interaction between thinking and performing activities, both of which can be generative (forward) or involve reprocessing (backward), and which together form functional strings, organized over time in serial and/ or nested patterns, as part of a goal-directed synthetic meaning-making and problemsolving rhetorical process.

Writing processes: The construct as studied Once the parameters of the writing process-as-a-construct have been defined, we can evaluate the selected studies in terms of their construct validity. First, we will hone in on the unit of observation (cf. Activity) in each selected study. We will then discuss them in terms of the two basic parameters for writing process construct definition, that is, their direction and domain, and their function in terms of understanding and being understood.

Unit of observation Process analysis requires a valid definition of the units of observation (strings of activities) and the functional linking of these units. Merely summing the number of occurrences of a certain type of activity disregards the fact that these activities may have distinct functions during a writing process: Moving forward in the process implies that the context of actions changes. Most of the selected studies (see Table 1) focus on the smallest units of observation, as depicted in Figure 2 above. The unit may be defined as an activity (studies 3 and 4), a mental representation, that is, an object of attention (Study 2), a combination of activities and objects (Study 1), a fluency indicator (Study 7), or a specific type of behaviour, such as formulation behaviour (Study 5), revision behaviour (Study 6) and writing behaviour (Study 8). The selected studies, however, do not combine the units of observation into functional strings of actions (means-goal relations). Yet, two studies do place the activities in a certain context. For example, Study 7 uses (an indication of ) the context (or moment in the writing process) as a proxy indicator of function. More precisely, the writing process is split up into ten intervals of equal length. The study shows (p. 88) that fluency for this specific task, a memory dump task, decreases steadily from interval 1 to 10, in L1 as well as in L2. From other L1 writing studies that used time during the process as a proximate indicator of context, and therefore function, we know that process activities indeed behave differently in conjunction with other activities (Van den Bergh

Chapter 2. Writing process studies. Struggling with complexities

et al., 1999 for generation) and that the function of activities may change over the course of the process (Breetvelt et al., 1996). A productive way forward is presented in Study 5, which investigated an at the time new phenomenon in writing process research: the ‘burst’. A burst is a unit of formulation activity which occurs after a pause and is usually indicated by either the number of words produced in general or the duration or number of words produced per time variable (seconds, minutes). In Study 5, a further functional distinction was made as a function of whether such a unit was followed by a pause (P-burst) or a revision (R-burst). A P-burst is taken to indicate a minimal string of two formulation activities, two forward-driven actions, in the internal (proposed formulation) or external (written formulation) mode. An R-burst represents a minimal string of a tentative forward action – the formulation itself – and a reprocessing action that totally or partly repairs the formulation, regardless of whether it was only formulated mentally or also physically transcribed. The burst as a starting point to define ‘units’ could be a productive way forward if the unit is defined as (a) part of a larger unit of observation or (b) placed on a time axis, since two writing processes with similar numbers of P- and Rbursts may well differ in the way these activities are interspersed. On the other hand, the definition of both types of bursts, underlining that process actions depend on each other, has inevitable consequences for statistical analyses. If an activity is defined as an R-burst, the number of revisions is directly related to the number of R-bursts: this dependency implies that these categories cannot be tested statistically as independent observations, which decreases statistical power.

Direction and domain The selected studies do not explicitly label activities or group them as forward or backward moves in writing processes. Study 6, however, presents a noteworthy classification of revisions. Following Lindgren’s dissertation study (Lindgren, 2004, 2005), the authors identified at the ‘point of inscription’ revisions which may be regarded as a specific type of R-burst: after the production burst, the writer reviews the last element (mental or written word in most instances). These revisions proved to be quite frequent (17 times per 100 words in L1, but more frequent in L2), served the progression of the text, balanced “on the cusp of internal and external revisions” (p. 206), and were “frequently only partially visible in the text” (p. 206) They could thus be only validly coded by adding think-aloud data to the keylogging data. Based only on keylogging data, these actions would be seen merely as backward actions – a stop and then a backward movement – while, from a functional perspective, they were stepping-stones for moving forward in text production. The search for another word to continue is then a progressive act.

47

48

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

Not making a distinction between backward and forward moves in writing processes can have consequences for the conclusions drawn. For example, Study 4 focused on the use of L1 during L2 writing in five types of activities, which ranged from task examining to process-controlling (see Table 1). In three of these categories (text-generating, idea-generating, and idea-organizing; Study 4, page 232), forward and backward activities were combined into one code. However, in doing so, findings did not point to the function of the activity at hand. Did the finding that L1 was used in content generation, for example, reflect word-finding during forward generating? Or was it the outcome of a reflective stance when evaluating already generated content? As a result, in that study a string of purely forward activities such as Generating-Generating-Generating fell in the same category as a string such as Generating-Evaluating-Generating. The direction of the activity also has consequences for the interpretation of writing processes. Fluency of text production, for instance, may have different meanings, depending on the direction. Generating text (performative level: adding text) is a forward move which may differ considerably in terms of fluency – numbers of elements per time unit – when compared to reprocessing the text-produced-so-far (a backward move). Writers who score similarly on text production fluency (P-bursts) may differ in reviewing fluency (R-bursts). Without distinguishing between moves in both directions, the fluency score reported in a study may be clouded and, as a result, interpretation will be difficult (see Study 7). In turn, the domain of the activity (i.e., internal, external) may also affect fluency. Pre-text fluency, that is, verbalizing thoughts before transcribing them into written text, as a forward movement, might be faster than writing them down directly as text because of the interference of the physical actions involved. In think-aloud studies, pretext and text fluency can be separated, as shown in Study 5. In that study, burst length was measured through a think-aloud protocol where two different ‘P-bursts’ could be distinguished, as they acted in different domains (see Chenoweth & Hayes, 2002, p. 89). The first type was the formulation burst, i.e., the mental operation for formulating an idea in a (fragment of a) sentence. After that, the author could revise this formulation mentally or could transcribe it. The second type was the direct transcription: the writer dictated the text to herself, and thus the mental formulation and external production of text ran in parallel. The authors, however, did not seem to distinguish between the two types in their analyses, as shown in an illustrative table with 12 lines of a think-aloud fragment from the study (see Figure 5). The fragment below shows a writer formulating some words, then repeating them, partly, and adding some new words mentally but in most cases transcribing

Chapter 2. Writing process studies. Struggling with complexities

Table 2. A segmented protocol Segment number

Burst type

Segment content

Number of new words

 1

R

Many people think, or

 3

 2

R

many people find uh

 1

 3

P

manv people see music and sports as opposite

 6

 4

P

opposite ends

 1

 5

P

as opposite ends of the spectrum uh

 3

 6

P

which has been qood

 4

 7

P

which has been good for my sister and I

 5

 8

which has been good for my sister and I uhm

 9

P

because we’ve been exposed

 4

 10

P

been exposed to both

 2

 11

R

to both worlds

 1

 12

P

to both disciplines yah

 1

NOTE: A P-burst is a segment that is terminated by a pause of at least two seconds. An R-burst is a segment that is terminated by revision. Underlined: new words proposed for inclusion; Bold: language included in the written text; Italics: repetition of text already uttered or written; repeating pretext or rereading text already written aloud

Figure 5. Table 2 from Chenoweth & Hayes, 2001, p. 89. Reproduced with permission from Sage

them immediately, that is, without internal pre-formulation. Of the 55 words in this fragment, we counted 28 words ‘reread’, 25 transcribed, and not more than 3 internally proposed (segment 1 think; segment 2: find; segment 11: worlds). All three words are replaced: they are located at the point of inscription. Nine of the twelve segments consist of performative transcription following rereading. The authors defined their primary data for this study as the number of words of new language proposed (p. 89). If so, we must conclude that words internally proposed (underlined, not bold) while thinking aloud and new words transcribed in text (but not formulated earlier; underlined, and bold) were combined into a single score. This seems questionable, given the fact that pre-formulation speed and writing speed are taken to be different constructs. In addition, monitoring

49

50

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

mental production or text production may indicate different formulation issues, as the former usually involves a proactive search for the best term to be used (see examples in Study 1, page 117; and the “Refining” category in Study 3, page 290). Studies that only collected keylogging data (e.g., studies 7 & 8) did not have to deal with the distinction between pretext and performed text: all data were performative. One of the advantages of keylogging technology is that one can efficiently collect data from many participants. Moreover, researchers can collect more tasks per individual, which creates the opportunity to estimate more stable patterns per individual, intra-individual variance, and individual profiles (see the section “Stable Profiles” below, and chapters 8, 11 and 12, this volume). Larger datasets provide the opportunity to investigate relations between distinguished activities since, as discussed earlier, they are not independent phenomena. Neither are keylogging scores. Number of pauses, pause duration, burst length, speed of production (number of strokes per minute), and all related variety in these scores stem from one and the same composition process and must be interrelated. In Study 8, however, in which the effect of task manipulations on writing processes is studied, the effect per separate keylogging score (e.g., for strokes per minute or per number of pauses) is tested as if the scores were not correlated or interdependent. The same holds for linguistic data, that is, features of the written texts. The challenge for data collected automatically, both processes and text features, is to create constructs of process features and text features. Basically, keystroke logging scores fit the parameters of a writing process quite well. They bring processes back to the basics: forward actions, backward actions, duration of actions, continuity, and discontinuity. However, all scores are performative as the mental world of writing is not considered. This world will be mostly inferred (see Figure 3) unless keylogging is combined with think-aloud protocols (Study 6) or stimulated recall (Study 8). One of the selected studies provides an example of what the future could be when large datasets of processes are collected. Study 7 aimed to create a construct of fluency, for L1 as well as for L2 writing, based on keylogging data. This is one of the first studies that investigated whether the construct of writing fluency for L1 and L2 is the same. It shows that writing behaviours correlate and form factors. It also shows that the context of the scores (beginning, middle, or end of the process) is a relevant factor to consider. Further studies should test whether the construct of fluency is indeed invariant to language (i.e., language-independent, L1, L2), and to tasks (i.e., task-independent), as the task executed in the study is quite specific: a short memory dump task (see Table 1).

Chapter 2. Writing process studies. Struggling with complexities

Understanding and being understood The goals a writer sets for both the level of understanding (meaning-making) and/or the communicative effect to be achieved depend on the context of the task and the writer’s internal standards. All selected eight studies focus on the quality of the writing process in terms of text production rather than knowledge production. The goal set for participants is ‘write a text’ while other products from the writing activity – new knowledge about the content of an academic discipline, for instance, or new knowledge about current issues in society- are not considered. In Study 1, however, one of the three tasks, based on a 48-page booklet, could have been considered a real academic task and interpreted as evidence of understanding. Yet the task’s instructions required participants to read the booklet and then write a short summary of it. As a result, the text was rated as text, not as evidence of understanding. The selected studies, therefore, shed no light on the issue of whether writing for understanding shows different forward-backward patterns than writing to be understood. Neither do they clarify participants’ expectations regarding the quality of the text(s) – their standards of what they had to write. Were the texts the participants wrote – in their own view – their best performance possible? This “neglect” of the “understanding” dimension in most L2 writing processoriented studies probably results from their drawing on writing models in which this dimension is absent (see Galbraith, 1999 and elsewhere). The first Hayes and Flower’s model (Hayes & Flower, 1980) had text, not knowledge, as an output variable, although in other early papers they focused on writing as an act in which meaning is discovered (for instance, Flower & Hayes, 1980). Reconsidering the aims of writing research in terms of understanding and being understood might move writing research from a mere instrumental act into one of meaning-making, which we see as the intrinsic value of writing: texts are a window to the world. To summarize, when studying the writing process, units of observation should be contextualized and studied in relation to other units of observation. If possible, the internal-external, and backward-forward dimensions of these units should not be collapsed. In this respect, P- and R-bursts seem to provide a promising way forward, provided they are part and parcel of a larger unit of observation on a time axis, and keystroke logging not only allows for these bursts to be automatically retrieved and easily coded but also allows for a distinction in terms of continuous or discontinuous processes. However, for all units of observations, also for those which can only capture performative actions such as keylogging, it is crucial to take into account that writing process units are interdependent. Finally, it must be noted that interdependent observation units may

51

52

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

differ as a function of their outcome, be it knowledge (in writing to learn) or text (in learning to write).

L2 writing studies: Continuity and change We are very aware that in this chapter we have discussed past research through a contemporary lens. Yet, our findings do not deviate much from the shortcomings and gaps pointed out by Roca de Larios and colleagues in a review of L2 writing process research published two decades ago. When we now reread what these authors pointed out in 2002 and compare their list with the issues we coded in our selected studies, it seems that progress has been made in terms of technicalities: generally, there is now a better grip on experimentation, and sample sizes provide better options for the valid inclusion of more components from Model 1 and for generalizations. What we still see as a challenge is the key component of Model 1: the writing processes construct. Among other comments, Roca de Larios et al. (2002) presented two conclusions that we would like to discuss in the light of progression: 1.

They found that the Formulation component required more attention, as did the temporal nature of composing, and, hence, the changing function of process activities during the composition process. 2. They concluded that little was known about writers’ ‘stable profiles’, potential variables affecting such profiles, and the extent to which writers showed the same processes under different circumstances. We thus decided to concentrate on these two observations by reorganizing the considerations discussed in earlier sections of the chapter. Finally, we will discuss the characteristics of an ideal future study.

Observation 1: Formulation is key We shared Roca de Larios et al.’s (2002) first observation in our analysis of the “Construct as intended” in earlier sections of the chapter, although we did not focus exclusively on Formulation. We concur with these authors in that L2 writing must share many aspects with writing in L1, as one and the same cognitive, affective, and psychomotor architecture must accomplish the same task in composing, with seemingly ‘only’ one difference, language. The language of composition plays an important role in text generation – forward production – and reading the already written text – reprocessing –, as well as reading external information (task documents, sources of topic information).

Chapter 2. Writing process studies. Struggling with complexities

The specific similarities and contrasts between the L1 and L2 constructs are not yet well-defined. The lack of shared basic parameters of the writing process construct leads to idiosyncrasies and fragmentation: Activities and behaviours are seen as separate entities rather than interdependent and forming part of larger strings. This often leads to considering observed elements as immutable in their functions despite their occurrence at different moments of the writing process and in different strings of activities. Therefore, it seems we are still in a largely explorative phase of this relatively young research domain. There is, however, some light on the horizon. As mentioned above, LópezSerrano et al. (2019), with the help of think-aloud protocols, have offered new insights into coding strings of formulation activities covering all options of forward and backward progression (see Figure 3). The system they propose, which may well fit in with our definition of the writing process (see above), has been built bottom-up, and its comprehensibility leads us to think that with more and further abstraction it might gain in power for generalization and facilitate the progression of writing studies. The study also shows that think-aloud protocols are a relevant method for studying writing processes, either separately or in combination with keylogging techniques (see Study 6). IT technology, such as speech recognition during think-aloud sessions, may facilitate collecting data from larger samples and more tasks per individual, while other options, such as screen capturing techniques, may also allow think-aloud studies to add timestamps to the protocols and provide indices for typical process features, such as duration, speed, and fluency (see Chapter 7, this volume). Another viable way forward is shown in Study 7, where the researchers have analyzed a construct instead of individual variables. As fluency – speed of production – is a feature of the writing process, gaining a stronger grip on that phenomenon would also be a step forward. As mentioned earlier, all the elements in the definition of the writing process construct we proposed above can be studied via keylogging techniques, even if restricted to the performative level. Basically, this technique facilitates studying multiple aspects, including forward-backward, continuous-discontinuous, duration, and speed.

Observation 2: ‘Stable profiles’ The issue raised by Roca de Larios et al. (2002) focused on the little knowledge we have about whether different writing tasks in L1 and/or L2, for example, are approached in a similar way by the same individual writer, and about the variables affecting such approaches. Although very little progress has been made in these issues over the years, it must be noted that the availability of behavioral writing process technology, as shown in studies 6, 7 and 8, provides the field with tools

53

54

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

not only to collect data relatively easily but also to collect them from larger samples, and to collect more writing samples per participant. More writing samples per individual are necessary for two reasons. First, when studying individuals’ profiles, that is, person-specific writing process configurations, one needs to gather multiple processes per participant in order to establish reliable indices of writing behavior. When a writer performs just one task, task and individual effects are confounded. For a specific task, a certain number of profiles can be distilled, but we cannot speak of individual profiles since participants may show another profile in another, even similar, task, due, for instance, to topic disposition. When more tasks per individual are collected, we may generalize the profiles found across this set of tasks, and then check whether individuals showed the same process profile in all tasks or showed some variation. In a study by Van Steendam et al. (2022), 16- to 18- year-old writers wrote four similar L1 tasks under keylogging conditions. The researchers found that participants varied in terms of the number of process profiles – combinations of source reading and writing activities in three intervals of the process. Some showed the same process profile in all four tasks, while others showed a maximum number of four profiles across four tasks. The researchers observed that writers’ task experiences such as topic knowledge and experienced effort explained the occurrence of certain process profiles within writers. That is why we need to study more than one task per participant, to detect abstracted patterns of strings of activities, regardless of participants and specific tasks. Second, multiple tasks per participant are necessary to filter out intra-writer variability. From text-related studies, it is known that text quality varies within writers from task to task. Therefore, one needs about three tasks or even more – when the tasks are short – to be able to report a reliable index of writing competence (Bouwer et al., 2015; Schoonen 2012). The same holds for writing processes, as topic knowledge, topic interest, and topic vocabulary may affect how they are engaged in by writers. If one would study the effect of a Task variable (Figure 1; Topic, Genre, Language) or the effect of Task conditions (Study 8) on writing processes, it should be assumed that about 50% of the variance would be due to intra-writer variability, as demonstrated by Van Weijen (2009) and Tillema (2012). Neglecting such a large degree of variability within writers causes a huge overestimation of the statistical power when we calculate the influence of task effects on writing processes. The need to collect more processes to study the construct may lead to a reliance on easier data-collection techniques such as keylogging. However, there are some potential risks in the use of this technique. From a technical perspective, one of these risks lurks in the data-cleaning process, which requires technical mastership. Keylogging software also requires researchers who have particularly

Chapter 2. Writing process studies. Struggling with complexities

good insight into the scores obtained, as the machinery may generate about 1,000 variables. As with all data, these scores are just data, they do not tell a story. It is the researcher who has to construct the story, guided by theory or hypotheses. From a theoretical point of view, keylogging data do not represent writing process constructs. Construct validity is still a key issue in this context, as basic parameters of writing processes are not given in the data, which are only a collection of performative actions on a time scale. These activities present what writers did: they do not form ‘strings’ or building blocks, and may not be identified as serial, nested or a-linear strings without the researcher’s intervention (see Figure 3). In essence, then, researchers must put significant effort into such data when they intend to use the strings of minimal actions output to mark clusters of segments, as Galbraith and Baaijen (2019) have demonstrated.

The ideal study It would be tempting to close this chapter with a wish list of elements to be included in an ideal future study, such as using meaningful tasks instead of clinical research tasks, or including output aims other than only text quality, and younger learners instead of HE-students (which was the case with all the studies subject to review in this chapter apart from Study 6). Yet we have resisted the temptation to do so, as the choices one makes in setting up a new study are always restricted to some extent by the researchers’ specific context and the limitations imposed by time and financial constraints. However, we hope that writing researchers will distill from this chapter those elements that they may take into account in their own context. In that way, we hope to have made a contribution to theory-building on the construct of (L2) writing process research and thus help move the field forward.

Acknowledgements We would like to thank the editors of this volume, Rosa M. Manchón and Julio Roca de Larios, for their continued encouragement in writing this chapter and their constructive and rich feedback on previous versions. We are also grateful for the insights Dr. Hyeyoun Kim shared from her dissertation study.

References Bouwer, R., Béguin, A., Sanders, T., & van den Bergh, H. (2015). Effect of genre on the generalizability of writing scores. Language Testing, 32(1), 83–100. .

55

56

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

Breetvelt, I., van den Bergh, H., & Rijlaarsdam, G. (1996). Rereading and generating and their relation to text quality, An application of multilevel analysis on writing process data. In G. Rijlaarsdam, H. van den Bergh, & M. Couzijn (Eds.) (1996). Theories, models & methodology in writing research (pp. 10–21). Amsterdam University Press. Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18(1), 80–98. Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Houghton Mifflin. Cumming, A. (1989). Writing expertise and second-language proficiency. Language Learning, 39(1), 81–135. De Beaugrande, R. (1984). Text production: Toward a science of composition. Ablex. Flower, L., & Hayes, J.R. (1980). The cognition of discovery: Defining a rhetorical problem. College Composition and Communication, 31(1), 21–32. Galbraith, D. (1999). Writing as a knowledge-constituting process. In M. Torrance & D. Galbraith (Eds.), Knowing what to write: Conceptual processes in text production (pp. 139–160). Amsterdam University Press. Galbraith, D., & Baaijen, V. M. (2018). The work of writing: Raiding the inarticulate. Educational Psychologist, 53(4), 238–257. Galbraith, D., & Baaijen, V. M. (2019). Aligning keystrokes with cognitive processes in writing. In E. Lindgren & K. Sullivan (Eds.), Observing writing. Insights from keystroke logging and handwriting (pp. 306–325). Brill. Hayes, J. R., & L. S. Flower (1980), Identifying the organization of writing processes. In L. W. Gregg & E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 3–30). Lawrence Erlbaum Associates. Kim, H. (2014). Dynamic interaction between generating and reviewing in the writing process [Paper presentation]. In Program & abstract book research school on writing & conference on writing research SIG Writing (p. 358). Amsterdam. Kormos, J. (2012). The role of individual differences in L2 writing. Journal of Second Language Writing, 21(4), 390–403. Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358–392. Lindgren, E. (2004). The uptake of peer-based intervention in the writing classroom. In G. Rijlaarsdam, H. van den Bergh, & M. Couzijn (Eds.), Effective learning and teaching of writing (pp. 259–274). Kluwer. Lindgren, E. (2005). Writing and revising: Didactic and methodological implications of keystroke logging (Doctoral dissertation). Umea University. López-Serrano, S., de Larios, J. R., & Manchón, R. M. (2019). Language reflection fostered by individual L2 writing tasks: Developing a theoretically motivated and empirically based coding system. Studies in Second Language Acquisition, 41(3), 503–527. Révész, A., Kourtali, N. E., & Mazgutova, D. (2017). Effects of task complexity on L2 writing behaviors and linguistic complexity. Language Learning, 67(1), 208–241. Rijlaarsdam, G., & van den Bergh, H. (1996). The dynamics of composing – An agenda for research into an interactive compensatory model of writing: Many questions, some answers. In C. M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences & applications (pp. 107–125). Lawrence Erlbaum Associates.

Chapter 2. Writing process studies. Struggling with complexities

Roca de Larios, J., Murphy, L., & Marín, J. (2002). A critical examination of L2 writing process research. In S. Ransdell & M. L. Barbier (Eds.), New directions for research in L2 writing (pp. 11–47). Kluwer. Sasaki, M. (2000). Toward an empirical model of EFL writing processes: An exploratory study. Journal of Second Language Writing, 9(3), 259–291. Scardamalia, M., & Bereiter, C. (1987). Knowledge telling and knowledge transforming in written composition. Advances in Applied Psycholinguistics, 2, 142–175. Schoonen, R. (2012). The validity and generalizability of writing scores: The effect of rater, task and language. In E. Van Steendam, M. Tillema, G. Rijlaarsdam, & H. van den Bergh (Eds.), Measuring writing: Recent insights into theory, methodology and practice (pp. 1–22). Brill. Stevenson, M., Schoonen, R., & de Glopper, K. (2006). Revising in two languages: A multidimensional comparison of online writing revisions in L1 and FL. Journal of Second Language Writing, 15(3), 201–233. Tillema, M. (2012). Writing in a first and second language. Empirical studies on text quality and writing processes (Doctoral dissertation). Utrecht University. Lot Publications. Uzawa, K. (1996). Second language learners’ processes of L1 writing, L2 writing, and translation from L1 into L2. Journal of Second Language Writing, 5(3), 271–294. Van de Kamp, M. T., Admiraal, W. & Rijlaarsdam, G. (2016). Becoming original: Effects of strategy instruction. Instructional Science, 44, 543–566. Van den Bergh, H., & Rijlaarsdam, G (1999). Generating in Documented Writing. In M. Torrance, & D. Galbraith (Eds.), Knowing what to write: Cognitive perspectives on conceptual processes in text production (pp 99–120). Amsterdam University Press. Van den Bergh, H., Rijlaarsdam, G., & Steendam, E. van. (2016). Writing process theory: A functional dynamic approach. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research (pp. 57–71). The Guilford Press Van den Broek, P., & Helder, A. (2017). Cognitive processes in discourse comprehension: Passive processes, reader-initiated processes, and evolving mental representations. Discourse Processes, 54(5–6), 360–372. Van Steendam, E., Vandermeulen, N., De Maeyer, S., Lesterhuis, M., van den Bergh, H., & Rijlaarsdam, G. (2022). How students perform synthesis tasks: An empirical study into dynamic process configurations. Journal of Educational Psychology. Van Waes, L., & Leijten, M. (2015). Fluency in writing: A multidimensional perspective on writing fluency applied to L1 and L2. Computers and Composition, 38, 79–95. Van Weijen, D. (2009). Writing processes, text quality, and task effects. Empirical stuies in first and second language writing (Doctoral dissertation). Utrecht University. Lot Publications. Wang, W., & Wen, Q. (2002). L1 use in the L2 composing process: An exploratory study of 16 Chinese EFL writers. Journal of Second Language Writing, 11(3), 225–246.

57

58

Gert Rijlaarsdam, Elke Van Steendam & Daphne van Weijen

Appendix A. Issues with regard to internal and statistical validity in a sample of eight frequently cited studies (see Table 1) Type of validity

Example of what needs to be avoided

1. Internal validity: Ruling out task, rater and experimenter effects 1.1

Operationalize the construct with multiple tasks (per genre) to rule out other explanations for covariation of task (genre) and process

1.2

Avoid nesting the topic in the independent variable (task complexity, genre etc.)

L1 writing in topic A; L2 writing on topic B

1.3

Avoid nesting raters in the independent variable

Panels of raters for L1 different from panel of raters for L2

1.4

When instructing participants in thinking aloud: a. employ double-blind trainers b. train with a non-writing task

2. Statistical validity 2.1

Run a power analysis (sufficient participants, observations/participant)

Absence of power analysis

2.2

Make sure to have independent observations or take dependency into account in power analysis

Assessing effects on three process components of Model 1 without increasing no of participants

2.3

Avoid type 1-errors when having multiple dependent variables

Testing effects on five process variables and three text quality variables without correcting for it

2.4

Build all-encompassing statistical models with as many relevant variables if small sample size, few variables if need for more variables, increase sample size

Conclusions on basis of separate statistical tests, each time for a different variable

2.5

Run and report an outlier analysis; report variation on top of group means

Absence of outlier analysis, no variation reported

2.6

When coding: a. report inter- and intra-coder reliability per code b. have a large enough set for doublecoded observation(s) to report a reliable index for intercoder reliability c. report estimated reliability index ànd 95% confidence intervals around index

No intra-writer reliability; Small set for double-coding with unclear selection of (interrelated) fragments from a few process protocols

Chapter 2. Writing process studies. Struggling with complexities

Appendix A. (continued) Type of validity d.

when coding fragments, either randomly select them or make sure to have fragments from all (stages of the) writing process protocols

Example of what needs to be avoided

59

chapter 3

Overview of methodological procedures in research on written corrective feedback processing Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo University of Murcia

This chapter offers a critical overview of the methods used in research on written corrective feedback processing. Broadly framed within interventionist and non-interventionist strands of research on the grounds of whether or not feedback and other task or participant-related variables are controlled by the researcher, we describe the research designs, participants, data collection tools, and analytical units used in studies on feedback processing. Our purpose in doing so is twofold. Firstly, we aim to take stock of the ways in which process research has evolved in line with changing theoretical and empirical developments in the field of L2 writing studies. Secondly, we intend to offer an appraisal of the methodological procedures used in existing research. Finally, we suggest future directions for a more inclusive research agenda that can respond to the challenges of new digital and curricular L2 writing scenarios and establish greater uniformity in its analytical approaches.

Introduction Written corrective feedback (WCF) is globally understood as “any explicit attempt to draw a learner’s attention to a morpho-syntactic or lexical error” (Polio, 2012, p. 375). A growing number of process-oriented studies have attempted to provide insights into how learners respond to different types of feedback to elucidate how they might benefit from it. Drawing on cognitive and sociocultural approaches to L2 acquisition, research into WCF processing has attempted to show how cognitive processes such as noticing, hypothesis testing, and metalinguistic awareness, or socially constructed knowledge during joint reflection on feedback, as well as additional factors such as learners’ attitudes, beliefs, and goals, can be critical in determining the impact of feedback on language learning outcomes. The goal of the present chapter is to provide an overview of the https://doi.org/10.1075/rmal.5.03coy © 2023 John Benjamins Publishing Company

Chapter 3. Methodological procedures in research on written corrective feedback processing

methodological practices employed in existing research on WCF processing from a critical lens. We begin by identifying the major strands within process research. We then review the designs adopted and populations studied before going on to highlight the strengths and shortcomings of the different instruments and constructs by which researchers have accessed and analysed process data. We point out some of the challenges and pending issues in research to date and conclude the chapter by making recommendations for the future.

Overview and assessment of methodological procedures Methodological approaches in WCF processing research Research on WCF processing can be broadly characterized as either interventionist or non-interventionist according to the specific goals pursued and the ways in which they have been addressed (see also Roca de Larios & Coyle, 2021). Following DeKeyser and Prieto Botana (2019), interventionist research encompasses controlled, laboratory-type experiments conducted with learners outside regular classrooms, as well as classroom-based interventions in which variables including the writing task, the type and amount of feedback provision, and/or participant organization are predetermined by the researchers. The aim of the interventionist research strand has been to examine how different types of WCF promote learners’ awareness of the L2 and/or their cognitive processing in order to explore their potential for contributing to L2 writing and development. Following Schmidt (1990, 2001), awareness in these studies is used to refer to the sensory noticing of surface features of the input (low level of noticing), or to noticing at the level of metalinguistic awareness and understanding (high level of noticing). Depth of processing, as defined by Leow (2015), refers to “the relative amount of cognitive effort, level of analysis and elaboration of intake, together with usage of prior knowledge, hypothesis-testing and rule formation employed in decoding and encoding some grammatical or lexical item in the input” (p. 204). Interventionist studies have adopted a range of methodological approaches, including experimental (e.g., Adams, 2003), quasi-experimental (e.g., Hanaoka & Izumi, 2012), and, predominantly, mixed-methods designs (Sachs & Polio, 2007), as well as analyses of individual cases (e.g., Storch & Wigglesworth, 2010). These studies have made use of a range of data elicitation procedures to investigate the cognitive, social, and individual factors involved in WCF processing and their impact on written output. In contrast, the aim of research within the non-interventionist strand is to describe learners’ responses to WCF in contexts (including classrooms) in which

61

62

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo

the researchers did not manipulate intervening variables (e.g., writing task, WCF type, timing of WCF, or distribution of participants) and where learners did not receive a specific treatment. In these studies, the emphasis has been on providing in-depth descriptions of how individuals engage with WCF. Engagement with feedback is a broad, multidimensional construct that encompasses not only cognitive (skills and strategies) and behavioural (revision operations) but also affective (emotional reactions) dimensions (Han & Hyland, 2015). Consequently, case study methodology has been frequently used to uncover learners’ perceptions, motivation, goals and attitudes towards feedback provision (Ferris et al., 2013), their cognitive, behavioural and affective engagement with feedback (Han & Hyland, 2019), or their engagement with computer-generated feedback (Zhang & Hyland, 2018).

Research designs Research designs in interventionist studies Most studies have investigated the short-term impact of WCF processing in time periods of between one and four weeks. Some of these studies have used a pretest/intervention/post-test design involving a multi-stage task (e.g., Qi & Lapkin, 2001), comprising an initial writing stage, a feedback provision and analysis stage, and a final stage involving the rewriting or revision of the original text. Other researchers have used correlational designs to examine the relationship between the depth of learners’ processing of feedback and its potential effects on the accuracy of their subsequent written output (Cerezo et al., 2019). Experimental studies have included control groups to account for the effects of noticing in comparison with a non-treatment group (Manchón, et al., 2020), thereby enhancing the validity of their findings. Alternatively, a repeated measures design with several writing cycles has also been employed (Fukuta et al., 2019). Interventionist studies have shown that increased linguistic awareness through WCF processing can lead to gains in accuracy in learners’ L2 writing. However, securing firm evidence of the language learning potential of WCF is still a major challenge in L2 writing process research. Some preliminary confirmation of enduring accuracy gains by adolescents in new writing tasks (Moradian et al., 2020) and of children’s increasingly target-like use of the L2 (Coyle et al., 2018) has been reported after repeated exposure to several cycles of WCF. Even so, these studies tracked learners for less than five months, which is too short to reflect fundamental changes in L2 development. Without renewed research designs that document learners’ processing of feedback with diverse writing tasks over longer time periods, it is unlikely that we will be able to provide answers to essential

Chapter 3. Methodological procedures in research on written corrective feedback processing

questions concerning the contribution of WCF to L2 learning. Manchón and Leow (2020) have suggested that research on WCF processing should be situated within the temporal dynamics and academic curricula of real L2 classrooms, in what they refer to as an Instructed Second Language Acquisition (ISLA)-applied perspective. This proposal, which involves gathering authentic, longitudinal classroom data with learners of various ages and backgrounds in different educational contexts, would enable researchers to engage in both exploratory and hypothesistesting research in order to shed light on crucial aspects of the relationship between WCF processing and L2 development. Currently, differences in participant-, task-, and feedback-related variables make comparisons across interventionist research difficult. Some small-scale laboratory studies have examined data from only a few participants (Qi & Lapkin, 2001; Swain & Lapkin, 2002). While providing an in-depth lens into learners’ WCF processing, these studies lack the generalizability of quantitative research. In contrast, quantitative studies that have collected data from larger samples of language learners (Park & Kim, 2019), or from intact classes (Cerezo et al., 2019), have provided more robust findings in support of the effects of feedback on learners’ writing. However, by reporting group statistics, this research fails to account for the individual variation which exists within every group of learners. Mixedmethods studies have used qualitative data to supplement statistical data (Storch & Wigglesworth, 2010). Writing tasks have also varied in terms of genre, length, the complexity of the writing prompts and the timing of task completion. Picture description tasks, which allow researchers to control for propositional content, have been widely used (Swain & Lapkin, 2002). L2 writers have also been invited to complete graphic commentaries (Storch & Wigglesworth, 2010), argumentative (Kang, 2020) personal opinion essays (Cerezo et al., 2019), and decision-making tasks (Manchón et al., 2020). Most of these tasks have been timed essays on topics proposed by the researchers, which, in principle, would make learners’ writing less authentic than assignments carried out in classrooms as part of a curricular programme. The same variability applies to WCF provision, which has differed with regard to its degree of explicitness and comprehensiveness. Techniques have included direct (Manchón et al., 2020), indirect (Park & Kim, 2019), and metalinguistic feedback (Shintani & Ellis, 2013), as well as combinations of different feedback types (Cerezo et al., 2019). Feedback provision has also ranged from the comprehensive correction of all errors to the selective targeting (Suh, 2020) or post-hoc analysis (Caras, 2019) of specific L2 forms. Discursive feedback techniques such as reformulations (Qi & Lapkin, 2001) and models (Hanaoka, 2007), which go beyond the correction of surface errors in learners’ texts to address

63

64

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo

issues of rhetorical inadequacy, writing style, etc., have also been employed. Conflicting findings make it impossible to draw any firm conclusions about the language learning potential of specific WCF techniques and, by extension, to make any definite recommendations for L2 teaching. Most early feedback studies were implemented in pen and paper rather than digital environments, a situation that has changed with the widespread use of computers for academic and personal writing. Research has also reflected the appearance of new digital genres (blogs, digital stories, tweets) and the increasing interest in multimodal composing (see Yi et al., 2020 and Chapter 13, this volume), which require different forms of feedback that can offer information on learners’ semiotic meaning-making resources in their entirety (Elola & Oskoz, 2022; Oskoz & Elola, 2020). Expanding WCF processing research to include these new modalities is essential to move the field forward.

Research designs in naturalistic studies In non-interventionist research, mixed-methods and case studies are conducted in second and foreign language classes without manipulation of the learning environment. This body of research involves the triangulation and analysis of multiple data sources, which are used to shed light on learners’ engagement with the feedback provided on multi-draft essays set by teachers as part of their regular teaching schedule over several weeks (Han & Hyland, 2015), or with feedback provided by automated writing evaluation (AWE) programmes subscribed by their institutions, including Grammarly (Ranalli, 2021) or Pigai (Zhang & Hyland, 2018). In this strand, greater attention has been paid to the effects of individual variables, including proficiency (Zhang & Hyland, 2018), learners’ beliefs (Han, 2017; Han & Hyland, 2019) or trust in the WCF (Ranalli, 2021), which are important in determining learners’ engagement. Findings, however, are still inconclusive and conditioned by the narrow focus of most studies on university students of similar proficiency levels and the same L1 background (but see Koltovskaia, 2020). Broadening the scope of inquiry to include a more diverse range of participants and contexts might shed fresh light on the construct of engagement and on the potential language learning benefits of digital feedback tools.

Populations studied The populations investigated in studies of WCF processing include learners with different nationalities (but see above), L2 proficiency levels, and educational experience. Most research has been carried out with adults learning English as an L2 in the USA (e.g., Kim & Bowles, 2019), Canada (e.g., Qi & Lapkin, 2001),

Chapter 3. Methodological procedures in research on written corrective feedback processing

Australia (e.g., Storch & Wigglesworth, 2010), and undergraduate students in Spain (e.g., Cerezo et al., 2019), Turkey (e.g., Buckingham & Aktug-Ekinci, 2017), China and Japan (e.g., Han & Hyland, 2015) or Saudi Arabia (e.g., Storch & Alsuraidah, 2020). Other languages studied include L2 Korean (e.g., Park & Kim, 2019) or Spanish (e.g., Bowles & Gastañaga, 2022; Caras, 2019; Leow et al, 2022). Adolescents are also represented in process research. Swain and her colleagues reported on French immersion students in Canada (Swain & Lapkin, 2002), while high school ESL learners in Quebec (Simard et al., 2015) and EFL students in Korea (Kang, 2020), Spain (González Cruz et al, 2022; García-Mayo & Loidi Labandibar 2017) and Iran (Moradian et al., 2020) have also been studied. Young learners under the age of thirteen are by far the most underrepresented population (but see Coyle & Roca de Larios, 2014). In fact, most of the participants in research on feedback processing are university students enrolled in undergraduate or postgraduate degrees, intensive language courses or academic writing programmes. Consequently, the proficiency of most learners in WCF processing research falls within a range of intermediate to advanced levels. Even when beginners (Caras, 2019; Leow et al., 2022) or low proficiency participants have been considered (Park & Kim, 2019), their cognitive development and metalinguistic knowledge are likely superior to that of younger populations. As noted by Andringa and Godfroid (2020), this type of sampling bias threatens the generalizability of results to other populations, including learners with no prior knowledge or interest in languages, large numbers of school-aged learners or those learning languages in the workplace, through private tuition or simply for pleasure. We suggest that research into WCF processing should replicate existing studies in different contexts and extend the range of participants to include a broader spectrum of society.

Data collection procedures Interventionist studies have used concurrent procedures such as think-alouds (TA) (Kim & Bowles, 2019), oral languaging (Storch & Wigglesworth, 2010), note-taking (Hanaoka, 2007), or eye-tracking (Shintani & Ellis, 2013) to tap into learners’ cognitive processes during the WCF processing stage. Non-concurrent measures, including stimulated recall (Adams, 2003) or written languaging (Manchón et al., 2020) have also been employed. In naturalistic contexts, qualitative data collection techniques, including retrospective verbal reports, classroom observation, student-teacher conferences, questionnaires and semi-structured interviews, have been used to elicit data on learners’ engagement with WCF (e.g., Zhang & Hyland, 2018).

65

66

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo

On examining the affordances and limitations of some of the main tools employed to collect data on WCF processing, we make a distinction between concurrent and non-concurrent data elicitation procedures (see Table 1). By doing so, we draw attention to their potential to provide researchers with information on learners’ cognitive processes during the act of WCF processing. To this end, and following Schmidt (2001), we identify the extent to which different procedures are likely to offer less (A-) or more (A+) evidence of awareness. Consequently, while all procedures (see “Levels of Awareness”) minimally address awareness at a low level (A-), not all procedures are able to provide rich data on the processes and strategies learners might engage in while interacting with the feedback. Such processes and strategies might potentially reveal higher levels of awareness and metalinguistic understanding (A+). We also highlight the extent to which each procedure can shed light on levels of DoP, from low (noticing with minimum or no cognitive effort) through to medium (some degree of cognitive effort without hypothesis testing) and high (high cognitive effort, hypothesis testing and rule formation) (Leow, 2015). We suggest that non-concurrent data collection tools essentially address the product of WCF processing and not the process itself, so that the insights they offer into levels of awareness or DoP should be considered with this proviso in mind. Concurrent and non-concurrent procedures are described in greater detail below.

Chapter 3. Methodological procedures in research on written corrective feedback processing

Non-concurrent procedures

Concurrent procedures Think aloud

Oral languaging

Measure

Transcribed protocols

Evidence of learners’ processing behaviours

Note-taking

Eye tracking

67

Digital screen Keystroke capture logging (writing)

Stimulated recall

Written languaging

Transcribed Learners’ Duration, protocols handwritten notes order and focus of eye

Facial expression (confusion, distraction)

Duration and frequency of keyboard and mouse movements

Transcribed protocols

WL tables

Observable Interpretable

Observable Observable Interpretable Interpretable

Observable Based on assumptions

Observable Based on assumptions

Observable Based on assumptions

Observable Interpretable

Observable Interpretable

Impact on the learner

Non-intrusive (nonmetacognitive TA) Learners express whatever thoughts come to mind

NonPotentially intrusive intrusive when Learners are learners are provided with own devices noticing tables to discuss which promote WCF reasoning

Non-intrusive Non-intrusive Non-intrusive

Potentially intrusive if learners Potentially are prompted to engage in intrusive if Learners are earlier activity or task prompted to provide reasons (and solutions) for noticed errors

Attention

Yes

Yes

Yes

Yes, including peripheral attention

Yes, including peripheral attention

N/A

Yes

Awareness

A+

A+

A−

A−

A−

N/A

Yes

68

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo Non-concurrent procedures

Concurrent procedures Think aloud

Oral languaging

Note-taking

Eye tracking

Digital screen Keystroke capture logging (writing)

Levels of awareness

Low, High

Low, High

Low

Low

Low

N/A

Depth of processing

Low, medium, high

Low, medium, high

Low

Low, medium

Low

N/A

– Unable to provide evidence of levels of awareness, DoP or cognitive strategies

– Unable to – Unable to provide provide evidence of evidence of levels of levels of awareness awareness DoP or DoP or cognitive cognitive strategies strategies

Limitations – Minimally subject to reactivity depending on levels language use and type of task

Note: N/A = non applicable

by pair dynamics and levels

sensitive to capture all linguistic features noticed – Unable to provide evidence of level of awareness, DoP or cognitive strategies

Stimulated recall

Written languaging

– May be subject to reactivity – Measures the outcome rather – May constitute an than depth of additional learning processing experience through double – Subject to input exposure reactivity as – May be subject to nonlearners are veridicality due to memory decay; time lapse between prompted to engage in the original task and the SR metacognitve or the desire to satisfy reasoning researcher’s expectations

Chapter 3. Methodological procedures in research on written corrective feedback processing

Concurrent data collection procedures Think-aloud Concurrent think-aloud data (see chapters 5 and 16, this volume) have provided researchers with insights into learners’ processing of different types of WCF (e.g., Caras, 2019; Kim & Bowles, 2019). By having participants verbalize their thoughts non-metacognitively and reflect aloud while analysing WCF, researchers have gathered valuable data on learners’ attentional processes, levels of awareness, depth of processing, and use of cognitive strategies. This procedure has been criticized, however, in relation to its potential reactivity as a data collection tool (see Chapter 6, this volume). Sachs and Polio (2007) and Buckingham and AktugEkinci (2017) reported problems with reactivity issues, whereas other studies that explicitly addressed the issue of reactivity in relation to WCF processing found the procedure to be non-reactive (e.g., Adrada-Rafael & Filgueras-Gómez, 2019; Suh, 2020). Importantly, these studies employed non-metacognitive TA whereby participants expressed their thoughts naturally as they emerged during task performance. This coincides with Ericsson and Simon’s (1993) Level 1 verbalization, which the authors identified as the least likely to cause reactivity. On the contrary, the more intrusive procedure employed by Buckingham and Aktug-Ekinci (2017) prompted their participants to engage in metacognitive thinking, which, as a Level 3 verbalization, would be potentially reactive. In using TA, individual and task-related variables including learners’ proficiency levels, task complexity and the procedure itself (metacognitive or non-metacognitive) are likely to influence the reliability and validity of the data (Leow et al., 2014).

Oral languaging Languaging is the process of “shaping knowledge and experience through language” (Swain, 2006, p. 98). From this perspective, learners’ reflections on and about language are theorized as creating opportunities for language learning when they engage in metalinguistic reasoning on the target of their attention. Studies of oral languaging have shed light on the extent of learners’ engagement with WCF, understood as the length and quality of their interactive discussions. Levels of engagement can be extensive (multiple turns in which the speakers discuss and explain aspects of the feedback); limited (reading or repeating the feedback, expressing agreement, remaining silent), or inexistent (Storch & Wigglesworth, 2010). In oral languaging studies, pair dynamics, learners’ predisposition to talk, and their attitudes towards the feedback (in terms of acceptance or rejection), influence what is noticed and whether they are likely to learn from it. As with non-metacognitive TAs, this procedure allows learners to freely discuss whatever they find striking or problematic in the feedback and so is less likely to

69

70

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo

suffer from reactivity than more guided forms of written languaging. It also has the potential to provide insights into the DoP and awareness underlying learners’ reactions to WCF, although studies to date have tended to code conversational patterns rather than cognitive behaviours (but see García Hernández et al., 2017).

Note-taking The nature of TA and oral languaging generally require researchers to meet with learners outside classroom sessions to record their deliberations without interruptions (but see Caras, 2019). Therefore, researchers interested in gathering data simultaneously with larger groups of learners inside the classroom have implemented more ecological and less time-consuming measures. Note-taking has been widely used in studies with WCF in the form of models. While useful for identifying the focus of learners’ attention, the demands involved in the writing and reporting of noticing have been found to influence the veridicality of the procedure, that is, whether or not the information provided by learners is an accurate reflection of their thought processes. Despite receiving similar instructions to make a note of whatever differences they noticed in the model, Hanaoka (2007), Coyle and Roca de Larios (2020), and Kang (2020) all found that learners incorporated more features in their revised texts than reported in their written notes, which were frequently incomplete. This suggests that note-taking may be insufficiently sensitive to fully capture learners’ awareness of feedback, even low-level noticing (i.e., the cognitive registration of features in the WCF) and is unlikely to reveal more than low-level DoP. Additionally, note-taking can be deceptive with young, low proficiency learners, which explains the use of audio recordings of children’s collaborative dialogue in order to triangulate data sources and enhance the reliability and validity of the findings.

Eye tracking, digital screen capture, and keystroke logging Alternative concurrent data collection procedures such as eye-tracking and digital screen capture technology (see Chapters 7, 8, and 9, this volume) have also been employed to capture learners’ cognitive processes during WCF processing (Shintani, 2015; Shintani & Ellis, 2013). Keystroke logging, to date, has been used to shed light on writing (rather than WCF) processes by measuring pausing and fluency. More recently, it has been employed non-concurrently as feedback to prompt learners’ processing of their writing processes (Vandermeulen et al., 2020). All three data collection procedures are unobtrusive since they do not alter the characteristics of the writing or feedback processing task itself and, in the case of eye-tracking and screen capture, are useful in revealing the target of learners’ attention and low-level awareness. Digital screen capture (see Chapter 7, this

Chapter 3. Methodological procedures in research on written corrective feedback processing

volume) provides information on the focus and timing of learners’ attention to WCF, as well as instances of confusion, which suggests they might have a problem with the WCF, or distraction, which could indicate the absence of processing. Eye tracking can offer more subtle evidence of the location, duration and linearity of learners’ conscious attention to features in the WCF by measuring changes in pupil dilation, a technique which has been employed in cognitive psychology as an index of cognitive effort and processing load (Ryan et al., 2017). As such, eye-tracking may indicate low or possibly medium DoP when there is sustained eye fixation. However, the information eye tracking data can offer on the quality of learners’ feedback processing is limited. Used in isolation, it is unable to provide insights into the content of learners’ thoughts, their cognitive processing strategies or goals (see Galbraith & Vedder, 2019 for a fuller discussion). Without additional information obtained through retrospective verbal reports, which could increase the risk of reactivity due to double input exposure, measures of eye movements and of facial expression only allow researchers to make assumptions as to the nature of the cognitive processes underlying WCF processing based on these observable indicators.

Nonconcurrent data collection procedures Stimulated recall Stimulated recall is a retrospective data elicitation technique in which participants are prompted to explain their thought processes aloud in reaction to a video or audio recording of themselves processing WCF at an earlier stage (see Chapter 5, this volume). Learners are often encouraged to justify their responses to the feedback when the researcher stops the video at key moments usually selected in advance. Having participants explain their thought processes about a previous task performance can provide post-hoc insights into their levels of awareness and DoP. However, in doing so, learners are effectively required to process the same input for a second time, a procedure which is likely to affect potential learning outcomes of WCF by increasing their attention to language. Stimulated recall may also suffer from problems related to the veridicality of the data. It has been suggested that learners’ performance might be subject to memory loss when the time lapse between the original task and the recall task is greater than 48 hours (Gass & Mackey, 2016). Further concerns relate to the truth value of the data, since there is also a risk that participants might verbalize what they are thinking during the recall activity itself rather than reflecting on what they had been thinking at the time of the earlier feedback task (Polio, 2017).

71

72

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo

Written languaging Written languaging requires learners to provide a retrospective written explanation or justification of the corrections provided on their writing after receiving WCF (see Chapters 6 and 16, this volume). The data elicited through this procedure represent the visible outcome of learners’ internal processing of WCF as opposed to their thought processes as they unfold in real time (Manchón et al., 2020). Written languaging research has placed differing constraints on learners by employing procedures which vary in their degree of explicitness. These range from open-ended questions (Suzuki, 2017) to highly structured languaging tables which require learners to copy out their errors and the teacher’s corrections, identify the error category and provide a corresponding metalinguistic explanation (Manchón et al., 2020). The explicit nature of some written languaging protocols are instrumental in encouraging learners to engage more deeply in language processing. While this is intentional, researchers need to be aware that by promoting metacognitive reflection, written languaging is more intrusive on learners’ thought processes and thus more prone to reactivity than non-metacognitive procedures. Written languaging also fails to provide detailed information on learners’ use of cognitive strategies during feedback processing since students simply copy out errors and attempt to explain them. This limitation might be allayed by simultaneously examining learners’ verbalizations while completing languaging tables. Limitations may also arise due to the time lapse between the languaging activity and the subsequent writing task. In some studies participants revised/rewrote their texts immediately after engaging with WCF (Manchón et al., 2020), while in others there was an interval of several days between each session (Cerezo et al., 2019). The immediacy of the revision task is likely to make differing demands on memory and facilitate (or not) the extent to which learners might recall the previously languaged forms. Thus, recall is likely to be enhanced when there is a shorter time interval between languaging and revision/rewriting of texts.

Questionnaires and interview data In naturalistic studies, multiple data sources including observation, questionnaires, conferencing, and interviews (see Chapters 4, this volume) have been used to explore why and how learners engage with feedback. While informative and ecologically valid, the quality of the information obtained from these data collection instruments depends on the reliability of participants’ self-reports. There is a chance that veridicality could be compromised if learners were to suffer from memory decay when recollecting their engagement with WCF, or if they developed new perspectives as a result of questionnaire prompting and/or the discussion generated with the researcher. In studies using classroom observation (Han,

Chapter 3. Methodological procedures in research on written corrective feedback processing

2019), the information gathered about learners’ engagement with feedback largely depends on the researchers’ personal interpretation of events.

Data analysis procedures Operationalization and coding of WCF processing WCF processing has been operationalized in different ways as a function of the theoretical assumptions underpinning the research. Drawing on SLA theories of output and noticing, learners’ processing of WCF has been coded either as Language-Related Episodes, defined as “any part of the dialogue where learners talk about the language they produced, and reflect on their language use” (Swain & Lapkin, 2002, p. 292), or the Problematic Features Noticed by learners while writing and the Features Noticed during feedback analysis (Hanaoka, 2007). Other studies have employed Leow’s (2015) conception of DoP to examine learners’ levels of awareness, use of previous knowledge, cognitive effort, hypothesis testing and rule formation using more finely-grained multiple-level coding schemes (Suh, 2020). Overcoming the limitations in the constructs and analytical tools employed in coding learners’ processing of WCF constitutes another methodological challenge for a future research agenda. Language-related episodes and Features noticed, for example, have been used (i) to quantify the linguistic focus and outcome of learners’ attention while processing feedback (Hanaoka, 2007); (ii) to conceptualize substantive or perfunctory noticing (Qi & Lapkin, 2001), or (iii) to describe engagement with WCF as the length and degree of elaboration of interactional patterns (Storch & Wigglesworth, 2010). However, as units of analysis, they have not fully captured the cognitive processes that underlie learners’ attempts to analyze mismatches between their own writing and noticed feedback (García Hernández et al., 2017). At the same time, the operationalization of WCF processing in available coding schemes reflects increasingly complex yet diverse conceptualizations of the phenomenon. While some researchers have analysed learners’ awareness as a dichotomy involving levels of noticing and understanding (e.g., Suzuki, 2017), others have analysed the cognitive effort involved in processing WCF (Caras, 2019) or a combination of DoP, cognitive effort and awareness (Park & Kim, 2019). As a result, different but complementary phenomena (e.g., linguistic reasoning, time spent on the task or actions taken like reading) have been coded under the same encompassing umbrella of DoP. Given this variation, agreement as to the precise nature of DoP would facilitate more rigorous methodological decision-

73

74

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo

making, including the selection or combination of appropriate data collection instruments, and the development of more fitting analytical schemes which refine Leow’s (2015) original DoP criteria. Future studies might draw on recent developments in L2 cognitive writing research. For instance, López-Serrano et al. (2019) provide an intricate description of L2 writers’ cognitive processes while composing. Their methodological proposal includes the reconceptualization of Language-related episodes as Language-related Problem Spaces that comprehend the clusters of strategies learners activate when attempting to solve language problems while writing (e.g., lexical, spelling and morphological searches, generation of cross linguistic alternatives, appeals to episodic memory). The combined analysis of learners’ strategy clusters, together with the degree of metalinguistic analysis involved, and the upgrading or compensatory orientation of the problem resolution, could prove to be a useful framework for researchers concerned with obtaining a more in-depth understanding of the nature and DoP learners engage in while analyzing WCF (see also Roca de Larios et al., 2021). Two recent studies (Coyle & Roca de Larios, 2020; García Hernández et al., 2017) have highlighted the theoretical and empirical importance of exploring the connection between the strategic nature of feedback processing and L2 learning. Methodologically, these studies suggest that adopting a microanalytic lens to focus on the cognitive processes taking place during writing and WCF analysis might enable researchers to provide insightful explanations of why some learners appear to benefit more than others from WCF. Empirically-based knowledge derived from this line of enquiry could then be used in classrooms to promote more strategic and self-regulated L2 learning. This move towards the refinement and complexification of the nature of WCF processing is also apparent in naturalistic research where, as noted above, the construct of engagement with WCF has been used in a broader sense that goes beyond purely cognitive or dialogic interpretations of WCF processing (Zhang & Hyland, 2018). Learners’ personal engagement with feedback is also inherent to the concept of feedback literacy defined as the “understanding of what feedback is and how it can be managed” (Carless & Boud, 2018, p. 2). It has been suggested that learners who are trained to value and appreciate feedback and to understand their active role in a reciprocal partnership with their teachers, become empowered to engage more meaningfully with the feedback they receive (Nash & Winstone, 2017). Chong (2020) situates feedback literacy at the interface between engagement (cognitive, behavioural, affective), contextual (e.g., textual or interactional) and individual (e.g., past experiences) factors. Introducing this expanded conceptualization of feedback literacy into L2 WCF processing research by exploring the combined interaction of these components in learners’ responses to feedback pro-

Chapter 3. Methodological procedures in research on written corrective feedback processing

vision could advance our knowledge of how they impact on L2 learning, and how effectively designed learning environments might enhance learning opportunities (Carless & Winstone, 2020).

Conclusions and future directions The aim of this chapter has been to provide a comprehensive and critical overview of the methodological procedures employed in research on WCF processing. We now offer some suggestions of the ways in which future research might continue to develop our understanding of the potential of WCF processing in facilitating L2 learning. Firstly, research agendas should be expanded to include a wider range of populations beyond the standard learner profiles investigated to date. WCF processing research has focused for too long on university students with language or linguistics backgrounds and has neglected other learners at different educational stages, or those with specific interests, abilities and learning goals. Researchers need to venture beyond the comfort zone of their own degree programmes, language courses, and writing centres and actively seek out opportunities to investigate participants of different ages from diverse educational, professional, or non-academic contexts in order to help make WCF processing research more comprehensive and inclusive. At the same time, the field is currently saturated with one-shot, cross-sectional studies. More longitudinal research designs that investigate the processing of different types and combinations of feedback, specifically attuned to learners’ changing needs at different points in time, might offer deeper insights into L2 learning as a dynamic occurrence (Storch, 2018). This means focusing not only on learners’ processing of single WCF episodes, but across successive drafts, revisions and new texts. Added to this, there is a need for more classroom-based research which could shed light on how WCF processing contributes to L2 learning as part of a regular curriculum embedded in the context in which instruction takes place (Manchón & Leow, 2020). This might require redesigned data collection instruments and analytical procedures undertaken from an ISLA applied (Leow, 2019) angle, whereby the research emphasis would be less on accuracy and more on the processing of actual errors and on tracking how WCF contributes to learners’ gradual and on-going L2 development throughout the course of instruction. This poses a complex challenge for researchers, since writing is only one of four components in a language curriculum and any attempt to relate WCF processing with subsequent writing performance will inevitably be conflated with events in the classroom setting. The triangulation of WCF processing data and learners’

75

76

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo

written texts with corpus data obtained from classroom interactions and observations might facilitate such an inquiry, especially in FL settings where input is more likely to be restricted to classroom talk and materials (Polio, 2017). Increased attention to L2 writing as it occurs in authentic classrooms calls, in turn, for more research into the role of individual differences in mediating WCF processing (Leow & Manchón, 2021). In addition to research on cognitive variables (e.g., working memory), greater attention should be paid to the ways in which affective and attitudinal factors influence how learners engage with WCF. Building on research from first language contexts (e.g., Carless & Boud, 2018), future ISLA-oriented studies might adopt a more propaedeutic view of learners’ engagement with WCF, whereby the development of feedback literacy becomes a key area of inquiry. If we agree that L2 writers should be trained to develop the “cognitive capacity, socio-affective capacity, and socio-affective disposition that prepares them for engaging with WCF” (Han & Xu, 2021, p. 3), then both the process of feedback literacy training and its potential impact on L2 writing and development should be a central concern in future research. Another crucial challenge facing the field is the need to address learners’ engagement with digital and multimodal feedback. Existing research has focused mainly on the effects on performance of digital feedback provision (e.g., Elola & Oskoz, 2016) and comparative studies of pen and paper versus digital writing are rare. Additionally, despite the recent interest in digital L2 literacies and multimodal composition, both from a creative design (e.g., Smith et al., 2017, and Chapter 13, this volume) and a language-learning perspective (Kim & Belcher, 2020), the role of feedback processing in contributing to both these dimensions is still an empirical question. Recent assessment proposals for digital multimodal composing (Hafner & Ho, 2020) emphasize the complex processes involved in planning, designing, sharing and reflecting on learners’ multimodal ensembles. Feedback processing at any of these stages is likely to enhance students’ awareness of how to orchestrate linguistic and non-linguistic resources effectively to achieve their rhetorical goals. Research which could elucidate this underexplored domain is imperative to advance an underexplored yet growing area of interest. Methodologically, eliciting data on learners’ engagement with feedback presents important challenges for researchers who need to be fully aware of the affordances and limitations associated with different data collection instruments. As we have suggested above, concurrent and non-concurrent measures provide different insights into learners’ awareness and DoP as a function of their temporality, offering either on-line or post-hoc reflections on language. Similarly, techniques based on observable behaviour (e.g., pauses, eye gazes) can only ever allow researchers to make assumptions about learners’ internal processing of WCF, in comparison to other techniques which involve rich cognitive introspec-

Chapter 3. Methodological procedures in research on written corrective feedback processing

tion (e.g., think-aloud). Clearly, data triangulation is vital in improving the quality of the information obtained as well as the reliability and validity of research findings. WCF processing research might also benefit from the insights that could be offered by controlled studies that actively compare the affordances of different combinations of data collection tools to successfully uncover learners’ processing behaviour (see Chapter 16, this volume). Future research agendas should attempt to enhance the comparability of findings across studies. This means achieving greater uniformity and refinement of the constructs used and the coding schemes employed. Existing analytical units, including language-related episodes and DoP, could be more finely-tuned by drawing on research on learners’ strategic problem-solving (López-Serrano et al., 2019; Roca de Larios et al., 2021) to provide more nuanced descriptions of cognitive processing. Likewise, a more subtle approach to coding awareness and its effects on written outcomes requires the incorporation of additional analytical categories that might adequately account for partial or unreported noticing and consider even small signs of progress in learners’ language use (Coyle et al., 2018). This is important in drawing attention to the dynamic and on-going nature of the language learning that derives from WCF processing. Finally, to gain deeper insights into their potential effects on feedback processing and on L2 development, learner, contextual and task-related variables should be investigated using research designs that involve both descriptive and hypothesis-testing goals in controlled and classroom-based studies. Likewise, naturalistic research needs to examine learners’ engagement from the wider angle of their feedback literacy in a greater variety of settings with a diverse range of L2 learners.

Acknowledgements The research synthesis reported on in this chapter is part of a wider research programme financed by the Spanish Ministry of Science and Innovation (Research grant PID2019-104353GB-100) and the Séneca Foundation (Research Grant 20832/PI/18).

References Adams, R. (2003). L2 output, reformulation and noticing: Implications for IL development. Language Teaching Research, 7(3), 347–376. Adrada-Rafael, S., & Filgueras-Gómez, M. (2019). Reactivity, language of think-aloud protocol, and depth of processing in the processing of reformulated feedback. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 199–211). Routledge.

77

78

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo

Andringa, S., & Godfroid, A. (2020). Sampling bias and the problem of generalizability in Applied Linguistics. Annual Review of Applied Linguistics, 40, 134–142. Bowles, M., & Gastañaga, K (2022). Heritage, second and third language learner processing of written corrective feedback: Evidence from think-alouds. Studies in Second Language Learning and Teaching, 12(4), 677–698. Buckingham, L., & Aktug-Ekinci, D. (2017). Interpreting coded feedback on writing: Turkish EFL students’ approaches to revision. Journal of English for Academic Purposes, 26, 1–16. Caras, A. (2019). Written corrective feedback in compositions and the role of depth of processing. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 186–198). Routledge. Carless, D., & Boud, D. (2018). The development of student feedback literacy: Enabling uptake of feedback. Assessment & Evaluation in Higher Education 43(8), 1315–1325. Carless, D., & Winstone, N. (2020). Teacher feedback literacy and its interplay with student feedback literacy. Teaching in Higher Education, 28(1), 150–163. Cerezo, L., Manchón, R. M., & Nicolás-Conesa, F. (2019). What do learners notice while processing written corrective feedback? A look at depth of processing via written languaging. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 173–187). Routledge. Chong, S. W. (2020). Reconsidering student feedback literacy from an ecological perspective. Assessment & Evaluation in Higher Education, 46(1), 1–14. Coyle, Y., & Roca de Larios, J. (2014). Exploring the role played by error correction and models on children’s reported noticing and output production in a L2 writing task. Studies in Second Language Acquisition, 36(3), 451–485. Coyle, Y., & Roca de Larios, J. (2020). Exploring young learners’ engagement with models as a written corrective feedback technique in EFL and CLIL settings. System, 95, 1–14. Coyle, Y., Cánovas Guirao, J., & Roca de Larios, J. (2018). Identifying the trajectories of young EFL learners across multi-stage writing and feedback processing tasks with model texts. Journal of Second Language Writing, 42, 25–43. DeKeyser, R., & Prieto Botana, G. (Eds.). (2019). Doing SLA research with implications for the classroom: Reconciling methodological demands and pedagogical applicability. John Benjamins. Elola, I., & Oskoz, A. (2016). Supporting second language writing using multimodal feedback. Foreign Language Annals, 49(1), 58–74. Elola, I., & Oskoz, A. (2022). Reexamining feedback on L2 digital writing. Studies in Second Language Learning and Teaching, 12(4), 575–595. Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. The MIT Press. Ferris, D. R., Liu, H., Sinha, A., & Senna, M. (2013). Written corrective feedback for individual L2 writers. Journal of Second Language Writing, 22(3), 307–329. Fukuta, J., Tamura, Y., & Kawaguchi, Y. (2019). Written languaging with indirect feedback in writing revision: is feedback always effective? Language Awareness, 28(1), 1–14. Galbraith, D., & Vedder, I. (2019). Methodological advances in investigating L2 writing processes: Challenges and perspectives. Studies in Second Language Acquisition, 41(3), 633–645.

Chapter 3. Methodological procedures in research on written corrective feedback processing

García Hernández, F. J., Roca de Larios J., & Coyle, Y. (2017). Exploring the effect of reformulation on the problem-solving strategies of young EFL writers. In M. P. García Mayo (Ed.), Learning foreign languages in primary school: Research insights (pp. 193–222). Multilingual Matters. García Mayo, M. P., & Loidi Labandibar, U. (2017). The use of models as written corrective feedback in EFL writing. Annual Review of Applied Linguistics, 37, 110–127. Gass, S. M., & Mackey, A. (2016). Stimulated recall methodology in applied linguistics and L2 research. Routledge. Hafner, C. A., & Ho, W. Y. J. (2020). Assessing digital multimodal composing in second language writing: Towards a process-based model. Journal of Second Language Writing, 47. Han, Y. (2017). Mediating and being mediated: learner beliefs and learner engagement with written corrective feedback. System, 69, 133–142. Han, Y. (2019). Written corrective feedback from an ecological perspective: The interaction between the context and individual learners. System, 80, 288–303. Han, Y., & Hyland, F. (2015). Exploring learner engagement with written corrective feedback in a Chinese tertiary EFL classroom. Journal of Second Language Writing, 30, 31–44. Han, Y., & Hyland, F. (2019). Learner engagement with written feedback: A sociocognitive perspective. In K. Hyland & F. Hyland (Eds.), Feedback in second language writing: Contexts and issues (2nd ed., pp. 247–264). Cambridge University Press. Han, Y., & Xu, Y. (2021). Student feedback literacy and engagement with feedback. A case study of Chinese undergraduate students. Teaching in Higher Education, 26(2), 181–196. Hanaoka, O. (2007). Output, noticing, and learning: an investigation into the role of spontaneous attention to form in a four-stage writing task. Language Teaching Research, 11(4), 459–479. Hanaoka, O., & Izumi, S. (2012). Noticing and uptake: Addressing pre-articulated covert problems in L2 writing. Journal of Second Language Writing, 21(4), 332–347. Kang, E. Y. (2020). Using model texts as a form of feedback in L2 writing. System, 89, 1–10. Kim, Y., & Belcher, D. (2020). Multimodal composing and traditional essays: Linguistic performance and learner perceptions. RELC Journal, 51(1), 86–100. Kim, H. R., & Bowles, M. (2019). How deeply do second language learners process written corrective feedback? Insights gained from think-alouds. TESOL Quarterly, 53(4), 913–938. Koltovskaia, S. (2020). Student engagement with automated written corrective feedback (AWCF) provided by Grammarly: A multiple case study. Assessing Writing, 44, 100450. Leow, R. P. (2015). Explicit learning in the L2 classroom. A student-centered approach. Routledge. Leow, R. P. (2019). From SLA > ISLA > ILL: A curricular/pedagogical perspective. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 483–491). Routledge. Leow, R. P., & Manchón, R. M. (2021). Expanding research agendas: Directions for future research agendas on writing, WCF, language learning and ISLA. In R. M. Manchón & C. Polio (Eds). The Routledge handbook of second language acquisition and writing (pp. 299–311). Routledge.

79

80

Yvette Coyle, Florentina Nicolás-Conesa & Lourdes Cerezo

Leow, R. P., Grey, S., Marijuan, S., & Moorman, C. (2014). Concurrent data elicitation procedures, processes, and the early stages of L2 learning: A critical overview. Second Language Research, 30(2), 111–127. Leow, R. P., Thinglum, A., & Leow, S. A. (2022). WCF processing in the L2 curriculum: A look at type of WCF, type of linguistic item, and L2 performance. Studies in Second Language Learning and Teaching, 14(2), 653–675. López-Serrano, S., Roca de Larios, J., & Manchón, R. M. (2019). Language reflection fostered by individual L2 writing tasks: Developing a theoretically motivated and empirically based coding system. Studies in Second Language Acquisition, 41(3), 503–527. Manchón, R. M., & Leow, R. P. (2020). An ISLA perspective on L2 learning through writing. In R. M. Manchón (Ed.), Writing and language learning: Advancing research agendas (pp. 335–356). John Benjamins. Manchón, R. M., Nicolás-Conesa, F., Cerezo, L., & Criado, R. (2020). L2 writers’ processing of written corrective feedback. Depth of processing via written languaging. In W. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching: A collection of empirical studies (pp. 241–265). John Benjamins. Moradian, M. R., Hossein-Nasab, M., & Miri, M. (2020). Effects of written languaging in response to direct and indirect corrective feedback on developing writing accuracy. In W. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching. A collection of empirical studies (pp. 267–286). John Benjamins. Nash, R. A., & Winstone, N. E. (2017). Responsibility-sharing in the giving and receiving of assessment feedback. Frontiers in Psychology, 8, 1519. Oskoz, A., & Elola, I. (2020). Digital L2 writing literacies. Directions for classroom practice. Equinox. Park, E. S., & Kim, O. Y. (2019). Learners’ engagement with indirect written corrective feedback. Depth of processing and self-correction. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 212–226). Routledge. Polio, C. (2012). The relevance of second language acquisition theory to the written error correction debate. Journal of Second Language Writing, 21(4), 375–389. Polio, C. (2017). Second language writing development: A research agenda. Language Teaching, 50(2), 261–275. Qi, D. S., & Lapkin, S. (2001). Exploring the role of noticing in a three-stage second language writing task. Journal of Second Language Writing, 10, 277–303. Ranalli, J. (2021). L2 student engagement with automated feedback on writing: Potential for learning and issues of trust. Journal of Second Language Writing, 52, 100816. Roca de Larios, J., & Coyle. Y. (2021). Learners’ engagement with written corrective feedback in individual and collaborative L2 writing conditions. In R. M. Manchón & C. Polio (Eds), The Routledge handbook of second language acquisition and writing (pp. 81–93). Routledge. Roca de Larios, J., García Hernández, F. J., & Coyle. Y. (2021). A theoretically-grounded classification of EFL children’s formulation strategies in collaborative writing. Language Teaching for Young Learners, 3(2), 300–336. Ryan, K., Hamrick, P., Miller, R. T., & Was, C. A. (2017). Salience, cognitive effort, and word learning: Insights from pupillometry. In S. Gass, P. Spinner, & J. Behney (Eds.), Salience in second language acquisition (pp. 187–200). Routledge.

Chapter 3. Methodological procedures in research on written corrective feedback processing

Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on a L2 writing revision task. Studies in Second Language Acquisition, 29(1), 67–100. Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129–158. Schmidt, R. W. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp. 3–32). Cambridge University Press. Shintani, N. (2015). The effects of computer-mediated synchronous and asynchronous direct corrective feedback on writing: A case study. Computer Assisted Language Learning, 29(3), 1–22. Shintani, N., & Ellis, R. (2013). The comparative effect of direct written corrective feedback and metalinguistic explanation on learners’ explicit and implicit knowledge of the English indefinite article. Journal of Second Language Writing, 22(3), 286–306. Simard, D., Guénette, D., & Bergeron, A. (2015). L2 learners’ interpretation and understanding of written corrective feedback: Insights from their metalinguistic reflections. Language Awareness, 24(3), 233–254. Smith, B., Pacheco, M., & de Almeida, C. R. (2017). Multimodal codemeshing: Bilingual adolescents’ processes composing across modes and languages. Journal of Second Language Writing, 36, 6–22. Storch, N. (2018). Written corrective feedback from sociocultural theoretical perspectives: A research agenda. Language Teaching, 51(2), 262–277. Storch, N., & Alsuraidah, A. (2020). Language when providing and processing peer feedback. In W. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching. A collection of empirical studies (pp. 111–128). John Benjamins. Storch, N., & Wigglesworth, G. (2010). Learners’ processing, uptake, and retention of corrective feedback on writing: Case studies. Studies in Second Language Acquisition, 32, 303–334. Suh, B. R. (2020). Are think-alouds reactive? Evidence from an L2 written corrective feedback study. Language Teaching Research, 1–21. Suzuki, W. (2017). The effect of quality of written languaging on second language learning. Writing & Pedagogy, 8(3), 1–34. Swain, M. (2006). Languaging, agency and collaboration in advanced second language proficiency. In H. Byrnes (Ed.). Advanced language learning: The contribution of Halliday and Vygotsky, (pp. 95–108). Continuum. Swain, M., & Lapkin, S. (2002). Talking it through: Two French immersion learners’ responses to reformulation. International Journal of Educational Research, 37, 285–304. Vandermeulen, N., Leijten, M., & Van Waes, L. (2020). Reporting writing process feedback in the classroom using keystroke logging data to reflect on writing processes. Journal of Writing Research, 12(1), 109–139. Yi, Y., Shin, D. S., & Cimasko, T. (2020). Special Issue. Multimodal composing in multilingual learning and teaching contexts. Journal of Second Language Writing, 47, 1–6. Zhang, Z. V., & Hyland, K. (2018). Student engagement with teacher and automated feedback on L2 writing. Assessing Writing, 36, 90–102.

81

part ii

Critical reflections on the affordances of data collection instruments and procedures

chapter 4

Survey data Questionnaires, interviews, and process logs Sofia Hort & Olena Vasylets

Mälardalen University | University of Barcelona

This chapter focuses on survey methods for studying writing processes. We specifically focus on questionnaires, interviews, and process logs, which all represent subjective self-report instuments. We start by describing the affordances of these data collection procedures, and review some relevant L1 and L2 studies which employed them to study writing processes. We then address some methodological concerns in the use of the self-report instruments discussed in the chapter and finish by suggesting avenues for future research.

Characteristics of survey data elicitation procedures Self-report instruments, which elicit writers’ thoughts on behaviors and/or mental acitivities about composing, constitute the most traditional means to explore writing processes. Thus, the seminal model of the writing process by Flower and Hayes (1981) was elaborated out of a verbal protocol of one adult participant, who was asked to self-report on his mental processes during expository text writing. Flower and Hayes’s (1981) study, which is still influential, provides an example of the potential with self-report instruments in the study of writing processes. The family of self-report techniques is extensive (see chapters 5 and 6 this volume). The specific focus of this chapter is questionnaires, interviews, and process logs, which have gained prominence in both L1 and L2 writing research. In what follows, we will define the characteristics of these instruments, exemplify their use in the empirical process writing research, and provide a summary of the explored research questions.

Writing processes through the lens of questionnaires A questionnaire can be defined as an instrument consisting of a number of questions and/or statements. It is designed to elicit responses which can be turned https://doi.org/10.1075/rmal.5.04hor © 2023 John Benjamins Publishing Company

Chapter 4. Survey data

into measures of the construct/variable under investigation (Dörnyei & Taguchi, 2010). Broadly speaking, questionnaires can provide data on the respondent’s background (e.g., age, race, gender), behaviour (e.g., frequency of use of a particular writing strategy), knowledge, as well as beliefs, interests, values, opinions and attitudes (e.g., beliefs about the importance of revision in writing). Questionnaire items can be open-ended (allowing the respondent to answer freely, with their own frame of reference) and/or close-ended (respondents have to choose between a set of pre-defined response options). One of the common ways to design a questionnaire is using a Likert scale, which asks a respondent to show his or her level of agreement on a metric scale, from strongly agree to strongly disagree (Clark & Watson, 2019). There is a long tradition of studying L1 and L2 writing by means of questionnaires (e.g. Berdanier & Zerbe, 2018; Biggs, 1988; Chitez et al., 2015; Lavelle, 1993; Lonka, 1996; Melo et al., 2019; Torrance et al., 1994; Torrance et al., 2000). These studies are often grounded in educational psychology and are frequently inspired by questionnaires focusing on students learning strategies (e.g. Biggs, 1988; Lavelle, 1993). The use of questionnaires has been especially popular in the context of L1 writing, with the Writing Process Questionnaire (Lonka, 1996) being one of the most popular instruments employed in varios studies. The questionnaire contains items measuring writing blocks (e.g., “I sometimes get completely stuck if I have to produce texts”), knowledge transformation (e.g., “Writing develops thinking”), productivity (e.g.,“I am a regular and productive writer”), procrastination (e.g., “I often postpone writing tasks until the last moment”), perfectionism (e.g., “I could revise my texts endlessly”), and innate writing ability (e.g., “Writing is a skill that cannot be taught”). The large-scale studies by Lonka et al. (2014, 2019), which involved 669 and 664 participants respectively, used this questionnaire to explore the relationship between PhD students’ conceptions of academic writing and their well-being. The structure of the Writing Process Questionnaire has also recently been validated for L1 Spanish speaking context by Cerrato-Lara et al. (2017). Another example is the European Writing Survey (EUWRIT project), which focused on the L1 writing of students in higher educational contexts (Chitez & Kruse, 2012; Chitez et al., 2015; Delcambre & Donahue, 2012). This survey, which was delivered to the students in various countries, includes general questions on writing in the study program, questions on text genre and writing practices, self-evaluation of the writing skills or conceptions of “good writing”. Under the heading of writing process, the questionnaire asks the participants to assess on a five-point scale the importance of writing an outline, brainstorming, planning, reading, drafting and revision. One of the relevant finding of EUWRIT project was that, in the German context, for

85

86

Sofia Hort & Olena Vasylets

example, the students assigned the highest score to reading activity, although they were also aware of the importance of other writing processes (Chitez et al., 2015). Further, Lavelle and colleagues (Lavelle, 1993, 1997; Lavelle et al., 2013; Lavelle & Bushrow, 2007; Lavelle & Guarino, 2003; Lavelle & Zuercher, 2001) developed and validated the Inventory of Processes of College Composition (IPIC) questionnaire, grounding it on the deep and surface learning model (Biggs, 1987; Schmeck et al., 1991) as well as on Bigg’s extension of that model to writing (Biggs, 1988). In this model, the surface approach is associated with extrinsic motivation, rote learning, and strategies aiming to reduce the task to its essentials. In contrast, the deep approach equals deep personal involvement with the task and the use of strategies seeking to achieve integration betweent different task components. Working from a psychometric perspective, Lavelle’s (1993) study further adapted the deep-surface approaches to writing. The use of the IPIC questionnaire allowed to distinguish between five scales, each corresponding to a different L1 writing style: Elaboration (a deep writing style marked by the need of selfexpression and a dynamic conception of the writing process), Low Self-Efficacy (characterized by thinking about writing as a painful task), Reflective-Revision (a deep style based on a sophisticated understanding of revision and willingness to engage in that process), Spontaneous-Impulsive (characterized by impulsive and unplanned strategies) and Procedural style (a method-driven style based on strict adherence to the rules). Validity studies further supported the relationship between the IPIC scale scores and freshman L1 writing grades (Lavelle, 1993), writing self-concept (Lavelle & Zuercher, 2001) and the development of L1 writing skills at the graduate level (Lavelle & Bushrow, 2007). The study by Biggs et al. (1999) was the first to employ the IPIC questionnaire in the context of L2 writing. In this case, the scores of the questionnaire were interpreted as a dependent variable. The aim of the study was to explore if intensive writing instruction could get L2 writers with maladaptive approaches (such as spontaneous-impulsive or procedural) to adopt more adaptive ones (such as reflective-revisionist or elaborative). The participants (25 Hong Kong university students) took part in a two-day writing workshop and completed the IPIC questionnaires at the beginning and at the end of the workshop. The pre- and post-test comparison of the IPIC scores showed positive changes in the writing processes as the students reported that they became more elaborationist, less spontaneousimpulsive, and less procedural in the way they went about writing after the intervention. There were no differences in reflective revision or self-efficacy, which was attributed to the short-term nature of training. An example of a questionnaire created specifically for the L2 context is the Writing Strategy Questionnaire by Petrić and Czárl (2003), who employed both qualitative and quantitative methods to validate this instrument. In the question-

Chapter 4. Survey data

naire, writing strategies, defined as actions or behaviors consciously carried out in order to improve the efficiency of writing (Cohen, 1998), were connected to the processes of planning, formulation, and revision. In order to provide a clear frame of reference to the responders, the items in the questionnaire were sequenced following the structure of the writing process. Thus, the original questionnaire included 8 items tapping into planning strategies, 14 items related to the whilewriting (i.e., formulation) strategies and 16 items for revising strategies. Some additional items were also designed to reflect the recursive nature of the writing process (e.g., “I go back to my outline and make change in it”) or to indicate that a respondent did not employ any planning or revising strategies (e.g., “When I have written my paper, I hand it in without reading it”). The recent study by Jang and Lee (2019) employed this questionnaire to explore the effects of L2 ideal self (refers to the self individuals would like to become) and ought-to self (refers to the attributes one believes one ought to possess) on the use of writing strategies and writing quality. Sixty-eight Korean learners with low-level of L2 English proficiency completed a motivation questionnaire, a descriptive essay and a writing strategies questionnaire. The results showed that the L2 ideal self had a significant positive effect on both planning strategy use and writing outcomes. Another recent example of the use of this questionnaire is the study by Zhi and Huang (2021) which explored the perceived authenticity (i.e., real-life correspondence) of the L2 writing processes in paper-based versus computer-based ESL writing tests. As a first step, all the participants completed the Writing Strategy Questionnaire. Then, the participants completed two timed writing tasks in both computer-based and paper-based modes. Immediately after the performance of the writing tasks, the cognitive processing questionnaire, designed to capture writers’ cognitive processes during writing (Chan et al., 2017), was administered. At the end of the study, semi-structured interviews were conducted with each participant. The interview included five main questions tapping into learners’ perceptions of the authenticity of their writing processes. Data elicited from the Writing Strategies Questionnaire were correlated with the data from the questionnaire on cognitive processing. The purpose of this correlation was to examine the correspondence between the writing processes and strategies used in the computer- and paper-based conditions and the writing processes in real-life academic tasks. The data from the interviews were analyzed qualitatively. One of the main findings in this study was that computer-based writing proved to be perceived by learners as more authentic, particularly in the area of revision. In addition, more participants planned ideas before composing and revised more in the computer-based mode. Another relevant example of the use of a self-report questionnaire to explore L2 writers’ cognition is the study by Wei et al. (2020). This study focused on

87

88

Sofia Hort & Olena Vasylets

rhetorical transfer, operationalized as the reshaping of L1 rhetorical knowledge in L2 writing. According to Wei et al. (2020), as an inner operation, “rhetorical transfer might become manifest merely in writers’ composing processes” (p. 2). Data were collected from 89 Chinese L2 English learners. Immediately after the completion of an argumentative essay in L2 English, the participants completed a questionnaire that solicited the participants’ recall of Chinese-to-English rhetorical transfer processes. The items in the questionnaire were presented in a form of a five-point Likert scale. The questionnaire tapped into the influence of L1 Chinese rhetorical features and the structure of Chinese argumentative writing on L2 English writing. To complement the data from the questionnaire, the researchers administered a follow-up interview which probed into the participants’ opinions about the role of Chinese L1 rhetoric in their EFL composing processes. The study found that L1 rhetorical transfer was positively associated with the learners’ perceptions of English writing difficulty, providing evidence that L2 writers resorted to L1 knowledge when feeling it difficult to manage their L2 composing processes. Another relevant study is the one by Manchón and Roca de Larios (2007), who combined verbal protocols and questionnaires to explore L2 writers planning behaviours. The study addressed various research questions, including the comparison of time allocated to planning in L1 and L2 writing, the role of proficiency in time management during planning, and the time dedicated to planning at different stages of composing in L1 and L2 writing. One of the data sources were the think-aloud protocols completed by 21 Spanish EFL learners while performing two argumentative tasks in their L1 and L2. Additionally, the participants completed a post-task questionnaire, which included, among other items, the questions tapping into the planning processes (e.g., What type of planning have you done in this composition? Generally, do you plan the way you have done today?). The study contributed with new findings on the important role of L2 proficiency in planning behaviors. In sum, questionnaires have been employed to study writing processes both in L1 and L2 writing research. In L2 writing in particular, questionnaires have been used to explore a variety of issues, including the effect of writing instruction on L2 writing processes (Biggs et al., 1999), the connection between individual differences (e.g. L2 ideal self ) and writing processes (Jang & Lee, 2019), the influence of L1 rhetorical knowledge on L2 writing (Wei et al., 2020), the role of L2 proficiency in planning behaviors (Manchón & Roca de Larios, 2007), and the perceived authenticity of L2 writing processes in computer- versus paper-based writing (Zhi & Huang, 2021). This variety of research questions shows that questionnaires represent a versatile and efficient research instrument to study writing processes, which can be adapted to use in different contexts and with different types of populations. Another advantage of questionnaires is their efficiency in

Chapter 4. Survey data

terms of financial resources and researcher time, as questionnaires allow to collect data with large samples (e.g., Lonka et al., 2014, 2019). There is also a tendency to complement quantitative data from questionnaires with the qualitative data obtained by means of interviews (e.g. Zhi & Huang, 2021) or think aloud protocols (Manchón & Roca de Larios, 2007), which represent other popular instruments employed in the exploration of writing processes.

Writing processes through the lens of interviews Interviews, defined as a dialogue between a researcher and a respondent, constitute a popular data collection instrument in social scientific disciplines (Rubin & Rubin, 2004). Interviews can be structured, semi-structured, or unstructured. Structured interviews, in which a researcher employs a pre-defined script and ready-made questions, can be suitable for data collection even with large samples, as the data collected is relatively easy to analyze. Most qualitative research interviews are, however, semi-structured. In this type of interview, the researcher prepares a list of questions which guide the conversation but there is also room for the respondent’s more spontaneous answers. Finally, an unstructured interview is explorative and may operate with just a single opening question, inviting the interviewee to engage into free conversation on the predetermined topic. In addition to individual interviews, focus group interviews can also be conducted. This format typically includes about 8 to 10 respondents and allows for the exploration of shared meanings on the topic under investigation (Gubrium & Holstein, 2002). Interviews have been a common method to investigate both L1 (e.g., Kolb et al., 2013; Hort, 2020; Roderick, 2019; Roozen, 2010; Wingate & Harper, 2021) and L2 writing processes (e.g. Chae, 2011; Green, 2013; Wei et al., 2020; Yang & Shi, 2003; Zhi & Huang, 2021). For example, in the study by Wingate and Harper (2021), data from screen recordings and interviews revealed considerable differences in the recursiveness and time allocation in the writing processes for successful versus less successful L1 writers. Another example is the study by Kolb et al. (2013) which investigated if a writing seminar could change L1 writing processes of students in higher education. The aim of the in situ interviews was to explore “students actual practices” (Kolb et al., 2013, p. 22). The results showed that participation in a writing-intensive seminar series did have a positive effect on writing, inducing students to adopt more advanced strategies. Taking a longitudinal perspective, Roozen (2010) performed a case study investigating the development of one single writer’s L1 disciplinary writing processes. The interviews were conducted during several years, showing that different periods in the writer’s life can have crucial effect on his/her writing process. Hort’s study (2020) also focused on specific cases, following L1 and L2 development of 9 higher education students. In

89

90

Sofia Hort & Olena Vasylets

this study interviews were combined with process logs to address aspects of essay writing process over a period of time. Among other findings, the study showed that students used writing as a way to make meaning in different ways, as support to elaborate on knowledge, or to plan their writing ahead. Although Hort’s and Roozen’s studies provide thick descriptions of students writing processes in higher education, generalizability of their results is limited due to the small number of the participants involved. Also, both studies provide a holistic description of the development of writing expertise, without addressing the details of the developmental changes in the discrete processes of planning, formulation and revision. Interviews have also been widely employed in the context of L2 writing. For example, Yang and Shi (2003) employed the data from interviews, think aloud protocols, and written drafts to compare writing processes and strategies of ESL and native English university students (n = 6). During the individual interviews, which were conducted after the think-aloud session, the participants were asked to recall the writing strategies they used and to comment on whether these strategies were related to their previous experiences. One of the prominent findings was that planning was the predominant process for all the participants. Also, previous writing experience appeared as an important mediator in the deployment of writing strategies. Another example is the longitudinal study by Chae (2011), which combined questionnaires and interviews to explore the interrelationships between students’ L2 English writing performance, prior knowledge, self-efficacy, interest and L2 writing strategies over the course of a semester. Data on writing strategies (which were defined as planning, monitoring, revising, retrieving and compensating) were collected from 127 Korean L2 English learners by means of a questionnaire in the form of a checklist. Subsequently, a subset of the participants (n = 15) took part in the one-to-one semi-structured phone interview that prompted the participants to recall and verbally report their writing strategies. A relevant finding was the existence of a significant positive correlation between all five groups of writing strategies and self-efficacy at two of the three points of data collection. This indicates the dynamic nature of the relationship between writing strategies/processes and other individual differences. Green (2013) employed interviews together with spoken journals (audio-logs) to explore the development of ESL academic writing processes over the course of one academic year. The participants were three ESL learners with B1-B2 English level (the Common European Framework of Reference). For triangulation, textual materials (drafts, feedback sheets, notes on individual tutorials) were collected. Semi-structured interviews were used to gain deeper understanding of the data obtained by means of the logs. The findings showed that novice ESL writers developed approaches similar to those adopted by L1 English writers (see

Chapter 4. Survey data

Torrance et al., 2000). The results also supported the view that successful academic L2 writing can be attained in multiple ways, as some writers can achieve it through detailed and structured pre-draft planning (as evidenced by the “planner” profile of one of the participants), while others find benefits in extensive and recursive drafting (“outline-and-develop” approach). In sum, interviews constitute another common instrument to study writing processes both in L1 and L2 research. The review of previous studies reveals that interviews are typically used to complement/triangulate data from other sources, such as questionnaires or process logs. Provided that data is time-consuming to analyze, interviews are often employed with a subset of participants in those studies in which other instruments, such as questionnaires, are the main data source (see, for example, Chae, 2011). In this type of studies, interviews provide valuable qualitative data which complement the quantitative data obtained with questionnaires. Studies which employ interviews as one of the main instruments (e.g., Green, 2013; Yang & Shi, 2003) are often case studies with few participants. Such studies can, despite their limited generalizability, provide valuable insights due to the rich description of the nature of L1 and L2 writing processes.

Writing processes through the lens of process logs Process logs, similar to “time use diaries” (Hart-Davidson, 2007) or “multimodal longitudinal journaling” (Gourlay & Oliver, 2016), represent a qualitative method of data collection that requires research participants to keep detailed records of their thoughts and behaviors related to a specific domain or activity. As such, this technique shows similarities with methods such as “learning diaries” (Roth et al., 2016) or “self-reflective journals” (Nicolás-Conesa et al., 2014). In the context of writing research, process logs refer to the procedure which requires participants to keep a record of their writing activity in an online journal (e.g. Hort, 2017; Youngyan, 2012; Nelson, 1993; Prior, 2004; Roderick, 2019; Segev-Miller, 2005). Similar to diaries, logs provide access to the cognitive processes undertaken by writers as they take decisions relative to composing. Compared to the think-aloud technique (see chapter 5 and 10, this volume), which requires to verbalize comments about the ongoing mental activities when writing, process logs are considered to put less pressure on writers’ cognitive/attentional capacity. This makes this instrument less intrusive, without losing its informativeness. An addional important advantage of process logs its that they allow to track the composing processes during a period of time, on multiple occasions, and in multiple settings, which can provide a rich and complex picture on a writer’s composing processes. Importantly, the development of mobile technologies has expanded the possibilities of

91

92

Sofia Hort & Olena Vasylets

process logs. For now, these technologies enable to add photographs, pictures, video- and audio recordings into the logs. The use of the affordances of mobile technologies to explore writing processes is exemplified by the study of Hort (2017). She employed process logs to explore the dynamic evolvement of the writing processes of 17 undergraduate students in Sweden. The participants were instructed to keep a digital diary of their everyday writing assignments. In particular, the participants had to report, inter alia, on what they did or thought while they were writing, their feelings, or any other issue which was relevant to the performance of their writing task. The participants were also instructed to supplement their text reports with the photos of their workplaces and the resources they were using while writing. The majority of the participants used the Evernote application suggested by the researchers, but some subjects used other applications, which were more familiar to them, such as Mental Note, Instagram or WhatsApp. The resulting data came in multiple formats, including written annotations, photos, audio recordings, maps and drawings, which made it possible to track the implementation of writing activities in situ and from a multimodal perspective. However, despite this multiplicity of sources, the variability of the data collected also posed challenges in its analysis and reliability. In the context of L2 writing, Lei (2008) employed interviews, stimulated recall, and process logs to investigate the way in which two Chinese learners of L2 English strategically mediated their writing processes. The participants were requested to keep process logs before and after writing their first draft. Semistructured interviews were designed to elicit data on the participants’ writing strategy use and their writing experiences. Stimulated recall included questions about pauses, revisions, and actions during the writing process. The data from the multiple sources were transcribed and coded for writing strategies. The analysis showed that learners’ L2 writing processes were mediated by multiple factors, including the level of L2 proficiency, use of L1 during L2 composing, knowledge of rhetorics, and knowledge of evaluation criteria or time restrictions to perform the written assignments. Also, interaction with peers and their writing teacher had a relevant influence on the participants’ writing activities. In a later study, Lei (2016) employed the same data sources to compare writing strategies of skilled and less skilled students. In this study, the participants completed (in English) the process log before and after submitting the draft, and upon receiving the teacher’s feedback. The analysis revealed that skilled writers were more sensitive to the teachers’ feedback and were better at noticing how more experienced writers used language in their writing. Moreover, Stapleton (2010) conducted an exploratory case study in which process log was the main data-gathering instrument to describe how one highly

Chapter 4. Survey data

proficient L2 learner allotted time to composing processes as she wrote a research paper, largely in an electronic environment. During three weeks, the participant spent about 50 hours writing her paper of over 4000 words at three different venues (library, home and in transit). The resulting process log, which comprised 216 entries, was coded for the writing processes following the categories from Roca de Larios et al. (2008). Information from the process log was also complemented with data from the open-ended retrospective questionnaire and interviews, which were coded following the same categories. A significant finding was a much reduced allocation of time to formulation (33%) as compared to previos studies, e.g., 62%-81% (Roca de Larios et al., 2008). This was attributed to the high level of L2 proficiency of the participant and to the fact that the writing assignment was performed in a digital environment. A combination of written texts, process logs, screen recordings of students’ real-time writing, semi-structured interviews, and stimulated recall were employed in the study by Kessler (2020), which explored writing strategies and the use of technology by two Chinese L2 English doctoral students. Prior to performing their writing assignment, the participants completed an online process log, which was designed to capture knowledge of the topic, goals and writing audience, as well as whether the participants had sought for help in preparation for the writing task. The participants were also interviewed to get deeper insights on their process log responses. Findings showed that students’ writing processes were influenced by numerous factors. Such factors included the recommendations provided by the writing instructor, as well as the participants' use of multimodal strategies and their access to external digital sources or use of digital notes. To summarize, process logs constitute another valuable instrument that can provide deep insights into the nature of writing processes. However, to date, process logs have been employed less frequently than other self-report measures. In L2 (and L1) writing, process logs have been predominantly employed in case studies, in which they complemented data from other sources such as interviews or stimulated recalls. Despite the limited generalizability of the findings because of the small number of participants, studies with process logs have provided deep insights into the nature of L2 (and L1) writing. Potential lack of systematicity of data collection and the subjectivity of process log data represent the main challenges in the use of process logs. However, this limitation is often counteracted with the help of data triangulation, an issue that we will discuss in more detail below in the section on methodological challenges.

93

94

Sofia Hort & Olena Vasylets

Research questions addressed with questionnaires, interviews, and process logs As evidenced by the above literature review, self-report measures can be successfully employed to explore varied research questions. Thus, questionnaires were used in a number of exploratory large-scale studies which tapped into writing profiles (Lavelle, 1993, 1997) or explored general/habitual writing behaviors (Chitez et al., 2015). Questionnaires appear as the main instrument in a number of studies with correlation designs which investigated, inter alia, the links between writing processes and self-perceived well-being (Lonka et al., 2014), writing grades (Lavelle, 1993), motivation (Jang & Lee, 2019), writing strategies (Zhi & Huang, 2021), or self-efficacy and prior knowledge (Chae, 2011). Questionnaires were also used to gauge writing development (Chae, 2011; Lavelle & Bushrow, 2007) or as a pre-posttest measure in an intervention study exploring the effects of intensive instruction on writing processes (Biggs et al., 1999). The combination of questionnaires with interviews expanded the range of the research questions, as exemplified by the studies into planning behaviors (Manchón & Roca de Larios, 2007), rhetorical transfer (Wei et al., 2020) or self-perceived authenticity of the writing processes in paper- versus computer-based writing (Zhi & Huang, 2021). Interviews, in combination with other instruments and data such as screen recordings, spoken journals, or writing drafts, were successfully employed in studies on writing development (Green, 2013). They have been also employed in comparative studies exploring differences and similarities in the writing processes of L1 and L2 writers (Yang & Shi, 2003) and in studies comparing writing cognition of successful and unsuccessful students (Lei, 2016; Wingate & Harper, 2021). Process logs have typically been used in qualitative exploratory studies investigating, for instance, writing processes during the completion of a research paper or an essay (Stapleton, 2010; Hort 2020) or factors (e.g., use of technology, level of L2 proficiency, etc.) that can influence writing cognition (Kessler, 2020; Lei, 2008). In sum, self-report instruments represent versatile means of data collection, which can be used to answer varied research questions in studies with different types of design. Thus, self-report measures are apt for both cross-sectional and longitudinal investigations, and can be employed in exploratory, correlational, or intervention studies, among others. While process logs appear as a useful instrument in qualitative studies, the investigations which combine questionnaires and interviews are typically characterized by a mixed-method approach, which is considered to be specifically suitable to explore complex phenomena, such as writing cognition (Riazi, 2016).

Chapter 4. Survey data

Methodological considerations when using self-report instruments A fundamental requirement of any research instrument is that it has to be reliable (i.e., it has to produce consistent data from one assessment to another) and valid (i.e., the instrument has to measure the construct it was designed to measure). Thus, in order to be able to use self-report instruments to their full potential, it is crucial to be aware of the possible sources of the threats to validity, especially bearing in mind that, when measuring writing processes, the validity of questionnaires can be influenced by multiple factors. One of these factors is the participants´ ability to relate questions about writing processes to their own writing experiences, as well as their ability to articulate, in a comprehensible manner, the description of the writing processes. Thus, in order to obtain reliable and valid data, the researcher has to pay special attention to the formulation of questions, which have to be comprehensible, concrete, and specific. It is also important to avoid ambiguity, and difficult terminology should be presented in a clear and simplified form accessible to the potential participants. Piloting of questionnaires with a representative sample of the potential participants is of vital importance. This is to ensure the clarity of the questions and the ability of the targeted population to adequately understand and provide meaningful answers to the questions tapping into their writing cognition. Another important pitfall to avoid is inconsistent use of terminology. Thus, some previous studies (e.g., Jang & Lee, 2019) employed the term “writing strategies” when referring to the writing process in its classical definition. For the sake of data validity, it is vital to avoid this undesirable overlap at the level of terminology, as it can lead to the confusion of such important constructs as “strategy” and “process”. This risks not only confusing the participants, but can also obscure data analysis and the comparability of findings across studies. The validity of questionnaire data on writing processes can also depend on the proximity of the completion of the questionnaire to the writing task. This issue might be less relevant if a questionnaire intends to tap into the participants’ general or habitual ways to plan, formulate, or revise their texts. However, if the questionnaire in question intends to capture the fine-grained nuances of writing processes, the proximity of the questionnaire completion to the writing task performance becomes of vital importance. In this case, the participants’ responses would depend on the recency in their short-term memory of the mental episodes related to their cognitive activity while completing the writing task. Greater time lapse between writing task and questionnaire completion would equal greater memory decay on the evoked writing processes, rendering the questionnaire answers on the writing cognition less accurate and less detailed (Ericsson & Simon, 1993). With a long time lapse, it is also likely that the participants would

95

96

Sofia Hort & Olena Vasylets

base their accounts, at least partially, on inferences and reconstructions derived from the their implicit and subjective theories of writing cognition, rather than from their real-life experiences. It is also important to avoid using questionnaires which are too long, as they can pose undue physical and cognitive strain on the participants. This strain can also be aggravated by a high ratio of too long and too complicated questions. An inadequate length of questionnaire and/or the poor quality of its items can result in the participants’ quitting or providing incomplete and unreliable answers. In this regard, we would like to highlight once again the importance of piloting. This procedure can increase the reliability, validity and practicality of the questionnaire (Oppenheim, 1992). Piloting is also necessary if the questionnaire is translated into the L1 of the participants. Provided that the bulk of the writing processes questionnaire were originally designed in English, translation and validation in the new cultural/education context would often be necessary (see, e.g., Cerrato-Lara et al., 2017). Questionnaire piloting should be conducted on a sizeable and representative number of respondents. Also, it is advisable to interview the participants after completing the quesationnaire, asking them about the comprehensibility of the questionnaire and the appropriateness of its length and format. The data obtained from piloting should be subjected to statistical analysis to assess for internal consistency and validity. On the basis of this analysis, redundant items and items with low reliability should be removed from the final version of the questionnaire. If the questionnaire intends to tap into general/habitual selfperceived writing behaviors a good practice would be to assess test-retest reliability by delivering the same questionnaire twice. A suggested time gap between the two test points could be of two or three weeks. It also important to avoid writing courses or any other type of writing-related training between the test-retest points in order to prevent changes in the writers’ original writing behaviors, which can obscure the results of the reliability test. Our review of the empirical studies has provided evidence that interviews constitute effective (and popular) means of gathering data on writing processes. However, the analysis of current methodological practices reveals the frequency of under-reporting of some important details related to data collection and analysis. Thus, none of the reviewed studies provided examples of transcripts in the appendices, which limits the understanding of interactional developments. Also, often the studies do not provide the questions posed to the participants. As a result, it becomes difficult to judge the validity of the interview-based investigation, or to assess the links between theory and research questions. Little attention is also given to the context of interviews or to the details about the interviewer and/or interviewees. This research thus tends to neglect the importance of the temporal, physical, social, and institutional context in the interactional context

Chapter 4. Survey data

of an interview (Mann, 2016). Likewise, it is important to report on the details (e.g, age, L1, gender) of both interviewer and interviewee, and also on the power relationships between them. This information is important, as knowledge/data in interviews is produced in negotiation with the interviewer, who guides and inevitably influences (even with the mere presence) the construction of meaning (Rapley, 2001). Other details which are commonly under-reported are how the participants were selected for the interviews or how confidentiality and informed consent were guaranteed in these settings. Concerning process logs, a particular methodological strength is their high ecological validity, since they can capture rich data on the writing process close to the moment in time when this process occurs. In this respect, process logs have an advantage over other more retrospective approaches, which can result in lower recall rates. Also, data obtained from process logs allow for the measurement of the dynamic development of writing processes over time. In contrast, the acknowledged disadvantages of process logs are potential under-reporting, recording errors, or content selection bias (e.g., the participants may prioritize some events/details which have little relevance for the research focus) (Ohly et al., 2010). To minimize weakness and maximize strengths of process logs, several issues have to be taken into account. Thus, sampling of the participants is important. As keeping a process log requires an important degree of discipline and commitment, it is vital to make clear to the participants the instructions of how to keep a process log and also to aknowledge the burden that this task can pose. It is also important that the participants understand the definition of writing process and have the necessary meta-language to describe their writing cognition. Thus, at least some basic training of the participants is necessary to make sure that they understand the general mechanisms of keeping a process log and that they gain some clarity about the phenomena on which they have to report. The time lapse between when an event occurs and when a participant records a process log entry is also an important consideration. Usually, it is preferable for the participants to record entries immediately after an event happens. Thus, a recommendation could be to make a recording right after working on the writing task. The researcher must also maintain a periodical and continual communication with the participants, constantly motivating them to keep with the task (Hort, 2017). It is also recommended to offer the participants some choice in recording technologies (Ohly et al., 2010). However, the optimal scenario for the researcher is that all participants use the same program/software, which facilitates data analysis. These are the aspects that the researchers have to take into consideration before initiating data collection. Depending on the research focus, and on the technology or data collecting tool used, the researcher might also encourage the participants to

97

98

Sofia Hort & Olena Vasylets

capture images or other objects in addition to text entries, which can enrich the data collected. The variety of data produced could, however, pose some additional challenges in data analysis and affect reliability. Finally, it is important to highlight that the majority of the reviewed studies combined various methods of data collection within the same design (e.g., Chae, 2011; Hort, 2020; Kessler, 2020). From the methodological standpoint, this can be defined as methodological triangulation. The literature identifies two main purposes of triangulation. The first is confirmation of data, which entails exploring the extent to which findings obtained from different instruments converge or differ. The second is completeness of data, which is concerned primarily with gathering multiple perspectives to provide a more complete picture of the phenomena studied (Cohen et al., 2002). One of the main benefits of triangulation is that the combination of methods can help overcome the limitations of each method and eventually provide an enriched explanation of the research problem (Flick, 2018). When applied in the studies of writing processes, the specific benefit of methodological triangulation is that the combination of data from different sources can provide a multidimensional view of the phenomenon. This is specifically important in the exploration of such a complex issue as the writing process. As such, the use of triangulation in writing studies could be praised as a good practice. However, it is also important to highlight that the use of triangulation is under-justified and under-reported, in the sense that often it is not clear if the study employs triangulation for the purpose of confirmation or completeness. Also, studies are not always explicit about the procedure followed to confirm findings or increase validity by means of triangulation; often, the details of how the rigor was achieved with every individual method are also missing.

Avenues for further research Studies of writing processes using self-report instruments would benefit from a number of methodological improvements, such as greater terminological consistency, more detailed description of the interview context and the interviewer/ interviewee characteristics, provision of the interview questions and samples of transcripts in the appendixes, as well as higher rigor in the procedure of process log data collection (e.g., clear instructions, sampling and training of the participants, as well as clarity on analysis procedure). Piloting and validating of questionnaires in new cultural/educational contexts and new L1s is also necessary. Additionally, the details and quality of reporting about methodological triangulation in writing process research should be improved.

Chapter 4. Survey data

Future research should also take a greater advantage of the affordances of focus group interviews, which can yield a rather specific type of data, different from the one obtained via one-to-one interviews. Focus group interviews have their own dynamics as they provide opportunities for interviewees to both articulate their own thoughts and experiences and also react to the comments and ideas of other interviewees. This can lead to interactive joint reflection and offer the participants opportunities to challenge each other’s ideas and expand their own responses. Group interviews work specifically well with the members of the socalled naturally occurring units (e.g., classmates) and when some visual support is available (i.e., writing task instructions or the completed writing task) (Cohen et al., 2002). As such, group interviews can provide a specific type of data which would represent the result of group co-construction of meaning and experiences on writing process. A further suggestion for future research would be to explore the possibilities of the linguistic analysis of the data from the open-ended questionnaires, interviews, and process logs. Up to date, writing studies have employed primarily content analysis of this type of data, identifying topics and themes in the data. Linguistuic analysis is instead performed by means of the specialized software, such as AntConc5 (Anthony, 2004), and consists in identifying various linguistic markers (e.g., positive and negative words, pronouns, etc.) of cognitive states and processes. Linguistic analysis enables researchers to identify implicitly conveyed thoughts and feelings (Tausczik & Pennebaker, 2010), thereby making it possible to deeper explore writing processes and from a different perspective. Following L1 writing research (e.g., Hort, 2017), L2 investigations should also open up to the use of mobile technologies in the investigation of writing processes. The advantage of mobile technologies (e.g., laptops, tablets, mobile phones) is their portability and the multimodal type of data they can produce (written texts, audio and video recordings, photos, drawings or geo-location). Until now, mobile technology has been rarely employed as a research tool in the writing research field. However, the ubiquitous nature of technological devices has induced researchers to explore their affordances in survey studies (Lachmann et al., 2013) or to explore the educational context (e.g., Beddall-Hill et al., 2011). The use of mobile technologies in writing research would widen the scope of writing process inquiry, allowing to study writing in situ and in other environments (e.g., workplace, home) than the classroom. The immediacy and the easiness in the use of technological devices might induce writers to employ them in a frequent and natural way any time the writing-relevant activity may happen. This would allow to gain more original and ecologically valid insights on the proper moment of how texts come into being (Prior, 2004). Importantly, these new insights would represent a move away from a purely cognitivist perspective

99

100

Sofia Hort & Olena Vasylets

on writing, allowing to view, record and eventually analyze writing cognition as a complex, situated activity, mediated by multiple agents (e.g., instructors, peers), artifacts (workplace) and other writer-internal and external factors. The findings from this type of studies would undoubtedly enrich writing process research.

References Anthony, L. (2004, 10 december). AntConc: A learner and classroom friendly, multi-platform corpus analysis toolkit. In Proceedings of Interactive Workshop on Language E-learning (pp. 7–13). Waseda University Press. Beddall-Hill, N., Jabbar, A., & Al Shehri, S. (2011). Social mobile devices as tools for qualitative research in education: iPhones and iPads in ethnography, interviewing, and design-based research. Journal of the Research Center for Educational Technology, 7(1), 67–90. Berdanier, C., & Zerbe, E. (2018, July 22–25). Quantitative investigation of engineering graduate student conceptions and processes of academic writing. 2018 IEEE International Professional Communication Conference (ProComm), Toronto, Canada. Biggs, J. B. (1987). Student approaches to learning and studying. Australian Council for Educational Research. Biggs, J. B. (1988, April). Students’ approaches to essay-writing and the quality of the written product. The Annual Meeting of the American Educational Research Association, New Orleans, LA. Biggs, J., Lai, P., Tang, C., & Lavelle, E. (1999). Teaching writing to ESL graduate students: A model and an illustration. British Journal of Educational Psychology, 69(3), 293–306. Cerrato-Lara, M., Castelló, M., Garcia Velazquez, R., & Lonka, K. (2017). Validation of the writing process questionnaire in two Hispanic populations: Spain and Mexico. Journal of Writing Research, 9(7), 151–172. Chae, S. E. (2011). Contributions of prior knowledge, motivation, and strategies to Korean college students’ L2 writing development (Doctoral dissertation). University of Maryland. Chan, S., Bax, S., & Weir, C. (2017). Researching participants taking IELTS Academic Writing Task 2 (AWT2) in paper mode and in computer mode in terms of score equivalence, cognitive validity and other factors (Issue 2017/4). IELTS Research Reports Online Series. https://www.ielts.org/teaching-and-research/research-reports/ielts_online_rr_2017-4 Chitez, M., & Kruse, O. (2012). Writing cultures and genres in European higher education. In M. Castelló & C. Donahue (Eds.), University writing: Selves and texts in academic societies (pp. 151–175). Brill. Chitez, M., Kruse, O., & Castelló, M. (2015). The European writing survey (EUWRIT): Background, structure, implementation, and some results (Working Papers in Applied Linguistics No. 9). Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427. Cohen, A. D. (1998). Strategies in learning and using a second language. Longman. Cohen, L., Manion, L., & Morrison, K. (2002). Research methods in education. Routledge.

Chapter 4. Survey data

Delcambre, I., & Donahue, C. (2012). Academic writing activity: Student writing in transition. In M. Castelló & C. Donahue (Eds.), University writing: Selves and texts in academic societies (pp. 129–149). Brill. Dörnyei, Z., & Taguchi, T. (2010). Questionnaires in second language research construction, administration, and processing. Routledge. Ericsson, K., & Simon, H. (1993). Protocol analysis: Verbal reports as data (2nd ed.). The MIT Press. Flick, U. (2018). Triangulation in data collection. In U. Flick (Ed.), The SAGE handbook of qualitative data collection (pp. 527–544). Sage. Flower, L., & Hayes, J. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387. Gourlay, L., & Oliver, M. (2016). Multimodal longitudinal journaling. In C. Haythornthwaite, R. Andrews, J. Fransman, & E. Meyers (Eds.), SAGE handbook of e-learning research (2nd ed., pp. 291–310). Sage. Green, S. (2013). Novice ESL writers: A longitudinal case-study of the situated academic writing processes of three undergraduates in a TESOL context. Journal of English for Academic Purposes, 12(3), 180–191. Gubrium, J. F., & Holstein, J. A. (2002). From the individual interview to the interview society. In J. F. Gubrium & J. A. Holstein (Eds.), Handbook of interview research: Context and method (pp. 3–32). Sage. Hart-Davidson, W. (2007). Studying the mediated action of composing with time-use diaries. In H. A. McKee & D. DeVoss (Eds.), Digital writing research: Technologies, methodologies, and ethical issues (pp. 153–170). Hampton Press. Hort, S. (2017). Exploring the use of mobile technologies and process logs in writing research. International Journal of Qualitative Methods, 16(1). Hort, S. (2020). Skrivprocesser på högskolan: Text, plats och materialitet i uppsatsskrivandet (Doctoral dissertation). Örebro University. Jang, Y., & Lee, J. (2019). The effects of ideal and ought-to L2 selves on Korean EFL learners’ writing strategy use and writing quality. Reading and Writing, 32(5), 1129–1148. Kessler, M. (2020). Technology-mediated writing: Exploring incoming graduate students’ L2 writing strategies with Activity Theory. Computers and Composition, 55, 1–18. Kolb, K. H., Longest, K. C., & Jensen, M. J. (2013). Assessing the writing process: Do writingintensive first-year seminars change how students write? Teaching Sociology, 41(1), 20–31. Lachmann, H., Ponzer, S., Johansson, U. -B., Benson, L., & Karlgren, K. (2013). Capturing students’ learning experiences and academic emotions at an interprofessional training ward. Journal of Interprofessional Care, 27(2), 137–145. Lavelle, E. (1993). Development and validation of an inventory to assess processes in college composition. British Journal of Educational Psychology, 63(3), 489–499. Lavelle, E. (1997). Writing style and the narrative essay. British Journal of Educational Psychology, 67(4), 475–482. Lavelle, E., Ball, S. C., & Maliszewski, G. (2013). Writing approaches of nursing students. Nurse Education Today, 33(1), 60–63. Lavelle, E., & Bushrow, K. (2007). Writing approaches of graduates students. Educational Psychology, 27(6), 807–822.

101

102

Sofia Hort & Olena Vasylets

Lavelle, E., & Guarino, A. J. (2003). A multidimensional approach to understanding college writing processes. Educational Psycology, 23(3), 295–305. Lavelle, E., & Zuercher, N. (2001). The writing approaches of university students. Higher Education, 42(3), 373–391. Lei, X. (2008). Exploring a sociocultural approach to writing strategy research: Mediated actions in writing activities. Journal of Second Language Writing, 17(4), 217–236. Lei, X. (2016). Understanding writing strategy use from a sociocultural perspective: The case of skilled and less skilled writers. System, 60, 105–116. Lonka, K. (1996). The writing process questionnaire. Department of Psychology, University of Helsinki. Lonka, K., Chow, A., Keskinen, J., Hakkarainen, K., Sandström, N., & Pyhältö, K. (2014). How to measure PhD. students’ conceptions of academic writing – and are they related to wellbeing? Journal of Writing Research, 5(3), 245–269. Lonka, K., Ketonen, E., Vekkaila, J., Cerrato Lara, M., & Pyhältö, K. (2019). Doctoral students’ writing profiles and their relations to well-being and perceptions of the academic environment. Higher Education, 77(4), 587–602. Manchón, R. M., & Roca de Larios, J. (2007). On the temporal nature of planning in L1 and L2 composing. Language Learning, 57(4), 549–593. Mann, S. (2016). The research interview. Reflective practice and reflexivity in research processes. Routledge. Melo, J., Zerbe, E., & Berdanier, C. G. (2019, June 6). Validating a short form writing attitudes survey for engineering writers. 126th ASEE Annual Conference and Exposition, Florida, United States. Nelson, J. (1993). The library revisited: Exploring students research processes. In A. M. Penrose & B. M. Sitko (Eds.), Hearing ourselves think: Cognitive Research in the College Writing Classroom (pp. 102–122). Oxford University Press. Nicolás-Conesa, F., Roca de Larios, J., & Coyle, Y. (2014). Development of EFL students’ mental models of writing and their effects on performance. Journal of Second Language Writing, 24, 1–19. Ohly, S., Sonnentag, S., Zapf, D., & Niessen, C. (2010). Diary studies in organizational research. Journal of Personnel Psychology, 9(2), 79–93. Oppenheim, A. N., (1992). Questionnaire design, interviewing and attitude measurement (2nd ed.). Bloomsbury. Petrić, B., & Czárl, B. (2003). Validating a writing strategy questionnaire. System, 31(2), 187–215. Prior, P. (2004). Tracing process: How texts come into being. In P. Prior & C. Bazerman (Eds.), What writing does and how it does it (pp. 173–206). Routledge. Rapley, T. J. (2001). The art (fulness) of open-ended interviewing: Some considerations on analysing interviews. Qualitative Research, 1(3), 303–323. Riazi, A. M. (2016). The Routledge encyclopedia of research methods in applied linguistics. Routledge. Roca de Larios, J. R., Manchón, R., Murphy, L., & Marín, J. (2008). The foreign language writer’s strategic behaviour in the allocation of time to writing processes. Journal of Second Language Writing, 17(1), 30–47.

Chapter 4. Survey data

Roderick, R. (2019). Self-regulation and rhetorical problem solving: How graduate students adapt to an unfamiliar writing project. Written Communication, 36(3), 410–436. Roozen, K. (2010). Tracing trajectories of practice: Repurposing in one student’s developing disciplinary writing processes. Written Communication, 27(3), 318–354. Roth, A., Ogrin, S., & Schmitz, B. (2016). Assessing self-regulated learning in higher education: A systematic literature review of self-report instruments. Educational Assessment, Evaluation and Accountability, 28(3), 225–250. Rubin, H. J., & Rubin, I. S. (2004). Qualitative interviewing: The art of hearing data (2nd ed.). Sage. Schmeck, R. R., Geisler-Brenstein, E., & Cercy, S. P. (1991). Self-concept and learning: The revised inventory of learning processes. Educational Psychology, 11(3–4), 343–362. Segev-Miller, R. (2005). Writing-to-learn: Conducting a process log. In G. Rijlaarsdam, H. van den Bergh, & M. Couzijn (Eds.), Effective learning and teaching of writing (pp. 533–546). Springer. Stapleton, P. (2010). Writing in an electronic age: A case study of L2 composing processes. Journal of English for Academic Purposes, 9(4), 295–307. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. Torrance, M., Thomas, G. V., & Robinson, E. J. (1994). The writing strategies of graduate research students in the social sciences. Higher Education, 27(3), 379–392. Torrance, M., Thomas, G. V., & Robinson, E. J. (2000). Individual differences in undergraduate essay-writing strategies: A longitudinal study. Higher Education, 39(2), 181–200. Wei, X., Zhang, L. J., & Zhang, W. (2020). Associations of L1-to-L2 rhetorical transfer with L2 writers’ perception of L2 writing difficulty and L2 writing proficiency. Journal of English for Academic Purposes, 47, 1–13. Wingate, U., & Harper, R. (2021). Completing the first assignment: A case study of the writing processes of a successful and an unsuccessful student. Journal of English for Academic Purposes, 49, 100948. Yang, L., & Shi, L. (2003). Exploring six MBA students’ summary writing by introspection. Journal of English for Academic Purposes, 2(3), 165–192. Youngyan, L. (2012). Undergraduate students searching and reading Web sources for writing. Educational Media International, 49(3), 201–215. Zhi, M., & Huang, B. (2021). Investigating the authenticity of computer-and paper-based ESL writing tests. Assessing Writing, 50, 100548.

103

chapter 5

Verbally mediated data Concurrent/retrospective verbalizations via think-aloud protocols and stimulated recalls Ronald P. Leow & Melissa A. Bowles

Georgetown University | University of Illinois at Urbana-Champaign

This chapter explores how think-alouds (TAs) and stimulated recalls (SRs) have been used to study cognitive processing, with a particular focus on writing processes. We describe the two types of verbal report and discuss the kinds of research questions that each is well-suited to answer, as well as considerations to be addressed when deciding which one to use in a given research design. We then discuss the validity of TAs and SRs and assess the robustness of findings from empirical studies on the writing process that have used TAs and SRs. We conclude by providing some future directions for research to move the field of L2 writing processes forward.

Introduction This chapter explores how two types of verbal reports, think-aloud protocols (TAs) and stimulated recalls (SRs), have been used to study cognitive processing, with a particular focus on writing processes, in (instructed) second language acquisition (I)SLA. Verbal reports were first used extensively in cognitive psychology in the early twentieth century to study problem solving but, because of their versatility, they were soon used in a whole host of fields as a primary means of collecting introspective data about cognition (Ericsson & Simon, 1984, 1993). Specifically, TAs are a kind of concurrent verbal report that requires participants to say their thoughts aloud while completing a task. Most commonly, participants are instructed not to justify or explain their thoughts (metacognitive TA) but rather to speak freely as their ideas come to them (non-metacognitive TA) while they complete the task. SR, on the other hand, is a type of retrospective verbal report pioneered in the field of education by Bloom (1953) and further refined by Siegel et al. (1963). In SR, participants complete a task and have some sort of recording or artifact of task completion, such as a screen recording, audio or

https://doi.org/10.1075/rmal.5.05leo © 2023 John Benjamins Publishing Company

Chapter 5. Verbally mediated data

video recording (the stimulus), which they then watch or listen to a short time later while verbalizing what they were thinking at the time. Therefore, although both TAs and SR rely on verbal reporting to provide insight into processing, they differ along several dimensions, all of which have implications for their use in research. In terms of timing, TAs happen during task completion whereas SR happens after task completion. In terms of the amount and type of support that is provided, because TAs are performed while the task is being completed, the task itself provides the sole support for the verbal report. Given that SRs are conducted after the task has been completed, participants need an artifact of the task to refer back to as they verbalize. Finally, although both TAs and SRs are predominantly carried out orally, with participants speaking their thoughts aloud, in some studies participants have provided their recall comments in writing (see Chapter 6, this volume, for written languaging).

Research questions addressed with verbal reports Although verbal reports are most often used in research based in cognitivist theories, other theoretical perspectives, such as sociocultural theory, have also used them as data collection tools. In that sense, verbal reports are atheoretical and can be used to gain insight into participants’ thought processes regardless of the theoretical approach being adopted. In writing research, verbal report data offer a window into the processing that is behind learners’ written texts and can serve to triangulate or complement product data. Given their versatility, verbal reports have enjoyed a long history in both first (L1) and second/foreign (L2) language writing research and can be traced back to the 1980s (e.g., Cohen & Cavalcanti, 1987, 1990; Durst, 1987). Indeed, one of the best-known writing models, the cognitive process model of writing (Flower & Hayes, 1981) was based largely on TA data that the authors gathered from writers (Flower & Hayes, 1980). In L1 writing research, TAs have been used extensively to examine the stages of the writing process, including comparing the cognitive processes involved in composing different kinds or genres of texts (Durst, 1987) and investigating writers’ thoughts as they revise and edit texts, providing explanations of behaviors and decisions that a comparison of the original and revised finished products alone would not reveal (e.g., Breetvelt, 1994; Zellermayer & Cohen, 1996). Teachers have even used TAs as part of their classroom instruction, as a technique to get students to view writing as an expression of their inner dialogue (e.g., Box, 2002; Cushman, 2002; Fresch et al., 1998; Scardamalia, 1984). In L2 writing, TAs have been used to compare writers’ strategies in their L1 and L2 (e.g., Armengol & Cots, 2009; Beare, 2001; Chenoweth & Hayes, 2001; El Mortaji,

105

106

Ronald P. Leow & Melissa A. Bowles

2001; Jannausch, 2002; Qi & Lapkin, 2001; Roca de Larios et al., 2001), to examine L2 writing strategies (e.g., Hatasa & Soeda, 2000; Johnson, 1992; Kang & Pyun, 2013) and processes assumed to promote the potential of learning (e.g., LópezSerrano et al., 2020), to investigate the role that the L1 plays in L2 process writing (e.g., C. D. Castro, 2004; D. Castro, 2005; Murphy & Roca de Larios, 2010; Uzawa, 1996; Wang & Wen, 2002), to gather information about the relationship between the L2 writing process and product (e.g., van Weijen et al., 2008), and even as a process-based indicator for measuring L2 writing fluency (e.g., Latif, 2009). The role of L2 proficiency in composing has also been profitably studied, with just TAs (Roca de Larios et al., 2008) and with TAs in combination with keystroke logging and questionnaires (Tiryakioglu et al., 2019). TAs have also been used to gain insight into L2 writers’ awareness (e.g., Sachs & Polio, 2007) and depth of processing (e.g., Caras, 2019) and use of feedback on their compositions, to uncover teachers’ and students’ beliefs about feedback on L2 writing (e.g., Diab, 2005), to shed light on L2 writers’ editing behavior (e.g., Willey & Tanimoto, 2015), and to compare self and peer revisions of L2 writing (e.g., Suzuki, 2008). Additionally, they have been used to understand L2 academic writers’ source use (e.g., McCulloch, 2013) and the role of reading strategies in integrated L2 writing tasks (e.g., Plakans, 2009). TAs have been especially fruitful in the assessment of L2 writing, where they have been used to understand what attributes essay raters attend to when scoring L2 writing (e.g., Barkaoui, 2010, 2011; Gebril & Plakans, 2014; Li & He, 2015) and to develop and validate writing rubrics (e,g., Zhao, 2013) and writing tasks (e.g., Yu et al., 2011). In recent years, research studying the strategies that writers use when they are composing, editing, and revising texts in their heritage language have also relied on TAs (e.g., Schwartz, 2003, 2005; Yanguas & Lado, 2012). Despite their versatility, TAs and SRs cannot be used interchangeably. Rather, their differences in timing, amount and type of support, and modality impact which technique is best suited to a particular research design. Over the years, writing research has incorporated TAs with greater frequency than SR, likely because it is possible for writers to think aloud throughout all phases of the writing process, while they are planning, composing, editing, and/or revising their writing, and with TAs, unlike with SR, there is no risk that participants will forget what they were thinking because verbalization is concurrent with writing. For instance, TAs have been profitably used to gain insight into individual writers’ composing and editing processes as well as to discover how deeply writers process different types of feedback on their writing and what revisions they make as a result (e.g., Caras, 2019; Kim & Bowles, 2019; Park & Kim, 2019. See also Chapter 14 and Chapter 16, this volume). However, there are some studies that preclude the use of TAs, such as those involving collaborative writing. In this type of study, TAs are typically

Chapter 5. Verbally mediated data

not appropriate because participants must work together to jointly produce a collaborative text and are therefore speaking to each other while they work together to draft a text. Having participants additionally think aloud while speaking to their peer(s) and writing would be considered too taxing because it would be an additional task. Furthermore, TAs are collected to capture an individual’s thought processes whereas collaborative talk, by its very nature, is more co-constructed and social and reflects the knowledge that is generated by a pair or group (see Chapter 6, for further information). If a researcher wanted to understand individuals’ cognitive processing in collaborative writing, it would be more appropriate to video or audio record the interaction while writers work together to write collaboratively and to ask them to engage in SR after the fact (e.g., Hoang, 2019; Khuder & Harwood, 2015; Révész et al., 2017; Révész et al., 2019; Sasaki, 2004; Stiefenhöfer & Michel, 2020. See Chapter 6, Chapter 7, this volume). Similarly, if a researcher wanted to use verbal reports to examine instructors’ or learners’ perceptions of one-on-one writing conferences, or to explore learners’ perceptions while they were giving each other essay peer review feedback, SR would be more appropriate. In studies of writing processes framed in cognitivist theories, the research questions tend to focus on how learner-internal factors, such as proficiency, or learner-external factors, such as text type, impact writers’ thought processes. Crucially, in this framework, verbal reports are taken as evidence of cognitive processes and a window into L2 writers’ minds. In studies framed in sociocultural theories, on the other hand, the basic assumption is not that verbalizations are a reflection of processing but rather that they are a means by which knowledge is socially co-constructed.

Validity issues with verbal reports Despite the widespread use of verbal reports, there has been controversy surrounding their validity, which hinges on (1) whether verbalizing alters the very thought processes under investigation (reactivity) and (2) whether verbalizations are an accurate (or truthful) reflection of thoughts (veridicality). Given that TAs are concurrent to task completion, they have long been thought to be veridical, or to accurately reflect thoughts, since there is no time lapse between task completion and reporting for memory decay to occur (Ericsson & Simon, 1993). This, however, does not imply that TAs exhaustively capture every fleeting thought or minute detail that goes through a participant’s mind (Bowles, 2018), which may be viewed as more of an issue of completeness rather than accuracy or truthfulness. For TAs, the main challenge is determining whether verbalizing actually alters the

107

108

Ronald P. Leow & Melissa A. Bowles

thought processes that a participant would normally have when completing the task silently. Ericsson and Simon’s (1993) classic model of verbalization predicts that as long as participants are asked merely to think their thoughts aloud, without being required to explain or justify their reasoning, there should be no significant impact on processing, although verbalizing often causes participants to take longer to complete the task. Additionally, if participants are asked to provide additional justifications for their thoughts, the model predicts that processing would be substantially impacted (i.e., such TAs would be reactive). Recent quantitative meta-analyses in psychology (Fox et al., 2011) and in SLA (Bowles, 2010) confirm those predictions, showing that with both verbal and non-verbal tasks, thinking aloud does not generally impact task performance or, by extension, alter thought processes, although it does increase time on task. Leow et al. (2014) advise that type of task may have an impact on reactivity, however. According to these researchers, the degree of naturalness of thinking aloud during certain tasks should be considered. They reported that problem-solving tasks such as a crossword or a maze (e.g., Bowles, 2008; Leow, 1997) appear to provide very high percentages of concurrent data (almost 100%) when compared to those gathered during a reading task with an average of 60% (e.g., Leow, 2001; Leow et al., 2008). Turning to SR, whether verbalizing alters thought processes (reactivity) is not a concern because reporting happens after task completion. Instead, the validity concern is whether the verbalization is an accurate or truthful representation of the thought processes that were going on in the participant’s mind while they were completing the task given the potential for manufacturing some thoughts during recall (veridicality). There are minimally three reasons why SR could be non-veridical: (1) because there is a delay between task completion and reporting, participants might not accurately remember their thoughts due to memory decay, (2) because SR results in participants having double exposure to the task (once during task completion and once again when the stimulus is presented). With double exposure, it is possible that participants may have new thoughts occur to them when they are presented with the stimulus that they did not have when first completing the task, and 3) having an audience for their reports may be an additional cause of non-veridicality for this procedure.

Validity issues with verbal reports in ISLA The first ISLA study to empirically address the potential reactivity of TAs, Leow and Morgan-Short (2004), compared the performance of 77 first year adult college-level participants who read an L2 text with TAs to those who did so

Chapter 5. Verbally mediated data

silently. They found no statistically significant differences in performance on comprehension, intake, or written production. It is noteworthy that they cautioned readers that “given the many variables that potentially impact the issue of reactivity in SLA research methodology, it is suggested that studies employing concurrent data-elicitation procedures include a control group that does not perform verbal reports as one way of addressing this issue” (p. 50). Leow and Morgan-Short’s (2004) study engendered further research investigating reactivity either with studies employing grammar learning tasks, including L2 instructional and problem-solving tasks (e.g., Bowles, 2008; Sanz et al., 2009; Stafford et al., 2012), or L2 reading tasks (e.g., Bowles & Leow, 2005; Goo, 2010; Leow & Morgan-Short, 2004; Morgan-Short et al., 2012; Rossomondo, 2007; Yoshida, 2008). It is important to note that although TAs have been used widely in writing research, only a handful of studies to date have examined the reactivity of TAs during writing tasks (e.g., Adrada-Rafael & Filgueras-Gómez, 2019; Sachs & Polio, 2007; Suh, 2020; Yang et al., 2014, 2020; Yanguas & Lado; 2012). Studies investigating the validity of TAs and SRs in writing research specifically are discussed in the next section.

Validity issues with verbal reports in writing research To date, only seven studies have empirically addressed the issue of reactivity in L2 writing tasks, six of which addressed the role of TAs while the remaining study addressed that of SR. Of the six that addressed the role of TAs, half examined the impact of verbalization on complexity, accuracy, and fluency (e.g., Yanguas & Lado, 2012; Yang et al., 2014, 2020) while the other half focused on L2 writers’ use of written corrective feedback (e.g., Adrada-Rafael & Filgueras-Gómez, 2019; Sachs & Polio, 2007; Suh, 2020).

Reactivity in writing tasks: Effects on complexity, accuracy, and fluency Yanguas and Lado (2012) examined the role of reactivity using a semi-guided writing task. Participants were 37 heritage Spanish speakers who were instructed to write captions to accompany three different comic strips. Two intact classes were assigned to either a TA (n = 20) or a silent condition (n = 17) and the fluency, accuracy and lexical complexity of the two groups’ writing were compared. There were positive reactive effects for accuracy and lexical complexity but no significant differences for fluency. However, the heritage language learners’ proficiency was not specified, and their prior knowledge was not controlled, making it impossible to assess how participants’ prior knowledge might have affected their processing. In addition, participants were not randomly assigned to experimental conditions.

109

110

Ronald P. Leow & Melissa A. Bowles

Yang et al. (2014) and its extension Yang et al. (2020) also investigated the role of reactivity in an L2 writing task. In the former, 95 Chinese non-English majors first completed a baseline writing task silently in a classroom and were then randomly assigned to three conditions: silent (n = 31), non-metacognitive think-aloud (NMTA, n = 32, that only thought aloud), or metacognitive TA (MTA, n = 32 that provided a reason for each sentence) and asked to write an argumentative essay in English with no time limit. The dependent variables were 11 measures of fluency, accuracy, complexity, and overall quality. The results revealed that those in TA conditions experienced significantly more disfluencies (i.e., the total number of words that a participant crosses out or reformulates divided by the total number of words produced) and a significant reduction (with a negligible effect size) of syntactic variety in the compositions while no reactivity was reported for accuracy or overall quality of writing. The authors concluded that TA in both conditions was “somewhat detrimental” for fluency and syntactic variety and that, with respect to the other measures examined, “their overall effects may not be strong” (p. 64). Interestingly, the measures for disfluency were noted by the authors as being relatively weak given the “peripheral status of dysfluencies as a measure of fluency” (p. 62), which are very rarely included in writing rubrics. For syntactic variety, they suggested that the results were partly due to its relatively little application as one facet of writing performance. Other limitations may include the treatment for the TA conditions being provided by three different experimenters and the exam-like conditions (the experimenters sitting close by) under which the TA groups performed their tasks. The authors concluded that “the overall effects of TA on L2 writing processes appear unsubstantial when such effects were operationalized in terms of L2 writing performance” (p. 64). Yang et al. (2020) employed a narrative writing task instead of an argumentative one and also addressed the potential role of participants’ working memory capacity (WMC) and perception of TA. The dependent variables were increased to 20 measures of fluency, accuracy, complexity, content, and organization of writing. 85 Chinese native speakers at an intermediate level of English proficiency first completed a baseline writing task silently in a classroom and then were randomly assigned to non-metacognitive TA or silent conditions. WMC was assessed via two tests (an operation span task and a reading span task). The results showed that, of the 20 measures examined, TA appeared to have an effect on lexical diversity and disfluencies but not on time, speed, structural complexity, accuracy, length, content, or organization. With respect to WMC, the findings reveal that detriment was greatest in organization for the low WMC group, while the high WMC group experienced significant impairment to lexical diversity. Lastly, participants appeared to view TA as detrimental. Like Yang et al. (2014), they also

Chapter 5. Verbally mediated data

reported some reactivity for a few aspects of writing (lexical diversity and nondisfluencies) but not for the majority of the measures. Although Yang et al. (2020) reported TAs to be negatively reactive for fluency, replicating the previous finding, and for lexical diversity, as the authors noted, a potential limitation is that a baseline level of participants’ lexical diversity (unlike in Yang et al., 2014) was not established. Another potential limitation may be the lack of equal conditions in which the baseline and the main tasks were administered. The baseline writing task was completed during the second half of the week (Thursday or Friday) during an “in-class hands-on” session, while the main task was completed over the weekend. If participants completed the baseline task as part of an in-class activity but the main task as an extracurricular activity, it is possible the different environments led participants to address the two tasks with different frames of mind. The 15-minute limitation for completion of the composition also appeared to align with the exam-like conditions observed in Yang et al. (2014), which could have impacted performance.

Reactivity in writing tasks: WCF In two experiments, Sachs and Polio (2007) investigated the role of type of written corrective feedback (error correction vs. reformulations vs. reformulation plus think aloud) on subsequent grammatical accuracy in L2 learners’ writing. In the first experiment with a repeated-measures design, 15 adult high intermediate learners of English wrote compositions in each of the three WCF conditions over the course of three weeks following a composing-comparison-revision process. The results showed that participants’ performance was significantly better in the error correction condition when compared to the reformulation condition. In addition, negative reactivity for revision accuracy in the reformulation plus thinkaloud condition was reported. In the second experiment, 54 ESL students at different levels of proficiency participated in a non-repeated measures design. Unlike the first study, the second study took place over the course of one week, following the same composing-comparison-revision process. A control group was also added and participants were randomly assigned to one of the three WCF conditions (error correction, n = 12; reformulation, n = 11; reformulation + think-aloud, n = 16; and control, n = 15). The results revealed that all three WCF groups outperformed the control group. However, the negative reactivity effect reported in the first study was not replicated in the second study, suggesting that there was no reactive effect for revision accuracy. One major limitation of the first experiment was the repeated-measures, within-subjects design. Participants admitted having memorized the WCF they received in order to produce it the next day. In addition, alternate forms were not

111

112

Ronald P. Leow & Melissa A. Bowles

used in the repeated-measures design, meaning that participants completed the same task each week, potentially influencing the veridicality of the participants’ second and third performances. Two other limitations are the small number of participants and small effect size reported when compared to the second experiment. Finally, participants were required to think aloud in their L2, which at this level of proficiency could have placed a greater cognitive load on a participant than doing so in their L1 and sets this study apart from other reactivity studies. Adrada-Rafael and Filgueras-Gómez (2019) was a conceptual replication of Sachs and Polio (2007) designed to address the potential reactive role of language of reporting (L1 vs. L2) in addition to the role of depth of processing, operationalized based on Leow’s (2015, pp. 227–228) criteria, during feedback processing in an L2 writing task. Participants were 47 advanced students of Spanish who were assigned to one of two experimental conditions: TA (n = 29) or silent (n = 15); in the TA condition, 13 thought aloud in their L1 and the remaining 16 in their L2. Like the original study, all participants received reformulations on their compositions. The results revealed that reporting language did not have a reactive effect on task performance. In addition, regardless of language use, participants processed grammatical and lexical items at an intermediate or deeper level; however, at the deepest level of processing, there was an increase in L1 comments as compared to L2. According to the authors, “[R]eaching deeper levels of processing can be more challenging with grammatical items than with lexical items, and the language used to process could be a factor, arguably because it can be easier for learners to verbalize more sophisticated thoughts in their L1” (p. 207). The discrepancy between findings of the original study and this one may lie in the proficiency level of the participants. Whereas learners in Sachs and Polio (2007) had intermediate proficiency, those in Adrada-Rafael and FilguerasGómez (2019) had advanced proficiency and likely had to expend less cognitive effort to think aloud in their L2. In addition, while Adrada-Rafael and FilguerasGómez addressed the role of reporting language in reactivity, they did not report comparisons between the TA groups and the silent group, making them unable to fully address the potential effect of TAs on a writing task. Recently, Suh (2020) investigated the potential reactivity of TAs in the context of focused WCF in L2 writing. A group of 59 Korean learners of English at an intermediate level of proficiency were randomly assigned to one of two conditions: TA or silent. The target structure was the English past counterfactual conditional (e.g., If I had not overslept, I would not have missed the bus yesterday) for which participants received either direct or indirect WCF. Participants followed a two-stage composing-feedback process and performed a story-telling task. The study was conducted on five days throughout a four-week period for each participant, during which they completed a pretest, three (new) writings during the

Chapter 5. Verbally mediated data

experimental phase (story-retelling tasks with sequenced picture and vocabulary prompts), an immediate posttest, a delayed posttest and an exit questionnaire. The results revealed no reactivity in the TA condition. Moreover, TA had no significant effect on the participants’ ability to produce the target structure in subsequent new writings or in their development of receptive knowledge when compared to the silent group. This study stands out from the other studies in this writing strand in that it addressed learning as opposed to revision accuracy or CAF measures of the writing process.

Validity issues with stimulated recalls in writing research Just three studies (Adams, 2003; Egi, 2007, 2008) have investigated the validity of SR, seeking to determine whether double exposure to L2 input in the course of SR following task-based interaction caused significant increases in post-test scores. Two of these studies (Egi, 2007, 2008) addressed the issue of reactivity in relation to communicative activities revealing mixed findings: Increased learning gains (positive reactivity) were reported in Egi (2007) whereas Egi (2008) revealed no such effect. To our knowledge, only Adams (2003) has investigated the validity of SR in writing research. 56 L2 learners of Spanish first collaboratively wrote a story in pairs (pre-test) and were then randomly assigned to one of three groups: (1) noticing, (2) noticing + stimulated recall, or (3) control group. Each pair in the two experimental groups discussed the differences between the original and reformulated versions of their essay (noticing session). Immediately after the noticing session, the noticing + stimulated recall group listened to an audio-recording of their discussion and recalled their thoughts at the time they made the comparisons. Both experimental groups then took a post-test in which they were asked to individually write out the story again. The noticing + stimulated recall group incorporated significantly more target-like forms in the post-treatment output than participants from the other groups, revealing positive reactivity due to SR.

Summary of reactivity in writing research As can be seen, previous writing studies have reported mixed results related to the reactivity of TAs and SRs. While there are not yet enough writing reactivity studies for a meta-analysis and all three studies that had participants complete TAs while writing reported reactivity (positive: Yanguas & Lado, 2012; negative: Yang et al., 2014; Yang et al., 2020), the overall effect on a minority of measures leads to the conclusion that TAs do not have a substantial impact on the writing process. It is also noteworthy that while Yanguas and Lado’s participants were

113

114

Ronald P. Leow & Melissa A. Bowles

heritage speakers, those in Yang et al. (2014) and Yang et al. (2020) were intermediate L2 learners, which may provide one plausible explanation for the discrepancy in results. Since heritage speakers often have more exposure to the target language than foreign language speakers, thinking aloud might not be as detrimental to fluency for heritage learners as it was for L2 learners. In addition, while thinking aloud was reported to be positively reactive to heritage speakers’ accuracy in Yanguas and Lado (2012), no reactivity was found for accuracy in the studies of Yang et al. (2014) and Yang et al. (2020). Participants’ proficiency level in the target language might have affected the TAs’ effect on accuracy as well. Among WCF studies, only the first experiment in Sachs and Polio (2007) reported negative reactivity but, as observed above, this result needs much caution. Together with the findings of Sachs and Polio’s second experiment and those of Adrada-Rafael and Filgueras-Gómez (2019) and Suh (2020), a similar conclusion of minimal effect of reactivity for revision accuracy and learning can be drawn. Regarding reactivity and SR conducted immediately after the experimental phase, the only writing study (Adams, 2003) reported positive reactivity on subsequent posttest performances. This result may not be surprising given that participants were afforded an additional exposure to the target information in the writing task.

Robustness of writing studies employing verbal reports From a methodological perspective, studies that have employed TAs have all conducted TA training before the experimental phase of the study and all included a silent group. In addition, while some studies specifically reported the time lapse of silence before nudging participants to continue thinking aloud (Yang et al., 2014; Yang et al., 2020), the other studies did not. Yang et al. (2014) also reported interrupting participants in TA conditions to adhere to type of TA while Yang et al. (2020) set a time limit for their task. Yang et al. (2014) and Yang et al. (2020) also created a kind of exam setting by having the experimenters sitting close by to the participants. With regard to the robustness of the findings, the TAs in all the studies addressing the writing process provided satisfactory information on the multiple measures employed. On the other hand, while Sachs and Polio (2007) and Suh (2020) employed TAs to address the issue of reactivity primarily on subsequent revision accuracy and learning, Adrada-Rafael and Filgueras-Gómez (2019) were also able to provide relatively robust data on how deeply participants were processing both lexical and grammatical items in each TA condition.

Chapter 5. Verbally mediated data

The sole SR study (Adams, 2003) adhered closely to the recommendation provided by Gass and Mackey (2016) to minimize potential threats to validity by conducting the SR phase via an audio recording immediately after task completion to reduce the potential for memory decay (see discussion below).

Future directions As Leow and Morgan-Short (2004) acknowledged in their initial reactivity study, although a given study may not find reactivity, it does not suggest that verbalization does not influence participants’ performance in any way; instead, it demonstrates that verbalizing did not impact the overall behavioral product. However, given the paucity of studies addressing the role of TAs and SRs in writing research and the acknowledgment that verbal reports are arguably the most robust procedure capable of eliciting the richest data on cognitive processes employed during a writing task (Leow & Manchón, 2021; Manchón & Leow, 2020), validity issues need to be further addressed in future studies. For writing process studies, variables that could potentially impact validity include task complexity and proficiency level, while for WCF studies, type of linguistic item, explicitness of feedback (e.g., +/- metalinguistic information), type and complexity of target structure, and L2 proficiency (Bowles, 2010) are key variables. Mode of writing (paper-and-pen vs. digital), context (formal vs. informal), and different populations (children vs. adults) are also of much interest (Manchón & Leow, 2020). Additionally, it would be interesting to compare the effects of TAs on an L2 writing task with and without a planning or revision opportunity to see whether the planning or revision session might play a role in inducing reactivity. Finally, another fruitful direction for further research might be to examine the possible effect of individual differences on reactivity (Bowles & Leow, 2005) by including measures of, for example, working memory, motivation, learner strategies, or cognitive styles. Writing research conducted from a process-oriented perspective clearly warrants the use of verbal reports to elicit concurrent data on how L2 writers process during both the writing and revision process (Leow & Manchón, 2021; Manchón & Leow, 2020). Verbal reports shed light on a vast array of issues related to processing that would otherwise be undiscoverable. Indeed, TAs are indispensable for studying issues such as depth of processing or levels of awareness, which are not easily captured by methods like eye-tracking (Leow et al., 2014) in addition to keystroke logging and screen capture. Depth of processing is defined as “the relative amount of cognitive effort, level of analysis, and elaboration of intake, together with the usage of prior knowledge, hypothesis testing, and rule formation

115

116

Ronald P. Leow & Melissa A. Bowles

employed in decoding and encoding some grammatical or lexical item in the input” (Leow, 2015, p. 204). Several levels of awareness have been postulated in the (I)SLA literature ranging from awareness at the level of noticing to reporting to understanding (Leow, 1997; Schmidt, 1990), where awareness at the level of noticing is correlated with a low depth of processing (Leow, 2012, 2015). However, TAs and SR are not a panacea and are not well-suited to capture processing at low levels of awareness or fleeting attention paid to L2 data. We follow the recommendation by Leow et al. (2014) in their critical review of three concurrent data elicitation procedures (TAs, eye-tracking – ET, and reaction times – RT) that “one methodological suggestion may be to employ a procedural combination of ET, RT, and TA that aims to increase the level of internal validity of the study by maximizing the strengths of a particular procedure while minimizing its weaknesses” (p. 121). This call for triangulation of data (see also Révész & Michel, 2019), then, may be the logical direction to follow to capture the many subtle layers of the writing process involved in the language learning process and the provision of WCF (Leow & Manchón, 2021). For example, eye-tracking methods like TRAKTEXT (Hacker, Keener, & Kircher, 2017. See also Chapter 10, this volume) may be needed to complement verbal reports. Yu, He, and Isaacs (2017), who sought to validate an IELTS writing task and understand how examinees allocated attention, is an excellent example of how verbal reports can be combined with eye-tracking. In this study, eye-tracking provided total fixation times on particular aspects of the prompt and task, whereas verbal reports provided more in-depth information about how examinees processed test components. Finally, we recommend that the stimulus for SR in writing research involve, for more robustness, direct observation methods such as screen capture (see Chapter 7, this volume) and keystroke logging (Barkaoui, 2019; Breuer, 2019. See also Chapter 9, this volume) because it will provide the most complete data source and verbalizations can be timestamped to, for instance, changes in the written product. Even for TAs, it is advisable to have such measures as well, so that the verbalization is better contextualized since otherwise it may not always be clear what part of the text a participant’s verbalization refers to. To minimize potential threats to the validity of TAs and SR, it is highly recommended that best practices, as described in Bowles (2010) for TAs and Gass and Mackey (2016) for SR, are followed. Researchers who use TAs as a data elicitation tool should also have a group of participants complete the tasks silently so that any differences in performance potentially due to thinking aloud can be identified and measured. In addition, participants should also be allowed to complete the task at their own pace. For SR, it is crucial for (1) the time lapse between task completion and reporting be as short as possible to reduce the potential for memory decay and for (2) the stimulus to be as robust as possible, relying on mea-

Chapter 5. Verbally mediated data

sures such as screen captures, videos, or keystroke logging tools so that participants can have multimodal input to remind them of what was going on while they were completing the task.

References Adams, R. (2003). L2 output, reformulation and noticing: Implications for IL development. Language Teaching Research, 2, 347–376. Adrada-Rafael, S., & Filgueras-Gómez, M. (2019). Reactivity, language of think-aloud protocol, and depth of processing in the processing of reformulated feedback. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 199–211). Routledge. Armengol, L., & Cots, J. L. (2009). Attention processes observed in think-aloud protocols: Two multilingual informants writing in two languages. Language Awareness, 18(3–4), 259–276. Barkaoui, K. (2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54–74. Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28(1), 51–75. Barkaoui, K. (2019). What can L2 writers’ pausing behavior tell us about their L2 writing processes? Studies in Second Language Acquisition, 41, 529–554. Beare, S. (2001). Differences in content generating and planning processes of adult L1 and L2 proficient writers. Dissertation Abstracts International, A: The Humanities and Social Sciences, 62(2), 547-A. Bloom, B. (1953). Thought-processes in lectures and discussions. Journal of General Education, 7(3), 160–169. https://www.jstor.org/stable/27795429 Bowles, M. (2008). Task type and reactivity of verbal reports in SLA: A first look at a task other than reading. Studies in Second Language Acquisition, 30(3), 359–387. Bowles, M. A. (2010). The think-aloud controversy in second language research. Routledge. Bowles, M. A. (2018). Introspective verbal reports: Think-alouds and stimulated recall. In A. Phakiti, P. De Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 339–357). Palgrave. Bowles, M., & Leow, R. P. (2005). Reactivity and type of verbal report in SLA research methodology: Expanding the scope of investigation. Studies in Second Language Acquisition, 27(3), 415–440. Box, J. A. (2002). Guided writing in the early childhood classroom. Reading Improvement, 39(3), 111–113. Breetvelt, I. (1994). Relations between writing processes and text quality: When and how? Cognition and Instruction, 12(2), 103–123. Breuer. E. O. (2019). Fluency in L1 and FL writing: An analysis of planning, essay writing and final revision. In E. Lindgren & K. Sullivan (Eds.), Observing writing. Insights from keystroke logging and handwriting (pp. 190–211). Brill.

117

118

Ronald P. Leow & Melissa A. Bowles

Caras, A. (2019). Written corrective feedback in compositions and the role of depth of processing. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 186–198). Routledge. Castro, C. D. (2004). The role of Tagalog in ESL writing: Clues from students’ think-aloud protocols. Philippine Journal of Linguistics, 35(2), 23–39. Castro, D. (2005). Investigating the use of the native language in the process of writing in a second language: A qualitative approach. Revista Virtual de Estudos da Linguagem, ReVEL, 3(5), n.p. Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18(1), 80–98. Cohen, A. D., & Cavalcanti, M. C. (1987). Viewing feedback on compositions from the teacher’s and the student’s perspective. ESPecialist, 16, 13–28. Cohen, A. D., & Cavalcanti, M. C. (1990). Feedback on compositions: Teacher and student verbal reports. In B. Kroll (Ed.), Second language writing: Research insights for the classroom (pp. 155–177). Cambridge University Press. Cushman, D. (2002). From scribbles to stories. Instructor, 111(5), 32–33. Diab, R. L. (2005). Teachers’ and students’ beliefs about responding to ESL writing: A case study. TESL Canada Journal/Revue TESL Du Canada, 23(1), 28–43. Durst, R. K. (1987). Cognitive and linguistic demands of analytic writing. Research in the Teaching of English, 21(4), 347–376. Egi, T. (2007). Recasts, learners’ interpretations, and L2 development. In A. Mackey (Ed.), Conversational interaction in second language acquisition: A series of empirical studies (pp. 249–267). Oxford University Press. Egi, T. (2008). Investigating stimulated recall as a cognitive measure: Reactivity and verbal reports in SLA research methodology. Language Awareness, 17(3), 212–228. El Mortaji, L. (2001). Writing ability and strategies in two discourse types: A cognitive study of multilingual Moroccan university students writing in Arabic (L1) and English (L3). Dissertation Abstracts International, C: Worldwide, 62(4). Ericsson, K. A., & Simon, H. A. (1984). Protocol analysis: Verbal reports as data. The MIT Press. Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (rev. ed.). The MIT Press. Flower, L., & Hayes, J. (1980). The dynamics of composing: Making plans and juggling constraints. In L. Gregg & E. Steinberg (Eds.), Cognitive processes in writing (pp. 31–50). Lawrence Erlbaum Associates. Flower, L., & Hayes, J. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–87. Fox, M. C., Ericsson, K. A., & Best, R. (2011). Do procedures for verbal reporting of thinking have to be reactive? A meta-analysis and recommendations for best reporting methods. Psychological Bulletin, 137(2), 316–344. Fresch, M. J., Wheaton, A., & Zutell, J. B. (1998). Thinking aloud during spelling word sorts. National Reading Conference Yearbook, 47, 285–294. Gass, S. M., & Mackey, A. (2016). Stimulated recall methodology in applied linguistics and L2 research (2nd ed.). Routledge.

Chapter 5. Verbally mediated data

Gebril, A., & Plakans, L. (2014). Assembling validity evidence for assessing academic writing: Rater reactions to integrated tasks. Assessing Writing, 21, 56–73. Goo, J. (2010). Working memory and reactivity. Language Learning, 60(4), 712–752. Hacker, D. J., Keener, M. C., & Kircher, J. C. (2017). TRAKTEXT: Investigating writing processes using eye-tracking technology. Methodological Innovations, 10(2), 1–18. Hatasa, Y. A., & Soeda, E. (2000). Writing strategies revisited: A case of non-cognate L2 writers. In B. Swierzbin, F. Morris, M. Anderson, C. A. Klee & E. Tarone (Eds.), Social and cognitive factors in second language acquisition: Selected proceedings of the 1999 Second Language Research Forum (pp. 375–396). Cascadilla. Hoang, H. (2019). In E. Lindgren & K. Sullivan (Eds.), Observing writing: Insights from keystroke logging and handwriting (pp. 236–257). Brill. Jannausch, U. H. (2002). A case study of native speakers of English composing in German as a foreign language. Dissertation Abstracts International, A: The Humanities and Social Sciences, 62(12), 4144-A. Johnson, K. E. (1992). Cognitive strategies and second language writers: A re-evaluation of sentence combining. Journal of Second Language Writing, 1(1), 61–75. Kang, Y., & Pyun, D. O. (2013). Mediation strategies in L2 writing processes: A case study of two Korean language learners. Language, Culture and Curriculum, 26(1), 52–67. Khuder, B., & Harwood, N. (2015). L2 writing in test and non-test situations: Process and product. Journal of Writing Research, 6, 233–278. Kim, H. R., & Bowles, M. A. (2019). How deeply do L2 learners process written corrective feedback? Insights gained from think-alouds. TESOL Quarterly, 53(4), 913–938. Latif, M. M. A. (2009). Toward a new process-based indicator for measuring writing fluency: Evidence from L2 writers’ think-aloud protocols. The Canadian Modern Language Review/La Revue Canadienne des Langues Vivantes, 65(4), 531–558. Leow, R. P. (2001). Do learners notice enhanced forms while interacting with the L2? An online and offline study of the role of written input enhancement in L2 reading. Hispania, 84, 496–509. Leow, R. P. (2012). Explicit and implicit learning in the L2 classroom: What does the research suggest? The European Journal of Applied Linguistics and TEFL, 2, 117–129. Leow, R. P. (2015). Explicit learning in the L2 classroom: A student-centered approach. Routledge. Leow, R. P., & Morgan-Short, K. (2004). To think aloud or not to think aloud: The issue of reactivity in SLA research methodology. Studies in Second Language Acquisition, 26(1), 35–57. Leow, R. P., Hsieh, H., & Moreno, N. (2008). Attention to form and meaning revisited. Language Learning, 58, 665–695. Leow, R. P., & Manchón, R. M. (2021). Expanding research agendas: Directions for future research agendas on writing and feedback as language learning from an ISLA perspective. In R. M. Manchón & C. Polio (Eds.), Routledge handbook of second language acquisition and writing (pp. 299–311). Routledge. Leow, R. P., Grey, S., Marijuan, S., & Moorman, C. (2014). Concurrent data elicitation procedures, processes, and the early stages of L2 learning: A critical overview. Second Language Research, 30(2), 111–127.

119

120

Ronald P. Leow & Melissa A. Bowles

Li, H., & He, L. (2015). A comparison of EFL raters’ essay-rating processes across two types of rating scales. Language Assessment Quarterly, 12(2), 178–212. López Serrano, S., Roca de Larios, J., & Manchón, R. M. (2020). Reprocessing output during L2 individual writing tasks: An exploration of depth of processing and the effects of proficiency. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas. John Benjamins. Manchón, R. M., & Leow, R. P. (2020). An ISLA perspective on L2 learning through writing: Implications for future research agendas. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 336–355). John Benjamins. McCulloch, S. (2013). Investigating the reading-to-write processes and source use of L2 postgraduate students in real-life academic tasks: An exploratory study. Journal of English for Academic Purposes, 12(2), 136–147. Morgan-Short, K., Heil, J., Botero-Moriarty, A., & Ebert, E. (2012). Allocation of attention to second language form and meaning: Issues of think alouds and depth of processing. Studies in Second Language Acquisition, 34, 659–685. Murphy, L., & Roca de Larios, J. (2010). Searching for words: One strategic use of the mother tongue by advanced Spanish EFL writers. Journal of Second Language Writing, 19(2), 61–81. Park, E. S., & Kim, O. Y. (2019). Learners’ engagement with indirect written corrective feedback: Depth of processing and self-correction. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 212–226). Routledge. Plakans, L. (2009). The role of reading strategies in integrated L2 writing tasks. Journal of English for Academic Purposes, 8(4), 252–266. Qi, D., & Lapkin, S. (2001). Exploring the role of noticing in a three-stage second language writing task. Journal of Second Language Writing, 10, 277–303. Révész, A., Kourtali, N. -E., & Mazgutova, D. (2017). Effects of task complexity on L2 writing behaviors and linguistic complexity. Language Learning, 67(1), 208–241. Révész, A., & Michel, M. (Eds.) (2019). Methodological advances in investigating L2 writing processes. Special issue of Studies in Second Language Acquisition, 41(3). Révesz, A, Michel, M., & Lee, M. (2019). Exploring second language writers’ pausing and revision behaviors. Studies in Second Language Acquisition, 41, 605–631. Roca de Larios, J., Marin, J., & Murphy, L. (2001). A temporal analysis of formulation processes in L1 and L2 writing. Language Learning, 51(3), 497–538. Roca de Larios, J., Manchón, R., Murphy, L., & Marín, J. (2008). The foreign language writer’s strategic behaviour in the allocation of time to writing processes. Journal of Second Language Writing, 17(1), 30–47. Rossomondo, A. E. (2007). The role of lexical temporal indicators and text interaction format in the incidental acquisition of the Spanish future tense. Studies in Second Language Acquisition, 29(1), 39–66. Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on an L2 writing revision task. Studies in Second Language Acquisition, 29(1), 67–100. Sanz, C., Lin, H. -J., Lado, B., Bowden, H. W., & Stafford, C. A. (2009). Concurrent verbalizations, pedagogical conditions, and reactivity: Two CALL studies. Language Learning, 59(1), 33–71.

Chapter 5. Verbally mediated data

Sasaki, M. (2004). A multiple-data analysis of the 3.5-year development of EFL student writers. Language Learning, 54, 525–582. Scardamalia, M. (1984). Teachability of reflective processes in written composition. Cognitive Science: A Multidisciplinary Journal of Artificial Intelligence, 8(2), 173–190. Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11, 129–158. Schwartz, A. M. (2003). Heritage Spanish speakers’ writing strategies. In A. Roca & C. Colombi (Eds.), Mi lengua: Spanish as a heritage language in the United States, Research and practice (pp. 235–257). Georgetown University Press. Schwartz, A. M. (2005). Exploring differences and similarities in the writing strategies used by students in SNS courses. In L. A. Ortiz López & M. Lacorte (Eds.), Contactos y contextos: El español en los Estados Unidos y en contacto con otras lenguas (pp. 323–334). Iberoamericana/Vervuert. Siegel, L., Capretta, P., Jones, R., & Berkowitz, H. (1963). Students’ thoughts during class: A criterion for educational research. Journal of Educational Psychology, 54(1), 45–51. Stafford, C. A., Bowden, H., & Sanz, C. (2012). Optimizing language instruction: Matters of explicitness, practice, and cue learning. Language Learning, 62(3), 741–768. Stiefenhöfer, L., & Michel, M. (2020). Investigating the relationship between peer interaction and writing process in computer supported collaborative L2 writing. An eye-tracking and stimulated recall study. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas. John Benjamins. Suh, B-R. (2020). Are think alouds reactive? Evidence from an L2 written corrective feedback study. Language Teaching Research. Suzuki, M. (2008). Japanese learners’ self revisions and peer revisions of their written compositions in English. TESOL Quarterly, 42(2), 209–232. Tiryakioglu, G., Peters, E., & Verschaffel, L. (2019). The effect of L2 proficiency level on composing processes of EFL learners: Data from keystroke loggings, think alouds and questionnaires. In E. Lindgren & K. Sullivan (Eds.), Observing writing: Insights from keystroke logging and handwriting (pp. 212–235). Brill. Uzawa, K. (1996). Second language learners’ processes of L1 writing, L2 writing, and translation from L1 into L2. Journal of Second Language Writing, 5(3), 271–294. Van Weijen, D., van den Bergh, Rijlaarsdam, G., & Sanders, T. (2008). Differences in process and process-product relations in L2 writing. ITL International Journal of Applied Linguistics, 156, 203–226. Wang, W., & Wen, Q. (2002). L1 use in the L2 composing process: An exploratory study of 16 Chinese EFL writers. Journal of Second Language Writing, 11(3), 225–246. Willey, I., & Tanimoto, K. (2015). ‘We’re drifting into strange territory here’: What think-aloud protocols reveal about convenience editing. Journal of Second Language Writing, 27, 63–83. Yang, C., Hu, G., & Zhang, L. J. (2014). Reactivity of concurrent verbal reporting in second language writing. Journal of Second Language Writing, 24, 51–70. Yang, C., Zhang, L. J., & Parr, J. (2020). The reactivity of think-alouds in writing research: Quantitative and qualitative evidence from writing in English as a foreign language. Reading and Writing: An Interdisciplinary Journal, 33(2), 451–483.

121

122

Ronald P. Leow & Melissa A. Bowles

Yanguas, I., & Lado, B. (2012). Is thinking aloud reactive when writing in the heritage language? Foreign Language Annals, 45(3), 380–399. Yoshida, M. (2008). Think-aloud protocols and type of reading task: The issue of reactivity in L2 reading research. In M. Bowles, R. Foote, S. Perpiñán, & R. Bhatt (Eds.), Selected proceedings of the 2007 Second Language Research Forum (pp. 199–209). Cascadilla Proceedings Project. Yu, G., Rea-Dickins, P., & Kiely, R. (2011). The cognitive processes of taking IELTS academic writing task one. IELTS Reports Volume 11, 373–449. Yu, G., He, L., & Isaacs, T. (2017). The cognitive processes of taking IELTS academic writing task one: An eye-tracking study. IELTS Research Report. Retrieved on 26 April 2023 from https://www.ielts.org/for-researchers/research-reports/ielts_online_rr_2017-2 Zellermayer, M., & Cohen, J. (1996). Varying paths for learning to revise. Instructional Science, 24(3), 177–195. Zhao, C. G. (2013). Measuring authorial voice strength in L2 argumentative writing: The development and validation of an analytic rubric. Language Testing, 30(2), 201–230.

chapter 6

Verbally mediated data Written verbalizations Wataru Suzuki, Masako Ishikawa & Neomy Storch Miyagi University of Education | Josai Universty | University of Melbourne

This chapter focuses on written verbalizations (e.g., written languaging, valid written explanations, diaries, written reflections) and discusses their possible roles as data collection instruments for the study of writing processes. We first describe general procedures for collecting written verbalizations and then critically analyze (a) the type of research questions researchers can ask and answer by using written verbalizations; (b) methodological challenges researchers face and the possible solutions to circumvent those challenges; and (c) how L2 researchers can best elicit written verbalizations in L2 research. We conclude by suggesting four practical tips to researchers who wish to use written verbalizations as a data collection instrument in the study of writing processes.

Introduction In Chapter 5, Ronald Leow and Melissa Bowles argue that oral verbalizations (e.g., think-aloud protocols, stimulated recalls) can represent a valid research methodology in the second language (L2) research. L2 researchers can use oral verbalizations to collect data about various cognitive processes underlying L2 performance (e.g., L2 writing strategies or noticing of and awareness of feedback on L2 writing). This chapter focuses on written verbalizations (e.g., written languaging, valid written explanations, diaries, written reflections) and discusses their possible roles as data collection instruments. We argue that written verbalizations may also be valid data collection procedures, but that, under some circumstances, written verbalizations (and oral verbalizations, see Swain, 2006a, 2006b) are part of the L2 learning processes, not just a medium of data collection. We begin by briefly describing two types of written verbalizations (e.g., concurrent and retrospective verbalizations) and considering their validity (e.g., veridicality and reactivity). https://doi.org/10.1075/rmal.5.06suz © 2023 John Benjamins Publishing Company

124

Wataru Suzuki, Masako Ishikawa & Neomy Storch

In concurrent written verbalizations, participants verbalize their thought processes while performing some tasks. For example, individuals explain, in writing, what is going on in their heads while they process written corrective feedback on errors in compositions (e.g., Boggs, 2019; Nicolás-Conesa et al., 2019; Manchón et al., 2020; Suzuki, 2012, 2017). In contrast, retrospective written verbalizations require participants to verbalize their thought processes after they have performed a task. For example, study participants are asked to write about what was going on in their minds after they had written some sentences and paragraphs (e.g., Noro, 2004; Suzuki & Itagaki, 2007, 2009) or after they had compared their writing with model texts (e.g., Hanaoka, 2007; Hanaoka & Izumi, 2012; Ishikawa & Révész, 2020). Veridicality refers to both the possibility that the “processes underlying behavior may be unconscious and thus not accessible for verbal reporting” (Ericsson & Simon, 1993, p. 109) and the possibility that “verbalizations, when present, may not be closely related to underlying thought processes” (p. 109). Put simply, veridicality of verbalizations concerns the completeness and accuracy of verbalizations. Concurrent written verbalizations are more likely to be veridical (that is, they are more likely to reflect thought processes accurately) than retrospective written reports. This is because concurrent reports are gathered while information is still active in working memory. Reactivity refers to the possibility that verbalizations may change what is going on in learners’ minds during task performance. Concurrent written verbalizations are more likely to be reactive; that is, they are more likely to alter the sequences of thought processes than retrospective written verbalizations. This is because concurrent written verbalizations require participants to perform a task (i.e., primary processing) and simultaneously verbalize their thinking (i.e., secondary processing). This dual processing caused by concurrent verbalizations most likely changes the primary processing. In contrast, retrospective written verbalizations are not reactive as they are conducted after the primary processing has been completed. Furthermore, Ericsson and Simon (1993) categorized verbalizations into three levels (i.e., Level 1, Level 2, Level 3). Level 1 verbalizations involve information currently in verbal form. For example, participants report what they are thinking while doing a language-related task. This verbalization level does not require participants to make additional effort to verbalize their thoughts, as they are already in verbal form. Hanaoka (2007), for example, asked his participants to write what they noticed immediately after comparing their essays with model texts (see also Hanaoka & Izumi, 2012). Level 2 verbalizations involve reporting thoughts that are not verbal, for example, when participants think aloud while playing chess or when trying to

Chapter 6. Verbally mediated data

understand diagrams. According to Ericsson and Simon (1993), this verbalization level requires additional time as participants recode nonverbal information into verbal form. Given the verbal nature of L2 writing, Level 2 verbalizations have rarely been used in L2 writing research. Level 3 verbalizations result from participants explaining their thoughts or specific information they would not ordinarily attend to while or after performing a task. Thus, Level 3 verbalizations are considered reactive. For example, Suzuki (2012) asked his participants to write out their explanations for the corrections on their essays (see also Manchón et al., 2020; Nicolás-Conesa et al., 2019). Level 3 verbalizations are similar to what Swain (2006a, 2006b, 2010) termed ‘languaging’, which corresponds to those instances when learners use language as a tool to work through a complex problem or gain a better understanding of a complex or abstract concept. A relatively small but growing number of studies have shown that engaging in written languaging contributes to students’ learning (see Suzuki & Storch, 2020 for a review). In this sense, Level 3 verbalizations are part of the learning process, not just a vehicle for data collection (Swain, 2006a, 2006b). In the following sections, we first describe general procedures for collecting written verbalizations and then critically analyze (a) the type of research questions researchers can ask and answer; (b) methodological challenges researchers face and possible solutions to circumvent those challenges; and (c) how L2 researchers best elicit written verbalizations in L2 research.

Methodological procedures of written verbalizations General description In this section we describe the general procedures for collecting written verbalizations and provide examples from published studies of the kind of instructions or worksheets given to participants to guide their written verbalizations. Written verbalization data are collected retrospectively or immediately after language production and language comprehension tasks. One way to collect written verbalization data is to ask participants to write journals, diaries, or written protocols retrospectively (e.g., Kasper, 1997; Nicolás-Conesa, et al., 2014; Sengupta & Falvey, 1998). For example, Kasper (1997) asked learners to describe and evaluate both positive and negative aspects of their English language writing experience at the beginning (Week 1) and end (Week 6) of an integrated reading/ writing course. In Nicolás-Conesa et al.’s (2014) study, learners were requested to write what goals and strategies they had in mind while writing their texts at the beginning and the end of the study. Sengupta and Falvey (1998) asked teachers to

125

126

Wataru Suzuki, Masako Ishikawa & Neomy Storch

write their beliefs about what good L2 writing is, views about student problems with writing, and techniques for giving feedback. Another way to collect written verbalization data is to ask students to write their thinking process as soon as possible after producing or comprehending language (e.g., Boggs, 2019; Ishikawa & Révész, 2020; Manchón et al., 2020; Noro, 2004; Suzuki, 2012; Suzuki & Itagaki, 2007, 2009). A typical data collection procedure is as follows. First, learners are asked to engage in language production or comprehension. Language production includes writing a sentence, a paragraph, or a composition. Language comprehension may consist of reading to-be-learned materials, such as written corrective feedback on written essays. Second, learners are requested to write everything that comes to mind while composing sentences or understanding written corrective feedback. Generally, students write or type their explanations in response to questions on a worksheet or a computer. For example, Noro (2004) asked his participants to write a paragraph and, immediately after that, to retrospect about their writing process and report on it as much as possible in a written form. Suzuki and Itagaki (2007) distributed a handout (see Table 1) in which participants were first instructed to translate an L1 sentence into L2 and then report what they had been thinking about while translating, what they had struggled with, and how they had arrived at their eventual solutions. Table 1. A sample handout for written reflections adapted from Suzuki and Itagaki (2007) Name:_______________ (1) Translate the following Japanese sentence into English 今晩十分に雪が降れば明日スキーに行ける。 (Konban jubun-ni yuki ga fure-ba asu sukii ni ikeru/ If it snows a lot tonight, I will be able to go sking.) _____________________________________________________________________________. (2) Report as fully and precisely as you can whatever you were thinking about while writing the above sentence (e.g., what you were struggling with, how you arrived at your eventual solution). _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________ _____________________________________________________________________________

These examples illustrate how learners are asked to write about the thinking processes involved in their L2 production (i.e., writing). However, the reporting is completed immediately after production. The following three examples show how learners can be asked to reflect, in writing, on their comprehension process. For

Chapter 6. Verbally mediated data

instance, Boggs (2019) asked her participants to complete reflective worksheets (see Table 2) after receiving direct written corrective feedback on their essays. The participants filled in all the rows in the self-explanation table after listening to the researcher’s oral instructions (see Excerpt 1 below) and reading the information provided in the table below. [1] Ok, on my essay, I have an error that is labeled number 1. First, I’ll copy the mistake here. In the next column, I need to write the correction. The correction is on my essay to check the teacher’s correction and write it here. Next, I need to explain why this is a problem. (J. A. Boggs, personal communication, February 1, 2021) Table 2. A sample worksheet for written languaging adapted from Boggs (2019) Error

Correction

Explanation of the correction

Ex.

Yesterday I go …

go ➔ went

Use past tense verbs to talk about past events

1

Counselor

a counselor

Use the article ‘a’ because this is the first time ‘counselor’ appears.

2 3

The first column shows the number attached to each grammatical error in the essays. In the second and third columns, learners report the errors identified in their writings and the researcher’s direct corrections. Finally, in the fourth column, the participants provide reasons for their errors and corresponding explanations. Manchón et al. (2020) also asked their participants to engage in written verbalizations following the provision of written corrective feedback. After reading the information provided on their errors, the participants were invited to fill in all the boxes in the table (see Table 3). The participants first copied each error they made in the “incorrect form” column. Then, they copied the direct correction supplied by their teacher in the “correct form” column. Finally, they explained, either in their L2 or L1, why they thought it was an error in the “explain why you think it’s an error” column (i.e., written languaging). Another example comes from a study conducted by Ishikawa and Révész (2020). The researchers first asked participants to reconstruct a passage after listening to it (i.e., dictogloss). Then, the students were given an instruction sheet (see [2] below), which asked them to carefully compare their reconstructed text and the original text and write down their thoughts upon examining the original text on the sheet (see [3] below).

127

128

Wataru Suzuki, Masako Ishikawa & Neomy Storch

Table 3. A sample written verbalization form adapted from Manchón et al. (2020) Incorrect form

Correct form

Explain why you think it’s an error

To call to the firefighter

To call the firefighter

The preposition “to” is unnecessary with the verb “to call,” which means here “to phone somebody.”

 1  2  3

[2] Below is what you heard earlier. Please compare it with your reconstruction, paying special attention to forms such as tense and verb forms. Are there any difference? If so, how is it different? (Why do you think whatever you noticed is used in the original text? Can you think of the grammar rule?) Think hard and write whatever comes to your mind between the lines or the blank space below. (Write on this sheet! Please do not write on the sheet you used earlier.) (Instruction) [3] …I need to think about my options one more time. If I had better grades, I could go to graduate school. … (written verbalization)

Written verbalization prompts Prompts used to elicit written verbalization vary across studies. Broadly, we can identify two prompt types in L2 writing research on written verbalizations: selfdirected and other-directed. In some studies employing self-directed prompts, learners write explanations to themselves (e.g., Boggs, 2019; Manchón et al., 2020; Moradian et al., 2020; Moradian et al., 2017; Simard & Zuniga, 2020; Suzuki, 2012, 2017). For example, Suzuki (2012) encouraged his participants to write selfexplanations to themselves, using the following prompt “Why is this linguistic form incorrect/wrong? Why did the instructor give feedback on this form? Please write your explanation in Japanese.” (pp. 1118–1119). Similarly, Simard and Zuniga (2020) asked students to write their reaction to written corrective feedback on their essays, using the following prompt “What were your first reactions when you received your corrected work? What grabbed your attention first? Why?” (p.292). Although not explicitly encouraged to write explanations to others in these studies, participants might have produced written verbalizations to/for their teachers/ researchers. In studies using other-directed prompts, learners are explicitly requested to generate written explanations for others such as the instructor or fictitious stu-

Chapter 6. Verbally mediated data

dents (e.g., Ishikawa & Suzuki, 2016, 2023; Nicolás-Conesa et al., 2014). For example, Nicolás-Conesa et al. (2014) asked participants to write journals addressing other students (or addressing fictitious students), using the following prompts (p. 17. Emphasis added): 1.

In the light of what you have learned at university, please write a journal entry to try to explain to a prospective student in our department [what you think good academic writing is and what it involves]. (Before the experiment) 2. In the light of what you know now, please write a journal entry trying to explain to a 3rd year student in our department what good academic writing is and what it involves. Try to focus on anything that you have discovered during this year that you did not know before. (After the experiment) Ishikawa and Suzuki (2016) asked their participants to write explanations to their instructor in Japanese, using the following prompt (p. 101. Emphasis added): I would like to determine the extent to which you understand the role of the present counterfactual conditional. Please write the rule based on what you read on this form so that I, your instructor, will be able to assess your level of understanding.

Ishikawa and Suzuki (2023), on the other hand, asked their university students to generate written explanations to fictitious and less knowledgeable others (i.e., high school students), using the following prompt: A junior high school student [emphasis added] is asking you questions about the following English sentences. Explain in detail so that the student will understand the form of the sentences.

To the best of our knowledge, no L2 study has examined the effects of selfdirected and other-directed prompts on the process and product of written verbalizations. However, cognitive psychologists have recently started investigating this issue. For example, Lachner et al. (2021) examined the effects of explaining to a fictitious student versus self-explaining on students’ conceptual knowledge of the term “endocarditis.” The researchers found that students in the self-directed prompt condition outperformed students in the other-directed prompt condition. We hope to see similar studies in our field soon to help determine the impact of prompt types (self-directed vs. other-directed) on the process and quality of the written verbalizations produced. The other-directed prompt condition (writing for others) may make written verbalizations a more cognitively demanding activity than the self-directed prompt condition, but the verbalizations produced may be more detailed. This is because students may perceive the need to make their explanations comprehensible for others who may not know about the to-be-

129

130

Wataru Suzuki, Masako Ishikawa & Neomy Storch

explained materials. However, it may well be that the self-directed prompt (writing for oneself ) condition may make written verbalizations a more demanding activity but for very different reasons. Writing for oneself may make students perceive a weaker social presence in the verbalization activity and thus become less engaged in reporting their thought processes than when writing for others, even if these others are fictitious.

Language of reporting It is essential to consider the language (e.g., L1, L2, L3) used for written verbalizations. If the verbalization is in the learner’s L2, even if he or she is highly proficient, the task may become more complex and increase cognitive demands, leading to potential veridicality and reactivity problems. Moreover, even if the verbalization is carried out in the learner’s L1, the shift between the language of the task (the learner’s L2) and their L1 may also increase cognitive demands. Thus, the issue of language choice of verbalization adds a new dimension to Ericsson and Simon’s (1993) model and related findings on reactivity. Although cognitive psychologists do not systematically report the language of choice, they have generally conducted think-alouds in the participants’ L1 because of their interests in their students’ L1 reading and writing processes (e.g., Chi et al., 1989). In contrast, L2 writing researchers have instructed their participants to engage in verbalizing in their L1 (e.g., Boggs, 2019; Simard & Zuniga, 2020), their L2 (e.g., Sachs & Polio, 2007), or a language of their choice (e.g., Manchón et al., 2020; Moradian et al., 2020). For example, Manchón et al. (2020) instructed their Spanish-speaking learners of English to explain “ideally in English” (p. 264). Although the participants in Manchón et al.’s study were high-intermediate (CEFR B1) and hence could produce written verbalizations in their L2, the researchers added that participants could also use Spanish, their L1, if necessary. Thus, the decisive factor in determining which language is used in verbalizations seems to be learners’ L2 proficiency level. In this regard, Ericsson and Simon (1993) explain: Persons fluent in a second language can usually think aloud in that language even while thinking internally in the oral code of their native language or in non-oral code. In this case, there is nearly a one-to-one mapping between structures in the oral code of the first language and the code of the second language that is used for vocalization. How much the thinking is slowed down will then be a function of the subject’s skill in the second language. (pp. 249–250)

Chapter 6. Verbally mediated data

In Ishikawa and Suzuki (2016), the participants were lower-intermediate learners, and verbalizing in L2 was not feasible. Therefore, the researchers asked the participants to explain their thinking processes in their L1. In addition to L2 proficiency, other factors such as motivation, ethnic identity, personal preferences, and task types may explain individual language choices used in written verbalizations. For example, Cumming (1989) asked L2 learners to think aloud while composing three different tasks (letter, argument, summary). The researcher found that the extent to which the students chose to use their L1 during their think aloud was not related to their level of L2 proficiency nor writing experiences. Rather, Cumming found that his participants frequently switched between L1 and L2, according to their thinking processes whilst writing, which were affected by the task type. Kim’s (2002) study, conducted in the US with ESL students from Asian and European backgrounds, shows that learners’ cultural background may be an additional factor that needs to be considered. The study found that task performance in the case of the students from Asian backgrounds was impaired by thinking aloud in their L1 (i.e., negative reactivity), but this was not the case with the students from European backgrounds. These findings suggest that learners from certain cultural and linguistic backgrounds may feel more comfortable with thinking aloud in their L1. Thus the language of verbalization and learners’ cultural background are factors that need to be considered when implementing think alouds. In our view, it is essential that future L2 research systematically examines language choice during verbalizations and its potential impact on veridicality and reactivity.

Research questions that can be answered using written verbalizations As stated in the introduction to this chapter, verbalizations, including written verbalizations, when collected soon after or during an activity (i.e., Level 1 verbalizations), are a window to learners’ cognitive processes and can thus be deployed to inform us more about L2 writing and feedback processing. Thus, the first research question that can be answered using written verbalization concerns how learners translate or compose a text (e.g., Kasper, 1997; Nicolás-Conesa et al., 2014; Noro, 2004; Sengupta & Falvey, 1998; Suzuki & Itagaki, 2007, 2009), how they process written corrective feedback (e.g., Nicolás-Conesa et al., 2019; Suzuki, 2012, 2017), and what they notice when comparing their own writing with a model text (e.g., Hanaoka, 2007; Hanaoka & Izumi, 2012; Ishikawa & Révész, 2020). For example, Suzuki and Itagaki (2007) examined what kinds of written metalinguistic reflections Japanese EFL learners engaged in while translating L1 sentences into L2 in writing. To do so, the researchers asked the participants to report

131

132

Wataru Suzuki, Masako Ishikawa & Neomy Storch

retrospectively, in writing, what they had been thinking about while they were translating the L1 sentences into L2 (see Table 1). It was found that the learners were thinking about grammar more than vocabulary. Manchón et al. (2020) investigated how deeply L2 writers processed written corrective feedback on errors in compositions (see also Nicolás-Conesa et al., 2019; Suzuki, 2012, 2017). First, the researchers asked the participants to write explanations for each error and corresponding correction. Then, they analyzed the written explanations in terms of the amount of cognitive effort the learners put in while processing written corrective feedback or reflecting on their errors. Finally, the participants’ written verbalizations were categorized broadly into three depth of processing levels: (a) reporting (i.e., error detection), (b) reporting with limited elaboration (i.e., error detection, error correction), and (c) reporting with extended elaboration (i.e., error detection, error correction, metalinguistic explanation). As mentioned previously, Nicolás-Conesa et al. (2014) asked learners to write in their journals what goals and strategies they had in mind while writing their texts at the beginning and the end of an EAP course. The researchers analyzed the journals (i.e., written verbalizations) in terms of how the learners’ goals and written performance changed over time. According to these researchers, like oral verbalizations (e.g., think-aloud protocols, stimulated recalls), written verbalizations are a valid research instrument used to collect data about various cognitive (e.g., depth of processing) and affective (e.g., goals, beliefs) processes involved in L2 writing and in processing written corrective feedback. However, the issues of reactivity and veridicality of written verbalizations have not been fully explored in L2 writng studies. To explore the reactivity issue, we need to address whether written verbalizations alter the cognitive processes we try to shed light on, resulting in improved or reduced task performance (i.e., positive reactivity or negative reactivity, respectively). A relatively small but growing number of studies have examined this question and have produced mixed findings (e.g., Boggs, 2019; Fukuta et al., 2019; Ishikawa & Suzuki, 2016; Suzuki, 2012, 2017). Although the results in these studies are inconsistent, some have shown that Level 3 written verbalizations or written languaging (Suzuki, 2012) can be positively reactive (i.e., the task performance is improved by the requirement of written verbalizations). For example, Suzuki (2012, 2017) explored the impact of concurrent Level 3 written verbalizations about feedback (direct corrections) provided on draft essays written by Japanese university students of English. The researcher found that concurrent written verbalizations seemed to help learners revise their papers more successfully. However, other researchers have not confirmed this effect (e.g., Boggs, 2019; Fukuta et al., 2019). For example, in Boggs’s (2019) study, the participants were

Chapter 6. Verbally mediated data

asked to record their errors and corresponding corrections in their worksheets and explain the reasons for those corrections. This written languaging group (i.e., the self-scaffolding group in Boggs’ terms) was compared to (a) a teacherstudent conference group (i.e., teacher-scaffolding group) and (b) a comparison group that received direct written corrective feedback only in terms of grammatical accuracy in three consecutive writing tasks. Results showed accuracy development across the three groups, suggesting a lack of positive or negative effect of written verbalizations on L2 learning. In cognitive psychology (Chi et al., 1989) the positive impact of written verbalizations on learning is explained by various hypothesized mechanisms (e.g., generating inferences, repairing mental models, or task engagement). For example, when learners explain materials to themselves, they are likely to generate inferences beyond the information contained in the materials and/or to infer new information missing from the materials. The new information is encoded into memory and becomes available to facilitate subsequent performance. In contrast, from socioculturally-oriented L2 research, the positive effects of written verbalizations seem to be attributed to the scaffolding provided by verbalizations (Suzuki & Storch, 2020). Written verbalizations enable learners to plan, coordinate, and review their actions through the product of the written verbalizations themselves. Nevertheless, limited L2 written verbalization research has been conducted to investigate the cognitive processes evident in written verbalizations. Like the reactivity issue, the veridicality issue (i.e., whether written verbalizations accurately reflect thought process) has not been fully addressed in L2 writing research (c.f., Barkaoui, 2011, for think-aloud studies). Instead, L2 researchers have attempted to identify possible mediating factors that promote or hinder the process of written verbalizations and consequently the end product of those processes (i.e., L2 learning). To date, some potential mediators have been explored, such as methodological, learner, and feedback factors. First, the veridicality of written verbalizations may differ depending on how they are conducted: individually or collaboratively (e.g., Manchón et al., 2020). Second, written verbalizations are likely to be affected by learner factors such as emotions (e.g., Simard et al., 2017; Simard & Zuniga, 2020), aptitude (e.g., Ishikawa & Suzuki, 2023; Ishikawa & Révész, in preparation), and proficiency levels (e.g., Ishikawa, 2018). Finally, the nature of the input (e.g., direct vs. indirect feedback) may be a pivotal factor influencing the process of written verbalizations (e.g., Moradian et al., 2017, 2020). Therefore, these factors should be considered when L2 researchers wish to employ written verbalizations to collect data on writing processes.

133

134

Wataru Suzuki, Masako Ishikawa & Neomy Storch

Methodological challenges In this section we will consider four significant methodological challenges researchers may face in using written verbalizations. First, participants may not be able to explain and elaborate their thought processes in writing (e.g., veridicality). This is not only because they rarely do so while being engaged in L2 writing, but also because explaining one’s own thoughts in writing may be cognitively demanding. Although written explanations allow for externalization and careful reflection on ideas, this activity may place high cognitive demands on students’ working memory capacities. This may explain differences in the findings reported in a small number of studies in cognitive psychology on the quality and effects of generating written explanations compared to oral explanations (e.g., Hoogerheide et al., 2016; Lachner et al., 2018). These studies have analyzed whether students’ written explanations after reading learning materials promote their comprehension as much as their oral explanations. Results to date are inconclusive. For example, Hoogerheide et al. (2016) compared the impact of oral and written explanations on the learning of syllogistic reasoning. They found that generating written explanations was not as effective as generating oral explanations. Somewhat more complex results were reported by Lachner et al. (2018), who compared self-explaining orally and in writing in terms of students’ quality of explanations and learning outcomes. The participants explained their understanding orally or in writing after reading a text on combustion engines. Regarding the quality of explanations, the oral explanation condition produced more words and more elaborate explanations than those in the written explanation condition. However, the written explanations were more coherently organized than the oral explanations. In terms of learning outcomes, the written explanation condition produced higher scores on conceptual knowledge than the oral explanation condition. However, the oral explanation condition produced higher scores on the transfer of knowledge than the written explanation condition, indicating that both conditions/ modalities facilitate students’ learning differentially. Contrary to oral verbalizations, written verbalizations require participants to respond to themselves or a (potential) audience having time to carefully organize and reflect on their explanations. Because participants may have difficulty explaining their thought processes in writing, researchers need to provide learners with training and practice in producing written verbalizations before implementing them in their study. For example, Ishikawa and Révész (2020) provided their participants with two written verbalization training sessions in which the students wrote down their thoughts after comparing their own written texts with the original ones.

Chapter 6. Verbally mediated data

Furthermore, generating written explanations may be less personally engaging as compared to oral explanations. Unlike oral verbalizations, which are often directed at a potential listener, writing is a more private activity, and thus when producing written verbalizations, learners may find it more challenging to perceive social presence/potential audience (e.g., Hoogerheide et al., 2016; Jacob et al, 2020; Lachner et al., 2021; Rittle-Johnson et al., 2008). When explaining, feelings of social presence/potential audience are thought to prompt learners to engage in elaborate cognitive processes, resulting in better learning outcomes. For example, Rittle-Johnson et al. (2008) compared (a) children’s explaining correct solutions of classification problems to themselves and (b) explaining the solutions to their mothers. The researchers found that explaining to others overall generated more positive learning effects than self-explaining. In addition, those children who explained to their mothers scored higher on tests of transfer of knowledge than children who engaged in self-explaining. Hoogerheide et al. (2016) also compared participants writing explanations to themselves and writing explanations to fictitious others on video and found that explaining to others enhanced the learning of syllogistic reasoning more than explaining to oneself. These studies have documented that those who explain orally produce more personal references such as “me” and “you" in their explanations than those who provide explanations in writing. This might indicate that students may not be cognitively and affectively engaged in generating explanations and elaborations while explaining their thoughts in writing. Such evidence reinforces the need for prior training and practice in studies using written verbalizations for data collection purposes. Also, we may need to facilitate students’ written verbalizations through the use of prompts, as discussed in the following section. Third, researchers may have a concern about the use of students’ L1 for their written verbalizations. As discussed earlier, in many L2 studies, researchers ask students to engage in written verbalizations in their L1. This practice may be counter-intuitive and conflict with many teachers’ pedagogical choices to maximize the use of L2 in L2 classrooms. However, verbalizing in the L2 may not be feasible or productive for limited L2 proficiency learners. L2 written verbalizations place great cognitive demands on working memory and may negatively affect cognitive processes during task performance. This may lead to reactivity and non-veridicality of written verbalizations, as mentioned in an earlier section. However, when learners and a teacher/researcher do not share the same L1, the use of the L2 for students’ written verbalizations may be a viable option. Fourth, researchers may be concerned that students may learn and consolidate less accurate grammatical and lexical knowledge using unsystematic, incomplete, and erroneous written verbalizations. Thus, researchers might ethically

135

136

Wataru Suzuki, Masako Ishikawa & Neomy Storch

want to provide feedback on the content of such non-targetlike verbalizations after the experiment. In this sense, cognitive psychologists have begun to examine the effect of providing feedback about the accuracy of students’ explanations on their learning. For example, Lachner and Neuburg (2019) asked their participants to write explanations after reading a physics text. The participants then received feedback on the quality of their explanations and finally revised their descriptions. The researchers compared the feedback group’s learning outcomes with those of the no-feedback group (who changed explanations without feedback). The study found that the feedback group significantly outperformed the no-feedback group. These findings suggest that the positive impact of written verbalizations can be enhanced by feedback that provides specific information about the quality of written verbalizations. Although written verbalizations, either accurate or inaccurate, can constitute learning processes, researchers may want to further enhance the outcomes of these learning processes by providing feedback on the accuracy of the content of the written verbalizations produced.

Best practices in using written verbalizations We have already mentioned that L2 writers may find it challenging to engage in written verbalizations not the least because it is an unfamiliar activity, and that one strategy to address this potential challenge is to provide prior training and practice to research participants so that they become familiar with the activity and the procedure of written verbalizations. It is worth mentioning that Ericsson and Simon (1993) and their followers also suggest prior training and practice (see Chapter 5, this volume). Such training, for example, could include asking participants to read a transcript in which written verbalizations are modeled and then practice providing written verbalizations on similar texts. Such prior training or practice would familiarize students with written verbalization procedures. Another point to consider is the kind of prompts given at the outset and during the written verbalization activity. For example, researchers need to encourage participants only to report what is going on in their minds (i.e., information in short-term memory) in order to collect Level 1 written verbalizations. In contrast, those interested in examining Level 3 verbalizations (e.g., written languaging, written explanations, written reflections) need to ask participants to describe, elaborate, and explain thoughts beyond the information in short-term memory (i.e., the information in long-term memory). Researchers interested in eliciting Level 3 written verbalizations may also remind students that it is acceptable to stray away from producing Level 1 written verbalizations toward reporting Level 3

Chapter 6. Verbally mediated data

verbalizations (e.g., explaining, inferencing, and elaborating about the targeted grammar). The use of well-designed written verbalization prompts is likely to lighten learners’ cognitive demands. For example, we can facilitate this activity for students, especially for lower proficiency students, by providing a set of grammatical terms that the students were previously taught in regular language classes. In Ishikawa and Suzuki’s (2016) study, the participants were encouraged to write their explanations with reference to direct subjunctive, indirect subjunctive, present fact, if-clause, main-clause, verb, auxiliary verb, past tense, subject, etc. Alternatively, we can partially provide weak students with a more limited set of terms and explanations to engage in written verbalizations without being overwhelmed. Another strategy is to ask students to imagine that they have to explain, in writing, their thoughts to their instructors, fellow students, or more junior students, not themselves. This strategy is the same as that used in studies using oral verbalizations (e.g., Hoogerheide et al., 2016) where participants were asked to imagine addressing fictitious listeners (see above). Finally, and regarding research conducted in classrooms (e.g., Ishikawa & Suzuki, 2016, 2023; Itagaki & Suzuki, 2007, 2009; Noro, 2004), although it is relatively easy to implement written verbalization activities in classrooms as it requires no special devices, the slower pace of writing can be an issue when class time is limited. One possible strategy is to implement written verbalization activities as homework assignments (e.g., Negueruela, 2008). For example, in writing, learners can be asked to explain how they produced a written text (whether it be a few sentences or paragraphs) or how they compared their compositions and corrective feedback in their notebooks and submit them to be the researchers. However, this strategy is likely to threaten the veridicality of written verbalizations, as such verbalizations may not accurately reflect cognitive processes involved in L2 writing and feedback processing per se.

Conclusion This chapter aims to contribute to L2 writing inquiry by showing how written verbalizations can be deployed to study L2 writing processes, including response to and uptake of feedback on writing. This line of investigation is an area that continues to attract much research attention. L2 researchers who use written verbalizations need to consider the issues of reactivity and veridicality. Future research needs to deploy systematic comparisons of oral and written verbalization during/after various L2 writing tasks as such comparative research will deepen our understanding of the veridicality and reactivity of written verbalizations. Finally,

137

138

Wataru Suzuki, Masako Ishikawa & Neomy Storch

and most importantly, researchers should contemplate the possibility that this research tool is part of the learning process, not just a vehicle for data collection. Although much progress is needed in this domain, we would like to provide the following four practical tips to researchers who wish to use written verbalizations as a data collection tool. We hope these tips will help researchers uncover learners’ cognitive and affective processes involved in L2 writing, including written corrective feedback processing. 1.

Retrospective written verbalizations should be used immediately after L2 writing and feedback processing 2. L2 writers should be encouraged to produce Level 1 verbalizations (i.e., reporting) and avoid the use of Level 3 verbalizations (e.g., explaining, elaborating) 3. L2 writers (particularly those of low L2 proficiency) could be asked to report their thought processes in their L1, not L2 4. L2 writers ought to be provided with prior training and practice to familiarize themselves with written verbalization procedures.

References Barkaoui, K. (2011). Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity. Language Testing, 28, 51–75. Boggs, J. A. (2019). Effects of teacher-scaffolded and self-scaffolded corrective feedback compared to direct corrective feedback on grammatical accuracy in English L2 writing. Journal of Second Language Writing, 46. Chi, M. T. H., Bassok, M., Lewis, M. W., Reimann, P., & Glaser, R. (1989). Self-explanations: How students study and use examples in learning to solve problems. Cognitive Science, 13, 145–182. Cumming, A. (1989). Writing expertise and second language proficiency. Language Learning, 39, 81–141. Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis (2nd ed.). The MIT Press. Fukuta, J., Tamura, Y., & Kawaguchi, Y. (2019). Written languaging with indirect feedback in writing revision: Is feedback always effective? Language Awareness, 28, 1–14. Hanaoka, O. (2007). Output, noticing, and learning: An investigation into the role of spontaneous attention to form in a four-stage writing task. Language Teaching Research, 11, 459–479. Hanaoka, O., & Izumi, S. (2012). Noticing and uptake: Addressing pre-articulated covert problems in L2 writing. Journal of Second Language Writing, 21, 332–347. Hoogerheide, V., Deijkers, L., Loyens, S. M. M., Heijltjes, A., & van Gog, T. (2016). Gaining from explaining: Learning improves from explaining to fictitious others on video, not from writing to them. Contemporary Education Psychology, 44–45, 95–106.

Chapter 6. Verbally mediated data

Ishikawa, M. (2018). Written languaging, learners’ proficiency levels and L2 grammar learning. System, 74, 50–61. Ishikawa, M., & Révész, A. (2020). L2 Learning and the frequency and quality of written languaging. In W. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching: A collection of empirical studies (pp. 220–240). John Benjamins. Ishikawa, M., & Révész, A. (in preparation). Written languaging, learners’ aptitude, and L2 learning through dictogloss tasks. In S. Li (Ed.,). Individual differences in task-based language learning and teaching. John Benjamins. Ishikawa, M., & Suzuki, W. (2016). The effect of written languaging on learning the hypothetical conditional in English. System, 58, 97–111. Ishikawa, M. & Suzuki, W. (2023). Effects of written languaging on second language learning: Mediating roles of aptitude. The Modern Language Journal, 107(S1), 95–112. Jacob, L., Lachner, A., & Scheiter, K. (2020). Learning by explaining orally or in written form? Text difficulty matters. Learning and Instruction, 68, 101344. Kasper, F. (1997). Assessing the metacognitve growth of ESL student writers. TESL-EJ, 3, 1–20. Kim, H. S. (2002). We talk, therefore we think? A cultural analysis of the effect of talking on thinking. Journal of Personality and Social Psychology, 83, 828–842. Lachner, A., Jacob, L., & Hoogerheide, V. (2021). Learning by writing explanations: Is explaining to a fictitious student more effective than self-explaining? Learning and Instruction, 74, 101438. Lachner, A., Ly, K. -T., & Nückles, M. (2018). Providing written or oral explanations? Differential effects of the modality of explaining on students’ conceptual learning and transfer. Journal of Experimental Education, 86, 344–361. Lachner, A., & Neuburg, C. (2019). Learning by writing explanations: Computer-based feedback about the explanatory cohesion enhances students’ transfer. Instructional Science, 47, 19–37. Manchón, R. M., Nicolás-Conesa. F., Cerezo, L., & Criado, R. (2020). L2 writers’ processing of written corrective feedback: Depth of processing via written languaging. In W. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching: A collection of empirical studies (pp. 241–263). John Benjamins. Moradian, M. R., Hossein-Nasab, M. H., & Miri, M. (2020). Effects of written languaging in response to direct and indirect corrective feedback on developing writing accuracy. In W. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching: A collection of empirical studies (pp. 267–286). John Benjamins. Moradian, M. R., Miri, M., & Nasab, M. H. (2017). Contribution of written languaging to enhancing the efficiency of written corrective feedback. International Journal of Applied Linguistics, 27, 406–421. Negueruela, E. (2008). Revolutionary pedagogies: Learning that leads development in the second language classroom. In J. P. Lantolf & M. Poehner (Eds.), Sociocultural theory and the teaching of second languages (pp. 189–227). Equinox. Nicolás-Conesa, F., Manchón, R. M., & Cerezo, L. (2019). The effect of unfocused direct and indirect written corrective feedback on rewritten texts and new texts: Looking into feedback for accuracy and feedback for acquisition. The Modern Language Journal, 103, 848–873.

139

140

Wataru Suzuki, Masako Ishikawa & Neomy Storch

Nicolás-Conesa, F., Roca de Larios, J., & Coyle, Y. (2014). Development of EFL students’ mental models of writing and their effects on performance. Journal of Second Language Writing, 24, 1–19. Noro, T. (2004). A study of the metacognitive development of Japanese EFL writers: The validity of written feedback and correction in response to learners’ self-analysis of their writing. ARELE: Annual Review of English Education in Japan, 15, 179–188. Rittle-Johnson, B., Saylor, M., & Swygert, K. E. (2008). Learning from explaining: Does it matter if mom is listening? Journal of Experimental Child Psychology, 100, 215–224. Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written corrective feedback on an L2 writing revision task. Studies in Second Language Acquisition, 29, 67–100. Sengupta, S., & Falvey, P. (1998). The role of teaching context in Hong Kong English teachers’ perceptions of L2 writing pedagogy. Evaluation & Research in Education, 12, 72–95. Simard, D., French, L., & Zuniga, M. (2017). Evolution of L2 self-repair behavior over time among adult learners of French. Canadian Journal of Applied Linguistics, 20, 71–89. Simard, D., & Zuniga, M. (2020). Exploring the mediating role of emotions expressed in L2 written language in ESL learners text revision. In W. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching: A collection of empirical studies (pp. 287–307). John Benjamins. Suzuki, W. (2012). Written languaging, direct correction, and second language writing revision. Language Learning, 62, 1110–1133. Suzuki, W. (2017). The effects of quality of written languaging on second language learning. Writing & Pedagogy, 8, 461–482. Suzuki, W., & Itagaki, N. (2007). Learner metalinguistic reflections following output-oriented and reflective activities. Language Awareness, 16, 131–146. Suzuki, W., & Itagaki, N. (2009). Languaging in grammar exercises by Japanese EFL learners of differing proficiency. System, 37, 217–225. Suzuki, W., & Storch, N. (2020). Introduction. In S. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching: A collection of empirical studies (pp. 1–15). John Benjamins. Swain, M. (2006a). Verbal protocols: What does it mean for research to use speaking as a data collection tool? In M. Chaloub-Deville, M. Chapelle, & P. Duff (Eds.), Inference and generalizability in applied linguistics: Multiple research perspectives (pp. 97–113). John Benjamins. Swain, M. (2006b). Languaging, agency, and collaboration in advanced second language proficiency. In H. Byrnes (Ed.), Advanced language learning: The contribution of Halliday and Vygotsky (pp. 95–108). Continuum. Swain, M. (2010). Talking it through: Languaging as a source of learning. In R. Batstone (Ed.), Sociocognitive perspectives on language use and language learning (pp. 112–130). Oxford University Press.

chapter 7

Direct observation of writing activity Screen capture technologies Jérémie Séror & Guillaume Gentil

University of Ottawa | Carleton University

Emerging technologies and the rise of internet-mediated writing spaces have contributed to the appearance of new forms of digital practices that have transformed writing and its development. These technologies also present important methodological opportunities for researchers interested in the study of writing processes and writing development. This chapter offers a critical overview of one such opportunity: the use of screen capture technologies (SCT) as a means of documenting and engaging in direct realtime observation of language learners’ digitally mediated writing activities. After a brief description of SCT, this chapter reviews the research questions explored and insights gleaned about writing processes and writing development with SCT. It then addresses the methodological challenges and potential solutions associated with the integration of SCT data within research projects in terms of research design, data collection, data analysis and reporting, and ethical considerations. The chapter concludes by suggesting some potential future avenues for writing process research enabled by the use of SCT.

Introducing screen capture technologies With keystroke logging (Miller et al., 2008. See Chapter 8, this volume) and eyetracking software (Abdel Latif, 2019. See Chapter 9, this volume), SCT belongs to a suite of digital tools developed to track and record writing processes in digital spaces (Gánem-Gutiérrez & Gilmore, 2018a; 2018b; Leijten et al., 2014; Lindgren & Sullivan, 2019; Révész & Michel, 2019; Van Waes et al., 2012). The use of SCT to study L2 writing builds on a body of work focused on capturing and studying video recordings of students’ interactions and learning processes as mediated with computers (Caws & Hamel, 2016). Originally developed to facilitate the production of instructional video tutorials, SCT was adopted by researchers as an unobtrusive observational method to collect data that could increase the validity

https://doi.org/10.1075/rmal.5.07ser © 2023 John Benjamins Publishing Company

142

Jérémie Séror & Guillaume Gentil

of the inferences made about participants’ activities with digital devices (Mroz, 2014). SCT software can be activated to record everything that occurs on a writer’s screen, with the option to also record the writer’s face, verbalizations, and other background events if the writer chooses to turn on the webcam, the microphone, and the audio feed. Thanks to SCT, processes that underlie the production of the final draft (e.g., planning, composing, editing) can be viewed, rewound, and retraced, allowing researchers to turn back the clock and examine the specific decisions and realtime computer-based sequence of events associated to the text being produced by learners (Hamel & Séror, 2016). SCT video records allow one to witness aspects that are not visible through the simple textual analysis of writers’ composition products, revealing, for example, not only what was typed, but also deletions, edits, and resources employed. SCT videos further offer a chronological perspective on composition processes, allowing researchers to observe, inter alia, the frequency and duration of writing pauses. As such, SCT provides a window into processes that have frequently remained largely unobserved and invisible, opening the proverbial “black box” of writing processes (Séror, 2013), especially when studying writing in natural, noncontrolled, non-laboratory conditions. As noted by Mroz (2014), the rich visual records produced by SCT “constitutes an evolution in process-oriented research methods” offering a more exhaustive appreciation of the “computer-mediated nature of the language-learning process and the development of digital literacy” (p. 1).

Research questions SCT has helped investigate SCT has been adopted by a small but growing body of writing researchers most often as a methodological complement to more traditional data sources (e.g., questionnaires, semi-structured interviews, or writing logs) as well as other innovative tools (e.g., eye-tracking and keystroke logging software). For example, Gánem-Gutiérrez and Gilmore (2018b) combined screen capture data, eyetracking data, and stimulated recall protocols to study Japanese L2 English writers. Khuder and Harwood (2015, 2019) similarly combined SCT data with keylogging data and stimulated recall interviews to investigate students’ motivations and understanding of testing tasks (composing with limited time and no access to resources vs. unlimited time and access to resources). The use of SCT to both produce videos for analysis as well as to engage students in stimulated recall protocols has proven to be a powerful triangulation technique that allows researchers to explore the factors and beliefs at play during students’ composition processes.

Chapter 7. Direct observation of writing activity

Exploring pedagogic applications of SCT Writing research that has made use of SCT includes a body of work that has explored the pedagogic affordances of SCT and how SCT videos can be integrated as “student learning resources and teacher development materials” (Bailey & Withers, 2018, p. 177). This includes examinations of instructors’ use of SCT to provide multimodal feedback to students or to support writing assessment practices that acknowledge process-related aspects of students’ performance (e.g., the use of plurilingual resources) not visible in their final drafts (Ranalli et al., 2018; Séror, 2013; Silva, 2012). Building on work on the pedagogical impact of observational learning for writing instruction (Braaksma et al., 2004), these studies highlight the value of SCT videos and tutorials as catalysts for activities that ask students to reflect on specific composition strategies (Hamel et al., 2015; Sabbaghan & Maftoon, 2015).

Exploring writing in action A second category of process-oriented research of central relevance to this volume has focused on how SCT can contribute to the study of writing processes and writing development. Studies of writing processes with SCT have explored informal (Takayoshi, 2015), professional (Macgilchrist & Van Hout, 2011) and academic contexts (Séror & Gentil, 2020), and has drawn on cognitive, sociocultural and sociomaterial theoretical frameworks. For example, Gánem-Gutiérrez and Gilmore (2018a, 2018b, 2021) used the video traces produced by SCT to map on-screen events to cognitive activities and strategies linked to writing. Smith et al. (2017) drew on SCT to explore how translanguaging and the use of diverse languages and modalities (e.g., image vs. text-based meaning-making) could be linked to sociocultural factors that shape students’ composition processes. More recently, Hort (2020) examined the sociomaterial dimensions of five students’ essay writing processes, underscoring the need to pay greater attention to how “materiality, physical place, people, and resources, not only the writer’s cognition," help shape the writing process (p. 44). An important contribution made by this body of work has involved identifying and categorizing the wide range of macro-processes (e.g., outlining an assignment) and micro-processes (e.g., scrolling, spell checking a word) writers engage in. Using SCT data, studies have confirmed previously existing L2 writing taxonomies of composition and metacognitive processes (e.g., Manchón et al., 2009; Sasaki, 2004) such as planning, transcribing, and revising while also offering greater insights into previously unseen learning events associated to computerbased L2 writing processes, such as scrolling and the use of the cursor to guide

143

144

Jérémie Séror & Guillaume Gentil

the eye when rereading and revising a draft. This adds data that are not as well represented in scripted logs and thus enhances the validity of inferences that can be made about the link between specific on-screen events to writing and learning processes. For example, Hamel and Séror (2016) noted surprising variations in the ways language learners produce accents and other diacritic symbols when they learn to type a new language. Learners of French unaware of how to set their keyboards to type an accent in a single stroke were frequently observed to rely on complicated steps involving copy-pasting accents from websites or using the word processor's spellcheck function to produce these. Such discoveries help “move toward a comprehensive understanding of the writing process” in digital spaces (Takayoshi, 2016, p. 6). SCT has also helped examine the temporal dimension of writing processes (e.g., Gánem-Gutiérrez & Gilmore, 2018b). One can for instance see in SCT recordings how long L2 writers engage in various writing processes (planning vs. composing a text vs. rereading vs. revising), the distribution of pauses, moments of fluency, and other observable indices of associated cognitive processes (Khuder & Harwood, 2015). While researchers have investigated the temporal duration and distribution of processes of writing using think-aloud (e.g., Manchón et al., 2009), sometimes in combination with keystroke logging (e.g., Tillema et al., 2011). Gánem-Gutiérrez and Gilmore (2018b) further underscore several advantages of SCT compared to other process-tracing methodologies, namely, that SCT captures writing events as they unfold unobtrusively and affords more nuanced qualitative insight into what students do and achieve within a given period. For example, Gánem-Gutiérrez and Gilmore (2018b) found no statistical relationships between L2 proficiency and writing processes at specific periods of the composing process and yet, with SCT, they could observe how a student with higher L2 proficiency used external resources more effectively than a student with lower L2 proficiency. While SCT data does not always offer the same degree of quantitative precision as the automated breakdown and timing of composition events associated with keystroke logging, it does offer an opportunity to record students’ writing processes in naturalistic environments since students can download and use screen capture software on their devices. This allows authentic and situated insights which are often closer to the everyday reality of students as they record themselves completing assignments on their computers, with access to their resources at a time and in a location of their choosing. Indeed, SCT offers valuable support for ethnographic and case study approaches to writing when combined with other forms of data (Bhatt et al., 2015). For instance, when combined with writer interviews and questionnaires, SCT can help explore how specific beliefs, ideologies, and educational and social backgrounds relate to specific aspects of

Chapter 7. Direct observation of writing activity

an individual’s writing processes. Additionally, cross-case designs can be used to compare writing processes across disciplines, languages, modalities, and tasks (Khuder & Harwood, 2015, 2019; Séror & Gentil, 2020; Smith et al., 2017).

Examining the visuospatial dimensions of writing SCT data has also allowed researchers to examine more closely the multimodal nature of writing processes (Takayoshi, 2015) and the use of space and visual elements while composing. For example, our ongoing study (Séror & Gentil, 2021) saw students strategically arranging the windows on their screen and tabs in their internet browsers to scaffold their writing by keeping all relevant textual resources (assignment description, draft, and source of ideas) within their visual range (see Figure 1 below).

Figure 1. Spatial organization workspace (Séror & Gentil, 2021)

Exploring online resources SCT data has also contributed to gaining insights into the strategies and multiple online resources language learners use to resolve problems encountered as they compose their L2 texts. SCT videos facilitate the monitoring of the online sites, chat sessions, forums, tools, and other resources that students deploy to resolve problems. They also capture the idiosyncratic manner in which these resources

145

146

Jérémie Séror & Guillaume Gentil

are used, indexing emerging literacy and strategic development on the part of writers. Séror (2013, 2021), for instance, documented students’ purposeful use of bilingual online dictionaries, learners’ forums, and keyboard emulators to convert concepts and ideas originally expressed in students’ dominant languages into the target language of their texts, as well as students’ skillful use of machine translation technology to back translate sentences to revise tense use in sentences produced in their L2. Bailey and Withers (2018) drew on screen capture software to record how university students used software tools (e.g., the synonym finder function of a word processor) to engage in a paraphrasing task. More recently, Séror and Gentil (2021) have reported on the strategic and complex role of the online translation tool DeepL as a means for language learners to activate and draw on their linguistic repertoires as they write. Meanwhile, Gilquin and Laporte (2021) have observed that most student writers used a limited range of writing tools repeatedly, uncritically, and with mixed results. Gánem-Gutiérrez and Gilmore (2021) report similar findings, noting individual differences related to L2 proficiency, with higher proficiency students drawing on more monolingual lexicographical resources in combination with bilingual resources than lower proficiency students. Yoon (2016) further underscored individual differences in the use of online writing resources related to writers’ attitudes, goals, needs, and context for writing. This growing body of research draws on the affordances of SCT to unobtrusively capture the elusive details of composing processes in digital environments that may escape or impact writers’ attention if they had to describe them in concurrent think-aloud protocols or retrospective interviews. Many composing processes mediated by digital resources, such as accepting the correction of a word suggested by software, can, for example, often occur quickly, reactively, and mechanically. By allowing researchers to “observe literacy events as they happen in people’s digital lives” (Takayoshi, 2016, p. 1), SCT adds to our understanding of the mediating impact of digital resources on multimodal literacy processes and highlights the social and networked nature of writing in digital environments, including the use of forums, shared writing platforms, and chat sessions that allow writers to compose collaboratively surrounded and scaffolded by the texts, resources, and voices of others (Cho, 2017; Séror, 2013). Used in combination with other research methods (e.g., interviews, process logs, stimulated recalls), SCT data shed light on students’ development of the “digital awareness” needed to intertwine texts, meaning-making resources, and technologies (Hort, 2020).

Chapter 7. Direct observation of writing activity

Methodological challenges and potential solutions in using SCT for writing process research While SCT offers affordances of interest for writing process research as an unobtrusive observation method, challenges are associated with the wealth, volume, and complexity of data it can generate (Mroz, 2014). These challenges and potential solutions are addressed below, drawing on examples from our research and other studies.

Research design: Pre-established vs. emergent focus One major challenge with using SCT for writing process research stems from the sheer volume of the data these videos can yield. Two main research design strategies can help address this challenge. The first is to limit the focus of research to one well-defined task of limited duration. This strategy is used in controlled studies with a narrow focus (e.g., Elola & Mikulski, 2013; Gánem-Gutiérrez & Gilmore, 2018b; Khuder & Harwood, 2019). In naturalistic studies, however, the initial scope of inquiry can be broad, given researchers’ potential interest in a variety of writing events as they unfold in context. A two-step strategy is thus advisable, as outlined by Bhatt and de Roock (2013). In the first stage, to avoid premature data saturation from an overwhelming amount of data, the SCT data is “gisted” to draw a global picture of the writing process and identify “noteworthy and discernible moments” (Bhatt & de Roock, 2013, p. 10) or features of relevance to the research questions, which can then become the focus of a more finegrained, systematic analysis. We use a similar strategy in our ongoing four-year longitudinal study of academic bilingual writing development (Séror & Gentil, 2020), first naming files by participants, semesters, and context to associate video with interview and text data, creating synoptic tables, and adding descriptive codes. The noticeable role of online translation tools in the student writers’ bilingual and cross-lingual writing processes then prompted us to conduct more finegrained analyses of their use and their potential impact on language and writing development and the final writing product.

Research design: Controlled vs. naturally occurring writing tasks Another practical consideration for researchers who are primarily interested in naturalistic observations involves recording tasks and genres that are similar enough to allow comparisons across participants and over time. When considering SCT as part of a study design, a tradeoff must be found between optimizing ecological validity by recording literacy events as they happen and facilitating

147

148

Jérémie Séror & Guillaume Gentil

comparisons by controlling for writing tasks and writing conditions. Our own research gives student writers full control over the writing processes they want to record and share with us. This results in variation in the number and length of recordings, and the types of writing and processes documented. To establish some baseline data, we have also included two shared writing tasks in Years 1 and 4, asking participants to reflect on their language and writing development. These also serve two additional purposes: (1) familiarizing the participants with the SCT software (we helped them install and use it during the first task), and (2) gaining preliminary insights that could be explored further during a follow-up interview.

Data collection: Controlled vs. naturally occurring digital writing environments In balancing out ecological validity and the necessary research control, another question is whether participants should record themselves on their own devices. Loaning a computer for use at home or in a research lab allows researchers to control the computing environment and helps prevent recording issues arising from hardware limitations in storage and processing. However, it can also result in a distorted image of a writer’s processes and resource use in situ. We considered combining the affordances of keylogging and SCT but ultimately decided against it for fear that asking participants to install and turn in two recording applications might prove too much of a hurdle for them. Researchers have reported the use of various applications (e.g., Blueberry Flash Recorder, Camtasia, Morae, Open Broadcaster Software, Snagit), but as Asselin and Moyaeri (2010) point out, many participants’ home computer situation may not be compatible with running software best suited for research purposes. In our case, we selected Screencast-OMatic, a cross-platform, web-based application, for its reliability, affordability, and ease of use. However, some participants reported difficulties in some conditions (e.g., old computer), which resulted in data loss. Some participants also preferred to use software that came preinstalled on their devices. Training and providing regular support to participants as they explore the best way to record, save, and transfer screen capture data is key to ensuring the success of SCT studies. Additionally, the ever-evolving nature of technological advances means that researchers must constantly stay abreast of available SCT programs on different platforms (e.g., an Apple computer vs. a cloud-based Chromebook) while staying up to date on questions of compatibility, accessibility, and file format conversions. For instance, whereas the secure transfer and storage of very large video data files were ensured by providing participants with memory sticks and storing the data on external drives, the advent of COVID-related

Chapter 7. Direct observation of writing activity

restrictions resulted in the need to adopt new forms of secure online file transfer and storage, raising questions of data security which needed to be addressed.1 Even when writers’ native digital environments are optimal for screen capture recording, SCT will not capture events that are happening outside the screen and which could be controlled in laboratory settings, such as the use of printed resources. Capturing such events would require complementary interviews or observational data. Additionally, the ecological validity of the writing processes captured may be affected by what writers choose to show and record. For example, they may turn off emailing and chatting functions when the camera is on, even though such activities would normally co-occur with writing processes. Ecological validity must also be balanced out with ethical issues of consent and privacy, a topic which we explore further below.

Developing a framework for data analysis Analyzing SCT data is labour intensive and time-consuming. Unlike data produced with keyboard logging systems, one does not have access to data sets that can be quantified automatically (e.g., the average length of a typing burst). Events and processes of interest must be coded manually. In this respect, the challenges of using SCT data are like those arising in other research producing large sets of unstructured data. To help address them, one strategy is to follow the common procedure of using the research questions and theoretical framing of a study to guide the analysis. These serve as a compass that helps researchers limit the analytical gaze and strategically identify elements of interest to focus on in SCT data (Mroz, 2014). This focus is essential to help separate what data will be pertinent from secondary information, which, although fascinating, can hinder the analysis of the studied phenomenon. In the absence of an overarching theory, writing researchers have developed analytical frameworks by drawing on various theories of relevance to their focus and orientations, such as models of L2 writing processes (Gánem-Gutiérrez, & Gilmore, 2018b), collaborative writing (Cho, 2017), or writing task representation (Khuder & Harwood, 2019), as well as activity theory (Cho, 2017; Geisler & Slattery, 2007), sociomateriality, new literacies, and digital literacies (Bhatt & de Roock, 2013). Additionally, researchers can draw on the literature on multimodal 1. An issue rarely raised in SCT-mediated research is the carbon footprint of large video data sets stored on hard drives and cloud servers and the other environmental costs associated with the extraction of minerals and the production of energy required for research relying heavily on computing resources. The negative environmental externalities of digital research should be weighed against the potential benefits for pedagogy.

149

150

Jérémie Séror & Guillaume Gentil

analysis to develop a transcription system suited to their multimodal SCT data (see, e.g., Jewitt, 2014; Meredith, 2016), or on ad hoc descriptive frameworks developed for specific purposes (e.g., Gilquin & Laporte, 2021, developed a specific framework for analyzing screen capture recordings of the use of online writing tools by learners of English). While it is necessary to develop analytical frameworks specific to targeted dimensions of interest (e.g., the use of online reference tools), broader theories of writing and language development (Mills, 2015) can also help in interrelating the aspects of writing that are best captured with SCT data with other dimensions at other timescales and other levels. For example, Gilmore and Gánem-Gutiérrez (2020) illustrate how a complex dynamic systems perspective theory on L2 writing helps to interrelate the moment-by-moment microgenetic changes (captured in part by SCT) at the micro-level with learners’ development of writing competence over longer periods (e.g., with a diachronic corpus) at the meso-level and evolving social, institutional, and historical contexts of writing at a more macrolevel (e.g., employing documentary and interview data). We discuss the integration of SCT data with other sources of data further below.

Choosing a unit of analysis and criteria for data segmentation The unit and grain of analysis would depend on the focus of interest but typically consist of an observable event, episode, or process (e.g., translating a word, correcting a typo) or a step within this process. Each video frame often involves more than one dimension (e.g., the side-by-side presence of a text found in a word processor and a browser interface), and processes can be protracted and intertwined within other processes (e.g., a writer may return to a formulation problem several minutes after first trying to solve it). It is thus useful to develop a multitiered annotation and transcription system that helps to capture on a timeline several processes unfolding along multiple dimensions. Moreover, researchers interested in the quantification of SCT data will require clear segmentation criteria and rigorous intra- and intercoder reliability estimates. A good example is provided by Gánem-Gutiérrez and Gilmore (2018b), who conducted two cycles of intercoder reliability on 10% of their data (before and after coding all the data) as well as intracoder reliability on 5% of the data based on Kappa estimates.

Using software for the transcription and analysis of SCT data Researchers commonly use computer-assisted qualitative data analysis software (CAQDAS) to assist them with the segmentation, coding, and annotation of the sequence of video frames that constitute SCT data as they generate a multi-

Chapter 7. Direct observation of writing activity

modal transcription of the events on screen (Meredith, 2016). This transcription can assist in both reducing the data and interpreting patterns of interest in SCT recordings based on a meaningful representation of the data. Each transcript is a selective, theory-laden representation of the observed phenomenon, “a record of the approach taken to the data by the transcriber” (Meredith, 2016, p. 664). Various CAQDAS tools are available to produce multimodal transcriptions, but researchers need to understand how each tool is also influenced by the theoretical and epistemological orientations of their designers. Morae, for example, was designed for usability testing (Asselin & Moyaeri, 2010), Transana for conversational analysis (Kasper & Wagner, 2014), and NVivo for unstructured textual data such as open-ended interviews (Richards, 2002). ELAN, on the other hand, was designed specifically as an annotation tool for audio and video recordings (The Language Archive, 2021), which helps explain its popularity among scholars who have worked with SCT data (e.g., Bhatt & de Roock, 2013; Gánem-Gutiérrez & Gilmore, 2018a; Gilquin & Laporte, 2021). ELAN facilitates the segmentation, coding, and retrieval of events observed on the video by locating them on a timeline and enabling their multimodal transcriptions through the creation of multiple tiers of analysis. ELAN also generates summary tables that can be used to produce descriptive statistics and graphs showing the duration and distribution of events within a defined writing period. In defining these periods, a common practice involves breaking up longer videos into equal-length sections and producing percentages of time spent on specific processes and events within a single period. This enables the study of the temporal distribution of writing processes (e.g., text construction vs. revision) over a writing task (Gánem-Gutiérrez & Gilmore, 2018b).

Triangulating and integrating data As noted earlier, screen capture data analysis benefits greatly from triangulation with other sources of data such as interviews, writing samples, videos of offscreen interactions, and contextual documents (e.g., assignment descriptions, course outlines, university policies). Screen capture videos are often used as stimulated recalls to probe writers’ cognitive processes and writing strategies (e.g., Cho, 2017; Lai & Chen, 2015; Park & Kinginger, 2010), sometimes in combination with eye-tracking (Gánem-Gutiérrez & Gilmore, 2018b) or keylogging (e.g., Khuder & Harwood, 2015, 2019). It can be difficult to identify writing processes based on SCT data alone. For example, a pause could be related to planning or reading. Stimulated retrospective recalls help to disambiguate these and can also shed light on writers’ motivations (e.g., reasons for using a tool).

151

152

Jérémie Séror & Guillaume Gentil

While potentially fruitful, triangulating SCT data with other sources of data can pose additional challenges for data integration. In the outward direction, the challenge is to triangulate screen recordings of writing events with other observational data, interview data, and document analyses to develop a thick description of language and writing development in a social context from an emic and an etic perspective (Bhatt & de Roock, 2013; Séror & Gentil, 2020). In the inward direction, the challenge is to infer underlying cognitive processes from the combination of the observational and retrospective accounts while remaining aware of the limitations of both as indirect records which may not fully reflect actual cognitive processes. The evidence must thus be well presented to offer researchers’ warrants of the inferences being made regarding the intentions and cognitive processes that may underlie the behaviours observed, described, and ultimately interpreted by both participants and researchers. Data integration can be achieved through a combination of theoretical linking and careful mixed methods research design. As mentioned earlier, a broad theoretical framework such as a complex dynamic perspective (Gilmore & Gánem-Gutiérrez, 2020) can help to integrate insight from different data sources by interrelating dimensions of writing and development occurring at various levels and timescales. Similarly, sociocultural theory and activity theory help to interrelate the nested levels of writing (subconscious operations, observable actions, inferable goal-oriented activities), the timescales of writing development (historical, individual, moment-by-moment), the internal and external dimensions of writing (the interaction of processes in the mind and the world), as well as the mediated and mediating aspects of writing (exploring the impact of the tools and individuals on learning activities) (see, e.g., Gánem-Gutiérrez & Gilmore, 2018a; Geisler & Slattery, 2007). CAQDAS tools can also assist with data triangulation and data integration. We mentioned earlier the affordances and common use of ELAN as a videoannotation tool to analyze SCT data. However, a software application like NVivo, initially designed for the analysis of unstructured non-numerical data such as interview transcripts and documents but now able to handle image and video data, can also facilitate data triangulation and enrich researchers’ abilities to understand the time and space distributed nature of writing. At present, we have found no single software which will accomplish everything, and the integration of data across platforms remains challenging. We are presently working both with ELAN and NVivo. Experimenting with different tools and their integration is necessary to find ways to best capture and re-present a literacy event or writing process (Bhatt, de Roock, & Adams, 2015). When assessing platforms, researchers should consider affordances not only for video screen data analysis but also for data integration and triangulation.

Chapter 7. Direct observation of writing activity

Gilquin (2022) offers a noteworthy example of original data triangulation combining SCT, keylogging, and corpus linguistic methods. She describes how the annotation of SCT data in ELAN enables SCT writing process data to be converted into textual data that can be queried by text retrieval software and techniques used in corpus linguistics. Additionally, the keylogging software Inputlog (Leijten & Van Waes, 2013) also generates textual data that can be time-aligned with video data annotated data in ELAN. The combined uses of SCT, keylogging, and corpus linguistic tools thus enable the design of a “process learner corpus” (Gilquin, 2022) for triangulated investigations of language learners’ processes of text production. Such corpora can be enriched with metadata about the writers and their texts and scaled up to include larger samples. Such data triangulation allowed Gilquin and Laporte (2021) to gain insight into patterns of online writing tool uses and the nature and quality of texts produced. In our research, the micro-analysis of screen recordings together with interview data provide insight into the depth of language processing (Leow & Mercer 2015) that appears to be in evidence when bilingual writers use online translation tools to compose academic texts in a first or an additional language (Séror & Gentil, 2021). In short, by helping to reveal the cognitive processes underlying the writing activities captured on screen, triangulating SCT data with other data sources can shed light on how L2 writing can promote the noticing and acquisition of second language resources for meaning-making.

Data reporting and presentation SCT data can also pose representational challenges when reporting data. Videobased research (Mroz, 2014; Plowman & Stephen, 2008) suggests three main strategies for representing SCT data. One approach includes employing written reconstructions to transform SCT data into narrative texts. These texts have the advantage of being easily integrated into research publications, but the difficulty of describing the details of video data verbally can easily lead to lengthy and yet reductive descriptions. A second strategy involves drawing on diagrammatic reconstructions to provide time-based graphical representations of SCT data. These diagrams help visualize the simultaneous unfolding of several processes on a timeline and are typically produced by extracting frames from a video recording and arranging them chronologically to represent a sequence of interest to “create a tellable story of the data, with [their] descriptions alongside” (Bhatt & de Roock, 2013, p. 11). The third strategy, which can be used in combination with the other two, is to include illustrative screenshots in research reports (e.g., Khuder & Harwood, 2019).

153

154

Jérémie Séror & Guillaume Gentil

The development of enhanced publications opens new possibilities for the reporting of SCT data beyond narrative reconstructions illustrated with still shots, enabling the addition of videos to the research reports. We have included and have seen included video excerpts in multimodal conference presentations, but we are not aware of writing research publications that have incorporated screencast videos. Guidelines and best practices for enhanced publications are still developing. While it may be possible to share long video excerpts or integral video recordings as online supplements, it may be preferable to integrate shorter, selected excerpts in the body of a research article for reasons including economy of presentation, protection of participants, and the energy and environmental costs of storing large video files online. Researchers may also be reluctant to share data sets that have demanded significant investments of time and resources.

Ethical issues SCT-mediated writing research further raises several ethical issues related to informed consent, privacy, and the unintended consequences of the collection of videos produced by participants in what would typically be unobserved settings. In naturalistic and even some of the more controlled studies, researchers may elect to give participants control over what they record, where, and when. As mentioned earlier, participants may choose not to record practices that could be perceived as transgressive or may change their behaviours when they know they are being recorded. In their study of paraphrasing, for example, Bailey and Withers (2018) noted they observed none of the questionable behaviours that had motivated their research. Even when participants willingly start a recording, they may not be fully aware of the incidental and unintentional observations that can be made. For instance, Takaoyshi (2016) reported that the Google search autocomplete feature could reveal potentially embarrassing search histories with little or no relevance to the research questions. With online socializing, friends appearing in chat pop-ups can also become unintended participants (Asselin & Moyaeri, 2010). Because on-screen data can easily be replete with confidential and sensitive information, they are shared based on trust. Participants with good video editing skills may delete portions of the screen capture data that they do not want to share, but it is much easier not to share a recording at all. Researchers can also review screen capture data with participants and ask them what they want to keep, hide, or delete before the data are analyzed or published. Ultimately, when participants have consented to the confidential use of the data they have helped to provide, video excerpts and other details chosen for inclusion in research presentations and publications must be carefully examined to make sure that any

Chapter 7. Direct observation of writing activity

compromising or identifying information is cut or blurred throughout the whole video segment. This important additional step requires access to video-editing software and can be time-consuming.

Avenues for future writing process research Despite the challenges noted above, SCT data remains of great interest in terms of future research implications for the study of writing processes and development. A key avenue of research will likely continue to be found in investigations of the pedagogic implications of the use of SCT for writing instruction. This work illustrates a key advantage of this methodological tool in that the data it produces is easy to recontextualize in a format (a small movie that shows a writer at work on their computer) that is easily accessible to writing instructors and developing writers. Further research could also continue to explore the use of SCT videos as feedback instruments (Özkul & Ortactepe, 2017) or the creation of a corpus of process videos to model best practices and strategies that can then be employed in classrooms to scaffold students’ writing development (Hamel & Séror, 2016). Undoubtedly, future research will also continue to benefit from the possibilities offered by SCT to advance under-researched areas associated with out-ofclass writing practices and important dimensions of writing which have to date frequently remained beyond the gaze of instructors and researchers. This includes using SCT to document the unacknowledged role of emotions in writing (SalaBubaré & Castelló, 2018). One can analyze, for example, moments of sighing or cursing in audio recorded by students as they compose or display signs of frustration through the way they suddenly close a window or delete a passage. Moreover, SCT will continue to contribute to our understanding of the impact of technological advances on writing practices and development. For instance, the emergence of increasingly accurate text-to-speech or predictive typing functions such as “Smart Compose,” a feature of Google Docs, is already transforming the processes and competencies required to transcribe ideas into written form. Undoubtedly, SCT videos documenting how students compose with assistive writing resources powered by neural networks and artificial intelligence technologies will be of great interest to educators and researchers (Leander & Burriss, 2020). While longitudinal studies of writing development using SCT remain rare, this also is an area of future work that should be of great interest to writing researchers. Indeed, whereas, as noted, a single video of a student’s writing processes can help investigate the fluid and complex nature of writing processes at a specific moment in time in a writer’s development, multiple SCT videos of

155

156

Jérémie Séror & Guillaume Gentil

the same individual over a set period can provide valuable data to examine how these processes develop over time and the possible external factors (e.g., classroom activities, the introduction of new resources, increases in familiarity with a task) that might impact this evolution. Finally, SCT will be of great use to scholars interested in investigating the composing processes of multilingual writers who draw synergistically on multiple linguistic repertoires to plan, compose, and revise their texts (Gentil, 2018). Indeed, at a time of growing interest in plurilingual approaches (Moore, 2020) and translanguaging practices (Kleyn & García, 2019; Wei, 2017), SCT affords new opportunities to explore plurilingual and crosslinguistic writing processes. While there is already a body of research on the use of the L1 and multilingual resources in composing in the L2, this work has typically drawn on introspective and retrospective data (e.g., think-alouds) and has focused on traces of writers’ multilingual cognitive processes (e.g. Gunnarsson, 2019). By contributing real-time observational data, SCT could shed additional light on how multilingual writers draw on texts in one language to compose in another and what resources and strategies they use for shuttling back and forth between languages and modalities, managing plurilingual terminology, and handling cross-lingual processes. Ultimately, SCT is but one of many compelling methodological techniques which, in combination with other tools, allow us to enrich existing empirical research on writing processes and the concepts and principles that can guide writing development. Interest in this tool is rooted in its ability to show rather than simply tell how various processes are orchestrated. This key advantage remains relevant today with the fast-paced development of new technologies for writing and multimodal and multilingual meaning-making. SCT offers an opportunity to observe how writers engage with these new composing technologies while raising awareness of their new affordances for meaning-making.

References Abdel Latif, M. M. (2019). Eye-tracking in recent L2 learner process research: A review of areas, issues, and methodological approaches. System, 83, 25–35. Asselin, M., & Moayeri, M. (2010). New tools for new literacies research: An exploration of usability testing software. International Journal of Research & Method in Education, 33(1), 41–53. Bailey, C., & Withers, J. (2018). What can screen capture reveal about students’ use of software tools when undertaking a paraphrasing task? Journal of Academic Writing, 8(2), 176–190. Bhatt, I., & de Roock, R. (2013). Capturing the sociomateriality of digital literacy events. Research in Learning Technology, 21(4).

Chapter 7. Direct observation of writing activity

Bhatt, I., de Roock, R., & Adams, J. (2015). Diving deep into digital literacy: Emerging methods for research. Language and Education, 29(6), 477–492. Braaksma, M. A. H., Rijlaarsdam, G., van den Bergh, H., Bernadette, H. A., & van HoutWolters, M. (2004). Observational learning and its effects on the orchestration of writing processes. Cognition and Instruction, 22(1), 1–36. Caws, C., & Hamel, M. -J. (2016). Language-learner computer interactions: Theory, methodology and CALL applications. John Benjamins. Cho, H. (2017). Synchronous web-based collaborative writing: Factors mediating interaction among second-language writers. Journal of Second Language Writing, 36, 37–51. Elola, I., & Mikulski, A. (2013). Revisions in real time: Spanish heritage language learners’ writing processes in English and Spanish. Foreign Language Annals, 46(4), 646–660. Gánem-Gutiérrez, G. A., & Gilmore, A. (2018a). Expert-novice interaction as the basis for L2 developmental activity: A SCT perspective. Language and Sociocultural Theory, 5(1), 21–45. Gánem-Gutiérrez, G. A., & Gilmore, A. (2018b). Tracking the real-time evolution of a writing event: Second language writers at different proficiency levels. Language Learning, 68(2), 469–506. Gánem-Gutiérrez, G. A., & Gilmore, A. (2021). A mixed methods case study on the use and impact of web-based lexicographic tools on L2 writing. Computer Assisted Language Learning, 1–27. Geisler, C., & Slattery, S. (2007). Capturing the activity of digital writing: Using, analyzing, and supplementing video screen capture. In H. A. McKee & D. N. DeVoss (Eds.), Digital writing research: Technologies, methodologies, and ethical issues (pp. 185–200). Hampton Press. Gentil, G. (2018). Modern languages, bilingual education, and translation studies: The next frontiers in WAC/WID research and instruction? Across the Disciplines, 15(3), 114–129. Gilmore, A., & Gánem-Gutierrez, G. (2020). Investigating complexity in L2 writing with mixed methods approaches. In G. G. Fogal & M. H. Verspoor (Eds.), Complex dynamic systems theory and L2 writing development (pp. 183–206). John Benjamins. Gilquin, G. (2022). The Process Corpus of English in Education: Going beyond the written text. Research in Corpus Linguistics, 10(1), 31–44. Gilquin, G., & Laporte, S. (2021). The use of online writing tools by learners of English: Evidence from a process corpus. International Journal of Lexicography, 34(4), 472–492. Gunnarsson, T. (2019). Multilingual students’ use of their linguistic repertoires while writing in L2 English. Lingua, 224, 34–50. Hamel, M. -J., & Séror, J. (2016) Video screen capture to document and scaffold the L2 writing process. In C. Caws & M. -J. Hamel (Eds.), Language-learner computer interactions: Theory, methodology, and applications (pp. 137–162). John Benjamins. Hamel, M. -J., Séror, J., & Dion, C. (2015). Writers in action: Modelling and scaffolding secondlanguage learners’ writing process. Higher Education Quality Council of Ontario. Retrieved on 27 April 2023 from https://heqco.ca/wp-content/uploads/2020/03/Writers _in_Action_ENG.pdf Hort, S. (2020). Digital writing, word processors and operations in texts: How student writers use digital resources in academic writing processes. Journal of Academic Writing, 10(1), 43–58.

157

158

Jérémie Séror & Guillaume Gentil

Jewitt, C. (2014). The Routledge handbook of multimodal analysis (2nd ed.). Routledge. Kasper, G., & Wagner, J. (2014). Conversation analysis in applied linguistics. Annual Review of Applied Linguistics, 34(2), 171–212. Khuder, B., & Harwood, N. (2015). L2 writing in test and non-test situations: Process and product. Journal of Writing Research, 6(3), 233–278. Khuder, B., & Harwood, N. (2019). L2 writing task representation in test-like and non-test-like situations. Written Communication, 36(4), 578–632. Kleyn, T., & García, O. (2019). Translanguaging as an act of transformation: Restructuring teaching and learning for emergent bilingual students. In L. de Oliveira (Ed.), The handbook of TESOL in K-12 (pp. 69–82). Wiley. Lai, S. -L., & Chen, H. -J. H. (2015). Dictionaries vs concordancers: actual practice of the two different tools in EFL writing. Computer Assisted Language Learning, 28(4), 341–363. Leander, K. M., & Burriss, S. K. (2020). Critical literacy for a posthuman world: When people read, and become, with machines. British Journal of Educational Technology, 51(4), 1262–1276. Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358–392. Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. R. (2014). Writing in the workplace: Constructing documents using multiple digital sources. Journal of Writing Research, 5(3), 285–337. Leow, R. P., & Mercer, J. D. (2015). Depth of processing in L2 learning: Theory, research, and pedagogy. Journal of Spanish Language Teaching, 2(1), 69–82. Lindgren, E., & Sullivan, K. (Eds.). (2019). Observing writing. Insights from keystroke logging and handwriting. Brill. Macgilchrist, F., & Van Hout, T. (2011). Ethnographic discourse analysis and social science. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 12(1). Manchón, R. M., Roca de Larios, J., & Murphy, L. (2009). The temporal dimension and problem-solving nature of foreign language composing processes: Implications for theory. In R. M. Manchón (Ed.), Writing in foreign language contexts: Learning, teaching, and research (pp. 102–129). Multilingual Matters. Meredith, J. (2016). Transcribing screen-capture data: The process of developing a transcription system for multi-modal text-based data. International Journal of Social Research Methodology, 19(6), 663–676. Miller, K. S., Lindgren, E., & Sullivan, K. P. H. (2008). The psycholinguistic dimension in second language writing: Opportunities for research and pedagogy using computer keystroke logging TESOL Quarterly, 42(3), 433–454. Mills, K. (2015). Literacy theories for the digital age: social, critical, multimodal, spatial, material and sensory lenses. Multilingual Matters. Moore, D. (2020). Conversations autour du plurilinguisme. Théorisation du pluriel et pouvoir des langues. OLBI Working Papers, 10, 43–64. Mroz, A. P. (2014). Process research screen capture. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (pp. 1–7). Wiley-Blackwell. Özkul, S., & Ortactepe, D. (2017). The use of video feedback in teaching process-approach EFL writing. TESOL Journal, 8(4), 862–877.

Chapter 7. Direct observation of writing activity

Park, K., & Kinginger, C. (2010). Writing/thinking in real time: Digital video and corpus query analysis. Language Learning & Technology, 14(3), 31–50. https://hdl.handle.net /10125/44225 Plowman, L., & Stephen, C. (2008). The big picture? Video and the representation of interaction. British Educational Research Journal, 34(4), 541–565. Ranalli, J., Feng, H. -H., & Chukharev-Hudilainen, E. (2018). Exploring the potential of process-tracing technologies to support assessment for learning of L2 writing. Assessing Writing, 36, 77–89. Révész, A., & Michel, M. (2019). Methodological advances in investigating L2 writing processes: Introduction. Studies in Second Language Acquisition, 41(3), 491–501. Richards, T. (2002). An intellectual history of NUD*IST and NVivo. International Journal of Social Research Methodology, 5(3), 199–214. Sabbaghan, S., & Maftoon, P. (2015). The affordances of screen capture technology for retrospective analysis of the writing process. International Journal of Research Studies in Educational Technology, 4(1), 35–50. Sala-Bubaré, A., & Castelló, M. (2018). Writing regulation processes in higher education: a review of two decades of empirical research. Reading and Writing, 31(4), 757–777. Sasaki, M. (2004). A multiple-data analysis of the 3.5-year development of EFL student writers. Language Learning, 54(3), 525–582. Séror, J. (2013). Show me! Enhanced feedback through screencasting technology. TESL Canada Journal, 30(1), 104–116. Séror, J. (2013). Screen capture technology: A digital window into students’ writing processes/Technologie de capture d’écran: Une fenêtre numérique sur le processus d’écriture des étudiants. Canadian Journal of Learning and Technology/La Revue Canadienne de l’A pprentissage et de la Technologie, 39(3), 1–16. Séror, J. (2021). Plurilingualism in digital spaces. In E. Piccardo, A. Germain-Rutherford, & G. Lawrence (Eds.), The Routledge handbook of plurilingual language education (pp. 449–464). Routledge. Séror, J., & Gentil, G. (2020). Cross-linguistic pedagogy and biliteracy in a bilingual university: Students’ stances, practices, and ideologies. Canadian Modern Language Review / La Revue Canadienne des Langues Vivantes, 76(4), 356–374. Séror, J., & Gentil, G. (2021). Plurilingual writing processes at the heart of biliteracy development. Paper presented in V. Johannsson & A. Wengelin (Organizers), S197 Writing processes: Strategies from idea to text. Symposium conducted at AILA 2021, Groningen, Netherlands. Silva, M. L. (2012). Camtasia in the classroom: Student attitudes and preferences for video commentary or Microsoft Word comments during the revision process. Computers and Composition, 29(1), 1–22. Smith, B. E., Pacheco, M. B., & de Almeida, C. R. (2017). Multimodal codemeshing: Bilingual adolescents’ processes composing across modes and languages. Journal of Second Language Writing, 36, 6–22. Takayoshi, P. (2015). Short-form writing: Studying process in the context of contemporary composing technologies. Computers and Composition, 37, 1–13. Takayoshi, P. (2016). Methodological challenges to researching composing processes in a new literacy context. Literacy in Composition Studies, 4(1), 1–23.

159

160

Jérémie Séror & Guillaume Gentil

The Language Archive. (2021, October 18). ELAN. https://archive.mpi.nl/tla/elan Tillema, M., van den Bergh, H., Rijlaarsdam, G., & Sanders, T. (2011). Relating self reports of writing behaviour and online task execution using a temporal model. Metacognition and Learning, 6(3), 229–253. Van Waes, L., Leijten, M., Wengelin, Å., & Lindgren, E. (2012). Logging tools to study digital writing processes. In V. W. Berninger (Ed.), Past, present, and future contributions of cognitive writing research to cognitive psychology (pp. 507–533). Taylor & Francis. Wei, L. (2017). Translanguaging as a practical theory of language. Applied Linguistics, 39(1), 9–30. Yoon, C. (2016). Individual differences in online reference resource consultation: Case studies of Korean ESL graduate writers. Journal of Second Language Writing, 32, 67–80.

chapter 8

Using keystroke logging for studying L2 writing processes Victoria Johansson,1,3 Åsa Wengelin2 & Roger Johansson1 1

Lund University | 2 University of Gothenburg | 3 Kristianstad University

This chapter presents an overview of keystroke logging. The chapter includes a general rationale for why and when the method is appropriate, how the technique works, and pros and cons with different methodological combinations of keystroke logging. Further, the chapter briefly outlines some previous L2 writing keystroke logging studies to illustrate the type of questions that can be addressed by this data collection technique. Finally, we discuss some methodological challenges and suggest best practices for using the method.

Introduction Keystroke logging programs have been available for the past four decades or so. With the increased accessibility and use of computers and digital writing practices in a variety of contexts (in schools, universities, and workplaces), the method has become a relevant data collection tool for writing researchers from a multitude of disciplinary backgrounds, the main areas being linguistics, language pedagogy or education, and psychology. Simply put, the general idea of keystroke logging is to capture all keyboard (and often mouse) activity during writing on a computer, or sometimes other digital tools (Spelman Miller & Sullivan, 2006). The method is particularly, although not exclusively, associated with investigations of writers’ production processes – rather than the text itself – and of how writers allocate cognitive resources during composition. Typically, research questions guiding keystroke logging studies involve an interest in the main processes (planning, revision, and translation/transcription) suggested in cognitive writing models, such as those by Hayes and Flower (1980) or Kellogg (1996). The cognitive focus is a consequence of the fact that keystroke logging programs have been developed partly with the purpose of answering the kind of research questions risen from similar theoretical frameworks, and this has also influenced the array of analyses offered https://doi.org/10.1075/rmal.5.08joh © 2023 John Benjamins Publishing Company

162

Victoria Johansson, Åsa Wengelin & Roger Johansson

in existing software. Keystroke logging has proved to be useful in addressing research questions with the purpose of exploring writers’ engagement in the writing processes, often with experimental design (see an overview from an L2 perspective in Galbraith & Vedder, 2019), including how writing processes unfold in real time. Several good overviews of the methodology exist, including examples of applied uses: The book Computer keystroke logging and writing: Methods and applications (Sullivan & Lindgren, 2006) introduces the method, positions it theoretically (see especially Spelman Miller & Sullivan, 2006), and provides ample examples of analysis. The book has a sequel in Observing writing. Insights from keystroke logging and handwriting (Lindgren & Sullivan, 2019), with more recent applied examples. In addition, we would like to direct the reader to Abdel Latif ’s (2008) state of the art-overview, and Spelman Miller, Lindgren, and Sullivan’s (2008) overview. A recent introduction is also given in Wengelin & Johansson (2023).

Introducing keystroke logging Keystroke logging is typically described as an unintrusive and observational data collection method that offers researchers the possibility to gather huge amounts of data on writers’ real-time writing (see Wengelin et al., 2019). Importantly, the existing software registers what the writers do, but not why. The unobtrusiveness of the keystroke logging is frequently described as an advantage, but a disadvantage is that the intentions of the writer will not be revealed (Spelman Miller & Sullivan, 2006). A possible solution is to combine keystroke logging with selfreport methods, such as think aloud protocols, dual/triple task paradigm, or retrospective interviews and stimulated recall, where the replay function incorporated in many keystroke logging programs invites to research self-reflection (see chapters 5 and 12, this volume). An important motivation for using keystroke logging is the possibility to examine the relationship between what writers are engaged in (the writing processes) and what these activities result in (the written product/finished texts). This has been used, for example, to gain insights into writing strategies, the relationship between writing processes and text properties, writing difficulties, or differences in writing approaches between L1 and L2 writers. Several keystroke logging programs were developed independently at different universities during the 1990s. Examples include TraceIt (Severinson Eklundh & Kollberg, 1992) and FAU-word (Levy & Ransdell, 1994), and two programs still developed today: ScriptLog (Strömqvist & Malmsten, 1997; Strömqvist &

Chapter 8. Using keystroke logging for studying L2 writing processes

Karlsson 2001; Wengelin et al., 2019) and TransLog, developed especially for translation studies (Lykke-Jakobsen, 1999; Carl, 2012). The most widely used program today is Inputlog (Leijten & Van Waes, 2013), which offers a wide range of additional resources, especially regarding analyses. Recent additions include the web-based CyWrite (Chukharev-Hudilainen, 2019), and GenoGraphiX-Log, specialized in graphic illustrations of complex writing processes (Usoof et al., 2020). Examples also include in-house software, for example EyeWrite (Simpson & Torrance, 2007). There exist stand-alone programs and integrated programs, where the latter type (for example, Inputlog integrated in Microsoft Word) presents excellent opportunities for ecological approaches, e.g., an ethnographic framework (see Leijten et al., 2014). Readers are referred to Wengelin and Johansson (2023) for a fuller elaboration of these programs.

How does keystroke logging work? The key principle of keystroke logging is to record key presses, and often also mouse movements, provide each event with a time stamp, and from this create a temporal log of the activities that take place during writing (see van Waes et al., 2016). This enables the researcher to follow the production of a text step-by-step. The basic output of most programs is a so-called log file, which presents an array of actions associated with the time stamps. The log file is the starting point for additional output files that embrace overall statistics on the writing time, the average time between key presses, and the number of deleted characters (i.e., letters, numbers, punctuation, and often also spaces and use of enter). Often, the programs also enable the researcher to set a pause criterion, that is, a specific time limit that will be viewed as a pause. To facilitate the understanding of the amount and richness of data collected, many programs offer additional analyses, such as replays of the writing session, and perhaps graphical output. To illustrate the basic principles we use ScriptLog, one of the established keystroke logging programs which we have been involved in developing. The example is a text written by a 15-year-old Swedish girl in L2 English on the topic “Why vote in the elections?”. At the time of the recording, she had studied English in school for six years. The left column in Table 1 shows the final text, while the right column displays a linear representation of the actions occurring during the text production. Here, text within angular brackets indicates the point in time for pause times (in seconds), and actions, such as using backspace, enter, arrow keys (up, down, left, right) or mouse clicks, e.g., to highlight and delete or move parts of the text. The linear representation only includes the first sentence. The pause criterion is here set to 0.5 seconds.

163

164

Victoria Johansson, Åsa Wengelin & Roger Johansson

Table 1. Example of a complete final text (left) and of a part of the corresponding linear text (right) Final text/written product [of full text]

Linear text/writing process [of first two sentences only]

Why is it important to vote in elections In sweden we have an election every 4 year. Everyone above 18 get to vote about partys they want to have in the riksdag. I think that it is important that every person with the right to vote uses it. The next four years depend much on the election, and I think you should do what you can to affect it in the way you want to. Even if you don’t know what to vote on, or if you don’t care about the result of the election you can always vote blank, maybe to show that none of the partys are good enough for you and you need more variated choices, or to tell the system that you value the democracy even if you don’t want to vote on a paricular party. In conclusion, all votes are important to the future, and because of that everybody should vote on what they think is the best party.

Why is it important to vote in elections b Why it is it important to vitote in elections

In sweden we hav e an election every 4 year .

In total, she wrote 164 words, which is equivalent to 833 characters (letters, numbers, punctuation, space, and use of enter) in her final text (see Table 2). Through statistics from keystroke logging software we get access to her writing process and we learn that her linear text consisted of 1428 characters – she thus deleted 595 characters, or 42 % of what she wrote. The basic statistics also show an extensive use of arrow keys and backspace, and also the mouse. Her total writing time was 1262 s and, by including only pauses longer than 2 s, her total pause time was 916 s, that is, 72.5 % of the writing time. ScriptLog further offers a replay of the writing session, which allows the researcher to study the unfolding of the writing in real time (or at a preferred speed).

Chapter 8. Using keystroke logging for studying L2 writing processes

Table 2. A selection of the statistical output from ScriptLog. (Proportion pause time calculated in Excel) Number of characters in final text

833

Number of characters in linear text (characters in final text plus deleted characters)

1428

Keystrokes in linear text (characters plus use of arrow keys and backspace)

1648

Events in linear text (keystrokes plus use of mouse clicks)

1733

Writing time (s)

1262.55 s

Pause time (2 s)

916.2 s

Proportion pause time (%)

72.5 %

The keystroke log from ScriptLog can be exported to Inputlog for further analyses. See, for example, the General Analysis in Table 3 (the extract is simplified for demonstration purposes), which demonstrates the basic principles of keystroke logging. The first column shows the time stamp (in seconds, elapsed from the start of the writing session); the second column, the event (input of characters or action); and the third column, the location, automatically identified by Inputlog. Table 3. Example of a General Analysis (GA) from Inputlog Time stamp

Event

Location

 00:00:07.104

W

BEFORE SENTENCES

 00:00:07.530

h

WITHIN WORDS

 00:00:07.719

y

WITHIN WORDS

 00:00:07.855

SPACE

AFTER WORDS

 00:00:08.136

i

BEFORE WORDS

 00:00:08.251

s

WITHIN WORDS

 00:00:08.311

SPACE

AFTER WORDS

 00:00:08.679

i

BEFORE WORDS

 00:00:08.991

t

WITHIN WORDS

 00:00:09.154

SPACE

AFTER WORDS

 00:00:09.441

i

BEFORE WORDS

 00:00:09.759

m

WITHIN WORDS

 00:00:10.035

p

WITHIN WORDS

 00:00:10.256

o

WITHIN WORDS

 00:00:10.848

r

WITHIN WORDS

165

166

Victoria Johansson, Åsa Wengelin & Roger Johansson

Table 3. (continued) Time stamp

Event

Location

 00:00:11.071

t

WITHIN WORDS

 00:00:11.204

a

WITHIN WORDS

 00:00:11.324

n

WITHIN WORDS

 00:00:11.501

t

WITHIN WORDS

 00:00:11.653

SPACE

AFTER WORDS

 00:00:11.895

t

BEFORE WORDS

 00:00:12.065

o

WITHIN WORDS

 00:00:12.188

SPACE

AFTER WORDS

 00:00:12.853

v

BEFORE WORDS

 00:00:12.989

o

WITHIN WORDS

 00:00:13.185

t

WITHIN WORDS

 00:00:13.325

e

WITHIN WORDS

 00:00:13.442

SPACE

AFTER WORDS

 00:00:13.610

i

BEFORE WORDS

 00:00:13.795

n

WITHIN WORDS

 00:00:13.951

SPACE

AFTER WORDS

 00:00:17.482

e

BEFORE WORDS

 00:00:17.660

l

WITHIN WORDS

 00:00:17.864

e

WITHIN WORDS

 00:00:18.815

c

WITHIN WORDS

 00:00:19.276

t

WITHIN WORDS

 00:00:19.470

i

WITHIN WORDS

 00:00:19.583

o

WITHIN WORDS

 00:00:19.734

n

WITHIN WORDS

 00:00:20.252

s

WITHIN WORDS

 00:00:21.702

RETURN

AFTER PARAGRAPHS

Inputlog offers several additional analyses, e.g., a pause analysis including an overall picture of contexts in which the writers pause (e.g., within words, between words, between sentences, between paragraphs). These automatic analyses of contexts are highly usable for a quick overview. Another quick and automatic analysis is presented by the process graph of the writing session, extracted from Inputlog (see Figure 1). The graph shows the

Chapter 8. Using keystroke logging for studying L2 writing processes

Figure 1. Example of a process graph from Inputlog

full writing session, of 21 minutes. The upper, blue line, depicts the cumulative total number of characters that are written (i.e. characters in linear text), and the lower, green line indicates the characters that are left on the screen (i.e. characters in final text). The gap between the blue and the green line thus depicts the deleted characters. A ‘dip’ in the green line indicates when the writer goes back in the text to make changes to something previously written. In this case, we can see that she goes back in her text between minutes 4 and 5 and that she also goes back, e.g., between minutes 9 and 14, to make changes to something she has previously written. The yellow dots in the graph indicate places where pauses longer than 2 s occur. The higher the dot is placed, the longer the pause (see pause time on X‑axis).

Research questions addressed in keystroke logging L2 writing studies As mentioned above, a theoretical motivation for developing the method was to enable “automatic” investigations of how writers allocate cognitive resources to different parts of the text during the whole or part of a writing session, and to do so by means of measuring pause duration and pause location. Initially, this was mostly restricted to controlled experiments in lab settings (Spelman Miller & Sullivan, 2006). However, with the increased use of digital writing in all areas of society, and with keystroke logging programs that can easily be installed on any computer, the method (by itself or in combination with other

167

168

Victoria Johansson, Åsa Wengelin & Roger Johansson

approaches, as noted above) allows for the investigation of naturalistic settings of digital writing per se. The most common analyses based on keystroke logging data focus on temporal distributions (including pauses, bursts, and fluency. See Spelman Miller, 2006b), and on revision processes (see Lindgren & Sullivan, 2006). Research has also included comparisons between writing and speaking processes (see Strömqvist et al., 2006). Additionally, attempts have been made to create an overarching model of language processing on the assumption that temporal patterns during writing would reflect general inherent patterns during language production (Spelman Miller, 2006a, b). L2 writing keystroke logging studies have often used it as a means to examine processing activity via pause duration based on the assumption that shorter pause times would indicate more automatization (cf. Révész & Michel, 2019). In what follows we review L2 writing keystroke logging studies, excluding (most of ) the studies combining keystroke logging and eye tracking, which are discussed in Chapter 9 of this volume. We start our analysis with studies interested in planning, revision, and fluency, and then add a section on studies with an educational approach. The list is not exhaustive but aims to represent different strands in the field. We refer the reader to the reference lists of the individual studies for continued exploration and more information.

Research questions focusing on factors moderating pausing behavior Several studies have inspected the moderating effect of certain variables on pausing behavior, be it learner-related variables – such as writing skill (Xu & Dingi, 2014; Xu & Qi, 2017), or other individual differences (Kim et al., 2021)–, taskrelated variables – such as task complexity (Révesz et al., 2017)–, or a combination of learner- and task-related variables (Barkoui, 2019). Barkaoui (2019, also with an excellent overview of previous studies on pauses during writing) examined English L2 writers’ argumentative essays and summaries. The study focused on the effects of task type, L2 proficiency, and keyboarding skills on the participants’ writing processes. Their pauses were investigated using a taxonomy adapted from Wengelin (2006) and Alves et al. (2007), which divides pauses (longer than 2 s) according to whether they occur in connection with (i) sentence-initial or sentence-final locations; (ii) minor delimiters (such as a comma); or (iii) deletions. In this study, keystroke logging was thus used as a means to discover which contexts are most associated with increased cognitive load. Xu and Qi (2017. See also Xu & Ding 2014, using a slightly different design) looked at planning during writing by examining the pausing behavior of English L2 writers to answer questions on differences between more or less skilled writers

Chapter 8. Using keystroke logging for studying L2 writing processes

with respect to how cognitive resources were used during the writing of argumentative texts. To do this, they used an option in Inputlog which divides the writing time into five temporal intervals of equal length. This renders an overview of when most pausing occurred during text production. Kim et al. (2021) asked questions on how individual differences regarding working memory and vocabulary knowledge in L2 English writers affected their writing processes, here especially so-called P-bursts (see below), what was revised, and when the writers engaged in revising. In addition, the study compared features in the final texts (text length and text quality) to the writers’ pausing behavior. In short, keystroke logging helped in shedding light on the relation between the final text, the writers’ background, and the writing activities that took place during text production. Keystroke logging has further been used for analyzing planning on a microlevel by examining pauses occurring in morphological boundaries in texts produced by L2 writers of French (Gunnarsson, 2006). The suggestion is that fewer and/or shorter pauses in these contexts demonstrate a more L1-like proficiency. Finally, the examination of pauses has been used to investigate the impact of task complexity. This includes Révész et al.’s (2017) study on English L2 writers producing more and less complex argumentative essays. Here, keystroke logging combined with stimulated recall helped to answer questions on what lessens the cognitive burden of planning during writing.

Research questions focusing on revision Revision processes have also been studied via keystroke logging, often in combination with other methods, such as think aloud. In some cases, the focus has been on comparing revision behavior across languages (L1/L2) and/or tasks. In other cases, the comparison has focused on more and less skilled writers. An early study by Thorson (2000) compared English L1 writers to their L2 writing in German and investigated their revision patterns in two tasks: a letter and an article. The method helped in showing how much effort (as measured by written characters which were produced but deleted) the writers put into each task, something which is not visible in the final texts. Another early contribution is Stevenson et al. (2006), who analyzed the type of revisions in relation to text quality in a group of Dutch high school students, comparing their L1 writing in Dutch and their L2 writing in English. The study, which combined keystroke logging with think aloud, introduced a multidimensional revision taxonomy to investigate whether the L2 inhibited higher level revision processes (such as generating and organizing content). Keystroke logging allowed to inspect what type of revisions the writers engaged in, and when

169

170

Victoria Johansson, Åsa Wengelin & Roger Johansson

this happened, while think aloud helped in understanding the writers’ motivations for conducting the revisions. A similar design is found in Lu and Révész’s (2021) study, which looked at L2 Chinese writing (as compared to Chinese L1 writing) using a combination of keystroke logging and stimulated recall targeting revisions. The study is of particular interest since the participants wrote in pinyin, one of the phonetic-based input methods for Chinese. This study used yet another revision taxonomy, where the revisions, revealed through the inspection of the keystroke logs, were coded according to linguistic domain, context, and level of transcriptions (here related to transcribing in pinyin, which requires the writer to consider orthography on both phonological and logographic level). The additional method with stimulated recall revealed the writers’ own orientation of the changes they made. A more recent keystroke logging study (Xu, 2018) addressed revision behavior by more and less skilled Chinese L2 writers of English. The method enabled an exploration of processes that are generally invisible in the final text by using a revision taxonomy that divided online revisions into immediate, distant, and end revisions. Revision choices were further explored by Bowen and Thomas (2020), who used keystroke logging to look at English L1 writers and Chinese L2 writers of English and their revision choices, using a taxonomy of different ‘functions’ from Systemic Functional Linguistics. Since the method allows for detailed analyses of what and when a text is revised during text writing, the results were able to relate revisions to ‘functions’.

Research questions focusing on writing fluency The concept of fluency was brought to the forefront in L2 writing research in a study (without keystroke logging) by Chenoweth and Hayes (2001). Fluency during writing will typically be calculated through a division of the number of linguistic units (words or written characters) per time unit (seconds, minutes, or the whole time on task/total writing time). The assumption behind all fluency measures is the connection between the ability to produce text without unnecessary interruptions and linguistic proficiency/writing competence. From a processing perspective, fluency has been connected to “bursts”, that is, the number of words or the number of typed characters between pauses (P-bursts) or between revisions (R-bursts). Increased fluency will occur when a writer has few and/or very short pauses and few revisions. Programs such as Inputlog have integrated several automatic analyses of fluency. Early L2 fluency studies using keystroke logging include research on Swedish high school students writing in L2 English (Lindgren et al., 2008; Spelman Miller et al., 2008). The research questions in these studies addressed how linguistic

Chapter 8. Using keystroke logging for studying L2 writing processes

experience and years of studying the L2 affected online writing processes, with a focus on the longitudinal development of revising, pausing, and fluency in L2 English, compared to L1 Swedish. The keystroke logging data were analyzed to identify the number and length of pauses, and to determine fluency by calculating the number of characters produced between interruptions (whether pauses or revisions). The results demonstrated that over the three years that the writers were followed, they increased their fluency, and decreased the number of pauses, as a function of improved L2 skills. Finally, the writers had longer time on task in their L2, which the authors interpreted as a means to compensate for their lack of linguistic resources in the target language. Fluency in relation to linguistic proficiency in English and Swedish L2 was also examined in narrative writing by first-year university students (Palviainen et al., 2012). Here, L2 proficiency was correlated with both offline fluency (e.g., number of written characters in the final text) and online fluency (e.g., number of written characters in the linear texts). Keystroke logging helped to answer questions regarding whether less proficient writers spend more time and produce more text to reach the same end results in the final text as their more proficient counterparts. Similar issues were addressed in a study of argumentative writing by Turkish L2 writers of English (Tiryakioglu et al., 2019).

Research questions with an educational focus A group of studies has made use of keystroke logging programs to study the effects of pedagogical interventions. These interventions include students receiving feedback on their own writing processes, presented through keystroke logging, or explorations of which writing processes may be more beneficial in terms of final text quality. Since student feedback is a key feature associated with writing instruction, some recent studies have used the output resulting from keystroke logging writing as input to students. The idea is that encouraging writers to inspect their own writing processes will increase their self-regulated writing capacity (see Spelman Miller et al., 2008). Inputlog has an integrated module that provides such feedback to students. Examples of articles that ask questions on how students’ writing strategies can be changed through the inspection of their writing processes include Vandermeulen et al. (2020), and Bowen et al. (2022). Other research questions involve how different writing processes relate to students’ text quality. The suggestion is that keystroke logging can help in understanding whether particular behaviors are more beneficial than others for the final text and, by identifying such processes, future writing instruction can direct writers to engage in certain practices (see example in Conijn et al., 2022).

171

172

Victoria Johansson, Åsa Wengelin & Roger Johansson

Methodological challenges This section highlights some of the challenges that the researcher may encounter when using keystroke logging.

Challenge 1: Choosing hardware Keystroke logging methodology is depending on using artifacts: computer (stationary or laptop, or possibly a writing tablet); keyboard (external or integrated on a laptop, or a virtual keyboard on a writing tablet); screen (external or integrated with laptop); and mouse (external mouse or mouse pad on laptop). The choices in relation to these devices will influence the data and, as a result, also the possibility to interpret writers’ activities correctly. The first methodological challenge is thus to make the choices of hardware. An example of how artifacts will impact the writer’s actions is the screen size. This will determine the amount of text that can be written before participants need to scroll. This can, in turn, influence the overview of the text and, as a consequence, how much of it that is read and revised. Further, choosing a keyboard layout is important. There may be advantages to letting writers use a keyboard with a layout designed for a specific language (cf. a QWERTY keyboard, used in many countries, for instance for writing in English, and an AZERTY keyboard, commonly found in France or Belgium and suitable for typing in French) but, as a consequence, one will compare writing with two different keyboard layouts. While some writers may be used to altering layouts according to language, it may also be the case that writers are used to switching between languages using one and the same keyboard layout. As a result, their transcription skills will be impeded if they change to the “language-adapted” keyboard. Writing on a laptop may involve using a mouse tablet instead of an external mouse. The choice can result in different behavioral patterns as a mouse pad is more accessible for the keyboard writer while using the external mouse requires that the writer’s hand is removed from the keyboard. Anecdotally, we have also encountered that a mouse pad can invite writers to “tap” on the mouse pad while pausing (or thinking), which results in unexpected pause patterns in the writing logs. The variation in hardware, such as how keyboard and computer are connected, may also impact the accuracy of the registration: Different connections (e.g., with or without USB connection) may result in different latencies between pressing a key and the registration of the keypress. The difference in latencies is normally at a millisecond level, but this will be important for studying very short

Chapter 8. Using keystroke logging for studying L2 writing processes

pauses. In addition, different keystroke programs calculate key presses differently. For instance, Inputlog registers the press of the key, while ScriptLog registers the pressing and releasing of a key. Differences between laptop keyboards and external keyboards will in addition influence the haptic input for the writer and may result in different revision patterns. Another important pitfall is related to the limitations of the hardware, i.e., what the computer can log, which also impacts latencies, and can be particularly important to consider when comparing recordings collected online. If recordings are made in a lab, with all participants using the same computer, keyboard, screen, mouse, operation system and other settings, there will be high comparability between writers. On the other hand, there may be many reasons for choosing to record data in more ecological settings, like a classroom, where students use similar but individual computers, or at home or at the workplace, where participants use computers with personal settings. To sum up, the decision on hardware will impact data collection immensely, and a variation in technical solutions for individual participants can influence comparability within a single study and between studies.

Challenge 2: Choosing the keystroke logging program The operative system (Mac, PC, Linux) may determine the choice of keystroke logging program, since not all programs are platform-independent. The different options for recording writing and analyzing the processes, which are incorporated in existing keystroke logging programs, will also be crucial for choosing a specific program. Issues that may be essential for the researcher concern whether or not mouse movements should be recorded, whether it is important to be able to replay the writing session, whether one wants to control input possibilities in a more experimental setting (including the presentation of stimuli) or if a more naturalistic setting suits the purpose better, whether one is interested in a design which allows to continue writing on the text over consecutive writing sessions, or whether the participants’ use of spelling and grammar checks are in focus. The final choice of program may be made in regard to the comparability with previous studies, or in relation to whether the researcher can be assisted by a user community (e.g., the many researchers using Inputlog helping each other). The choice of keystroke logging program will probably also depend on whether it should be used in combination with other methods for capturing real time processes (e.g., eye tracking, screen capturing, think aloud protocols, stimulated recall, or speech recognition). In this case, a challenge may be not only to find ways to synchronize the different processes during the onset of recording and in the post-analysis, but also to avoid getting lost in data by relying on economic

173

174

Victoria Johansson, Åsa Wengelin & Roger Johansson

ways to compare the parallel processes. The combination of several methods generally requires extensive manual coding and annotations (see, for instance, Révész et al., 2017; Stevenson et al., 2006. See also Chapter 12, this volume).

Challenge 3: Choosing settings and procedure A third challenge is to choose the settings for data recording. This includes such decisions as font and font size, size of writing window, or the choice to allow spelling and grammar checks. While a bigger font size facilitates reading and error detection, it also results in less text visible on the screen, which in turn leads to more scrolling. Including spelling and grammar checks will invite a more naturalistic writing context, but will also delimit the possibility to examine proficiency in, for instance, orthography and grammar. The choice of recording participants individually or in groups, for instance in a classroom, is also important. While it is less time consuming, and often more ecological, data collection in a classroom will involve a higher risk of participants disturbing each other and/or being interrupted by other external factors. In contrast, a lab setting will make it easier to interpret the cause of, for instance, pauses (and to avoid disturbing factors that may cause pauses), but individual writing sessions mean more time for data collection. Another methodological consideration is whether to include a copy task to establish a baseline for transcription skills (Van Waes et al., 2021). For L2 writing, controlling typing skills in different languages may be particularly important. Other choices may be to gather information on learners’ background (e.g., language proficiency, reading comprehension, working memory tests), which may be quite useful for the interpretation of findings. Finally, procedural choices can involve decisions as to whether the participants should receive instructions orally or in writing (or both), and whether the writers should be allowed to start and finish the writing session themselves. The last issue may be of special importance for studies interested in initial planning, since in these cases the design should make it possible to relate the pause between pressing start and the first keypress with planning, and not with other issues such as receiving instructions or asking clarifying questions. The same consideration should be made with regard to the last pause.

Challenge 4: Choosing analyses As outlined above, theoretical underpinnings and previous research findings have led to short cuts for selecting pause criteria, and tailored analyses for fluency, revision matrixes, and pause location, for instance (see especially options in Input-

Chapter 8. Using keystroke logging for studying L2 writing processes

log). These analyses are useful for quick explorations of the data and also allow for good comparisons among studies. However, the researcher will be helped in interpreting the data by understanding the calculations behind the different measures incorporated in the keystroke logging programs. One example is that the linguistic algorithms (in Inputlog) related to the identification of such phenomena as ‘pauses between words’ or ‘revisions at word level’ are dependent on word definitions: that is, a word is defined as a string of characters between spaces. If writers erroneously produce words without spaces or with spaces between letters, the automatically generated measures will not be accurate. Such errors may be more common with developing writers (e.g., children and L2 writers). Another possible pitfall is to use all available measures without reflection, which can easily lead to a fishing expedition for findings. For the analyses, different taxonomies may be relevant, e.g., categorization of pause location, or types of revisions (spelling, grammar, etc.). Again, one may use the in-built options that are offered by the programs or export data to, for instance, Excel for manual coding. Since it is easy to “get lost in data” and create a wide range of categories as a result, for saving time and for comparative reasons, it may be useful to consult previous research and choose from established taxonomies. Lastly, although some established procedures exist in research using keystroke logging, they have often been developed through an interplay between empirical examination and theoretically-driven approaches combined with a pragmatic approach leading to the adaptation of ad hoc criteria, suitable for particular studies. For instance, there is a convention that a pause criterion of 2 seconds is suitable for a study with adult L1 writers who are able typists with no reading and writing difficulties, where the researcher is interested in examining macro-level processes (and not aspects connected to, for instance, transcription) (see Wengelin, 2006). Yet, a very different pause criterion may be relevant for a study with L2 writers who are adapting to a different orthography, and maybe even a different alphabet (cf. Gunnarsson, 2006; Lu & Révész, 2021). In such a study, low-level processes related to transcription, spelling, morphology, and syntax may be more relevant to focus on, and a lower pause criterion will suit the purpose better, although it will decrease the comparability with studies using a 2-second criterion.

Challenge 5: Interpreting the findings The findings from keystroke logging are consequences of the actions that writers engage in during writing, and a persistent question is how they should be interpreted. For instance, the study of pauses rests on the assumption that they reflect

175

176

Victoria Johansson, Åsa Wengelin & Roger Johansson

increased cognitive activity, which may occur in contexts where writers are engaged in deeper processing. The interpretations of pause duration and pause location must be made at a general level (cf. Spelman Miller, 2006a), while overall interpretations may not apply to specific individual pauses. The same caution applies to analyses of revision patterns. L1 writing research has demonstrated that more mature, older, and experienced writers revise more, whereas the existing L2 studies suggest that revision behavior depends on L2 proficiency, among other factors. The methodological challenge lies in evaluating the writing processes together with the writing product and making informed interpretations of the behavior in the data. Adding self-reported methods may help in the interpretation of what caused a pause or the considerations behind revisions (as done, for instance, in Lu & Révész, 2021), but there is also a concern that concurrent methods such as dual/triple task and think aloud will cause reactivity during the writing process (see discussion in Wengelin et al., 2019). When interpreting L2 writing data, it is also important to remember that much of what is reported in the literature on how to use and interpret keystroke logging comes from studies on L1 writing. These findings have demonstrated variation due to motoric skills, typing skills, linguistic and cognitive development, and writing in different genres, with different time limits and according to instructions on pre-planning or post-revisions. Age, education, writing practice, working memory capacity, the task at hand, and the anticipated reader are examples of additional factors that may play major roles. It can be expected that the same variables impact L2 writing.

Best practices In what follows we provide some advice on how to make choices when using keystroke logging to explore writing processes.

Explore writing processes Our first recommendation is to take the time to explore writing processes as they are shown through the keystroke logging programs before, or in parallel to, starting to use the tailored analyses built into the programs. This is especially important for researchers new to the methodology. A typical and simple exercise is to record oneself, try out different actions during writing, and inspect how they are replayed and manifested in various analyses, with different pause criteria and settings. This will create an understanding of the types of behavior that quantitative and sometimes opaque analyses reflect. The replaying and inspection can also be

Chapter 8. Using keystroke logging for studying L2 writing processes

made using data collected with participants to discover the relation between the process and the measures, and allow for empirically grounded interpretations. We also recommend to explore data through linear files and general analyses, where it is possible to follow the writing key-by-key. Piloting different kinds of setups, including procedures for data analysis, prior to data collection may be time consuming, but it is advisable in order to avoid difficulties in interpreting the data afterwards.

Choose a program that suits your needs The various keystroke logging programs have different advantages, and future researchers are advised to choose a program that will best assist them in answering their research questions, and perhaps also in dealing with other practical issues. While there are reasons to “always” use a certain program for comparative reasons (e.g., between data sets or between consecutive studies), most basic analyses are comparable between programs, at least on a tendency level. That is, although specific pause lengths in milliseconds may not be advisable to compare between data sets collected with different programs, it is still relevant to show that findings in certain contexts, such as pauses within words or pauses in clause boundaries, are more or less common in a particular context. To give one concrete example: as mentioned above, ScriptLog and Inputlog measure pauses in slightly different ways due to how key presses are recorded. This means that it is not recommended to compare the number of pauses or the exact pause time between programs because it will vary slightly for most recordings. The need for connecting eye tracking, using spelling and grammar checks, writing in a more naturalistic setting, or collecting data online in a web-based environment will all influence the choice of keystroke logging program, as do the types of analysis that will be used later. Experimental settings allow for more control, but a more naturalistic setting – like using Microsoft Word – will allow for an increased understanding of what actions writers normally engage in. At this point, we would like to refer to some good experiences made during the pandemic, where data was collected on distance with participants installing ScriptLog on their home computers and receiving instructions on how to use it over zoom (see Ramírez Maraver, 2021 for a methodological description). It proved to be a rewarding solution for collecting L2 writing from participants all over the world, who could easily email or share their log files with the researcher after the writing session. While we recommend exploring the data through a variety of available analyses, it is equally important to have a rationale for selecting the tailored analyses and including them in a given study. Researchers must be aware of the affordances

177

178

Victoria Johansson, Åsa Wengelin & Roger Johansson

and limitations of specific analyses. Thus, criteria must be selected with care and prior reflection. In connection with this, we recommend reading up on how measures and analysis have been used in previous keystroke logging studies, and using keystroke logging manuals, if possible, to understand the theoretical basis for the choices.

Consider orthography The existing keystroke logging programs have been developed for writing using foremost the Latin alphabet. While several programs give support – direct or indirect – for other orthographies and alphabets, there is currently limited research using such data. This means that there may be behaviors occurring while writing in other alphabets that the programs do not account for accurately. One such error may be the logging of diacritics (sometimes they are not registered in the writing log until an accompanying letter has been pressed), which can affect measures such as transition times within words, general pause and revision patterns, as well as measures of fluency and number of characters and key presses. A good practice is therefore to pilot before data collection, and to find out whether or not there are any systematic errors occurring in the capturing (one way of finding this out is to use screen capturing in parallel with keystroke logging during the pilot).

Keep variables constant and include baseline tasks The section describing methodological challenges underlines the importance of keeping variables as constant as possible between participants: screen size, keyboard layout, language settings, e.g., lab setting or classroom/workplace/at home settings, letting participants press stop and start themselves, giving instructions orally and/or in writing, etc. Since many studies are based on variations during the writing process – e.g., fluency, pausing and revision depending on text type, on L1/L2, on pre-writing activities or whether participants have benefitted from receiving specific writing instructions – it is further advised to include a baseline or a copy task (to control for transcription skills). Individual variation can be substantial, and controlling for this is especially relevant in L2 writing, which involves, on the one hand, attention to less well-known spelling and orthography, and, on the other, keyboard proficiency on an unfamiliar keyboard – both of which are aspects where one mostly would expect a high degree of automaticity and fluency in L1 writing.

Chapter 8. Using keystroke logging for studying L2 writing processes

Future avenues This chapter has described the fundamentals of using keystroke logging for the study of writing processes. Examples from previous studies on L2 writing show that the methodology can constitute a rewarding tool for inspecting how writing unfolds during text production, for analyzing overall division of cognitive labor during writing, as well as a resource for targeting specific contexts and evaluate learning outcomes. Previous studies can serve as inspiration regarding questions and specific methodological choices. However, since many of these studies have addressed writing in English L2, one can expect that research that uses keystroke logging to explore L2 writing in other languages (and with non-Latin alphabets) will find hitherto undescribed writing activities, especially in relation to areas related to orthography and morphology. The main affordance of using keystroke logging is the possibility of capturing writing activities without interrupting or disturbing the writer. The method offers great validity for studying in real time what happens during writing. By using quantitative methods to statistically explore the vast data of key presses, pause locations, pause durations, and revision activities the overall picture of writers’ engagement in different tasks can be outlined. However, the researcher interested in why writers make choices to delete a word, change a sentence, or make a long pause after a paragraph will need to use additional methods. These may be concurrent self-reported methods, such as think aloud, or – to avoid reactivity – retrospective interviews, where writers are presented with their writing processes through a replay of the session while reporting on the reasons for various actions. With the increasing use of computers and, with that, a growing computer literacy, along with the development of new online methods for capturing keystrokes, the method is increasingly accessible to researchers. The output of various analyses, built into several keystroke logging programs, further enhances quick overviews. It allows for both affordable large-scale data collections, e.g., in classrooms, or for case studies with deep analyses of the individual writer.

References Abdel Latif, M. M. (2008). A state-of-the-art review of the real-time computer-aided study of the writing process. International Journal of English Studies, 8(1), 29–50. Retrieved on 27 April 2023 from http://revistas.um.es/ijes/article/view/49081 Alves, R. A., Castro, S. L., de Sousa, L., & Strömqvist, S. (2007). Influence of keyboarding skill on pause- execution cycles in written composition. In M. Torrance, L. Van Waes, & D. Galbraith (Eds.), Writing and cognition: Research and applications (pp. 55–65). Elsevier.

179

180

Victoria Johansson, Åsa Wengelin & Roger Johansson

Barkaoui, K. (2019). What can L2 writers’ pausing behavior tell us about their L2 writing processes? Studies in Second Language Acquisition, 41, 529–554. Bowen, N. E. J. A., & Thomas, N. (2020). Manipulating texture and cohesion in academic writing: A keystroke logging study. Journal of Second Language Writing, 50. Bowen, N. E. J. A., Thomas, N., & Vandermeulen, N. (2022). Exploring feedback and regulation in online writing classes with keystroke logging. Computers and Composition, 63. Carl, M. (2012). Translog – II: A program for recording user activity data for empirical reading and writing research. InProceedings of the Eight International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA). Retrieved on 27 April 2023 from http://www.lrec-conf.org/proceedings/lrec2012/pdf/614_Paper.pdf Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18, 80–98. Chukharev-Hudalainen, E. (2019). Empowering automated writing evaluation with keystroke logging. In E. Lindgren, Y. Knospe, & K. P. Sullivan (Eds.), Observing writing: insights from keystroke logging and handwriting (pp. 125–142). Brill. Conijn, R., Cook, C., van Zaanen, M., & Van Waes, L. (2022). Early prediction of writing quality using keystroke logging. International Journal of Artificial Intelligence in Education, 32, 835–866. Galbraith, D., & Vedder, I. (2019). Methodological advances in investigating L2 writing processes. Studies in Second Language Acquisition, 41, 633–645. Gunnarsson, C. (2006). Fluidité, complexité et morphosyntaxe dans la production écrite en FLE (Études Romanes de Lund 78, doctoral dissertation). Lund University. Hayes, J. R., & Flower, L. (1980). Identifying the organisation of the writing process. In L. W. Gregg & E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 3–30). Lawrence Erlbaum Associates. Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S. E. Ransdell (Eds.), The science of writing (pp. 57–71). Lawrence Erlbaum Associates. Kim, M., Tian, Y. Crossley, S. A. (2021). Exploring the relationships among cognitive and linguistic resources, writing processes, and written products in second language writing. Journal of Second Language Writing, 53. Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30, 358–392. Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. R. (2014). Writing in the work-place: Constructing documents using multiple digital sources. Journal of Writing Research, 5(3), 285–337. Levy, C. M., & Ransdell, S. (1994). Computer-aided protocol analysis of writing processes. Behavior Research Methods, Instruments, & Computers, 26(2), 219–223. Lindgren, E., Spelman Miller, K., & Sullivan, K. P. H. (2008). Development of fluency and revision in L1 and L2 writing in Swedish high school years eight and nine. International Journal of Applied Linguistics, 156, 133–151. Lindgren, E., & Sullivan, K. P. H. (2006). Analysing on-line revision. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer keystroke-logging and writing: Methods and applications (pp. 157–188). Elsevier. Lindgren, E., & Sullivan, K. (Eds.). (2019). Observing writing: Insights from keystroke logging and handwriting. Brill.

Chapter 8. Using keystroke logging for studying L2 writing processes

Lu, X., & Révész, A. (2021). Revising in a non-alphabetic language: The multi-dimensional and dynamic nature of online revisions in Chinese as a second language. System, 100. Lykke Jakobsen, A. (1999). Logging target text production with Translog. In Hansen, G. (Ed.), Probing the process in translation: Methods and results (pp 9–20). Samfundslitteratur. Palviainen, Å., Kalaja, P., & Mäntylä, K. (2012). Development of L2 writing: Fluency and proficiency. AFinLA-E: Soveltavan Kielitieteen Tutkimuksia, 4, 47–59. Ramírez Maraver, R. (2021). L1 and L2 language processing in written production and perception: Null objects in English, Portuguese and Spanish (Unpublished MA thesis), Lund University. Révész, A., & Michel, M. (2019). State of scholarship. Introduction. Studies in Second Language Acquisition 41, 491–501. Révész, A., Kourtali, N. E., & Mazgutova, D. (2017). Effects of task complexity on L2 writing behaviors and linguistic complexity. Language Learning, 67, 208–241. Severinson Eklundh, K. S., & Kollberg, P. (1992). Translating keystroke records into a general notation for the writing process (IPLab-59). Department of Numerical Analysis and Computing Science, Royal Institute of Technology, Stockholm. Simpson, S., & Torrance, M. (2007). EyeWrite (Version 5.1). Unpublished report, Nottingham Trent University, SR Research. Spelman Miller, K. (2006a). Pausing, productivity and the processing of topic in online writing. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer keystroke logging and writing: Methods and applications (pp. 131–156). Elsevier. Spelman Miller, K. (2006b). The pausological study of written language production. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer keystroke logging and writing: Methods and applications (pp. 11–30). Elsevier. Spelman Miller, K., Lindgren, E. & Sullivan. K. P. H. (2008). The psycholinguistic dimension in second language writing: Opportunities for research and pedagogy using computer keystroke logging. TESOL Quarterly, 42(3), 433–454. https://www.jstor.org/stable /40264477. Spelman Miller, K., & Sullivan, K. P. H. (2006). Keystroke logging: An introduction. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer keystroke logging and writing methods and applications (pp. 1–9). Elsevier. Stevenson, M., Schoonen, R., & de Glopper, K. (2006). Revising in two languages: A multidimensional comparison of online writing revisions in L1 and FL. Journal of Second Language Writing, 15(3), 201–233. Strömqvist, S., Holmqvist, K., Johansson, V., Karlsson, H., & Wengelin, Å. (2006). What Keystroke logging can reveal about writing. In K. P. H. Sullivan & E. Lindgren (Eds.) Computer keystroke logging: Methods and applications (pp. 45–71). Elsevier. Strömqvist, S., & Karlsson, H. (2001). ScriptLog for windows – User’s manual. Technical report, Department of Linguistics, Lund University and Centre for Reading Research, University College of Stavanger. Strömqvist, S., & Malmsten, L. (1997). Scriptlog Pro 1.04 – user’s manual. Department of Linguistics, University of Gothenburg. Sullivan, K. P. H., & Lindgren, E. (Eds). (2006). Computer keystroke logging and writing: Methods and applications. Elsevier.

181

182

Victoria Johansson, Åsa Wengelin & Roger Johansson

Thorson, H. (2000). Using the computer to compare foreign- and native-language writing processes: A statistical and case study approach. Modern Language Journal, 84, 55–70. Tiryakioglu, G., Peters, E., Verschaffel, L., Sullivan, K., & Lindgren, E. (2019). The effect of L2 proficiency level on composing processes of EFL learners: Data from keystroke loggings, think alouds and questionnaires. In E. Lindgren & K. Sullivan (Eds.). Observing writing: Insights from keystroke logging and handwriting (pp. 212 – 235). Brill. Usoof, H. A., Leblay, C. & Caporossi, G. 2020. GenoGraphiX-Log version 2.0 user guide, Les Cahiers du GERAD. Vandermeulen, N., Leijten, M., & Van Waes, L. (2020). Reporting writing process feedback in the classroom: Using keystroke logging data to reflect on writing processes. Journal of Writing Research, 12(1), 109–140. Van Waes, L., Leijten, M., Lindgren, E., & Wengelin, Å. (2016). Keystroke logging in writing research. Analyzing online writing processes. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research (pp. 410–426). Guilford Publications. Van Waes, L., Leijten, M., Roeser, J. Olive, T., & Grabowski, J. (2021). Measuring and assessing typing skills in writing research. Journal of Writing Research, 13(1), 107–153. Wengelin, A. (2006). Examining pauses in writing: Theory, methods and empirical data. In K. P. H. Sullivan & E. Lindgren (Eds.), Computer key-stroke logging: Methods and applications (pp. 107–130). Elsevier. Wengelin, Å., Frid, J., Johansson, R., Johansson, V. (2019). Combining keystroke logging with other methods: Towards an experimental environment for writing process research. In E. Lindgren & K. Sullivan (Eds.). Observing writing: Insights from keystroke logging and handwriting (pp. 30–49). Brill. Wengelin, Å., & Johansson, V. (2023). Investigating writing processes with keystroke logging. In O. Kruse, C. Rapp, C. M. Anson, K. Benetos, E. Cotos, A Devitt, & A. Shibani. (Eds.) Digital writing technologies: Impact on theory, research, and practice in higher education. Springer. Xu, C. (2018). Understanding online revisions in L2 writing: A computer keystroke-log perspective. System, 78, 104–114. Xu, C., & Ding, Y. (2014). An exploratory study of pauses in computer-assisted EFL writing. Language Learning & Technology, 18(3), 80–96. https://hdl.handle.net/10125/44385 Xu, C., & Qi, Y. (2017). Analyzing pauses in computer-assisted EFL writing: A computerkeystroke-log perspective. Journal of Educational Technology and Society, 20, 24–34. https://www.jstor.org/stable/10.2307/26229202

chapter 9

Using eye tracking to study digital writing processes Victoria Johansson,1,3 Roger Johansson1 & Åsa Wengelin2 1

Lund University | 2 University of Gothenburg | 3 Kristianstad University

This chapter presents an overview of eye tracking combined with tools for capturing digital writing (foremost keystroke logging). This includes a general rationale for why eye tracking is relevant for research on writing processes, how the technique works, and pros and cons of different eye trackers and methodological designs. The chapter describes previous L1 and L2 writing studies which have used eye tracking to illustrate the type of questions that can be addressed by this technique. Finally, some methodological challenges are highlighted and best practices are suggested. It is emphasized that information of writers’ visual attention can enrich realtime writing studies, but that the researcher must pose research questions and opt for designs bearing in mind the advantages and limitations of the technique.

Introduction The last four decades of research on writing processes involving keystroke logging have shed light on the multi-faceted and complex process of text production (see Lindgren & Sullivan, 2019, for a recent overview of the field, and Chapter 8 in this volume). Several of these studies include eye tracking as a way of capturing gaze behavior during writing. Studies in this tradition have mainly been carried out in L1 contexts, and have often used exploratory and experimental approaches, the former including case studies. Initially, we want to point the reader to some overviews that are relevant for the content of this chapter. These include summaries of L2 research on writing processes in Spelman Miller et al. (2008) and Godfroid et al. (2020), the recent book Eye tracking: A guide for applied linguistic research by Conklin et al. (2018), which includes a chapter on writing, and Godfroid’s (2020) book Eye tracking in second language acquisition and bilingualism, which specifically targets issues of relevance for the L2 researcher. Further, two new special issues on L2 and writing provide examples of studies in the area: https://doi.org/10.1075/rmal.5.09joh © 2023 John Benjamins Publishing Company

184

Victoria Johansson, Roger Johansson & Åsa Wengelin

Révész and Michel's (2019) edited special issue of Studies in Second Language Acquisition, and Godfroid et al.'s (2020) guest edited issue of Second Language Research. Since this chapter builds on research finding from studies using keystroke logging and handwriting, we recommend that Chapter 8 in this volume is read prior to this chapter. The main purpose of this chapter is to address methodological issues of combining keystroke logging with eye tracking, a recording method that can be categorized as unintrusive and observational (see Wengelin et al.’s [2019] discussion on the categorization of different methods for capturing writing processes), and which offers researchers the possibility to gather huge amounts of data of writers’ visual attention. The addition of eye tracking to the study of writing processes was theoretically motivated, typically by claiming that it would shed light on reading during writing and increase the knowledge on which role reading played for other writing processes, such as planning and revision (Andersson et al., 2006). One advantage over traditional online processing measures, for example, reaction times, is that eye tracking allows for studying processing in a more natural way since eye movements are difficult to consciously control (Conklin & PellicerSánchez, 2016). Eye tracking was introduced as a method in cognitive writing research in the early 2000s, when different research groups combined eye tracking with keystroke logging (ScriptLog; Andersson et al., 2006. See also Wengelin et al., 2009 for an overview of the methodological challenges) and handwriting (Eye & Pen; Alamargot et al., 2006). Later solutions include EyeWrite (Simpson & Torrance, 2007), and combinations with Inputlog and eye tracking (Leijten & Van Waes, 2013). More recent examples are TRAKTEXT, without keystroke logging (Hacker et al., 2017), and Chukharev-Hudilainen et al.’s (2019) combination of eye tracking and web text editors with underlying keystroke logging.

Introducing eye tracking technology The motivation for using eye tracking is that it will provide information on both what a person is focusing on and the underlying assumption that the amount of time that a person spends on focusing a certain element signals the extent of cognitive effort that is needed for processing that item (Conklin et al., 2018). Often eye tracking is described as a window to the language user’s mind – the so-called eye–mind hypothesis (Just & Carpenter, 1980). The overall idea is that the direction of the writers’ gazes represent an approximation of what information their visual attention is focused on. Although the locus of visual attention and eye location can be decoupled in certain activities, such as when a person is spying on someone but directs the gaze elsewhere to avoid detection (for findings from

Chapter 9. Using eye tracking to study digital writing processes

experimental situations, see Posner, 1980), the link between gaze direction and visual attention is assumed to be strong in tasks which require a high degree of information processing, such as reading (Rayner, 1998). Consequently, when a person looks at visual information, the gaze behavior should typically be a signature of the associated cognitive and/or linguistic processing.

How does eye tracking work? Eye tracking captures and measures the direction and movement of the eyes. These movements unfold in sequences of fixations and saccades, where fixations are the moments when the eyes remain relatively still over a brief period of time, and saccades are the rapid movements that occur from one fixation point to another (Rayner, 1998). Whereas we are virtually blind during a saccade, we have the capacity to discriminate visual information in full acuity during a fixation, but only in a small part of our visual field, less than 2°. From studies on reading static, unfamiliar texts, it is known that fixations generally last for about 200–250 ms and have intervening saccades that move the eyes about 7–9 letters forward in the text. However, these measures are highly sensitive to the physical features of the text and to language-based characteristics, such as word frequency and predictability. Where longer fixation durations usually reflect more cognitive processing, shorter saccades can indicate reading difficulties (Engbert et al., 2002; Rayner, 1998; Rayner et al., 2012). As cognitive processes typically unfold in time scales that are too fast for us to be consciously aware of (ranging from a few to hundreds of milliseconds), eye tracking is a valuable tool when attempting to capture the underlying dynamics in this intricate interplay. Currently, the most widely used commercial eye-tracking systems in writing research are remote systems from SR Research (EyeLink), Tobii and SensoMotoric Instruments (SMI, which are no longer manufactured). They all build on the same basic principle for data collection: that a person’s gaze direction is mapped onto x- and y-coordinates on a two-dimensional plane, often a computer screen. To achieve spatial mapping between gaze direction and the reference frame, the eye tracker needs to be calibrated for individual factors – such as physical eye properties, glasses, or contact lenses – and experimental settings – such as the writers’ distance to the computer screen. Commonly, the calibration is achieved by having the person look at a number of points presented within the reference frame. Spatial accuracy can then be calculated between the measured gaze locations and the actual locations (see Holmqvist et al., 2011, for examples of spatial offsets caused by different degrees of calibration success). When the eye tracker has been calibrated, it will produce gaze data corresponding to locations in space and time. In order to analyze such data properly, it eventually needs to be converted into fixations and

185

186

Victoria Johansson, Roger Johansson & Åsa Wengelin

saccades through an event detection algorithm. Some algorithms come as default with commercial eye trackers, but the researchers may sometimes want to adjust the settings.

Eye tracking and writing research Writing research can benefit from eye-tracking studies regarding the mechanisms underlying reading, which show that this process varies in accordance with different text characteristics and reader purposes (Engbert et al., 2002; Rayner, 1998, 2012). Some assumptions can further be made regarding gaze behavior and linguistic processing during writing: we can assume that when a familiar word is fixated, the reader has access to the word’s morphology, syntax, and associated semantic information. If the reader fixates another word, the information of this word is instead available (cf. Wengelin et al., 2009). However, one important difference between reading in general and reading concurrently with one’s own writing is that writers will know their own texts and that the reading therefore rarely occurs due to information acquisition. When eye tracking was first introduced in writing research, one challenge was to accurately align and synchronize the time feed from the gaze behavior with the time feed from text production. Some of the first attempts to combine writing and eye tracking solved this issue by developing integrated systems which allowed for automatic display and analyses: Eye & Pen, for handwriting (Alamargot et al., 2006); ScriptLog and EyeWrite, for keystroke logging (Wengelin et al., 2009; Simpson & Torrance, 2007); and recent examples including a web editor solution with CyWrite (Chukharev-Hudilainen et al., 2019). These attempts have proven useful in several studies (see below). The systems of seamless integration further make it possible to use (semi)automatic analyses and in that way explore the vast amount of data produced during an eye-tracking session. An illustration of one such integrated milieu from ScriptLog is found in Figure 1, which shows a screenshot from an adult writing an expository text in Swedish on the topic of cheating. The example comes from a revision phase, where the gaze (the small, green circle) is located on the first line, far from the inscription point on the last line. The right part of the window displays different analysis options, and of special interest is the information in the “source field”, where the software automatically has identified which word the writer is inspecting (in this case a misspelled version of the word vanligare – ‘more common’). This integration offers output with synchronization between the time feed from writing and gaze behavior, which allows for further analyses of fixations at different parts of the text. Another example of an integrated system is presented by Chukharev-Hudilainen et al.’s (2019) description of a novel combination of a web text editor with underlying keystroke

Chapter 9. Using eye tracking to study digital writing processes

logging, which can be paired with any eye tracker. This solution, called CyWrite, builds on EyeWrite’s idea of time-aligning logs of keystrokes, eye fixations and changes to the text moment-by-moment. Just like the integrated ScriptLog solution, it offers automatized analyses of eye and key data and, just like Inputlog, it renders output, such as process graphs, of the writing session.

Figure 1. Screenshot from ScriptLog

Research questions addressed in eye tracking writing studies In general, the combination of eye tracking and digital writing processes invites research questions about visual attention during writing and, more specifically, questions regarding reading during writing.1 The research questions in this field have been motivated both by technological advancements and theoretical underpinnings. For instance, new and more affordable types of eye trackers have influenced the possibilities to use this technology (see Chukharev-Hudilainen et al., 2019), and new ways to temporally align eye tracking data with data from writing processes have opened up new avenues for analysis. Exploratory studies with eye tracking have provided an increased understanding of how the process of reading interacts with other writing processes, while experimental studies have asked more specific questions about where and when specific linguistic elements have been fixated by writers.

1. Note that in existing studies, ‘reading’ during writing is mostly defined using a definition from reading research (Rayner, 1998). Also, note that the description of gaze behavior during writing may even encompass periods when writers do not look at the text at all.

187

188

Victoria Johansson, Roger Johansson & Åsa Wengelin

Theoretical frameworks as a source for research questions Research questions in studies of writing processes that use eye tracking characteristically relate to the role of reading and visual input described in over-arching models of writing, especially those built on Hayes and Flower’s (1980) L1 writing model, which identifies three main writing processes: planning, translating/ transcription and reviewing/revising. The model states that reading constitutes an essential part of the reviewing process, especially important for evaluating “what has been written so far”, and that reading is a prerequisite for revising. Reading activities may further serve various purposes during different parts of text production. Important sub-processes of the model include writers’ goal setting (see Hayes, 1996) and writers’ use of external sources (see Leijten et al., 2014). An additional significant framework is the role of working memory and how limitations in working memory capacity influence the writing processes (see Kellogg, 1996). Importantly, the models build on empirical data from studying writing in real time, but without the use of eye-tracking techniques. Studies of writing processes in L2 have to a great extent made use of the same theoretical models, as well as others that build on and modify them (see Galbraith & Vedder, 2019). As pointed out by Révész and Michel (2019), the study of L2 processes, including reading, is relevant for theory building of cognitive activities underlying behaviors of L2 writers.

Research questions addressing reading during writing Given the prominent role of reading processes in the models of writing, many writing studies involving eye tracking have focused on exploring how reading activities are distributed throughout the writing session, and how reading is related to other processes, such as revision and planning. Summarized below are some questions that have been addressed in previous research, with references to key studies. Some L1 writing studies have compared the reading of unfamiliar texts written by someone else with the reading of one’s own emerging text by analyzing whether the purpose of each reading type (that is, reading for gaining new insights into an unfamiliar text versus reading for error correction in one’s own text) is reflected in different reading patterns (e.g., Torrance et al., 2016). Equally, studies in L2 writing have explored how cognitive activities vary during text composition, showing that transcription/translating and revision dominated at the beginning of the writing session, while (re-)reading was more typical by the end of it (Gánem-Gutiérrez & Gilmore, 2018).

Chapter 9. Using eye tracking to study digital writing processes

L2 writing studies have also addressed questions on how writers divide their attention during writing. In this respect, it must be noted that previous research on writing processes without eye tracking (see, for example, keystroke logging in Barkaoui, 2019; or think aloud in Manchón et al., 2000) has served the purpose of being hypothesis-generating for studies using this technique. As a result, more recent studies have been able to employ eye tracking for similar, process-oriented purposes, such as, for instance, asking questions about what writers look at prior to revisions (Révész et al., 2017, 2019) or about how more independent versus more controlled writing tasks may affect writers’ division of attention (Michel et al., 2020). The relationship between the process of reading the existing text – the textwritten-so-far – and the way writers revise their texts has also been an area of interest in L1 writing research using eye tracking. In this regard, a study conducted by Johansson et al. (2010) is worth noting. The study looked at how visual access to the monitor affected reading and revision during the writing process and identified differences between writers who mostly looked at the screen during text production (monitor-gazers) and those identified as foremost keyboard-gazers. The results indicated that the former group, as compared to the latter, engaged in more reading and revision, and wrote texts with different linguistic properties. In addition, the combination of eye tracking and keystroke logging has shown that writers’ visual inspection of their emerging texts depends on the types of errors that may be found in them, which in turn leads to different acts of revision (see Van Waes et al., 2010). Revision and revision proficiency have further been associated with spelling, and eye-tracking studies have shown that spelling proficiency correlates with the amount of reading during text writing (Wengelin et al., 2014). Fluency during writing, that is, being able to produce text without unnecessary interruptions, is usually identified as an important feature of writing proficiency and has also been studied with eye tracking. In L2 writing, ChukharevHudilainen et al. (2019) compared the writing fluency (as measured by pausing and revision) of L1 and L2 writers in Turkish and English, respectively. The results showed a general increase in pausing and more looking back during L2 writing, but also that longer pauses occurred in clause boundaries or in connection with revisions in L1 writing, while they were found to be more frequent in midsentence in the L2 condition. Studies such as this one show, on the one hand, that the examination of processing differences between L1 and L2 writing can help researchers not only to understand what L2 learners are struggling with but also to contribute more knowledge on the general processes of writing. The importance of fluency during L1 writing has also been examined with eye tracking by means of a strictly controlled experiment carried out using a text box that only allowed

189

190

Victoria Johansson, Roger Johansson & Åsa Wengelin

space for writing one sentence at a time in isolation. The study, conducted by de Smet et al. (2018), used this setup to ask questions about the cognitive demands experienced by writers with different writing fluency. Picture-elicited tasks are a data collection method generally used to create semi-controlled environments in which the comparability of free-writing processes is enhanced. At a more specific level, these tasks, which are also used in the context of eye tracking to explore how visual attention shifts between picture sources and the text during writing, allowed Holmqvist et al. (2004) to ask questions about how the writer used the content of a picture to create connections between the protagonist and his friends in a written narrative. In another study, Drijbooms et al. (2019) also used pictures during a pre-writing task and asked questions about how the linguistic outcome in the final texts was influenced by the variation in the amount of attention paid by writers to picture consultation prior to writing the text. Other areas that have attracted research using eye tracking include text quality and writers’ ability to adapt their texts to the needs of future readers. The first issue was addressed, for example, in a study intended to analyze how text quality was influenced by the type of reading processes (close to the inscription point or further away) that writers engaged in during writing (Beers et al., 2010); and the second, by looking at how writers’ working memory capacity influenced their ability to modify their texts to a future reader (Alamargot et al., 2011). Tasks involving writing from sources are yet another area where eye tracking can be a useful tool. L2 researchers’ special interest in writing from sources is based on the fact that this practice is common in many school- and work-related activities such as, for instance, accessing and processing a source text when translating from one language to another (Alves et al. 2010), or using external sources, such as dictionaries, grammars or source texts during writing in many authentic writing tasks (see Leijten et al., 2019, for an example of source use in L1 and L2 writing). Eye tracking can be used to detect the contexts and frequency of such operations, as is the case with one study in which L2 writers with higher proficiency were found to use more external sources and to revise more often than their less proficient counterparts (Gánem-Gutiérrez & Gilmore, 2018). Although many eye-tracking studies in writing have targeted overall writing processes in relatively free writing tasks, some experiments have also used gaze behavior to explore the reasons for increased processing time during the production of grammatical constructions. Nottbusch (2010) and Torrance and Nottbusch (2012) investigated how visual attention changed during L1 writers’ construction of sentences with different grammatical complexity and found that increased visual attention was a clear indication that longer processing time was needed.

Chapter 9. Using eye tracking to study digital writing processes

The study of visual attention has many pedagogical applications since eye tracking can provide useful information on a variety of instructional issues to both teachers and students. One relevant area is the reception of corrective feedback, a process that typically requires reading the corrections (which may come from teachers and peers, as well as from spell-checkers and grammar software) and often results in subsequent writing. Based on these assumptions, one study investigated how students read their original written texts after having received written feedback on them (Shintani & Ellis, 2013), and another study explored how the immediate reception of corrective feedback in a chat led L2 students to attend to the errors in their texts (Smith, 2012). One general conclusion from these studies is that eye tracking helped researchers determine if and how learners attended to the corrections in their texts and used them when revising. It follows that eye tracking can be a useful technique for the development of feedback practices and to understand how students attend to teachers’ corrective feedback and revise (or not) their texts accordingly. To date, L2 writing studies that include eye tracking as a data collection procedure have often used a mixed-method approach, combining either screen capturing, digital video recording and eye tracking (Gánem-Gutiérrez & Gilmore, 2018) or stimulated recall, keystroke logging and eye tracking (Michel et al., 2020; Révéz et al., 2017, 2019). This mixed approach has proven effective in identifying general patterns of how visual attention is used by L2 writers concurrently with other activities. As such, these studies shed light on the complexity of the writer’s task, serve as input for theory building in the field, and open up avenues for further research by formulating hypotheses to be tested in more controlled contexts.

Methodological challenges Challenge 1: Choosing eye tracker for your purpose The challenges of using eye tracking in L2 research (beyond writing studies) have recently been highlighted by Godfroid and Hui (2020). One of the pitfalls described is the tendency to overlook the limitations of the technical properties of the eye tracker, especially its spatial accuracy and precision. Sufficiently high spatial resolution is necessary to accurately map what a writer is looking at within a spatial reference frame, such as, for instance, a particular word or even a morpheme in a specific location. The spatial accuracy of gaze data is dependent on the size of the areas of interest on the screen and, because of this, the font size and line spacing are critical aspects to consider when mapping gaze data with the emerging text. If a too-small font size and/or too-narrow line spacing is used, then

191

192

Victoria Johansson, Roger Johansson & Åsa Wengelin

the spatial mapping between what the writer is in fact looking at and what the eye tracker indicates that the writer is looking at might not be reliable. Appropriate temporal resolution is further important for determining with precision when something was looked at and for how long. State-of-the-art eyetracking techniques offer temporal resolution with sampling frequencies around 100–1000 Hz, that is, they can capture temporal processes unfolding as fast as at least tens of milliseconds. The type of eye tracker will have different pros and cons for the study of text production. The field of reading research (of static, unfamiliar texts) often uses high-speed eye trackers in a setup where a chin and/or a forehead rest is used. This type of setup offers high spatial accuracy and precision as well as high temporal resolution in the gaze data (see Holmqvist et al., 2012). However, as it is not possible to move the head and body during data collection with these systems, they are not well suited for writing research. More suitable alternatives are remote systems, which film the eyes from a distance – typically from a camera below the computer display – allowing writers to move their head and body to some extent without sacrificing too much spatial precision and accuracy in the gaze data (see Holmqvist et al., 2012). Since not all remote systems on the market offer sufficient spatial and temporal resolution for writing research, it will be important to carefully evaluate the affordances and limitations of specific eye-tracking systems. Today, web cameras can also be used as eye trackers but a challenge in these studies is that spatial precision, spatial accuracy, and temporal resolution in the gaze data are not currently on par with those of a dedicated eye-tracking system. The former would not typically allow the researcher, for instance, to sufficiently analyze information at a word level or sentence level accuracy (Semmelmann & Weigelt, 2018). Additionally, mobile head-mounted systems, such as eye-tracking glasses, allow for full mobility of the head and body (see for example Hacker et al., 2017). These systems record a scene video, filming in the line of the writers’ sight, onto which the gaze data is later “superimposed” (Holmqvist et al., 2012). The reference frame is therefore in constant movement and gaze data cannot automatically be associated with information within a frame, such as a particular word in a specific location. In effect, data analyses require a lot of manual coding and, as a result, the system is not suitable for studies requiring high spatial and temporal resolution. However, there are some head mounted-systems combined with magnetic head-tracking that allow for better-defined reference frames. Their data analyses are comparable to those of static/remote eye trackers, but typically have poorer spatial accuracy (Johansson et al., 2010; Torrance et al., 2016, experiment 1).

Chapter 9. Using eye tracking to study digital writing processes

Challenge 2: Compensating for the dynamic writer and the dynamic text An important challenge with writing concerns the fact that the emerging text is dynamic, which among other things means that text properties cannot be decided on before data collection. Typically, traditional reading experiments carefully control for dimensions such as the number of words, word frequency, syntactic complexity, or text layout, but the “unexpected” nature of text writing, especially in more open writing tasks, does not allow for control prior to task completion. Further challenges result from the dynamically evolving text, which will often attract writers’ visual attention in the vicinity of the inscription point. When looking at such moving information, writers’ eye movements can no longer be characterized as fixations and saccades, but rather as smooth pursuit, that is, a particular type of eye movement that only occurs when the eyes follow a moving object (in this case, the inscription point). Smooth pursuit, however, cannot be detected by current event detection algorithms (Larsson et al., 2015). The algorithms will assume that the writer is looking at static information, and these dynamic gaze data will be erroneously identified as saccades or fixations (typically as very short saccades and very long fixations). As a result, the functional meaning of smooth pursuit will be interpreted according to the assumptions traditionally assigned to fixations, that is, that longer durations represent more cognitive processing. However, if the researcher is only interested in where writers look at a particular point in time, and not in fixations or fixation durations, the data can still be informative. As writing experiments commonly involve situations where the text exceeds the space of a computer screen, extensive scrolling up and down in the text will occur. This can be difficult to accommodate in the eye-tracking setup, and the behavior will often affect spatial precision in the data with consequences for the reliability of accuracy in the vertical dimension. A challenging situation also occurs when writers move their visual attention extensively between the keyboard and the monitor during typing (see Johansson et al., 2010). If a remote eye tracker is used, only gazes on the computer screen will be tracked, and it is therefore critical that the eye tracker quickly “re-captures” the gaze data when the writer has been looking outside the screen. If not, this will have a major effect on spatial precision, resulting in spurious data. Different systems may have different “latencies” to properly re-capture the gaze signal and therefore the data may not always be reliable around those points in time.

Challenge 3: Choosing method for capturing writing processes Another issue is the researcher’s choice of the tool for capturing writing processes, and how easy it is to time-align the output of this tool with the timestamp of

193

194

Victoria Johansson, Roger Johansson & Åsa Wengelin

the eye tracker. In particular, this concerns the possibilities of automatic analyses. Studies interested in knowing why a writer makes certain choices during writing will often use self-reported methods such as think aloud, retrospective protocols, or stimulated recall, while studies mainly interested in what process the writer engages in (and perhaps when and for how long) may choose observational or comparably unintrusive methods like screen capturing or externally video filming (see examples in Matsuhashi, 1981, and Chapter 7 in this volume). Other options include the use of keystroke logging to cover experimentally controlled environments for recording writing which may either differ substantially from the writers’ normal writing software (ScriptLog, Eye & Pen) or operate behind Microsoft Word, and thus allow for enhanced ecological validity (Inputlog). The choice of method for registering the writing process will require different amounts of manual work for aligning the temporal aspects of writing, processing the gazes, and perhaps also synchronizing the self-reported verbal data.

Challenge 4: Asking questions and drawing conclusions from eye tracking studies Another concern mentioned by Godfroid and Hui (2020) is the danger of using eye tracking as a purely observational tool, only aiming to answer the question of “what does the writer look at?”. While such descriptive studies may serve the purpose of being hypothesis-generating, they will not answer questions regarding what causes the writer to look at certain elements. However, if that is the purpose of the research, it must specifically be considered in the design. From an L2 perspective, this may mean controlling for both exogenous factors (such as morphology and syntax) and endogenous factors (such as task and expertise), since it is well known from the general eye movement literature that these factors influence gaze behavior (Rayner et al., 2012). Godfroid and Hui (2020) further describe the problem of not defining and justifying the use of eye tracking, that is, not motivating what eye-tracking data contributes to the study or how to interpret the data collected. Related to this is the danger of overinterpreting specific measures, such as, for instance, interpreting an absence of gazes on certain items as a lack of attention (see also Godfroid & Spino, 2015). An example is “skipped words” in text reading. These are most likely processed covertly during fixations on the previous or following word – otherwise, the text would not be fully comprehended. In writing we can be even more certain that writers have given attention to every word they write, even if they do not explicitly look at them; this would apply to keyboard gazers, as described by Johansson et al. (2010).

Chapter 9. Using eye tracking to study digital writing processes

Best practices in using eye tracking in writing studies What is then the best practice for studying digital writing processes with eye tracking? We will relate the advice on best practices below to the previously outlined challenges. The choice of eye tracker is an important starting point, especially choosing an eye tracker that allows for sufficient spatial and temporal accuracy. As commercial eye trackers rarely output validation measures of spatial accuracy, researchers are recommended to carefully check how well the registered gaze data maps onto actual locations on the computer screen (for techniques to do this, see Holmqvist et al., 2012). As described above, researchers must also consider that dynamic gaze data is difficult to interpret, and that eye trackers with event detection algorithms can be very useful for this interpretative purpose. While most commercial eye trackers offer such algorithms, default settings may rely on assumptions and thresholds not always appropriate for a particular experimental situation (for techniques on how to adjust your settings for different situations, see Godfroid & Hui, 2020; Holmqvist et al., 2011). Related to this issue is the recommendation from Godfroid and Hui (2020) to evaluate raw data, rather than blindly rely on the processed gaze data values provided by the software. In studies interested in the visual processing of particular words, or even parts of words (such as morphemes), high resolution and/or the use of bigger font during writing may be imperative for reliability purposes (see, for instance, the choice of increased font size in Révész et al., 2019). A solution is to allow for enough space around words and lines to provide accurate word-level measurement while being aware that this decision may in turn reduce ecological validity. As a consequence, researchers should use a research design that is adjusted to the limitations of their eye tracker and also be aware of the margins or errors that play a role in interpreting the data. The advice on best practices also involves recommendations for comparing data across studies. Considering that eye trackers differ in their spatial accuracy and precision (and in their sampling frequency) and that the software systems used for recording and analyzing gaze differ in how they identify fixations and saccades (and validate accuracy and precision), it will always be very difficult to directly compare data across different studies and eye-tracking setups. As in most experimental research, the advice here is to evaluate identified differences over experimental conditions and/or participant groups within a study and compare those relative differences with findings in other studies, instead of attempting to compare absolute data values, such as values of fixation duration, across studies. See Holmqvist et al. (2011) for an in-depth discussion of these issues.

195

196

Victoria Johansson, Roger Johansson & Åsa Wengelin

Traditional reading studies with eye tracking would typically control for textual features, such as word frequency or syntactic complexity, prior to the experiment. In writing studies based on free writing tasks such text preparations are impossible. Instead, comparisons of the written texts must be carried out later in the analyses, and some variation between participants is therefore expected. Some ways to address this issue are exemplified, for example, in the semi-controlled experiment on sentence tasks conducted by Nottbusch (2010) and De Smet et al. (2018). Also, variations in reading and writing proficiency, including the automatization of transcription skills, can influence the outcome. One proposal for controlling variation in the latter is including a copy task to establish participants’ typing skills and measure their motor skills independent of such factors as lexical and linguistic knowledge (see Van Waes et al., 2021). Further, the researcher may want to carefully choose a tool for registering the writing processes. The logging software which is specially designed for capturing and analyzing real-time writing processes has the advantage of including shortcuts to the time-consuming scrutiny of writing. This includes analyses of pause location and pause duration (as a measure of cognitive load), as well as revision (deletions, additions, local or global approaches), the possibility of replaying the writing sessions, and various types of statistical and graphic outputs. Since such software has already been tailored for capturing the dynamic temporal activity of writing, the addition of eye tracking will merely add one layer to the analysis and, accordingly, it will enrich other types of data. Godfroid and Hui (2020) advocate avoiding self-reported measures and propose other options instead, such as conducting studies with controlled experimental designs which can test a priori ideas of what may influence visual behavior (examples above include Torrance & Nottbusch, 2012). When it is impossible to manipulate the variable of interest during data collection to experimentally test causal connections, Godfroid and Hui (2020) suggest an ex post facto design, that is, combining experimental research with descriptions. According to them, this approach may consist of relating participants’ test scores (of working memory, linguistic proficiency or other background factors and experiences) to their gaze behavior during reading (or writing) (see Alamargot et al., [2011] and GánemGutiérrez & Gilmore [2018] for examples).

Future avenues This chapter has described how research on digital writing processes has used eye tracking to better understand how visual attention, and especially reading, interact with overall writing processes such as translation/transcription, planning, and

Chapter 9. Using eye tracking to study digital writing processes

reviewing/revision. The study of gaze behavior as an addition to the analysis of writing processes has led to insights into how transcription skills influence the amount of time that is spent inspecting the text, and how writers with different backgrounds (age, proficiency in writing, language proficiency) look at their emerging texts differently and allocate different amounts of pause time to specific linguistic contexts. The chapter has further outlined methodological challenges concerning the choice of hardware and software, and the formulation of research questions that the methodological setup will be able to answer. It has also shown that data output is often rich and time-consuming to interpret, and consequently that studies limiting their focus to well-defined (experimental) tasks will probably be more easily conducted. Equally, theoretical starting points leading to a careful choice of research questions and corresponding methods also have the potential of being rewarding. Future studies could benefit from triangulating methods differently to ask new questions, conducting more studies comparing L1 and L2 writing in a variety of contexts, and, not the least, exploring (L2) writing processes and text production using other orthographies than the Latin alphabet. An example could be the study of revising in a non-alphabetic orthography as, for instance, Chinese as a second language. Finally, more studies in pedagogical settings may have the potential to inform better educational practices in a variety of ways.

References Alamargot, D., Caporossi, G., Chesnet, D., & Ros, C. (2011). What makes a skilled writer? Working memory and audience awareness during text composition. Learning and Individual Differences 21, 505–516. Alamargot, D., Chesnet, D., Dansac, C., & Ros, C. (2006). Eye and pen: A new device for studying reading during writing. Behavior Research Methods, Instruments, & Computers, 38(2), 287–299. Alves, F., Pagano, A., & Da Silva, I. (2010). A new window on translator’s cognitive activity: Methodological issues in the combined use of eye tracking, key logging and retrospective protocols. In I. Mees, F. Alves, & S. Göpferich (Eds.), Methodology, technology and innovation in translation process research (pp. 267–292). Samfundslitteratur. Andersson, B., Dahl, J., Holmqvist, K., Holsanova, J., Johansson, V., Karlsson, H., Strömqvist, S., Tufvesson, S., & Wengelin, Å. (2006). Combining keystroke logging with eye tracking. In L. Van Waes, M. Leijten, & C. Neuwirth (Eds.), Writing and digital media (pp. 166–172). Elsevier. Barkaoui, K. (2019). What can L2 writers’ pausing behavior tell us about their L2 writing processes? Studies in Second Language Acquisition, 41, 529–554. Beers, S. F., Quinlan, T., & Harbaugh, A. G. (2010). Adolescent students’ reading during writing behaviors and relationships with text quality: An eyetracking study. Reading and Writing, 23, 743–775.

197

198

Victoria Johansson, Roger Johansson & Åsa Wengelin

Chukharev-Hudilainen, E., Saricaoglu, A., Torrance, M., & Feng, H. (2019). Combined deployable keystroke logging and eyetracking for investigating L2 writing fluency. Studies in Second Language Acquisition, 41(3), 583–604. Conklin, K., & Pellicer-Sánchez, A. (2016). Using eye- tracking in applied linguistics and second language research. Second Language Research, 32(3), 453–467. Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). Eye tracking: A guide for applied linguistics research. Cambridge University Press. De Smet, M. J. R., Leijten, M., & Van Waes, L. (2018). Exploring the process of reading during writing using eye tracking and keystroke logging. Written Communication, 35(4), 411–447. Drijbooms, E., Groen, M. A., Alamargot, D., & Verhoeven, L. (2019). Online management of text production from pictures: A comparison between fifth graders and undergraduate students. Psychological Research, 84, 2311–2324. Engbert, R., Longtin, A., & Kliegl, R. (2002). A dynamical model of saccade generation in reading based on spatially distributed lexical processing. Vision Research, 42(5), 621–636. Galbraith, D., & Vedder, I. (2019). Methodological advances in investigating L2 writing processes. Studies in Second Language Acquisition 41, 633–645. Gánem-Gutiérrez, G. A., & Gilmore, A. (2018). Tracking the real-time evolution of a writing event: Second language writers at different proficiency levels. Language Learning, 68, 469–506. Godfroid, A. (2020). Eye tracking in second language acquisition and bilingualism. Routledge. Godfroid, A., & Hui, B. (2020). Five common pitfalls in eye tracking research. Second Language Research, 36(3), 277–305. Godfroid, A., & Spino, L. A. (2015). Reconceptualizing reactivity of think-alouds and eye tracking: Absence of evidence is not evidence of absence. Language Learning, 65(4), 896–928. Godfroid, A., Winke, P., & Conklin, K. (2020). Exploring the depths of second language processing with eye tracking: An introduction. Second Language Research, 36(3), 243–255. Hacker, D. J., Keener, M. C., & Kircher, J. C. (2017). TRAKTEXT: Investigating writing processes using eye-tracking technology. Methodological Innovations, 10(2), 1–18. Hayes, J. R. (1996). A new framework for understanding cognition and affect in writing. In C. M. Levy & S. E. Ransdell (Eds.), The science of writing: Theories, methods, individual differences, and applications (pp. 1–27). Lawrence Erlbaum Associates. Hayes, J. R., & Flower, L. (1980). Identifying the organisation of the writing process. In L. W. Gregg & E. R. Steinberg (Eds.), Cognitive processes in writing (pp. 3–30). Lawrence Erlbaum Associates. Holmqvist, K., Holsanova, J., Johansson V. & Strömqvist, S. (2004). Perceiving and producing the Frog Story. In D. Ravid & H. Bat-Zeev Shyldkrot (Eds.), Perspectives on language and language development. Essays in honor of Ruth A. Berman (pp. 293–306). Kluwer. Holmqvist, K., Nyström, M., Andersson, R., Dewhurst, R., Jarodzka, H., & Van de Weijer, J. (2011). Eye tracking: A comprehensive guide to methods and measures. Oxford University Press. Holmqvist, K., Nyström, M., & Mulvey, F. (2012). Eye tracker data quality: What it is and how to measure it. Proceedings of the Symposium on Eye Tracking Research and Applications (pp. 45–52).

Chapter 9. Using eye tracking to study digital writing processes

Johansson, R., Wengelin, Å., Johansson, V., & Holmqvist, K. (2010). Looking at the keyboard or the monitor: Relationship with text production processes. Reading and Writing, 23(7), 835–851. Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review 87, 329–54. Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S. E. Ransdell (Eds.), The science of writing (pp. 57–71). Lawrence Erlbaum Associates. Larsson, L., Nyström, M., Andersson, R., & Stridh, M. (2015). Detection of fixations and smooth pursuit movements in high-speed eye tracking data. Biomedical Signal Processing and Control, 18, 145–152. Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30, 358–392. Leijten, M., Van Waes, L., Schrijver, I., Bernolet, S., & Vangehuchten, L. (2019). Mapping master’s students’ use of external sources in source-based writing in L1 and L2. Studies in Second Language Acquisition, 41, 555–582. Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. R. (2014). Writing in the work-place: Constructing documents using multiple digital sources. Journal of Writing Research, 5(3), 285–337. Lindgren, E., & Sullivan, K. (Eds.). (2019). Observing writing: Insights from keystroke logging and handwriting. Brill. Manchón, R. M., Roca de Larios, J., & Murphy, L. (2000). An approximation to the study of backtracking in L2 writing. Learning and Instruction, 10(1), 13–35. Matsuhashi, A. (1981). Pausing and planning: The tempo of written discourse production. Research in the Teaching of English, 15, 113–134. https://www.jstor.org/stable/40170920 Michel, M. Révész, A., Lu, X, Kourtali, N. -E., Lee, M. & Borges, L. (2020). Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study. Second Language Research, 36(3), 307–334. Nottbusch, G. (2010). Grammatical planning, execution, and control in written sentence production. Reading and Writing, 23(7), 777–801. Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3–25. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422. Rayner, K., Pollatsek, A., Ashby, J., & Clifton Jr., C. (2012). The psychology of reading. Psychology Press. Révész, A., & Michel, M. (2019). Introduction to the special issue. Studies in Second Language Acquisition, 41, 491–501. Révész, A., Michel, M., & Lee, M. (2017). Investigating IELTS Academic Writing Task 2: Relationships between cognitive writing processes, text quality, and working memory. British Council, Cambridge English Language Assessment and IDP. Révész, A., Michel, M., & Lee, M. (2019). Exploring second language writers’ pausing and revision behaviors: A mixed-methods study. Studies in Second Language Acquisition, 41(3), 605–631.

199

200

Victoria Johansson, Roger Johansson & Åsa Wengelin

Semmelmann, K., & Weigelt, S. (2018). Online webcam-based eye tracking in cognitive science: A first look. Behavior Research Methods, 50(2), 451–465. Shintani, N., & Ellis, R. (2013). The comparative effect of direct written corrective feedback and metalinguistic explanation on learners’ explicit and implicit knowledge of the English indefinite article. Journal of Second Language Writing, 22(3), 286–306. Simpson, S., & Torrance, M. (2007). EyeWrite (Version 5.1). Unpublished report, Nottingham Trent University, SR Research. Smith, B. (2012). Eye tracking as a measure of noticing: A study of explicit recasts in SCMC. Language Learning & Technology, 16(3), 53–81. https://hdl.handle.net/10125/44300 Spelman Miller, K., Lindgren, E. & Sullivan. K. P. H. (2008). The psycholinguistic dimension in second language writing: Opportunities for research and pedagogy using computer keystroke logging. TESOL Quarterly, 42(3), 433–454. Torrance, M., Johansson, R., Johansson, V., & Wengelin, Å. (2016). Reading during the composition of multi-sentence texts: An eye-movement study. Psychological Research, 80, 729–743. Torrance, M., & Nottbusch, G. (2012). Written production of single words and simple sentences. In V.W. Berninger (Ed.), Past, present, and future contributions of cognitive writing research to cognitive psychology. Psychology Press. Van Waes, L., Leijten, M., & Quinlan, T. (2010). Reading during sentence composing and error correction: A multilevel analysis of the influences of task complexity. Reading and Writing, 23(7), 803–834. Van Waes, L., Leijten, M., Roeser, J. Olive, T., & Grabowski, J. (2021). Measuring and assessing typing skills in writing research. Journal of Writing Research, 13(1), 107–153. Wengelin, Å., Frid, J., Johansson, R., & Johansson, V. (2019). Combining keystroke logging with other methods: Towards an experimental environment for writing process research. In E. Lindgren & K. Sullivan (Eds.), Observing writing: Insights from keystroke logging and handwriting (pp. 30–49). Brill. Wengelin, Å., Johansson, R., & Johansson, V. (2014). Expressive writing in Swedish 15-yearolds with reading and writing difficulties. In B. Arfé, J. Dockrell, & V.W. Berninger (Eds.), Writing development and instruction in children with hearing, speech and oral language difficulties (pp. 242–269). Oxford University Press. Wengelin, Å., Torrance, M., Holmqvist, K., Simpson, S., Galbraith, D., Johansson, V., & Johansson, R. (2009). Combined eye tracking and keystroke-logging methods for studying cognitive processes in text production. Behavior Research Methods, 41(2), 337–351.

part iii

Critical reflections on the implementation of data collection instruments and procedures and on data analysis procedures

chapter 10

Exploring the generation, development, and integration of argumentative goals in L1 and L2 composition processes Methodological considerations Julio Roca de Larios University of Murcia

The main purpose of this chapter is to critically examine a number of methodological issues related to the analysis of L1 and L2 writing processes from a genre-based perspective – a domain of inquiry that has remained largely underdeveloped in the L2 writing research agenda. I discuss the range of theoretical assumptions and empirical considerations that led me to become involved in the research, describe the most relevant methodological challenges faced when attempting to trace the generation, elaboration, and integration of writers’ argumentative goals across languages, and illustrate the decisions I made at these three stages. I finally evaluate the significance of the analytical categories and procedures previously discussed and suggest avenues for further research.

Introduction The present chapter is intended as a critical reflection on key issues in the analysis of data on writing processes with adult English as a Foreign Language (EFL) learners in instructional settings. It adds a new, genre-oriented dimension to the research program our team has been conducting over the last twenty years or so on different aspects of L2 composing (e.g., López-Serrano et al., 2019; Manchón et al., 2000; Manchón & Roca de Larios, 2007, 2011; Murphy & Roca de Larios, 2010; Roca de Larios et al., 1999; Roca de Larios et al., 2008). Specifically, I discuss, from a perspective focused on the internal dimension of writing tasks (Manchón, 2014), the methodological challenges involved in the analysis of how writers generate, develop, and integrate their goals when composing L1 and L2 argumentative texts. Goals in writing research are usually understood as internal representations of desired outcomes at pragmatic, rhetorical, ideational, or linhttps://doi.org/10.1075/rmal.5.10roc © 2023 John Benjamins Publishing Company

Chapter 10. Exploring the generation, development, and integration of argumentative goals

guistic levels, which are generated by writers with different degrees of specificity and used as guides or focal points in the cognitive processes involved in their production of texts (Cumming, 2012, 2020; Galbraith & Vedder, 2019; Graham, 2018; Manchón & Roca de Larios, 2011; Nicolás-Conesa et al., 2014). Given the evolving nature of the writing process, an essential characteristic of goals is that they may be accepted, modified, or rejected as composing proceeds (Hayes & Nash, 1996) and the writer’s representation of the task develops (Wolfersberger, 2007; Khuder & Harwood, 2019).

Overview of the research programme: Rationale, aims, and methods The research aim outlined above is a response to persistent calls in the field to explore the role of genre knowledge in composing processes across languages as a way to understand how L2 writing development is achieved (e.g., Gentil, 2011; Kobayashi & Rinnert, 2013; Parks, 2016; Rinnert et al., 2015; Tardy et al., 2020). Such calls are based on the conceptualization of writing as an inherently individual and social activity in which the act of composing is understood as comprising not only the deliberate choices writers make but also the socially constructed goals they draw upon and develop in their multiple-language writing (e.g., Byrnes, 2020; Cumming, 2020; Hyland, 2011; Tardy, 2009). These assumptions, however, are in clear contrast with most cross-linguistic studies on L2 writing processes which, as a result of being directly or indirectly inspired by cognitive models of composition (e.g., Bereiter & Scardamalia, 1987; Hayes & Flower, 1986; Kellogg, 1996), tend to see writing as an internal, individualistic process only loosely connected with the social demands of text construction that crystallize as genres (Hyland, 2011; Parks, 2016). As a result, very little research has explicitly looked at process knowledge as a component of genre knowledge. Most authors interested in L1-L2 writing processes typically ask writers to undertake different types of tasks at various levels of difficulty on the assumption that successful performance on each type will involve some “familiarity with the corresponding genre both at the process and the formal levels” (Gentil, 2011, p. 9). Moreover, researchers usually limit themselves to identifying the frequency of writing processes (e.g., Breuer, 2019; Chenoweth & Hayes, 2001; Tiryakioglu et al., 2019) or the specific moment in which they occur throughout the composition (e.g., Tillema, 2012). However, although these approaches have made it possible to look at writers’ choices and decisions in terms of occurrence or non-occurrence, their significance in the completion of specific texts has generally been overlooked (Byrnes, 2011). There is therefore a need to analyze these processes as forming part of a genre-oriented,

203

204

Julio Roca de Larios

hierarchical sequence of decisions that extend from the beginning to the end of the composition. To address these concerns, I revisited the data collected in a study we had previously conducted to analyze the planning processes engaged in by a group of 21 Spanish writers as they completed two 1-hour argumentative tasks, one in English and one in Spanish, while thinking aloud (TA) in open laboratory booths (Manchón & Roca de Larios, 2007). The writers included seven high school students (HS, henceforth) with a pre-intermediate level of English proficiency, seven university students of Education (SE) at an intermediate proficiency level, and seven recent graduates in English Philology (PH) with an advanced command of English. The three groups had received some guidance in writing as part of their English courses, but no specific instruction aimed at developing their argumentation skills. The advanced group, however, had had greater contact with English and more extensive L2 writing practice, particularly in academic writing. The writing prompts used in the study read like this: L2 task: Success in education is influenced more by the student’s home life and training as a child than by the quality and effectiveness of the educational program. Do you agree or disagree? (adapted from Raimes, 1987). L1 task (translated from Spanish): Failure at school is a result of teachers’ lack of responsibility in the fulfillment of their duties rather than a result of students’ attitudes, effort, aptitudes, and motivation. Do you agree or disagree? The data collected (written texts and TA transcriptions) were originally analyzed to identify the length and temporal distribution of pre-writing and online planning episodes, without taking into account the genre of the tasks (for details, see Manchón & Roca de Larios, 2007). Bearing this previous purpose in mind, my decision to reanalyze these data from a genre-oriented perspective was triggered by the observation that many of the decisions and choices made by the participants were based, implicitly at least, on the requirements and conventions of argumentation. Further reasons included the suitability of think-aloud protocols to provide rich data on writing processes (Manchón & Leow, 2020), and especially to capture writers’ goals (e.g., Galbraith & Vedder, 2019), as well as the fact that the application of diverse analytical approaches (in this case, a genre-oriented approach integrated into a previous cognitive one) to the same type of material (written texts and TA transcriptions) might reveal multiple aspects of foreign language writing (see Cumming, 2009).

Chapter 10. Exploring the generation, development, and integration of argumentative goals

Methodological decisions, challenges, and solutions Against this background, three major assumptions guided the reanalysis of the data. A first assumption involved the consideration of the argumentative genre as an “intellectually challenging problem” that, viewed from a functional perspective, requires the ability to recognize the existence of a real or imagined difference of opinion about a controversial issue (e.g., Ferreti & Graham, 2019). Such conflict is expected to be linguistically addressed through different moves that typically include claiming a position, supporting that claim with reasons, acknowledging an alternative claim, and restricting or modulating the opposing claims with the use of counter argumentation (e.g., Ferreti et al., 2009). Within this framework, the handling of arguments has traditionally been contemplated from an adversarial perspective in which writers are expected to develop a position and support it with reasons and evidence. However, recent trends seem to advocate a dialogic, cooperative problem-solving approach in which argumentation is viewed as the exploration and integration of the different sides of an issue to reach a reasoned conclusion (Coffin & Halloran, 2009; Nussbaum, 2008, 2021). A second assumption was based on the realization that the complex sequence of conceptual, rhetorical, and linguistic decisions involved in the development of argumentative moves could be interpreted in terms of goals. From a sociocognitive perspective, argumentative goals are taken to be differentially constructed by writers as a function of their knowledge of argumentative discourse, topic, or interlocutor, as well as the cognitive processes they may engage in to activate that knowledge (e.g., Ferreti & Graham, 2019; Ferreti & Lewis, 2018; Nussbaum & Kardash, 2005). Finally, I assumed that the management of those goal-oriented processes by the participants in the study could adequately be captured through the adaptation of some analytical categories previously used in L1 writing contexts to examine how ideational and rhetorical goals are generated, developed, and integrated by writers throughout the composition process (e.g., Flower et al., 1992; Nussbaum, 2008, 2021). The decision to apply these categories to the data would involve a systematic comparison of written texts and TA protocols that could also be informed, when necessary, with the insights gained in our previous studies of backtracking (Manchón et al., 2000), restructuring (Roca de Larios et al., 1999), lexical searches (Murphy & Roca de Larios, 2010) and linguistic reflection (López-Serrano et al., 2019). I presumed that the analyses of these formulation processes might provide information not only on the local orientation (compensatory or upgrading) of the different linguistic choices made by the writers (as previously reported in those studies) but also on their textual meaning-making significance as potential or actual expressions of the different

205

206

Julio Roca de Larios

argumentative moves being constructed throughout their compositions (see Byrnes, 2020). With those assumptions in mind, I made two major methodological decisions intended to shed light on the composing processes involved in the construction of argumentative texts. On the premise that cognitive processing leaves many traces in the written product that can be inferred from text analysis (Hayes, 2012; Sanders & Schilperoord, 2006; Torrance, 2016; van Wijk, 1999), the first decision involved the selection of a taxonomy of argumentative moves that would allow me to analyze the texts produced by the participants as an initial procedure to access their decision-making processes and goals. From this broad point of departure, I decided to conduct a subsequent, more fine-grained data analysis with the specific intention of capturing writers’ moment-by-moment engagement in the generation, development, and integration of argumentative goals throughout the composition. Inspired by theoretical perspectives that argue for a close connection between writers’ own perceptions of task demands, goals, and processing activity during task performance (e.g., Flower et al., 1992; Graham, 2018; Manchón, 2014; Nicolás-Conesa et al., 2014), as well as by those asserting the dialectical nature of argumentative reasoning (Kuhn, 2005; Nussbaum, 2021), the data provided by protocols and texts were then jointly analyzed to explore whether and how writers: i. strived to initially conceptualize the task as an argumentative problem, ii. managed to construct a network of argumentative goals and subgoals throughout the composition, and iii. attempted to create a coherent text through the integration of conflicting goals.

A taxonomy of argumentative moves As for the first decision, I examined several taxonomies used in L1 and L2 writing research to analyze argumentative texts (Crammond, 1998; Nussbaum & Kardash, 2005; Qin & Karabacak, 2010; Rinnert et al., 2015; van Weijen et al., 2018). After comparing the categories in each taxonomy, I concluded that the range of moves to be explored in the study should be sufficiently broad to account for the differences in ideational, rhetorical, and linguistic complexity observed in the texts produced by the three groups. With this aim in mind, I decided that an adaptation of the taxonomies proposed by Crammond (1998) and Qin and Karabacak (2010), supplemented with the Introduction and Conclusion categories suggested by Rinnert et al. (2015), would be the best option to identify argument structures in the written texts with a certain level of depth and accuracy (Bracewell & Breuleux,

Chapter 10. Exploring the generation, development, and integration of argumentative goals

1994). Further reasons for this choice of taxonomies included, on the one hand, their extensive use in research, which made them especially valuable for comparisons purposes, and, on the other hand, their suitability to accommodate both adversarial and dialogic forms of argumentation (see the section on Integration of goals below). Table 1 includes the definitions of the moves selected, illustrated with examples from the texts produced by some participants. (NOTE: Each participant has been identified with the abbreviation of their group, a number, and the task condition. The texts originally written in Spanish in the L1 condition have been translated into English). Table 1. A taxonomy of argumentative moves Definition of move

Illustrative examples from written texts

Introduction. Background information to contextualize the topic and/or situate the two sides of the issue in question.

School failure is an important issue today. Never before had it been given such importance. If the pupil got negative results, it was always his/her own fault, but now it seems that this conception is changing. (SE4, L1)

Claim: An assertion formulated as a response to a contentious issue or problem. Claims may be presented as problem statements, evaluations, or opinions.

For me, school failure is caused neither by the lack of responsibility of the teacher nor by the attitude of the pupil, but rather by a combination of both variables. (Claim as opinion; SE1, L1)

Qualification. Qualifies or places limits on the universal applicability or strength of the claim (may involve the use of conditionals or terms like “perhaps”, “possibly”, etc.).

For me, it is really difficult to fully agree or disagree with the question raised. I’d rather place myself in a position of balance, though if I had to choose between one or the other argument, I’d go for the influence of the student’s home life. (PH7, L2)

Data. Evidence or reasons that are adduced to support a claim. Data may take the form of facts or accepted truths, definitions, logical reasoning, personal experience, or value judgments.

When you are a child, you do the things that you see people do, so the other people are models to you because if they do something you do the same in that moment or in another moment. (Facts intended to support a previous claim on the importance of home life; SE5, L2)

Data backing. Statements that reinforce or support the data. They may include examples, principles, authorities, or explanations.

This fact has been learnt by me in Psychology this year and is called “learning by observation”. (A scientific principle backing the previous data; SE5, L2)

Counterargument claim. The possible opposing views that challenge the validity of a

Although we must not forget that in some cases the physical, psychological, social, etc.,

207

208

Julio Roca de Larios

Table 1. (continued) Definition of move

Illustrative examples from written texts

claim. They are also known as restrictions or reservations.

conditions of the pupil are responsible for this failure. (A claim that challenges a previous claim asserting the relevance of teachers in school failure; SE2, L1)

Counterargument data. Evidence or reasons that are adduced to support a counterargument claim.

There may be several reasons for this, for example, that the child’s learning ability is limited or that the child is still not very interested, etc. (Facts adduced as reasons to support the previous counterargument claim; SE2, L1)

Rebuttal claim. A statement intended to counter the force of a counterargument by pointing out the weaknesses of the claim or data.

However, it is also possible to learn without a teacher, which further emphasizes the responsibility of the learner. (Rebuttal intended to weaken the force of a previous counterargument claim that underscored the importance of teachers; PH3, L1)

Rebuttal data. Evidence that is used to support a rebuttal claim.

Most of us have had a horrible teacher or suffered a disastrous teaching method. (Personal experience supporting the rebuttal claim above; PH3, L2)

Conclusion. It usually consists of a reinstatement of the position taken.

It can be concluded that the educational system is based on two fundamental pillars which are the teacher and the student. If one fails, the result will always be school failure. (HS3, L1)

On the premise that a piece of discourse is considered an argument if it minimally consists of a claim-data combination (Wolfe et al., 2009), the coding of students’ texts with the categories presented above was quite straightforward in some cases. In other cases, the recursiveness inherent to the argumentation process, e.g., the claim of an argument could serve as data for another argument (Voss, 2005), led me to take great care in disentangling the different lines of reasoning in arguments, counterarguments, and rebuttals. The recognition of recurrent linguistic elements that signalled their presence in the texts was very useful for that purpose (see also Qin & Karabacak, 2010; Stapleton & Wu, 2015). I found that claims were usually associated to phrases like “for me”, “I think”, “in my opinion,” or “without a doubt”; data, to conjunctions such as “because” or “since”, or phrases like “a reason is that …”, “there are different reasons …”; and counterarguments and rebuttals, to certain phrases such as “it does not mean that …”, “it is also possi-

Chapter 10. Exploring the generation, development, and integration of argumentative goals

ble that …”, or conjunctions like “however” or “although”. When the alignment of linguistic units and argumentation was not so transparent, either because the discourse markers used were erroneous or because there was no explicit signalling of coherence relations, I had no choice but to conduct a careful reading and analysis of the co-text in which the problematic segment had arisen (Basturkmen & von Randow, 2014).

Coding the initial representation of the task The analysis of the initial representation of the task was intended to ascertain whether and how the participants explored its topic, genre and/or potential audience, established their initial position towards the issue, and set top-level rhetorical and content goals in the form of ideas that would subsequently be specified and formulated or discarded (Flower et al., 1992). Equipped with these criteria, I examined the initial episodes in the TA protocols and the notes (if any) written by the participants, along with the opening sentences of their written texts. This procedure allowed me to make decisions on whether each participant conceptualized the task from an argumentative perspective. I interpreted that this was the case when the data showed that the writer addressed the characteristic goals of the argumentation genre, i.e., from the minimum requirement of a claim plus supporting data (see above) to more complex goals involving the consideration of both sides of the problem (home and school, L2 task; the teacher and the students, L1 task). This interpretative process was greatly facilitated when the protocols included explicit comments on these issues, usually in the form of “pre-writing planning” episodes, which allowed participants to understand task demands, discuss their stance on the topic, set goals, and establish connections between them at a pre-linear level (Manchón & Roca de Larios, 2007). However, when these episodes did not occur, I could only rely on the initial moves previously identified in the written texts to infer participants’ implicit goals. Through the application of these criteria, three main types of approaches to the task were identified at this initial stage of the composition process. In the first type, writers generally read the prompt several times, paused for a few seconds, and immediately moved on to writing. This approach was often linked to differences across languages. While in the L1 condition writers usually adopted at least a minimal argumentative perspective (claim plus data combinations), their approach in the L2 often involved a mixture of narration, personal experiences, and fragmentary statements about what they initially thought education was about. A second type of behaviour was represented by those writers who read the prompt and verbalized something else before starting to write. While

209

210

Julio Roca de Larios

some of these writers limited that “something else” to stating their initial position, other writers tried to understand task demands, wondered how they would introduce the issue (rhetorical goals), and even came up with ideas (content goals) that would be developed while writing. Finally, a more explicit and complex approach was shown by the writers who, in addition to reading the prompt, verbalized their initial interpretation of the task through outlines which included sets of rhetorical and content-related notes to be subsequently developed.

Identification of goal networks throughout the composition: Temporal and hierarchical dimensions The identification of goals and their transformation into subgoals, by acting as “sets of instructions for generating content and language and a body of criteria for testing them” (Flower et al., 1992, p. 203), also presented specific problems. The combined reading of written texts and protocols showed that goal networks were not always constructed in an orderly, top-down fashion since, in a relevant number of cases, traces of goal setting in the form of claims, data, or qualifiers, for example, could be found anywhere in the protocols. The problem, then, was to decide how goals could be analyzed both temporally and hierarchically. The temporal order of goals was relatively easy to identify by looking at the sequence of steps followed by participants in the construction of their texts. For that purpose, the segmentation procedures we had used in previous studies to code the flow of information in the TA protocols (see, for example, Manchón & Roca de Larios, 2007; Roca de Larios et al., 2008) were of great help. These procedures allowed me to sequentially trace how writers set their goals in a chronological fashion. In contrast, the analysis of the hierarchical order of goals turned out to be more demanding as it was intended to bring to light the richness and complexity of cognitive processes behind the argumentative moves already identified in the written texts. Drawing on the coding criteria suggested by Bracewell & Breuleux (1994), I found that a very useful procedure for this purpose was to try and identify the logical relations between ideas, i.e., cause and effect, part and whole, condition, adversative, concessive, alternative, or-exlusive, identity, equivalence, modality, degree, definite, location and time. When this set of logical categories was applied to the data, I found that the relations they represented were sometimes explicitly verbalized by the participants in the form of “online planning”, “lexical and syntactic searches”, “evaluation” or “revision” episodes (see, for example, Murphy & Roca de Larios, 2010; Roca de Larios et al., 1999). In these cases, goal networks were relatively easy to identify. On other occasions, however, very little information was verbalized far and beyond the specific text segment being analyzed, and the identification of goals had to be conducted through the indirect

Chapter 10. Exploring the generation, development, and integration of argumentative goals

reconstruction of the thought processes engaged in by participants. In addition to the procedures used in the identification of argumentative moves in the written texts (see above), I relied for that purpose on the verbalizations that surrounded the actual writing of the segment in question (mostly rereadings and occasional comments). Following the cooperative principle (Crammond, 1998, p. 264), the main aim here was to make inferences on the specific argumentative functions played by contiguous pieces of text (see van Weijen et al., 2018). These coding procedures allowed me to identify different types of goal construction which, in turn, included several variations in the L1 and L2 conditions. Globally considered, they consisted of those behaviours in which writers (i) approached the task either as a narrative account of personal experiences or a presentation of ideas in a fragmentary fashion; (ii) produced a kind of expository prose which, despite being interspersed with occasional claim-data combinations, seemed to be mostly aimed at describing the main characteristics of each opposing side rather than arguing a position; or (iii) followed the pathways opened up through prewriting and/or online planning, and managed to develop articulated sets of goals and subgoals involving lines of interrelated moves for and against one or both sides of the argument. As an illustration of the generation and development of genuine argumentative goals, Table 2 below shows and interprets the steps (chronologically ordered in numbers) followed by SE6 when composing the third sentence of her L1 text. Before reaching this point, she had completed the previous two sentences with the purpose of developing two goals previously planned in her outline: (i) to claim that school failure is a highly controversial issue; and (ii) to support this controversial condition with a set of alleged causes that served as data for that claim: family environment, social context, personal characteristics, etc. The writer is now addressing a new goal, i.e., how to develop the teacher-student relationship, which is privileged as the main cause of failure and elaborated through a claim-data combination. As can be seen, the interpretation of the episode integrates most of the categories presented above, i.e., argumentative moves, composing processes, and logical relations. Table 2. An example of the elaboration of a network of argumentative goals TA verbalizations above and beyond the written text

Written text (sentence 3)

Interpretation

1. All these factors may contribute to school failure …

With the theme (“factors”) of this first clause in sentence 3, the writer returns to the causes presented in the previous sentence (family environment, social context, etc.). By doing so, she ensures cohesion

211

212

Julio Roca de Larios

Table 2. (continued) TA verbalizations above and beyond the written text

Written text (sentence 3)

Interpretation and prepares a subsequent qualification with a specific modality (may) and cause (contribute) to the claim to come.

2. I want to introduce now the teacher and the student relationship (main goal, written as a note in the outline), but I don’t know how to move from all those factors to the concrete topic of the teacher and the student interaction, which is the key issue (RR pswsf )

The writer verbalizes (online planning) the main goal in this episode which (i) is part of a more global rhetorical goal previously verbalized, i.e., writing an introduction (written as a note in the outline); and (ii) involves, in turn, a more specific, subordinate subgoal: how to instantiate the transition from the general ("those factors") to the specific ("the teacher and the student"). 3 … but the problem should certainly be centered on …

4. (RR pswsf ) on what? how can I put this? (RR notes in the outline and pswsf )

The transition problem is initially solved through an adversative relation (but) with a specific modality (certainly should), indicating that the claim is signalled through a problem statement (problem). The writer addresses (lexical search) another subordinate goal, i.e., finding the theme of the claim.

5. the teacher – student relationship 6. and now I have to say why (written as a note in the outline).

She finds the theme by reinstating the note mentioned in step 2 above. The writer moves (on-line planning) to a still more subordinate goal within the “instantiate the transition” subgoal, i.e., formulating the data to justify the claim.

7. since it is in the classroom where

After writing the first sentence of the text, the writer (online planning) had verbalized: “I don’t

Chapter 10. Exploring the generation, development, and integration of argumentative goals

Table 2. (continued) TA verbalizations above and beyond the written text

Written text (sentence 3)

Interpretation know if I should focus only on school or also on high schools or university … I think it is better to focus only on school”. Based on this decision, the data to justify the claim are initially formulated with a clause involving the topicalization of location ("it is in the classroom") with a specific modality (present, large applicability).

8. How could I put this? where it “ocurre” (occurs), “tiene lugar” (takes place)? (RR pswsf )

The writer now sets the still more subordinate goal of deciding on the specific term to name the action in that location through a lexical search (steps 8–9). 9. takes place

10. I’d put “process” but I don’t think that is the right word (RR pswsf )

the goal is met by choosing one of the terms, an or-exclusive logical relation The initial potential agent of that action ("process"), retrieved from memory, is negatively evaluated. A new lexical search is initiated with the subordinate goal of finding an alternative to “process” (steps 10–14).

11. most of the

12. how could I put this? I don’t want to write much more on this, I have already written a very long sentence (RR pswsf ) I don’t like “process” but OK I’ll put it

a qualifier of degree is introduced to limit the scope of application of the claim. In view of the impossibility of retrieving alternatives and her awareness that the sentence is too long, the writer finally stops the search and writes down “learning process” with a negative evaluation.

13. learning process *

213

214

Julio Roca de Larios

Table 2. (continued) TA verbalizations above and beyond the written text

Written text (sentence 3)

14. I don’t like it, so I’ll put an asterisk next to it … well, since I have already written about the teacher and the student, I can start talking about each of them I’ll start first with the teacher (main goal) (RRsentence 3)

Interpretation After reiterating her negative evaluation of the term used and giving herself a second chance when revising the text (*), the writer moves back to the outline and reinstates the next goal there to be developed, “the teacher” (online planning).

Code: RR = rereading; pswsf = part of the sentence written so far.

Integration of conflicting goals In order to shed light on the way writers approached one of the crucial dimensions in recent conceptualizations of the argumentative genre, i.e., the integration of positions or goals in conflict (Nussbaum, 2008, 2021), I had to decide whether there was evidence of this confrontation in the data and, if so, how it could be accounted for. In consonance with the trends previously identified on goal generation and networking (see above), a first analysis showed that a few students, mostly from the HS group, did not refer to any conflict between the home and the school (L2 task) because they interpreted and developed the task as a personal narration or a combination of fragmentary ideas. Most participants, however, addressed such confrontation through a variety of procedures which ranged from the production of arguments in an additive and, at times, inconsistent fashion, to the engagement in coherent reasoning processes supporting one position or asserting the complementarity of both sides. The challenge then was to decide how this variety of confrontational patterns could be analyzed in more precise ways by means of theoretically grounded categories. Given the lack of attention to this issue in the L2 writing field, I turned to L1 writing research specifically focused on the integration of conflicting arguments (e.g., Mateos et al., 2018; Nussbaum, 2008, 2021). Drawing on theoretical assumptions claiming that argumentative reasoning is dialectal in nature (e.g., Kuhn, 2005), this line of inquiry has looked at argument-counterargument integration by empirically identifying different strategies used by writers for that purpose. Globally, these strategies range from those focused on one-sided, adversarial reasoning (e.g., refuting strategies) to others more in consonance with two-sided, dialogic reasoning (e.g., weighing and synthesizing strategies).

Chapter 10. Exploring the generation, development, and integration of argumentative goals

When I tried to use this framework for the analysis of texts and protocols in which the confrontation of positions was acknowledged by the students, different challenges arose. A major challenge was that an important number of participants seemed to address such confrontation in ways that could not directly be interpreted in terms of refutation, weighing, or synthesizing. These students tended to address the development of their goals through the production of more or less elaborated lists of ideas in which they described certain characteristics of each position, occasionally argued for them in an unconnected way, and even ended up with contradictory claims in favour of the same side (see also previous section). Drawing on Nussbaum (2008), I concluded that this type of behaviour, which I termed “listing”, could only be regarded as a form of pseudo-integration. A second challenge was to find which students, if any, relied on refutation strategies, defined in this field of inquiry as a type of one-sided reasoning intended to prove that the arguments opposing the writer’s own views are erroneous, irrelevant, or insufficiently supported (Nussbaum, 2008, 2021). With this definition in mind, I selected the texts and protocols that, from my point of view, involved some form of one-sided reasoning and tried to verify if they could also be regarded as examples of refutation. The analysis showed, however, that the writers composing these texts did not explicitly argue against a position. Rather, they seemed to rely on what Mateos et al. (2018) have reported as “arguing in support”, i.e., opting for one side almost from the outset and paying little attention to the other side throughout the entire composition. Examples of writers using this strategy included those that (i) focused only on how the educational shortcomings of the family are compensated for by the school, without paying any attention to the limitations involved; (ii) made the teacher entirely responsible for school failure on account of pupils’ young age; or (iii) claimed for and supported the absolute primacy of home life in the L2 task and of personal factors in the L1 task. Weighing and synthesizing strategies were also considered for analysis as genuine representatives of two-sided, dialogic reasoning. Weighing strategies were taken to be used by participants when they judged the relative merits of the two sides of the argument and decided that both were complementary to some extent (Mateos et al., 2018). Since the abovementioned listing approach might also involve the consideration of both positions, a preliminary step for coding purposes was to establish a criterion that would help me make a clear distinction between listing and weighing strategies. After comparing different protocols related to both strategic domains, I concluded that the key issue in the use of weighing strategies, unlike listing ones, was that arguments and counterarguments were extensively interrelated by means of restrictions, specifications, examples, and illustrations, usually with the aim of showing that the issue under consideration was complex and open-ended.

215

216

Julio Roca de Larios

As for synthesizing, defined as the strategy used by writers to formulate a claim in the form of a creative solution that integrates the advantages of each side (Mateos al., 2018), the main challenge was to try and identify those protocols that contained a unifying idea or principle of that kind. Although in a couple of cases such an idea was easily identifiable from the initial stages of the composition, there were other cases in which the identification was more laborious because the writers concerned appeared to hit upon the new, binding principle after having written a good portion of their texts. It looked as if the very act of writing enabled them to analyze the conflicting issues more deeply and engage in a process of discovery (Galbraith, 2009). Table 3 below shows an illustration of how PH2 discovered and developed a new integration goal in the final stages of his L2 composition process. The writer had initially verbalized his favourable attitude towards the integration of both sides (home and school) together with the difficulty and importance of weighing their value (“both factors can have some influence on success, but the question is which one is more influential … this is the interesting point”). With this top goal in mind, he elaborated a network of goals (materialized in paragraphs 2, 3, 4, 5, and 6) which were intended to gauge the force of each side by claiming the importance of certain factors while at the same time restricting their relevance (for example, “the usefulness of acquiring certain study habits at home as a foundation for facing more serious matters of study in the future” versus “the real possibility of not being taught to plan and organize your time by your parents when you are a child”). After writing paragraph 6, the writer is now synthesizing the idea of “individualism” as a “solution to everything”, i.e., the confrontations previously discussed. The steps followed for that purpose include, first, the generation of data in support of that solution (paragraphs 7 and 8) and, second, its proper formulation as a major claim in the conclusion (paragraph 9). (NOTE: Due to space limitations, only those segments in the first two columns with “ “ symbols represent either literal verbalizations or authentic written text produced by the writer. The remaining segments are summaries). Viewed as a whole, the analyses of students’ integration of goals indicated that (i) almost half of the participants showed similar levels of goal integration in both languages (either listing, one-sided, or two-sided reasoning); (ii) two students upgraded in the L2 the integration levels achieved in the L1 (moving, respectively, from listing to one-sided reasoning and from one-sided to two-sided reasoning); and (iii) the remaining students displayed a lower level of integration when moving to the foreign language (e.g., from one-sided reasoning to listing or from two-sided to one-sided reasoning). Developmentally, a gradual progression could be observed across groups, ranging from the predominance of nonargumentative strategies (narratives and listing) in the HS group, and the use of

Chapter 10. Exploring the generation, development, and integration of argumentative goals

Table 3. An example of integration of conflicting goals through synthesizing TA verbalizations

Written text

“I’ll put individualism as the solution to everything” On-line planning, lexical searches, translations, and rereadings are used to write both paragraphs

Individualism is posited as a synthesizing goal (online planning) Two paragraphs (7–8) are written to explain that this synthesizing goal is based on the writer’s study habits acquired early in life (para 7) and the importance he attaches to the personal character of the individual (para 8).

“Now I have to write the conclusion …” Immediate translations of words and phrases are the main procedure used by the writer to compose this paragraph

Interpretation

The explanations offered in these two paragraphs are the data provided by the writer to justify his subsequent, major claim in the conclusion (see below).

The next rhetorical goal to be addressed (the conclusion) is advanced (online planning) Written text (para 9): “It is difficult, in my opinion, to decide which of the said factors has a bigger influence, but I have always thought that external, disfavourable conditions can be overcome by personal initiative and an active predisposition to create your own ways to be successful.”

After reinstating the difficulty mentioned at the beginning of the composition, the major claim is now formulated: Individualism, represented by “personal initiative” and “active disposition to create your own ways”

mixed strategies (both non-argumentative and argumentative ones) by the SE students, to the overall use of genuine argumentative strategies (one-sided, weighing and synthesizing) by the PH group.

217

218

Julio Roca de Larios

Methodological conclusions and implications for future studies This chapter has discussed different analytical procedures and categories intended to look at composing in connection with the generation, elaboration, and integration of genre-related goals which, acting as “forces outside the individual”, help the writer “define problems, frame solutions and shape the text” (Hyland, 2011, p. 20). Important methodological challenges and concerns involved in such analyses have also been considered and illustrated through the exploration of how participants differentially constructed argumentation as a rhetorical space across languages. In this sense, the chapter may be regarded as a processoriented contribution to the conceptualization of argumentation in L2 writing, an endeavour currently regarded as much needed in some quarters (see Hirvela, 2017). Although the range of concerns discussed is by no means comprehensive, it nonetheless exemplifies the importance of making systematic and well-grounded methodological decisions when the robustness and significance of findings are at stake. This is the case, for example, with the cross-linguistic and developmental trends on goal integration identified in the data and reported in the previous section. Although necessarily brief due to space limitations, this report provides concrete evidence that engaging in data analysis with rigour and consistency has been instrumental not only in making visible the different stages learners had to traverse in their efforts to gradually elaborate and coordinate arguments across languages but also in making the interpretation of these findings possible from transfer-oriented (e.g., Cumming, 2020; Rinnert et al., 2015) and developmental (e.g., van Wijk, 1999) perspectives. Despite these benefits, the analytical procedures discussed above also present some limitations. Being mostly focused on structural and logical relations, their application to the data led me to overlook the quality of arguments. Future studies might address this limitation by (i) drawing on frameworks especially designed for the integrated assessment of argumentative structural elements and reasoning quality (Chuang & Yan, 2022; Stapleton & Wu, 2015); (ii) exploring how writers differentially use factual and conceptual knowledge in their argumentation processes of construction and critique (Osborne et al., 2016); or (iii) applying the Aristotelian concepts of ethos, logos and pathos (Uysal, 2012) to the analysis of thought processes identified in protocols and texts. Alternatively, the quality of argumentative processes might also be explored through Appraisal, a systemic functional construct intended to capture how writers formulate specific attitudinal values and express their stance on an issue in relation to other alternative positions by means of dialogic expanding (e.g., aligning themselves with the opinion of others, considering different interpretations) and dialogic contracting (e.g.,

Chapter 10. Exploring the generation, development, and integration of argumentative goals

proclaiming their own views, contrasting their views with those of others) strategies (Martin & White, 2005; Ryshina Pankova, 2014). Limitations can also be observed in coding the development of goals, especially when the protocols did not offer sufficient clues about the alignment of certain argumentative moves previously identified in the texts with their corresponding cognitive processes. This lack of clues was also an important drawback when deciding whether some learners used listing strategies for goal integration. Although the key point in the use of these strategies, as discussed above, was the production of occasional arguments for each position without clear connections between them, it was difficult at times to determine whether such connections existed or not. Further refinements, such as the analysis of thematic progression (McCabe, 2021), are thus needed to increase the precision of coding. It would also be highly beneficial for that purpose if the data provided by texts and protocols were handled and interpreted in combination with the information provided by other data collection procedures such as keyboard logging and eye-tracking (see chapters 8 and 9, this volume) or screen capturing (see Chapter 7, this volume). The coding skills gained with the analyses of such variety of data might be applied or extended to other tasks, genres and populations within the broader panorama of how L1 and L2 writing processes differ, relate, or complement each other across contexts (Cumming, 2020).

Funding The study reported in this chapter is part of a wider research programme financed by the Spanish Ministry of Science and Innovation (Research grant PID2019-104353GB-100) and the Séneca Foundation (Research Grant 20832/PI/18).

References Basturkmen, H., & von Randow, J. (2014). Guiding the reader (or not) to re-create coherence: Observations on postgraduate student writing in an academic argumentative writing task. Journal of English for Academic Purposes, 16, 14– 22. Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Lawrence Erlbaum Associates. Bracewell, R. J., & Breuleux, A. (1994). Substance and romance in analyzing think-aloud protocols. In P. Smagorinsky (Ed.), Speaking about writing: Reflections on research methodology (pp. 55–88). Sage.

219

220

Julio Roca de Larios

Breuer, E. O. (2019). Fluency in L1 and FL writing: An analysis of planning, essay writing and final revision. In E. Lindgren & K. Sullivan (Eds.), Observing writing. Insights from keystroke logging and handwriting (pp. 190–211). Brill. Byrnes, H. (2011). Beyond writing as language learning or content learning. Constructing foreign language writing as meaning making. In R. M. Manchón (Ed), Learning-to-write and writing-to-learn in an additional language (pp. 133–153). John Benjamins. Byrnes, H. (2020). Towards an agenda for researching L2 writing and language learning: The educational context of development. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 73–94). John Benjamins. Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18(1), 80–98. Chuang, P-L., & Yan, X. (2022). An investigation of the relationship between argument structure and essay quality in assessed writing. Journal of Second Language Writing, 56. Coffin, C., & O’Halloran, K. A. (2009). Argumentation reconceived? Educational Review, 61(3), 301–313. Crammond, J. (1998). The uses and complexity of argument structures in expert and student persuasive writing. Written Communication, 15(2), 230–268. Cumming, A. (2009). The contribution of studies of foreign language writing to research, theories, and policies. In R. M. Manchón (Ed.), Writing in foreign language contexts: Learning, teaching and research (pp. 209–231). Multilingual Matters. Cumming, A. (2012). Goal theory and second language writing development, two ways. In R. M. Manchón (Ed.), L2 writing development: Multiple perspectives (pp. 135–164). De Gruyter Mouton. Cumming, A. (2020). L2 writing and L2 learning: Transfer, self-regulation and identities. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 29–48). John Benjamins. Ferretti, R. P., & Graham, S. (2019). Argumentative writing: Theory, assessment, and instruction. Reading and Writing: An Interdisciplinary Journal, 32, 1345–1357. Ferretti, R. P., & Lewis, W. E. (2018). Knowledge of persuasion and writing goals predict the quality of children’s persuasive writing. Reading and Writing: An Interdisciplinary Journal, 32(6), 1411–1430. Ferretti, R. P., Lewis, W. E., & Andrews-Weckerly, S. (2009). Do goals affect the structure of students’ argumentative writing strategies? Journal of Educational Psychology, 101(3), 577–589. Flower, L., Schriver, K. A., Carey, L., Haas, C., & Hayes, J. R. (1992). Planning in writing: The cognition of a constructive process. In S. P. Witte, N. Nakadate, & D. Cherry (Eds.), A rhetoric of doing: Essays on written discourse in honor of James L. Kinneavy (pp. 181–243). Southern Illinois University Press. Galbraith, D. (2009). Writing as a discovery. British Journal of Educational Psychology, 1(1), 5–26. Galbraith, D., & Vedder, I. (2019). Methodological advances in investigating L2 writing processes. Challenges and perspectives. Studies in Second Language Acquisition, 41(3), 633–645. Gentil, G. (2011). A biliteracy agenda for genre research. Journal of Second Language Writing, 20(1), 6–23.

Chapter 10. Exploring the generation, development, and integration of argumentative goals

Graham, S. (2018) A revised Writer(s)-Within-Community model of writing, Educational Psychologist, 53(4), 258–279. Hayes, J. R. (2012). Modelling and remodelling writing. Written Communication, 29(3) 369–388. Hayes, J. R., & Flower, L. S. (1986). Writing research and the writer. American Psychologist, 41(10), 1106–1113. Hayes J. R., & Nash, J. G. (1996). On the nature of planning in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing (pp. 29–55). Lawrence Erlbaum Associates. Hirvela, A. (2017). Argumentation & second language writing: Are we missing the boat? Journal of Second Language Writing, 36, 69–74. Hyland, K. (2011). Learning to write: Issues in theory, research and pedagogy. In R. M. Manchón (Ed.), Learning-to-write and writing-to-learn in an additional language (pp. 17–35). John Benjamins. Kellogg, R. (1996). A model of working memory in writing. In M. Levy & S. Ransdell (Eds.), The science of writing (pp. 57–72). Lawrence Erlbaum Associates. Khuder, B., & Harwood, N. (2019). L2 writing task representation in test-like and non-test-like situations. Written Communication, 36(4): 578–632. Kobayashi, H., & Rinnert, C. (2013). L1/L2/L3 writing development: Longitudinal case study of a Japanese multicompetent writer. Journal of Second Language Writing, 22(1), 4–33. Kuhn, D. (2005). Education for thinking. Harvard University Press. López-Serrano, S., Roca de Larios, J., & Manchón, R. M. (2019). Language reflection fostered by individual L2 writing tasks: Developing a theoretically motivated and empirically based coding system. Studies in Second Language Acquisition, 41(3), 503–527. Manchón, R. M. (2014). The internal dimension of tasks: The interaction between task factors and learner factors in bringing about learning through writing. In H. Byrnes & R. M. Manchón (Eds.), Task-based language learning. Insights from and for L2 writing (pp. 27–52). John Benjamins. Manchón, R. M., & Leow, R. (2020). An ISLA perspective on L2 learning through writing. In R. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 335–356). John Benjamins. Manchón, R. M., & Roca de Larios, J. (2007). On the temporal nature of planning in L1 and L2 composing. Language Learning, 57, 549–593. Manchón, R. M., & Roca de Larios, J. (2011). Writing to learn in FL contexts: Exploring learners’ perceptions of the language learning potential of L2 writing. In R. M. Manchón (Ed.), Learning-to-write and writing-to-learn in an additional language (pp. 181–207). John Benjamins. Manchón, R. M., Roca de Larios, J., & Murphy, L. (2000). An approximation to the study of backtracking in L2 writing. Learning and Instruction, 10(1), 13–35. Martin, J. R., & White, P. R. R. (2005). The Language of Evaluation. Appraisal in English. Palgrave. Mateos, M., Martín, E., Cuevas, I., Villalón, R., Martínez, I., & González-Lamas, J. (2018). Improving written argumentative synthesis by teaching the integration of conflicting information from multiple sources. Cognition & Instruction, 36(2), 119–138. McCabe, A. (2021). A functional linguistic perspective on developing language. Routledge.

221

222

Julio Roca de Larios

Murphy, L., & Roca de Larios, J. (2010). Searching for words: One strategic use of the mother tongue by advanced Spanish EFL writers. Journal of Second Language Writing, 19(2), 61–81. Nicolás-Conesa, F., Roca de Larios, J., & Coyle, Y. (2014). Development of EFL students’ mental models of writing and their effects on performance. Journal of Second Language Writing, 24, 1–19. Nussbaum, E. M. (2008). Using argumentation vee diagrams (AVDs) for promoting argument – counterargument integration in reflective writing. Journal of Educational Psychology, 100, 549–565. Nussbaum, E. M. (2021). Critical integrative argumentation: Toward complexity in students’ thinking. Educational Psychologist, 56(1)1–17. Nussbaum, E. M., & Kardash, C. M. (2005). The effects of goal instructions and text on the generation of counterarguments during writing. Journal of Educational Psychology, 97(2), 157–169. Osborne, J. F., Bryan Henderson, J., MacPherson, A., Evan, S., Wild, A., & Shi-Ying, Y. (2016). The development and validation of a learning progression for argumentation in science. Journal of Research in Science Teaching, 53(6), 821–846. Parks, S. (2016). Workplace writing: From text to context. In R. M. Manchón & P. K. Matsuda (Eds.), Handbook of second and foreign language writing (pp. 223–241). De Gruyter Mouton. Qin, J., & Karabacak, E. (2010). The analysis of Toulmin elements in Chinese EFL university argumentative writing. System, 38(3), 444–456. Raimes, A. (1987). Language proficiency, writing ability, and composing strategies: A study of ESL college student writers. Language Learning, 37, 439–468. Rinnert, C., Kobayashi, H., & Katayama, A. (2015). Argumentation text construction by Japanese as foreign language writers: A dynamic view of transfer. Modern Language Journal, 99, 213–245. Roca de Larios, J., Murphy, L., & Manchón, R. M. (1999). The use of restructuring strategies in EFL writing: A study of Spanish learners of English as a foreign language. Journal of Second Language Writing, 8, 13–44. Roca de Larios, J., Manchón, R. M., Murphy, L., & Marín, J. (2008). The foreign language writer’s strategic behavior in the allocation of time to writing processes. Journal of Second Language Writing, 17, 30–47. Ryshina-Pankova, M. (2014). Exploring academic argumentation in course-related blogs through ENGAGEMENT. In G. Thompson & L. Alba-Juez (Eds.), Evaluation in Context (pp. 281–302). John Benjamins. Sanders, T., & Schilperoord, J. (2006). Text structure as a window on the cognition of writing; How text analysis provides insights in writing products and writing processes. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), The handbook of writing research (pp. 386–402). Guilford Publications. Stapleton, P., & Wu, A. (2015). Assessing the quality of arguments in students’ persuasive writing: A case study analyzing the relationship between surface structure and substance. Journal of English for Academic Purposes, 17, 12–23. Tardy, C. M. (2009). Building genre knowledge. Parlor Press.

Chapter 10. Exploring the generation, development, and integration of argumentative goals

Tardy, C., Sommer-Farias, B., & Gevers, J. (2020). Teaching and rehearsing genre knowledge: Towards an enhanced theoretical framework. Written Communication, 37(3), 287–321. Tillema, M. (2012). Writing in first and second language: Empirical studies on text quality and writing processes (Doctoral dissertation). Utrecht University, LOT Publications. Tiryakioglu, G., Peters, E., & Verschaffel, L. (2019). The effect of L2 proficiency level on composing processes of EFL learners: Data from keystroke loggings, think alouds and questionnaires. In E. Lindgren & K. Sullivan (Eds.), Observing writing. Insights from keystroke logging and handwriting (pp. 212–235). Brill. Torrance, M. (2016). Understanding planning in text production. In C. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research (2nd ed., pp. 72–87). Guildford Press. Uysal, H. H. (2012). Argumentation across L1 and L2 writing: Exploring cultural influences and transfer issues. Vigo International Journal of Applied Linguistics, 9, 133–159. Van Weijen, D., Rijlaarsdam, G., & van den Bergh, H. (2018). Source use and argumentation behavior in L1 and L2 writing: A within-writer comparison. Reading and Writing: An Interdisciplinary Journal, 32(6), 1635–1655. Van Wijk, C. (1999). Conceptual processes in argumentation. A developmental perspective. In M. Torrance & D. Galbraith (Eds.), Knowing what to write. Conceptual processes in text production (pp. 31–49). Amsterdam University Press. Voss, J. F. (2005). Toulmin’s model and the solving of ill-structured problems. Argumentation, 19, 321–329. Wolfe, C. R., Britt, A. M., & Butler, J. A. (2009). Argumentation schema and the myside bias in written argumentation. Written Communication, 26(2), 183–209. Wolfersberger, M. A. (2007). Second language writing from sources: An ethnographic study of an argument essay task (Doctoral dissertation). University of Auckland.

223

chapter 11

Affordances and limitations when using Inputlog to study young learners’ pausing behavior in L2 writing Aitor Garcés, Raquel Criado & Rosa M. Manchón University of Murcia

This chapter focuses on methodological considerations in a study in which keystroke-logging data was used for the analysis of young English as a foreign language (EFL) learners’ pausing behavior while writing in their L2. We first present the rationale behind the study and subsequently discuss methodological considerations in the operationalization of the construct of pausing behavior, challenges and problems related to data analysis, and solutions adopted. In the final part, we suggest directions for further research.

Overview of the study Rationale A strand of research in the study of L2 writing processes has looked at the phenomenon from the angle of pausing behavior, which is an important element of writing (Barkaoui, 2019; Révész et al., 2021). Pausing behavior can offer tangible and indirect evidence of underlying writing cognitive processes contemplated in cognitive models of writing, such as planning, transcription, or revision (Chenoweth & Hayes, 2001; Flowers & Hayes, 1980; Hayes, 2012; Kellogg, 1996). This is so because pauses, as the absence of physical activity during the writing event (Medimorec & Risko, 2017), constitute an important element in the writing process since they occupy three-quarters of the time spent on writing (Alamargot et al., 2007). Additionally, pauses are regarded as problem indicators since interruptions in the writing process are accompanied by and usually reflected as such. As a result, they may be indicative of cognitive activity – for instance, when searching for the most appropriate word in the L2 (Alves et al., 2007; Wengelin, 2006). https://doi.org/10.1075/rmal.5.11gar © 2023 John Benjamins Publishing Company

Chapter 11. Affordances and limitations when using Inputlog

The keystroke-logging software used in our study, Inputlog 8.0., is a useful tool for the study of pausing behavior. As discussed in Chapter 8 (this volume), the main functionality behind Inputlog lies in its research potential to record pausing and revision behaviors unobtrusively, while they are happening in real time, for later observation and study (Leijten & Van Waes, 2013; Van Waes et al., 2011). Inputlog is integrated within an ordinary word processor (e.g., MS Word) and records all the online writing behaviors (keystrokes, mouse clicks, pauses, deleted and inserted characters, etc.), with no restrictions applied as to where the writing action is taking place (Leijten & Van Waes, 2013). The fact that L2 writers are writing in a usual environment such as MS Word, and that no visible traces of Inputlog are shown on the computer while they are writing, makes it an ecologically valid methodological procedure to obtain data about writing behaviors, from which cognitive processes underlying writing can be inferred. Importantly, a useful feature of Inputlog is that it captures quantitative data about pause frequency, location, and duration. More specifically, the analysis of pauses includes relevant information about pausing behavior in connection with pauses at different text boundaries (within words, before words, before sentences and before paragraphs) as well as intervals between pauses. This information includes the number of pauses as well as the mean duration in each location, among other data. Keystroke logging software also allows for the establishment of a specific pause threshold parameter, or minimum pause length. For instance, Inputlog allows setting the pause threshold at 2000 milliseconds, thus restricting the analysis to pauses above this pause threshold. Other pause thresholds are equally possible depending on the purpose of the study (see Van Waes & Leijten, 2015). All the data captured by Inputlog are collected in an .idx file (Inputlog file format) which may be further processed into an XML file. To this end, Inputlog offers a wide variety of analyses which, in the study reported in this chapter, were general analysis, summary analysis, pause analysis, and revision analysis. These analyses provide fine-grained data about the writing process itself from several perspectives (Leijten & Van Waes, 2013; Van Waes & Leijten, 2015). Thus, “general analysis” includes detailed information about all the keystrokes, mouse movements, and timestamps in a linear manner. “Summary analysis” offers information about the total process time, the number of characters, words, sentences and paragraphs produced (in the final and linear text, and per minute), the length of pauses and revision bursts (Hayes & Chenoweth, 2006). “Pause analysis” includes an account of the pausing processes and behavior, with detailed data about the number of pauses, the mean pause length, and the location of pauses such as within words or between sentences (Van Waes & Leijten, 2015). Finally, “revision analysis” presents a summary of the type of revisions incorporated,

225

226

Aitor Garcés, Raquel Criado & Rosa M. Manchón

including information such as the number of words deleted and inserted, or the duration of each revision action. Revision analysis also includes S-Notation (Kollberg & Eklundh, 2002), which offers a linear display of the text indicating the breaks in the text and their number, as well as the order in which they occurred. Breaks are pauses or interruptions of normal text production to insert or delete a fragment from the text produced so far (Leijten & Van Waes, 2019, pp. 43 and 102). An overview of S-Notation can be seen in Figure 1 below. The breaks are represented with the symbol “|” and the number corresponds to the order in which they have occurred. [{ther}3]4|4{There·are·a·dog·and·a·sci[n]6|7{ent[[f·]8|9|{·.}]10|11{.·}11|12}7|8}5|6A·{s}12|13cien[ti f·]1|1t[if]13|14{·}14]15|15·drink·a·me[id]2|2dicine{.·[[[[[[[t]30|31{·}31]32|32|33{the·}33]34|34|35{th e·}35]36|36|37{t}37]38|38|39{t}39]40|40|41{t}41]42|42|43{The·[dog·]44|45{scient·is·wor[s[e]46|47{.}47]48|48| 50 53 51 49 45 43 29 52 16 49{se[,] |51{.·{W} } [|52} |50} |46} |44} |30·,·|3·] |53when·d[r] o|16ink·a·medicine·the·scie 17 18 19 n[t·covert] ·|17·covert[·a·dog·because·the·medicine·is] ·|18i[na] ·|19·a[·]20d|20ca[t]21·|21·[·t h]22e|22The·dog[·and·cat]23·|23is·bad·and·the·cat·is·wo[r[re]24d|24y]25·|25ie[d]26·|26.·After[r]27·| 28 27,·the·dog·and·the·cat·pla[y] ·|28··|29·

Figure 1. Extract from S-Notation (Linear Analysis) from Inputlog

The Inputlog analyses alluded to in the previous paragraphs are central to processing the data obtained on the writing process. Thus, our study looked into a key dimension of writing processes (pausing behavior) aided by the use of a methodological procedure (Inputlog) with attested affordances for the analysis of the phenomenon in focus.

Aims and design overview Previous research has primarily explored pausing behavior in adults’ L2 writing in a digital environment in one-time writing events (e.g., Barkahoui, 2019; Révész et al, 2019; Xu & Ding, 2014; Xu & Qi, 2017), that is, while writers composed their texts. Yet, as noted in Chapter 1 (this volume), writing processes include cognitive activity while writing, while processing the feedback provided on one’s own writing, and while revising a previously produced text on the basis of the feedback received. To the best of our knowledge, no studies have examined the dynamics of pausing behavior when learners are provided with written corrective feedback (WCF). We attempted to fill these research gaps by analyzing L2 children (aged 10–11) writers’ pausing behavior before and after receiving WCF on their texts. Our study hence intended to generate new empirical evidence on pausing behavior with a new population (young L2 writers) and in relation to a new dimension

Chapter 11. Affordances and limitations when using Inputlog

of the writing process (comparing pausing behavior while writing and revising a text after receiving feedback). To do so, we designed and carried out a study in which L2 English children writers were asked to write a story based on a sixpicture-sequence prompt on the computer, they were then provided with WCF and invited to rewrite their original texts (with no access to any resources) after engaging with the WCF they had been provided with. The WCF strategy selected was that of model texts, whose use and effects on children’s writing had previously been studied in pen-and-paper (e.g., Coyle et al., 2018; Coyle & Roca de Larios, 2014; García-Mayo & Labandibar, 2017; Lázaro-Ibarrola, 2021; Martínez-Esteban & Roca de Larios, 2010) but not in computer-mediated writing. To achieve our aims, we designed a classroom-based study in which data were collected from 18 participants who belonged to an intact Primary classroom selected through convenience sampling. These participants were randomly assigned to either a WCF group or a non-WCF group. Having these 2 groups allowed us to inspect whether or not the provision of WCF had an effect on the pausing behavior in the revision stage. The research design involved a three-stage data collection procedure, as shown in Figure 2.

Figure 2. Research design



Stage 1. Participants wrote a story based on a six-picture-sequence prompt on the computer using Inputlog (Leijten et al., 2012; Van Waes et al., 2011) in an MS Word-based environment. The participants were provided with the task

227

228

Aitor Garcés, Raquel Criado & Rosa M. Manchón





prompt (printed DIN A4) together with the task instructions (in Spanish to avoid any potential misunderstandings at the time of writing), and were given 30 minutes to complete their writing. Stage 2. This stage took place twenty-four hours after Stage 1. It corresponded to the comparison stage (cf. Cánovas, 2018; Coyle et al., 2018; García Mayo & Labandibar, 2017). In this stage, children in the WCF group were led to the computer room in which they found their initial texts open on the computer screen. They were subsequently provided with a printed DIN A4 that contained the model text as the corrective feedback, and a prompt to guide them in their process of identifying inaccuracies and inadequacies in their original texts, noticing differences between their texts and the model text, and providing explanations for such differences. The participants in the non-WCF group also had access to their initial texts on the computer but were not provided with any type of feedback. However, they were asked to reflect on their original texts with a prompt for self-editing, which, in line with the instructions provided to the WCF group, encouraged them to identify inaccuracies and inadequacies in their original texts, and note down potential explanations and possible corrections. The participants in both groups were allowed to complete this stage in a maximum of one hour. Stage 3. It corresponded to the rewriting stage and involved an identical procedure to that in Stage 1. Hence all participants in both groups wrote a new text based on the same picture-framed story using the same tools, i.e., Inputlog in an MS Word environment.

In the next sections, we explain how pausing behavior was operationalized in our study, together with a series of challenges and problems we faced and the solutions we adopted.

Operationalization of pausing behavior We decided to follow Alamargot et al.’s (2007) proposal to examine pausing behavior in terms of pause duration, pause location, and pause frequency. These three main dimensions are supposed to provide indirect evidence of which underlying processes may be occurring (Révész et al., 2019). Thus, there is substantial empirical evidence that pausing at larger textual units (i.e., between sentences and paragraphs) corresponds to higher-order writing processes, such as planning or text organization. In contrast, pauses located at lower textual units (i.e., between and within words) are thought to reflect writers’ engagement in lower-order writing processes, such as revising spelling (Barkaoui, 2019; Révész et al., 2019, etc.).

Chapter 11. Affordances and limitations when using Inputlog

The way we operationalized each of these dimensions of pausing behavior will be detailed in the following sections. For now, let us simply advance that pause duration can be seen as an indicator of the writer’s cognitive effort as a function of the complexity of the process in question. The location of pauses in writing can point to the underlying cognitive processes involved, such as planning, formulation, or revision (Barkaoui, 2019). For instance, pausing at larger textual units, such as clauses, sentences or paragraphs, might suggest a planning process (Révész et al., 2019). Another important decision we had to take for the operationalization of pauses was the specific pause threshold to be set, that is, the minimum length of a pause to be considered for analysis. This is a crucial parameter that has attracted attention in previous work and about which there is no full agreement, which renders the comparability of findings from different studies difficult (see Medimorec & Risko, 2017). Given the absence of previous research on children L2 writers in digital environments, we had to rely on the procedure followed in studies with adult L1 and L2 writers where pause thresholds have been established between 1000 and 2000 ms (Barkaoui, 2019; Strömqvist et al., 2006). According to Van Waes and Leijten (2015), lower cognitive processes are generally located between 1000 and 2000 ms and they reflect motor activities, while pauses above 2000 ms are indicative of more cognitively demanding processes. We opted for selecting a pause threshold of 2000 ms since this is the benchmark used in previous research to capture underlying cognitive processes such as planning or formulation rather than motoric issues (Alamargot et al., 2007; Barkaoui, 2019). Previous research (e.g., Michel et al., 2020) has also relied on 200 ms, a sufficiently low pause threshold to capture low-level processes (Van Waes & Leijten, 2015). However, as pointed out by Michel et al. (2020), a 200-ms pause threshold does not allow for further comparisons with much of the previous research on L2 pausing behavior, especially the one concerning between-word pauses. In our view, a dual perspective should be adopted: applying larger pause thresholds to some measures in terms of pausing behavior, and lower pause thresholds for motoric issues. Inputlog allowed us to comply with the two approaches advocated by Chenu et al. (2014) to study pausing behavior: temporal and linguistic. The former is based on the definition of a threshold and the observation of pausing locations, while the latter examines thresholds by examining pause duration within pause locations, as defined by text boundaries/structural units (e.g., within and between words, between sentences, and between paragraphs).

229

230

Aitor Garcés, Raquel Criado & Rosa M. Manchón

Challenges when analyzing Inputlog data Handling and analyzing Inputlog data involve several important concerns and corresponding methodological decisions. For instance, as noted by Leijten and Van Waes (2013, p. 7), while Inputlog “allows for very detailed analyses,” the side effect is that “the huge amount of data [collected] is sometimes hard to interpret" (p. 7). This is one of the reasons why being selective regarding the data to be exploited in research terms is one of the main concerns for the researcher making use of Inputlog (see also Chapter 2, this volume). Further challenges when collecting data about children’s pausing behavior with Inputlog include the relevance of controlling for children’s keyboarding skills, as previous research (Hayes, 2006; Kellogg et al., 2013; Torrance & Galbraith, 2006) has pointed out that the degree of automatization of these skills exerts a strong influence on writing processes (cf. Barkaoui, 2016, for revision processes). Alves et al. (2007), in their study on the influence of keyboarding skill on pause – execution cycles in adult L2 writing, found that those participants with low keyboarding skills focused their attention on the motor aspects of the typing event at the expense of higher-order processes such as planning or revision. In the case of children, to the best of our knowledge, the influence of transcription skills on writing processes has been studied solely from the perspective of handwriting (Olive et al., 2009; Alves & Limpo, 2015) and the results are similar to those of adult participants: Higher automaticity of handwriting skill resulted in longer bursts, shorter pauses, and augmented fluency. However, there seems to be a general agreement on the fact that children writers – be it in their L1 or L2– are very unlikely to have automatized transcription skills, either handwriting or typing (Kellogg et al., 2013). Thus, children’s lack of automaticity in terms of motor skills may result in more frequent pausing as their typing rate is considerably lower than that of adults in L2 writing. To address this important concern, we decided to not include a separate typing skill test prior to the implementation of our study but instead rely on interkeystroke interval or IKI (Conijn et al., 2019; Vandermeulen et al., 2020). IKI is a measure that involves interkey-transition times or pauses (Van Waes & Leijten, 2015). It measures the transition times between keystrokes and, thus, motoric issues may be more clearly identified. However, this measure is not devoid of methodological difficulties when applied to children. For example, if the transitions are captured during the completion of a writing task, the presence of extraneous variables – such as cognitive effort or L2 proficiency-related problems – may interfere with the correct interpretation of the results. In other words, it is hard to distinguish between pauses due to cognitive effort and pauses resulting from limited motoric abilities. An additional difficulty in our study was that the

Chapter 11. Affordances and limitations when using Inputlog

children had the picture-based story on paper rather than on the screen, which meant that transition times between keystrokes could also be affected by the temporary distraction of alternating between looking at the prompt on paper and turning to the screen to engage in writing processes such as planning or formulation. Certainly, the absence of studies on digital writing with children makes it challenging to identify the most suitable solution to measure their keyboarding skills. Although we opted for IKI, a possible alternative while children are typing a text in response to a task would be to include a typing test, which should be adapted to the characteristics of this specific population. Bearing in mind that young learners might not be used to writing long texts on the computer, an adaptation could involve copying several words and observing their typing speed, as allowed by the “copy task” function of Inputlog. All in all, whatever test is used, researchers ought to measure children writers’ typing skills not just for descriptive purposes (as we did), but also to ensure that the potential effect of this variable on pauses and the quality of texts is controlled for. The next three sections describe the decisions we took regarding the three dimensions of pausing behavior mentioned above, namely, pause location, pause frequency, and pause duration.

Analyzing pause location Pause location refers to the exact location of the pause within a specific text boundary – for instance, before words or after paragraphs. Pause location was computed in our study by using the output generated from Inputlog. Our coding categories included: (1) pauses within, before, and after words, (2) pauses before and after sentences, (3) pauses before and after paragraphs, and (4) other pauses (e.g., between mouse movements). Research with adult L2 writers has shown that before-word pauses usually correspond to reflective operations, such as word choice or lexical retrieval, as well as planning processes (Conjin et al., 2019; Van Waes & Leijten, 2015). In contrast, a long pause after a word may involve a dual objective: reading/revising the text written up to that moment as well as pondering over the next lexical item to be included in the text. We obtained these data directly from the Pause Logging File module in Inputlog. As can be observed in Figure 3, Inputlog gives us the number of pauses (see the section on Pause Frequency below), and the mean of pauses in seconds.

231

232

Aitor Garcés, Raquel Criado & Rosa M. Manchón

Figure 3. Screenshot of the data for within-word pauses in Pause Logging File

In what follows, we exemplify the coding categories for the location of pauses: within words, before and after words, before and after sentences, and before and after paragraphs. All the examples include the following two elements: (1) {numbers}, which indicates pause duration; (2) [cursor movement/key press or release], which indicates the specific movement or key used; for instance, BACK refers to deletion, and CAPS LOCK to pressing on the capital letter key (Leijten & Van Waes, 2019). Example 1. Within-word pauses (in bold) "{7945}.·[CAPS LOCK]T[CAPS LOCK]he·dog·s{2008}aw"

Example 2. Before- and after-word pauses (in bold) "{3968}the·pocion·{4168}and·{10385}he ·ch[BACK][BACK]{2152}explosin{6857}"

Example 3. Before- and after- sentences pauses (in bold) "bea[BACK]cause·the·sciente[BACK]{4120}ist·transform·to·{5008}a·cat{39797} | [Movemen t][LEFT Click][RETURN][Movement]{136183}[CAPS LOCK]T[CAPS LOCK]he·scient{3192}ist·{618 4}start·to·have·{2184}"

Example 4. Before- and after- paragraphs pauses (in bold) "{2216}tact ·{12080}[BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BAC K][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK].[RETURN]{43272}[CAPS LOCK]T[CAPS LO CK]he·dog·[Movement]{2153}atact·to·{2487}s·c[BACK][BACK]cientist·"

Chapter 11. Affordances and limitations when using Inputlog

Despite the apparent clarity that these examples may offer regarding pause location, numerous inexactitudes may lead the analysis of the Inputlog data to wrong or inaccurate coding. Let us illustrate this issue with an example from the “Revision-based Linear Analysis”, which allows us to observe the process recorded in real time along with the keypresses, the location, and the duration of the different pauses: Example 5. Revision-based Linear Analysis ,·{4993}a·crazy·{3960}sc{2779}ientist·{10089}is·in·{6497}her·laboratary·whi[Movement]t [BACK][BACK][BACK] ith·{2993}her·dog·{3816}doing·experiments· [Movement][BACK] .{223 2}[RETURN]{10481} one·dia·{3249}

Looking at the highlighted part of the output of the Linear Analysis, pause {2232} is classified by Inputlog as BEFORE PARAGRAPHS since it is before the [RETURN] key press, which is equated with [ENTER] in some keyboards. However, the following pause {10481} is classified as BEFORE WORDS since it is already in the next paragraph. Nevertheless, from the perspective of the underlying cognitive process that they may represent, both could fall into the category of BEFORE PARAGRAPHS inasmuch as the objective of the BEFORE-WORD pause could be three-fold: (1) these pauses reflect processes prior to transcription such as global planning or formulation, but also (2) looking back at the task prompt – which was printed, or (3) looking backwards in the already written text (revising). This is one of the reasons why the task of interpreting a potential writing process behind each pause proves challenging, as noted above. Indeed, our decision to provide the writing prompt on paper had some ecological validity, since the children in our study were used to completing writing tasks presented to them on printed paper. However, this procedure may also have acted as a confounding variable, as the children may have extended their pausing time before words precisely by looking down on the prompt for writing. In order to solve this problem, the writing prompt might be provided on the screen in future studies and controlled through the so-called source analysis that Inputlog offers. Accordingly, it would be possible to interpret some pauses as being directly linked to changing the window on the screen, and the researcher would gain deeper insights into how much time children devote to looking at the prompt and, potentially, to planning processes. Following on from Example 5, it becomes evident that the advantage of the unobtrusive nature of keystroke logging software and its corresponding ecological validity stands as insufficient for the full interpretation of the writing event. Thus, in line with the arguments used in Chapter 8 (this volume), we advocate

233

234

Aitor Garcés, Raquel Criado & Rosa M. Manchón

data triangulation with verbal reports to allow researchers to identify the writing processes related to cognitive operations and their association with a specific writing action (e.g., a long or short pause at a specific text boundary). Past research into L2 adult writing has mainly relied on two different methodological approaches regarding verbal reports: (1) retrospective interviews on Inputlog data, where participants are enquired about specific events during writing (Révész et al., 2017); and (2) stimulated recalls, which include a full replay of the writing process (Révész et al., 2019). In our study, verbal reports in the form of stimulated recall were used in the piloting stages. However, the data obtained were not sufficient to align the different pausing data with cognitive operations throughout the writing process, perhaps because children’s (meta)cognitive abilities are not as developed as they are in adults and, for that reason, they tend to show a lack of focused and conscious attention to the writing process itself. As a result, the children in our study were not able to provide sufficiently accurate explanations as to what they were thinking (Gass & Mackey, 2017), even if asked to do so in their own L1. Given the questionable usability of our results with the stimulated recalls in the piloting stage, we opted for not using them after the rewriting stage. Hence, it remains challenging to understand what these pauses may represent and what reasons might have led the children in our study to pause at certain points throughout the composition. Also, it is unfortunate that we could not replicate the procedure used in previous L2 adult writing studies where the data from keystroke logging software has been combined with verbal reports and eye tracking (e.g., Gánem-Gutiérrez & Gilmore, 2018; Michel et al., 2020; Révész et al., 2017; Révész et al., 2019). Previous attempts in our own research group to use eye tracking with children had not been particularly successful due to their lack of comfort with the technique, and thus this line of research was eventually discarded. In our study, we relied solely on the automatically obtained data from Inputlog, a decision which, in turn, gave rise to a number of crucial concerns to be solved in future research with L2 children writers. In this respect, Galbraith et al. (2021) have suggested that, in L2 adult writing, Inputlog data should be processed manually and, when necessary, recoded, given the number of issues that may arise in the classification of pauses. This may be a sound solution, especially with children writing, where pause location may not be successfully categorized, as will be exemplified below. Indeed, our recommendation for future studies would be to include manual data checking as part of the process. Thus, discarding pauses at certain paragraph levels or re-coding pauses at word boundaries as potential sentence pauses should be a key concern in analyzing children’s Inputlog data. Inputlog tends to classify this information on the basis of an algorithm that recognizes the location of the pause and the immediate production thereafter (Leijten

Chapter 11. Affordances and limitations when using Inputlog

& Van Waes, 2013). As suggested by a reviewer, another possibility to obtain more accurate data on children’s behaviors – and thus, indirectly, on their writing processes – could be to screen record what they are actually doing. This can be achieved through the Replay function of Inputlog. Besides their lack of automatization of keyboarding skills (Kellogg et al., 2013), which may influence the attention paid to certain elements of writing, L2 children writers also tend to omit some punctuation marks or err in their use. Technically, this means that Inputlog may fail in characterizing whether a certain interruption may be due to a before-word or a before-sentence pause. To exemplify this issue, let us focus on the surrounding textual context of pauses in Example 6 below. Example 6. Extract from a linear view of the writing process ·and·{52040}an [BACK][BACK]{3240} the·{14928} [BACK][BACK][BACK][BACK] {2272}[BACK][BA CK][BACK][BACK][BACK] ,·after·{10592}the·{7704}sc{48087}ienti{3648}st·{8936}is·{1388 0}very·very·b

As can be observed, pause {2272} appears before a series of keyboard presses labelled as [BACK], which are associated with revision processes. Nevertheless, despite this obvious option, this pause could have also been considered a BEFORE SENTENCE pause. One of the main reasons why Inputlog categorized this pause as a [BACK] behavior alone lies in the erroneous use of the comma before “after.” Had the follow-up written text started with a new sentence, Inputlog would have classified it in the before-sentence pauses category. Thus, future research that examines children’s L2 writing texts using Inputlog – but also L2 adult writing – should reassess the value of such BEFORE WORD pauses, and potentially consider them as BEFORE SENTENCE pauses. Once again, drawing on stimulated recall or using screen recordings (see, for example, Gánem-Gutierrez & Gilmore, 2018) could contribute to a more accurate interpretation. Location ratio was another measure included in our study, following previous research with adult L2 digital writing (see Barkaoui, 2019; Chenu et al., 2014), as stated in the “Operationalization of pausing behavior” section. We calculated this measure manually. To do so, the total number of pauses at a specific location (within and between words, between sentences and between paragraphs) was divided by the total number of pauses, and a ratio (i.e., the percentage) was obtained. The usefulness of this measure is rooted in the standardization of the number of pauses within a specific location rather than simply relying on raw numbers.

235

236

Aitor Garcés, Raquel Criado & Rosa M. Manchón

Analyzing pause frequency The operationalization of pause frequency included (a) the total number of pauses in the process; (b) the number of pauses per interval; (c) inter-keystroke interval (IKI), which was previously discussed; and (d) pauses per minute. Analyzing pause frequency did not pose a challenge since data were automatically obtained from Inputlog. In the case of the total number of pauses in the process, the data were automatically obtained from Inputlog, as shown in Figure 4. This measure provides a global overview of the children’s pausing behavior regardless of pause location. The inclusion of pause frequency provides valuable information about the writer’s recurrence of pausing. Pause count reveals the level of cognitive or motoric effort engaged in by the writer since pauses constitute interruptions of the writing process (Barkaoui, 2019). We also opted for computing pause frequency across different intervals of the global writing process – initial, middle, and final stages – in the specific textual locations that we had defined (within and between words, between sentences and between paragraphs). Importantly, previous research with adult L2 writers (Barkaoui, 2019; Xu, 2014; Xu & Qi, 2017) has found that proficiency level affects pause frequency and duration at different stages of writing depending on the writing process being engaged in. More skilled writers seem to pause more frequently but for less time at the beginning of the writing process, while the reversed pattern has been found in the middle stage, suggesting that planning is optimally triggered in the initial stage to guide later translation and transcribing. There is no consensus in the literature as to the number of writing intervals to be set. Thus, Tillema et al. (2011) used five with secondary school learners, and Michel et al. (2020) and Xu and Qi (2014) did the same with adult learners. In contrast, Van Waes and Leijten (2015) employed 10 intervals with adult writers, while Barkoui (2019) opted for three with the same age group. We decided to select three pause intervals too for the sake of easiness of data management. To obtain pause frequency per interval, the output data provided in the Pause Logging File (Pause Analysis) from Inputlog was examined. As can be observed in Figure 4, Inputlog provides a division between intervals with data about the pause count, the mean of pause time, and the standard deviation, among many other statistical descriptive values. Nevertheless, the only levels of measurement we took into consideration were the number of pauses and the arithmetic mean of pauses (s). We did so because both measures of pause frequency allowed us to observe potential variations within the continuum of the writing process throughout three standardized intervals.

Chapter 11. Affordances and limitations when using Inputlog

Figure 4. Summary per interval from Pause Logging File

Finally, pauses per minute was another measure included in the operationalization of pause frequency, which we calculated by dividing the total number of

237

238

Aitor Garcés, Raquel Criado & Rosa M. Manchón

pauses by the length of the writing session in minutes. This measure has been used in pausing research with adult L2 writers (e.g., Barkaoui, 2019) and its inclusion offered us an additional perspective to pause frequency in relation to fluency (Van Waes & Leijten, 2015). Pauses represent interruptions in the writing process due to very varied reasons; thus, a large number of pauses per minute indicates that fluency was reduced possibly as a result of a high degree of cognitive effort triggered by different writing processes. Although the measurement of the four dimensions of pause frequency did not pose any methodological problems, the interpretation of pause frequency data in terms of writing processes was an important issue. This was so since inferences had to be made from the mere existence of pause frequency at different writing intervals and text boundaries. As stated in the section “Analying pause location,” triangulating data from the keystroke logging software and retrospective reports may help researchers identify the writing processes related to cognitive operations and their association with a specific writing event (e.g., a long or short pause at a specific text boundary). However, the use of retrospective verbal reports such as stimulated recall, especially in studies in which feedback correction is central, may lead to more noticing and language awareness and, as a result, more revision (Lindgren & Sullivan, 2003). This, in some sense, may have a counter-effect insomuch as the stimulated recall may interfere with the effect of the feedback stage if the researcher’s intention is solely to gauge the isolated effects of feedback intervention on the learners’ uptake (without any additional instruments). A methodological solution to this issue, which can help researchers identify the precise counter-effect of stimulated recall, consists in the inclusion of four groups in the research design: two with and without stimulated recalls (both feedback and non-feedback groups in the two conditions). Figures 5 and 6 include Inputlog data that can provide useful information for the preparation of the prompts used in stimulated recalls in order to tap into the different reasons behind these pauses and the potential connection with writing processes. In Figure 5, a revision-based linear analysis provides a structured view of how pauses are distributed throughout the process. As can be observed, the preparation of the type of questions that may be asked in the stimulated recall session is facilitated by the visual presentation of such pauses, the keypresses used, such as [CAPS LOCK], which indicates the use of capital letters, and the [Movement] action, which points to mouse movement. Researchers are expected to review all these data before Stage 2, which methodologically implies an arduous task on their part. Nevertheless, children’s texts and pausing behavior are much more reduced in comparison with teenagers’ or adults’ behaviors. This would allow for the revision of these data despite the short time between Stage 1 and Stage 2.

Chapter 11. Affordances and limitations when using Inputlog

Figure 5. Revision-based analysis output: Information to prepare the prompts for stimulated recall

Similarly, the position of revisions, as in Figure 6 below, may help understand the interpretation behind these pauses. If the child is asked about a specific pausing behavior, the visual prompts could aid in deciphering the writing process engaged in. Let us imagine that, during a stimulated recall session, a child is asked about why s/he specifically paused for nearly 10 seconds before writing the word “a possion” (sic) in line 5. Although, as noted above, children are not fully cognizant of the type of writing process they engage in (e.g., planning, lexical retrieval, etc.), they may nevertheless provide an explanatory answer that serves the researcher as a starting point to classify this specific pause. [[hdj]1|1A·boy[·]2|2·]3|3The·boy·is·do·a·possion[·]4|4.·T[HE·]5|5he·boy·is·very·happy·[but·the n·]6|6 and·his·dog·is·slepping[[·but·then·]7|7·]8|8.[ks]9|9·T[HEW]10|10hen·the·boy·is·bad·for·th e·possion[[·]11|11,]18|19{·because·h[i]29|30{e·drink·the·po{ssion[·]33|34}32|33}30}28|29{·}20|2 1{and}19|20·[uhuw]12|12then·the{·b[iy]14|15{oy[·]16|17{·and·[y]22|23{the·dog·}23|24}21|22}15|16}1 3|14·[is]24|25 {are·}25|26·in·alucinatio[ns·|13]17ns[·|18]26·for·the·po[[ssio[·]27|27n·[|28ys8w]3 1|31|32]34t[on]34|35ion]36|36ssion{·an[d·]39|40{d{·}42}40[|41}38|39·]41|42before[·]37|37·|38th e·[boy·]43|43s·body·is·for·a·cat[v]44|44·abd{·a[bd]46|47|{nd[·]48|49}47|48}45|46·the[·is·|45]49·d og·look·the·boy·an[·]50|50d·the·dog·is·fight·w[n]51|51ith·the·boy··

Figure 6. S-Notation output: Information to prepare the prompts for stimulated recall

239

240

Aitor Garcés, Raquel Criado & Rosa M. Manchón

Analyzing pause duration Pause duration, obtained directly from the Inputlog output data, was measured as the mean duration of each pause in seconds. In both Figures 5 and 6 in the previous section, mean pause duration (in seconds) was part of the measures included in the Pause Logging File. Thus, mean duration was considered (1) for each pause location (e.g., mean duration of pauses within words, between words, between sentences and between paragraphs); and (2) at a certain interval (e.g., mean duration of pauses at interval 2). In this respect, the importance of the pause threshold previously set should be underscored, as mean duration is largely dependent on this decision. As stated above in “Operationalization of pausing behavior,” the pause threshold in our study was set at 2000 ms. The mean duration of a pause may reflect the amount of cognitive effort deployed by the writer. As with adult L2 writers, pause duration with children writers may point to differential operations during writing. Nevertheless, the direct association of the duration of a pause with a specific process is problematic when its interpretation is solely based upon inferences from this measure. In contrast, the duration of the pause along with its location at a certain text boundary may be a more reliable indicator of cognitive processes. As reported in research with adults (e.g., Medimorec & Risko, 2017; Révész et al., 2019), long pauses at higher-level units, such as sentences and paragraphs, may indicate either global planning processes, organization or reading back (revision), thus pointing to cooccurring processes, while short pauses at lower textual units (for example, within and between words) tend to be associated with lower-level processes, such as lexical retrieval (Schilperoord, 1996). However, children’s behavior while writing may differ from that of adults (Chenu et al., 2014). For instance, in the case of children, long pauses before certain textual units (e.g., words) may not be due to high-level cognitive operations but rather to spelling issues or problems derived from their purported lack of automatization of keyboarding skills. Another dimension related to the mean duration of pauses concerns the pause threshold, a debate that has already been referred to in previous sections. To distinguish between pauses potentially reflecting high-level processes and those underlying low-level processes, different pause thresholds could be applied (Medimorec & Risko, 2017). Setting the pause threshold at 250 ms, and then at 900 ms, may provide rich data about the nature of pauses potentially reflecting low-level writing processes (see Van Weas & Leijten, 2015). Obviously, as a result of using such thresholds instead of 2000 ms, the number of pauses increases at all textual locations, which may indicate potential lexical retrieval/morphological encoding and motoric issues (the latter being the briefest pauses; Schilperoord, 2002). The association of pauses within the aforementioned thresholds with other

Chapter 11. Affordances and limitations when using Inputlog

writing processes, such as planning or formulation, is disregarded since these processes are thought to occur from 2000 ms onwards (Barkaoui, 2019; Medimorec & Risko, 2017; Révész et al., 2019; Schilperoord, 2002, etc.). Apart from the operationalization of pause duration as mean duration, other global process time measures were also included in our study, namely, total process time (the total length of the writing event), total active writing time (total time spent on writing actively), and total pausing time (amount of pausing that the children resorted to in the full writing event). All of these measures were automatically obtained from Inputlog Pause Logging File, thus no challenges were encountered throughout the process of their coding or interpretation.

Figure 7. General Information section in Pause Logging File

As can be seen in Figure 7, these global process time measures are provided both in minutes and seconds, and a proportion of pause time is offered too. In essence, these measures were included owing to the rich overall information they can provide about the whole writing process, especially concerning fluency. For instance, the traditional measure of fluency, words per minute, is obtained by dividing the number of words produced in the final text by the total number of minutes spent on the writing task (active writing time/transcription time [in minutes] + pausing time in minutes). The reader is referred to Criado et al. (2022) for more information about different ways of operationalizing the construct of fluency, both from product- (traditional) and process-oriented angles. In both perspectives, pauses play an important role in the measurement of fluency.

241

242

Aitor Garcés, Raquel Criado & Rosa M. Manchón

Conclusion In this chapter, we have reported some methodological considerations on the use of Inputlog to study children’s pausing behavior while engaged in digital L2 writing. Based on our experience, we would like to put forward a number of recommendations for future research. Firstly, we recommend other authors to measure children writers’ typing speed not just for descriptive purposes, as we did, but also to control for its potential effects on their pausing behavior. Secondly, besides analytical measures, another crucial methodological decision for the interpretation of pausing behavior concerns the pause threshold, which, combined with pause location, might help researchers infer the underlying physical or cognitive process behind each pause. We set our pause threshold at 2000 ms as done in research with adult L2 writers (e.g. Barkaoui, 2019; GánemGutiérrez & Gilmore, 2018). However, interpreting children’s higher-level and lower-level processes is not an easy task. For future research, we recommend establishing multiple pause thresholds, a procedure previously implemented by Medimorec and Risko (2017) with adult writers. Due to L2 children writers’ lack of automatization of keyboarding skills and limited L2 proficiency, we suggest setting a pause threshold ranging from 250 ms to 1500 ms, and a second one for pauses above 1600 ms. Both would be enough to uncover L2 children writers’ lowlevel processes and motoric issues, on the one hand, and higher-level processes, on the other. Thirdly, the data on pause location, frequency, and duration, as well as the pause threshold together with the unobtrusive nature of keystroke-logging software, allowed us to obtain useful information about where, when, and why children paused. Nevertheless, as noted above, some of these data require manual coding to avoid technical inexactitudes that, in some cases, may lead to confusion in their interpretation. As has been mentioned throughout this chapter, children’s writing tends to be more erratic than that of adults. Especially in the case of punctuation (be it in the L1 or in the L2), such mistakes bear crucial importance for the correct analysis of pauses, as they may lead Inputlog to mix, for instance, beforesentence with before-word pauses. Hence, we recommend reassessing the value of these pauses and their text boundaries in a manual manner. As can be seen from our second and third recommendations, we concur with Chenu et al.’s (2014) advice to study pausing behavior both from linguistic perspectives (by examining pause duration in certain text boundaries, such as within and between words, between sentences, and between paragraphs) and temporal ones (by means of setting pause thresholds and observing where pauses at or above such thresholds occur).

Chapter 11. Affordances and limitations when using Inputlog

Fourthly, and equally relevant, the robustness of findings may be enhanced when combining Inputlog data with data elicited via other techniques used with adult L2 writers, such as stimulated recall. Given the absence of previous studies on L2 children’s digital writing, research would benefit from carefully piloting and fine-tuning the procedure of stimulated recalls to optimize their research potential on account of the characteristics of this age group. Triangulation of methodological procedures may assist researchers in determining the likely writing processes of which pauses may be indicative. Finally, future research avenues should explore the existing connection between pausing behavior and overall writing quality, as measured, for example, by scores on writing tests. Previous research has attempted to shed light on whether pausing behavior may be linked to text quality (e.g., Révész, Michel et al., 2019; Xu & Ding, 2014; Xu & Qi, 2017). Nevertheless, most of these studies have been conducted with an adult population. Looking into text quality in children’s L2 writing and pausing behavior is an important gap future research should also fill. Despite the notable challenges posed by keystroke logging software in research with children, we hope that the analysis of our research process detailed in this chapter can be useful to other researchers interested in the study of young learners’ writing processes.

Funding This chapter is based on the first author’s PhD dissertation, which was supervised by the second and third authors. The PhD was conducted in the framework of two competitive programs of research (PID2019-104353GB-I00 and 20832/PI/18) financed by Spanish National Research Agency and Fundación Séneca respectively, whose PI is the third author.

References Alamargot, D., Dansac, C., Chesnet, D., & Fayol, M. (2007). Parallel processing before and after pauses: A combined analysis of graphomotor and eye movements during procedural text production. In M. Torrance, L. Van Waes, & D. Galbraith (Eds.), Writing and cognition: Research and applications (pp. 13–29). Elsevier. Alves, R. A., Castro, S. L., de Sousa, L., & Strömqvist, S. (2007). Influence of typing skill on pause-execution cycles in written composition. In M. Torrance, D. Galbraith & L. Van Waes (Eds.), Recent developments in writing-process research (pp. 55–65). Kluwer. Alves, R. A., & Limpo, T. (2015). Progress in written language bursts, pauses, transcription, and written composition across schooling. Scientific Studies of Reading, 19(5), 374–391.

243

244

Aitor Garcés, Raquel Criado & Rosa M. Manchón

Barkaoui, K. (2016). What and when second-language learners revise when responding to timed writing tasks on the computer: The roles of task type, second language proficiency, and keyboarding skills. The Modern Language Journal, 100(1), 320–340. Barkaoui, K. (2019). What can L2 writers’ pausing behavior tell us about their L2 writing processes? Studies in Second Language Acquisition, 41(3), 529–554. Cánovas, J. (2018). The use of written models in the teaching of English in Primary (Doctoral dissertation). University of Murcia. Retrieved on 27 April 2023 from http://hdl.handle.net /10201/55971 Chenoweth, N. A., & Hayes, J. R. (2001). Fluency in writing: Generating text in L1 and L2. Written Communication, 18, 80–98. Chenu, F., Pellegrino, F., Jisa, H., & Fayol, M. (2014). Interword and intraword pause threshold in writing. Frontiers in Psychology, 5, 182. Conijn, R., Roeser, J., & van Zaanen, M. (2019). Understanding the keystroke log: The effect of writing task on keystroke features. Reading and Writing, 32(9), 2353–2374. Coyle, Y., & Roca de Larios, J. (2014). Exploring the role played by error correction and models on children’s reported noticing and output production in an L2 writing task. Studies in Second Language Acquisition, 36(3), 451–485. Coyle, Y., Cánovas Guirao, J., & Roca de Larios, J. (2018). Identifying the trajectories of young EFL learners across multi-stage writing and feedback processing tasks with model texts. Journal of Second Language Writing, 42, 25–43. Criado, R., Garcés-Manzanera, A., & Plonsky, L. (2022). Models as written corrective feedback: Effects on young L2 learners’ fluency in digital writing from product and process perspectives. Studies in Second Language Learning and Teaching, 12(4), 697–719. Flower, L., & Hayes, J. (1980). The dynamics of composing: Making plans and juggling constraints. In L. Gregg & E. Steinberg (Eds.), Cognitive processes in writing (pp. 31–50). Lawrence Erlbaum Associates. Galbraith, D., Baaijen, V., & Hall, S. (May2021). Aligning keystroke logging data with writing processes: Methodological reflections for future research on L2 writing processes [Conference presentation]. L2 Writing Research Seminar: Advancing Research in L2 Writing and WCF Appropriation in Pen and Paper and Digital Environments: Controlled and Classroom-based Studies, Murcia, Spain. https://eventos.um.es/event_detail/56004 /detail/l2wr-seminar.html Gánem-Gutiérrez, G. A., & Gilmore, A. (2018). Tracking the real-time evolution of a writing event: Second language writers at different proficiency levels. Language Learning, 68(2), 469–506. García-Mayo, M. P., & Labandibar, U. L. (2017). The use of models as written corrective feedback in English as a foreign language (EFL) writing. Annual Review of Applied Linguistics, 37, 110–127. Gass, S. M., & Mackey, A. (2017). Stimulated-recall methodology in second language research (2nd ed.). Routledge. Hayes, J. R. (2006). New directions in writing theory. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of writing research (pp. 28–40). The Guilford Press. Hayes, J. R. (2012). Modeling and remodeling writing. Written Communication, 29(3), 369–388. Hayes, J. R., & Chenoweth, N. A. (2006). Is working memory involved in the transcribing and editing of texts? Written Communication, 23(2), 135–149.

Chapter 11. Affordances and limitations when using Inputlog

Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences and applications (pp. 57–71). Lawrence Erlbaum Associates. Kellogg, R. T., Whiteford, A. P., Turner, C. E., Cahill, M., & Mertens, A. (2013). Working memory in written composition: A progress report. Journal of Writing Research, 5(2), 159–190. Kollberg, P., & Eklundh, K. S. (2002). Studying writers’ revising patterns with S-Notation analysis. In T. Olive & C. M. Levy (Eds), Contemporary tools and techniques for studying writing (pp. 89–104). Springer. Lázaro-Ibarrola, A. (2021). Model texts in collaborative and individual writing among EFL children: noticing, incorporations, and draft quality. International Review of Applied Linguistics in Language Teaching. Leijten, M., Macken, L., Hoste, V., Van Horenbeeck, E., & Van Waes, L. (2012, April). From character to word level: Enabling the linguistic analyses of Inputlog process data. In M. Piotrowski, C. Mahlow, & R. Dale (Eds.), Proceedings of the Second Workshop on Computational Linguistics and Writing (CL & W 2012): Linguistic and Cognitive Aspects of Document Creation and Document Engineering (pp. 1–8). Association for Computational Linguistics. Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358–392. Leijten, M. M, & Van Waes, L. (2019). Inputlog. Help documentation (in progress). Selfpublishing. University of Antwerp. Retrieved on 27 April 2023 from https://www.inputlog .net/wp-content/uploads/Inputlog_manual.pdf Lindgren, E., & Sullivan, K. P. H. (2003). Stimulated recall as a trigger for increasing noticing and language awareness in the L2 writing classroom: A case study of two young female writers. Language Awareness, 12(3–4), 172–186. Martínez-Esteban, N., & Roca de Larios, J. (2010). The use of models as a form of written feedback to secondary school pupils of English. International Journal of English Studies, 10(2), 143–170. Medimorec, S., & Risko, E. F. (2017). Pauses in written composition: On the importance of where writers pause. Reading and Writing, 30(6), 1267–1285. Michel, M., Révész, A., Lu, X., Kourtali, N. E., Lee, M., & Borges, L. (2020). Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study. Second Language Research, 36(3), 307–334. Olive, T., Favart, M., Beauvais, C., & Beauvais, L. (2009). Children’s cognitive effort and fluency in writing: Effects of genre and of handwriting automatisation. Learning and Instruction, 19(4), 299–308. Révész, A., Kourtali, N. E., & Mazgutova, D. (2017). Effects of task complexity on L2 writing behaviors and linguistic complexity. Language Learning, 67(1), 208–241. Révész, A., Lu, X., & Pellicer-Sánchez, A. (2021). Directions for future methodologies to capture the processing dimension of L2 writing and written corrective feedback. In R. M. Manchón & C. Polio (Eds.), The Routledge handbook of second language acquisition and writing (pp. 339–355). Routledge. Révész, A., Michel, M., & Lee, M. (2019). Exploring second language writers’ pausing and revision behaviors. Studies in Second Language Acquisition, 41(3), 605–631.

245

246

Aitor Garcés, Raquel Criado & Rosa M. Manchón

Schilperoord, J. (1996). The distribution of pause time in written text production. In G. Rijlaarsdam, H. van den Bergh, & M. Couzijn (Eds.), Current research in writing: Theories, models and methodology (pp. 21–35). Amsterdam University Press. Schilperoord, J. (2002).On the cognitive status of pauses in discourse production. In T. Olive & M. Levy (Eds.), Contemporary tools and techniques for studying writing (pp. 61–90). Kluwer. Strömqvist, S., Holmqvist, K., Johansson, V., Karlsson, H., & Wengelin, Å. (2006). What keystroke logging can reveal about writing. In K. Sullivan & E. Lindgren (Eds.), Computer keystroke logging and writing: Methods and applications (pp. 45–71). Brill. Tillema, M., van den Bergh, H., Rijlaarsdam, G., & Sanders, T. (2011). Relating self reports of writing behavior and online task execution using a temporal model. Metacognition and Learning, 6(3), 229–253. Torrance, M., & Galbraith, D. (2006). The processing demands of writing. In C. A. MacArthur, S. Graam, & J. Fitzgerald (Eds.), Handbook of writing research (pp. 67–80). Guilford Press. Van Waes, L. & Leijten, M. (2015). Fluency in writing: A multidimensional perspective on writing fluency applied to L1 and L2. Computers and Composition: An International Journal, 38, 79–95. Van Waes, L., Leijten, M., Wengelin, A., & Lindgren, E. (2011). Logging tools to study digital writing processes. In V. W. Berninger (Ed.), Past, present, and future contributions of cognitive writing research to cognitive psychology (pp. 507–533). Psychology Press. Vandermeulen, N., Leijten, M., & Van Waes, L. (2020). Reporting writing process feedback in the classroom using keystroke logging data to reflect on writing processes. Journal of Writing Research, 12(1), 109–139. Wengelin, Å. (2006). Examining pauses in writing: theory, methods and empirical data. In K. P. Sullivan & E. Lindgren (Eds), Computer keystroke logging and writing: Methods and applications (pp. 107–130). Elsevier. Xu, C., & Ding, Y. (2014). An exploratory study of pauses in computer-assisted EFL writing. Language Learning & Technology, 18(3), 80–96. https://hdl.handle.net/10125/44385 Xu, C., & Qi, Y. (2017). Analyzing pauses in computer-assisted EFL writing: A computerkeystroke-log perspective. Journal of Educational Technology and Society, 20(4), 24–34. https://www.jstor.org/stable/26229202

chapter 12

Investigating cognitive processes during writing tests Methodological considerations when triangulating data from eye tracking, keystroke logging, and stimulated recalls Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel Universität Innsbruck

The purpose of this chapter is to critically discuss some of the key issues when investigating writing processes for the purposes of foreign language assessment research – a branch of research that currently tends to triangulate synchronous observational data (e.g., eye tracking, keystroke logging) with asynchronous data from stimulated verbal recalls or text analysis. We will discuss a range of methodological considerations that should be taken into account when researching foreign language writing processes in the context of language tests and beyond. We will exemplify and critically discuss key issues related to three of the methods predominantly used in this strand of research: eye tracking, keystroke logging, and stimulated recalls. We will illustrate these issues and decisionmaking processes at the various stages of research by critically reflecting on the lessons learned from two research projects of this kind conducted by the authors.

Introduction The relatively recent interest in writing processes in the field of language testing (LT) is a long overdue response to a paradigm shift in writing research focusing on the processes of writing rather than just the outcomes. Gaining ground from the late 1970s onwards, the process-oriented outlook on writing is now wellestablished in second and foreign language (L2) writing research and pedagogy, and hence complements socio-cultural and product-oriented approaches to the skill (Roca de Larios et al., 2016). Building on this cognitive strand of writing https://doi.org/10.1075/rmal.5.12gug © 2023 John Benjamins Publishing Company

248

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

research, LT studies have recently adapted the models and methods from L2 writing research to investigate the cognitive processes activated during L2 writing tests (Barkaoui, 2015, 2016, 2019; Eberharter et al., 2020; Michel et al., 2020; Révész, Kourtali, & Mazgutova, 2017; Révész, Michel, & Lee, 2017, 2019; Yu et al., 2017). These LT studies follow two major goals: First, they seek to gain insights into test takers’ writing processes and behaviours that are activated by a particular writing test for validation purposes. Second, they provide us with valuable information on how test takers interact with a particular writing test task to inform future task design and assessment practices. Knowledge gained through these studies helps test providers and LT practitioners improve writing components of language tests to create better – i.e., more valid, fair, and reliable – methods and tasks to measure L2 writing skills. In this chapter, we will outline methodological tools from writing research and how we used these tools in two test validation studies. We will first present the unique insights that keystroke logging, eye tracking, and stimulated recalls can provide in test validation studies. We will then discuss the methodological decisions, challenges and solutions surrounding the following issues: i. ii. iii. iv. v.

systematising elicitation procedures for stimulated recalls, defining pause length thresholds, interpreting keystroke-logging data, interpreting stimulated recall data, and comparing writing processes across languages.

Prior to the analysis of our own research, we synthesise key methodological considerations in language testing research concerned with writing processes.

Rationales and aims of process-tracing methods in language testing research The growing interest in research on writing processes in the context of LT is linked to both the increased use of digitised assessments of writing as well as methodological innovations. Studies to date have looked at (a) pausing and revision behaviour (Barkaoui, 2019; Révész et al., 2019); (b) academic writing prompted by graphs (Yu et al., 2017); (c) writing across proficiency levels (Barkaoui, 2015, 2016; Eberharter et al., 2020; Michel et al., 2020); (d) writing processes across foreign languages (Guggenbichler, 2020); and (e) across different writing stages (Barkaoui, 2015; Guggenbichler, 2020; Michel et al., 2020); as well as the effects of (f ) delivery mode (Barkaoui, 2016); (g) keyboard writing skills (Barkaoui, 2015); (h) working memory (Révész, Michel, & Lee, 2017); and (i) task complexity, in terms of content support (Révész, Kourtali, & Mazgutova 2017),

Chapter 12. Investigating cognitive processes during writing tests

integrated vs. independent tasks (Barkaoui, 2015; Michel et al., 2020), and other task type effects (Eberharter et al., 2020) for test validation purposes. The focus of many of these studies is to establish the cognitive validity of writing tests, a concept first introduced by Weir (2005). Within the socio-cognitive framework, Shaw and Weir (2007) define cognitive validity as “a measure of how closely [a writing task in a writing test] represents the cognitive processing involved in writing contexts beyond the test itself, i.e. in performing the task in real life” (p. 34). Put differently, to probe the cognitive validity of a writing test, language testing researchers compare the processes elicited by the writing task to those activated in non-test writing conditions, as predicted by cognitive models of writing, mainly those of Kellogg (1996) and Field (2004). To this end, cognitive writing research and cognitive writing models serve as a reference point in describing the cognitive processes test takers purportedly activate. If a writing test yields cognitive processes that are not predicted by writing models, i.e., are not part of the writing construct, questions can be raised as to whether the test score based on the writer’s response, and the interpretation thereof, can be considered meaningful and generalizable, i.e., valid, beyond the test situation. Researchers have employed a range of methods to investigate cognitive processes during writing tests. Most LT studies have in common that they use a mixed-methods approach to track writing processes, combining at least two of the following methods: verbal recalls, keystroke logging, and/or eye tracking. The individual methods provide us with distinct insights into writing tests and test takers’ cognitive processes. Verbal reports are amongst the most common elicitation techniques in process-tracing research (see Chapter 5 in this volume for a critical discussion) and are the most prevalent technique in LT research on writing tests. Verbal reports provide additional and more profound insights into the writing processes and goals behind an action than textual products or direct observation of writing behaviour would on their own (Galbraith & Vedder, 2019). Thus, they provide a “valid window into processes and processing” (López-Serrano et al., 2019. See also Roca de Larios et al., 2001, 2006) and aid LT researchers to compare processes that are triggered by a writing task under test conditions with those predicted by theory models of writing. LT research has also used keystroke logging (see also Chapter 8, this volume) via software such as Inputlog 8.0 (Leijten & Van Waes, 2013) to investigate the effects of various task-related factors (e.g., genre, requested register) on test-taker performance and cognitive validity. In LT research, keystroke logging has been used to shed light on pausing and revision behaviours as indicators of L2 writing processes (Barkaoui, 2019), in relation to text quality (Eberharter et al., 2020; Révész, Michel, & Lee, 2017), proficiency levels (Eberharter et al., 2020; Révész

249

250

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

et al., 2019), and task type (Eberharter et al., 2020; Michel et al., 2020). As tracing writing fluency and its breakdowns allows researchers to make inferences about the cognitive load imposed on writers (Leijten & Van Waes, 2013; Révész, Michel, & Lee, 2017), keystroke-logging data can also be used to identify events of interest and challenging aspects of language production in writing tests (Révész et al., 2019; Spelman Miller et al., 2008; Xu & Xia, 2019), which may help to prompt verbal reports (Guggenbichler, 2020). This particular use of keystroke logging will be explored later in this chapter. Eye-tracking data is distinct in that it reveals how individuals interact with visual inputs and their own writing (see Chapter 9, this volume). Frequently triangulated with keystroke-logging data, eye tracking provides LT researchers with a fuller picture of which part of the writing task or the previously produced text the participant visited and if so, when, how long and how often, to indicate processing demands (see Eberharter et al., 2020). As the point of pausing might not always be identical with the focus of attention (Wengelin et al., 2009), eye tracking detects writing behaviours that would remain uncovered or possibly falsely interpreted when looking at keystroke-logging data alone. In addition to serving as direct data, eye-tracking recordings are frequently used to prompt stimulated recalls (Révész, Michel, & Lee, 2017. See also Conklin et al., 2018) to improve their accuracy (Brunfaut & McCray, 2015). Each of these methods can be used individually or in combination to trace writing processes in studies related to language tests. Researchers frequently opt to combine them, so as to outbalance some of the shortcomings of each individual method. Although mixed-methods approaches may be considered to be “still very much in [their] infancy” (Galbraith & Vedder, 2019, p. 642), they have become a standard paradigm in the field of language testing.

Overview of research programme We will now briefly outline two studies we recently completed (Eberharter et al., 2020; Guggenbichler, 2020) to illustrate the benefits and challenges of triangulating data from stimulated recalls, keystroke logging, and eye tracking for LT research purposes. Eberharter et al. (2020) describes a funded test validation study which investigated the cognitive validity of the writing component of a digital multi-level English exam. The Linguaskill test suite was developed by Cambridge Assessment to cater for a broad range of proficiency levels and areas of language use. We deliberately recruited a heterogenuous sample of learners (N = 30) from different age groups (M = 29 years old), educational and professional backgrounds, and language proficiency levels (A2 to C1 according to the Common European Frame-

Chapter 12. Investigating cognitive processes during writing tests

work of Reference for Languages (CEFR) (Council of Europe, 2001, 2020). Participants were recorded with a Tobii TX-300 eye tracker (Tobii Technology AB, 2012) and Inputlog 8.0 (Leijten & Van Waes, 2013) while they completed a full Linguaskill writing test consisting of two tasks. Sixteen participants provided stimulated recall interviews prompted by eye-tracking recordings which were then transcribed and coded for writing processes following a coding scheme based on a combination of Kellog’s (1996) and Field’s (2004) models. The writing processes reported by participants could then be compared to expectations based on writing theory. In addition, we analysed the text quality of the written products, which shed further light on the complex relationship between cognitive processes, writing behaviour, task effects, and proficiency levels. The focus of the second study (Guggenbichler, 2020) was on investigating writing processes in language tests across two foreign languages. For this purpose, a standardised CEFR B2-level task taken from the Austrian national schoolleaving exam was translated and administered to learners of English and learners of French respectively at CEFR B2 (Council of Europe, 2001, 2020). The sample (N = 12) mainly included recent high-school graduates and university students in language classes aged 19 to 26 (M = 22.5 years; English n = 6; French n = 6), who were at the relevant language proficiency level and had multilingual profiles. All writing sessions were recorded with eye-tracking and keystroke-logging tools, and participants provided stimulated recalls. Following Révész, Michel, & Lee (2017), the stimulated recalls were coded and analysed for pause type (revision pause vs. formulation pause) as well as writing stages (Barkaoui, 2019; Michel et al., 2020) and writing processes (based on Kellogg, 1996. See also Révész, Michel, & Lee, 2017). The writing processes reported by each language group were compared (a) to theoretical models of writing, and (b) across languages to detect language-specific differences and commonalities in different stages of the writing session. Thus, the study tapped into writing processes across foreign languages and task equivalency, constituting one of the first studies to examine writing processes triggered by test tasks from a multilingual perspective. Based on the experiences from these two projects, we will now turn to a critical reflection on the opportunities, challenges, and limitations of these processtracing methodologies in the context of L2 writing tests by discussing illustrative examples of decisions, challenges and potential solutions in triangulating writing process data.

251

252

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

Methodological decisions taken, challenges experienced, and solutions adopted Systematising elicitation procedures for stimulated recalls One of the essential considerations we faced in both studies was developing the procedural guidelines for the stimulated recalls. While there is ample general advice on how to conduct stimulated recalls (see e.g., Bowles, 2018), there appears to be no consensus, for instance, on when to stop the eye-tracking or screencapturing playback and ask participants to report their thoughts in LT studies. In theory, it is possible to (1) let participants self-initiate their reports of the thought processes; (2) stop the recording when pre-defined events occur; or (3) implement a mixture of (1) and (2). Unlike Barkaoui (2015), who mainly relied on selfinitiated verbal recalls, many other studies have used a combined approach of self-initiated and researcher-prompted stimulated recalls (e.g., Michel et al., 2020; Révész, Kourtali, & Mazgutova, 2017). There is only a limited number of studies in writing research that employed approach (2) above through systematic prompting by a researcher (e.g., Guggenbichler, 2020; Lu & Révész, 2021; Sasaki, 2000). In our two projects, we had the opportunity to test two approaches (namely 2 and 3 above) and learn about their benefits and pitfalls. In a first step, we decided against entirely self-initiated stimulated recalls. While they might minimise researcher interference (Gass & Mackey, 2016), we thought that the data gathered with this approach would be less rich, unsystematic and possibly less useful (Gass & Mackey, 2016). Taking a more structured approach, Révész, Michel and colleagues have reported in several studies that they let participants self-initiate their recalls, but “paused the recording whenever participants paused, made a revision (e.g., substitution or deletion), or went back to parts of the text they had earlier produced” (e.g., Révész, Michel, & Lee, 2017, p. 14), or when conspicuous fixations and regressions occurred (Michel et al., 2020). Our first study (Eberharter et al., 2020) adopted this approach and we defined our stopping rules for the stimulated recalls on responding to L2 writing test tasks as illustrated in Figure 1. We defined the stopping rules to systematise our approach. Nevertheless, they remain vague in their definitions of the kind of pauses or revisions that merit prompting, which makes coding more challenging as coding systems and analyses distinguish between formulation and revision pauses depending on the pause environment (e.g., Eberharter et al., 2020; Révész, Michel, & Lee, 2017). Mixing self-initiated and prompted stimulated recalls thus adds an element of arbitrariness to the procedure. This might not be an issue when the focus of the project is to analyse whether a certain process is activated, e.g., when establishing the cog-

Chapter 12. Investigating cognitive processes during writing tests

Stop replay of stimulus when participants … a. b. c. d. e. f. g.

reread the task as a whole or parts of the task, change sections of their writing (e.g., by adding, deleting or rearranging words/phrases), pause for a noticeable amount of time, dwell for a longer period on a particular feature of the task, return to an earlier section of their writing, look at the time remaining or word count, indicate through cues (e.g., movements of the head, laughing or chuckling, gasping) that they remember something about the writing process.

Figure 1. Stopping rules in verbal recalls as defined by Eberharter et al. (2020)

nitive validity of a test. If, however, the focus is on precisely estimating the relative proportions of certain processes, as was the case in our second study across foreign languages (Guggenbichler, 2020), a more systematic definition of pauses and stopping rules is called for. To avoid under-representation of certain processes due to unrefined stopping rules, and to increase the replicability of our research, our second study (Guggenbichler, 2020) applied an innovative approach and used keystrokelogging data to inform the stimulated recalls. Building on Sasaki’s (2000) initial approach to use pauses as triggers for stimulated recall interviews, we attempted to cover as much of the higher-level processing as possible by systematically targeting all pauses above 2,000 milliseconds identified in the Inputlog output to elicit stimulated recalls. To implement this procedure, we analysed the keystrokelogging data for >2,000 ms pauses upon completion of the writing session while participants took a short break. We first ran a general analysis in Inputlog to produce an XML file that lists all input actions and links them to a time stamp (Leijten et al., 2019). We then examined this output manually to define the beginning of the writing session and remove ‘noise’ by using the pre-processing ID & time filter function. Next, we conducted a linear analysis to filter for all >2,000 ms. The output (see Figure 2) helped us find and target the >2,000 ms pauses as indicator for events of interest. This approach proved to be easily replicable and a useful guideline in our pursuit to fully cover pauses above the defined threshold. Nonetheless, the rather mechanical nature of this procedure brought along new challenges: First, the threshold of >2,000 ms is hardly discernible with the naked eye, and participants themselves sometimes struggled to notice a short pause (and hence to report on it). Furthermore, pauses >2,000 ms would often accumulate in certain moments of the writing process and while, in theory, they could be seen as independent

253

254

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

Output [RETURN]{57727}[RSHIFT]Dans·cet·essai·je·[BACK][BACK][BACK][BACK],·je·voudrais·{20336}vous·racont er·les·avantages·{5096}d[RSHIFT]'un·[BACK]e·ann[OEM_6]ee·{3968}sabbatique{11864}et·{11109}[Moveme nt][Movement]{2982}discuter{15815}[BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK]parle r·d{7896}e{19248}·{6184}[BACK]{3464}·{4864}[BACK]s[BACK]{2064}compet[OEM_6]ences{3448}[BACK],·[OE M_6]etant·util{4608}es.[RETURN][RETURN][BACK][RSHIFT]Premi[RSHIFT][OEM_6]erement,·{6936}je·veux·v ous·donenr·[BACK][BACK][BACK][BACK]ner·mon·opinion{13232}sur·{7000}une·ann[OEM_6]ee·{15248}[RSHIF T][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIIT][RSHIFT][RSHIF T][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIFT][RSHIFT][OEM_6]a·l[RSHIFT]'[OEM_6]etran ger.·{2272}[RSHIFT][RSHIFT][RSHIFT][RSHIFT][OEM_6]a[BACK][OEM_6][RSHIFT]A·[BACK][BACK][OEM_6][RSH IFT]A·[BACK][BACK][RSHIFT][OEM_6][RSHIFT]A·mon·avis,·il·est·n[OEM_6]ecessaire·de·[BACK][BACK][BAC K][BACK]{2496}faire·une·experience{15845}[Movement][LEFT Click][Movement]{2145}·de[Movement][LEFT Click][Movement][BACK]·{2592}de{4984}travau·{6664}[BACK][BACK][BACK][BACK][BACK][BACK][BACK][BAC K][BACK][BACK]{9248}pour·{21697}[Movement][LEFT Click][Movenient]{4056}[Movement][LEFT Click][RET URN][RETURN][BACK][RSHIFT]Deuxi[RSHIFT][OEM_6]emement,{3783}je·vous·pr[OEM_6]esente·les·qualit[OE M_6]es{5928}qu[RSHIFT][RSHIFT]'ils·[BACK][BACK][BACK][BACK][BACK][BACK][BACK]{2216}d[RSHIFT]'a{23 20}[BACK][BACK][BACK]qu[RSHIFT]'il·faut·apporter·pour·avoir·du·succ[OEM_6]es·[RSHIFT][OEM_6]a·l[R SHIFT]'[OEM_6]etranger.·{5312}[RSHIFT]Je·pense·qu[RSHIFT]'il·{2528}faut·[OEM_5]etre·tr[RSHIFT][OE M_6]es·ouvert·parce·qu[RSHIFT]'·on·fait·la·conaissance·{2504}avec·{2632}beaucoup·de·gens{18944}[B ACK],·ayaqnt·[BACK][BACK][BACK][BACK]nt·une·autre·langue{3968}[Movement][LEFT Click][LEFT Clic k][Movement]parlant·[Movement][LEFT Click][Movement][BACK].·[RSHIFT]{12279}[RSHIFT]'[BACK][RSHIF T]Ensuite,·{16880}il·est·n[OEM_6]ecessaire·de·fair·[BACK]e·une·recherche·au·[RSHIFT]Internet·pou r·[OEM_5]etre·bien·inform[OEM_6]ee{3928}de·[BACK][BACK]u·projet.·{15176}[BACK]·{10024}[RETURN]{54 40}[RSHIFT]Finalement,{6328}je·vous·[BACK][BACK][BACK][BACK][BACK]aprl[BACK][BACK][BACK][BACK]par le{6342}[Movement][Movement][LEFT Click][Movement][BACK]·{5848}[OEM_6]elargir·son·horizon·et{431 2}pour{9647}[RETURN][Movement][LEFT Click][Movement]{16199}voir·{20955}[Movement][LEFT Click]{128 79}[BACK][BACK][BACK][BACK][BACK]entrer·en·contact·evec·[BACK][BACK][BACK][BACK][BACK]avec·{412 8}d{3808}[RSHIFT]'autres·jeunes·{2400}qui·ont·{10149}[Movement]{2588}les·m[OEM_5]mes·[BACK][BAC K][BACK][BACK][BACK][OEM5]emescentres·d[RSHIFT]'intleret[BACK][BACK][BACK][BACK][BACK]{3048}[OE M_6]er[OEM_5]ets.·[Movement][LEFT Click] < · > [BACK][Movement][RETURN][Movement][Movement][LEFT Cl ick][Movement][BACK]·[RSHIFT][RSHIFT]Mais·je{3024}ne·crois·pas·qu[RSHIFT]'{2544}·{4512}[RSHIFT](s ubjontiv[RSHIFT])[Movement][LEFT Click][Movement][LEFT Click][Movement][BACK][RSHIFT]'[Movemen t][BACK][Movement]{2578}tu·sois·[Movement][LEFT Click][Movement][BACK]

Figure 2. Inputlog linear analysis of participant F02’s writing stage 1 (Guggenbichler, 2020)

events or processes, they might in fact be related to the same concern or problem space from the participants’ point of view. Example (a) illustrates this in Figure 3. The participant struggled with the correct lexico-grammatical form of the adjective différentes for a while, yet the two >2,000 ms pauses appear independently in

Chapter 12. Investigating cognitive processes during writing tests

the output. Pauses also became more difficult to detect and analyse for us when participants displayed a particularly non-linear writing behaviour and jumped back and forth in their text. This also applied to pauses towards the end of the writing session. In both instances, participants would make increasing use of scrolling functions (via mouse or key press), and Inputlog would record pauses between these on-screen movements as distinct events even though they were all part of the same monitoring phase (excerpt b in Figure 3). (a)

beaucoupde·diff[OEM_6]érentes[BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BA CK]langues·diff[OEM_6]érents,·[BACK][BACK]e[BACK]s,·[BACK][BACK][BACK][BACK]es,·{2392}[BAC K]·{3584}e[BACK][0EM_5]être·capable·de·parler·au·moins·

(b)

[Movement]{4025}[Movement]{10952}[Movement][LEFT Click][Movement][Movement]{15856}[RIGHT]{1 2739}[Movement][Scroll][Movement][Scroll][Movement][Scroll][Movement][Scroll][Movement][Scr oll][Movement][Scroll][Movement][Scroll][Movement][Scroll][Movement][Scroll]{20358}[Movemen t][LEFT Click][BACK][BACK][BACK][BACK][BACK],·[LCTRL][LCTRL + RIGHT]or·{3872}

Figure 3. Examples of non-distinct >2,000 ms pauses provided by participants F05 and E02 (Guggenbichler, 2020)

In our stimulated recalls, we targeted these events separately. One advantage of our approach is certainly that researchers do not pre-interpret pauses as related to a certain phenomenon. In many cases, however, participants did not recall those events separately as stimulated recall data confirmed. In hindsight, another possibility could have been to adopt an approach similar to Murphy and Roca de Larios (2010) and define problem spaces based on a set of alone-standing but dependent writing behaviours. Based on our experience, a manual post-hoc check of pauses and comparing them to stimulated recall data is inevitable in any elicitation procedure to obtain meaningful results and interpretations.

Defining pause length thresholds The issue of pause length thresholds is frequently discussed in the broader field of L2 writing research and extends to projects in the context of LT (Barkaoui, 2019). When investigating L2 writing processes in the context of language testing and assessment, the focus is on identifying factors that may affect test-taker performance, for instance, test setup, task design, or task difficulty. The difficulty of an L2 writing task, for instance, may impact conscious thinking efforts, which are often argued to be captured with a pause threshold setting above above 2,000 milliseconds (e.g., Van Waes & Leijten, 2019). However, these effects may also

255

256

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

influence lower-level and automated processes, which are only isolated on lower threshold settings. Our first study (Eberharter et al., 2020) was based on a larger sample (N = 30) that also allowed for group comparisons. To be able to isolate effects, we differentiated pause threshold settings depending on the part of analysis. First, global time-based measures and P-bursts were reported at >2,000 ms as this is the most frequent setting and allows comparison with other studies (Leijten et al., 2019). When comparing pausing behaviour across ability levels or tasks, data was filtered at three threshold levels: 30, 200 and 2,000 milliseconds, in line with a procedure suggested by Leijten et al. (2019) which allowed for a more nuanced analysis of effects. For instance, when we compared self-reported planning processes across two tasks, we found that participants unexpectedly engaged in more content planning in the easier task. This finding was corroborated via the keystroke-logging data at the between sentences location, which showed a significant difference between the two tasks in terms of median pause frequency at the most sensitive setting. Through filtering keystroke-logging data at different thresholds, we found that task difficulty had a stronger effect on lower-level processes than on higherlevel ones. There are downsides to this approach, as analysing and reporting data on three threshold settings only postpone taking certain decisions. First, should researchers wish to include variables based on keystroke logging into statistical modelling – and parsimony becomes a concern – they will likely have to make a choice as to limit the number of variables. Second, this strategy triples the amount of data and may be challenging to handle, analyse and interpret for less experienced researchers or colleagues who are new to this area of research (particularly if pauses are intended to be employed for defining stopping rules for stimulated recalls, as described in the previous section). In the case of our second study (Guggenbichler, 2020), the keystroke-logging output served to detect pauses of a particular length and systematise the prompting of stimulated recalls. The pause length threshold was defined at 2,000 milliseconds as this is the common threshold in the literature for conscious reflection and was assumed to best target writing processes (Galbraith & Vedder, 2019; Leijten et al., 2019).

Interpreting keystroke-logging data Although pauses, revisions, deletions and insertions are common pre-defined features within software packages such as Inputlog, we faced unanticipated decisions when analysing keystroke-logging data, given that previous LT studies appear to disagree on how to define and interpret these features within particular contexts.

Chapter 12. Investigating cognitive processes during writing tests

One of the first decisions we had to take was how to clear the ‘noise’ (e.g., from launching the software) at the beginning of the recordings. This is of particular importance when writing processes are captured via several parallel methods (e.g., eye tracking and keystroke logging) and when data is analysed quantitatively and compared to similar studies. Révész, Kourtali, & Mazgutova (2017) excluded events such as “explicit planning episodes (i.e., when writers stop producing full text in order to plan on the screen), draft revisions (i.e., when individuals go back to the beginning of the text and systematically go through, edit, and revise their initial drafts), and end revisions (i.e., when writers revise text outside the final paragraph while working on the final paragraph)” from analysis (p. 219). In our opinion, however, this approach raises the questions as to what parts of the writing session can or should be included in keystroke-logging analyses and how we can select them in line with the aim of our studies. In our first study (Eberharter et al., 2020), we investigated writing fluency across proficiency levels and tasks. Similar to Révész, Kourtali, and Mazgutova's (2017) and Leijten et al.’s (2015) approach, we removed all data captured by Inputlog before the first and after the last keystroke set by the participant. However, contrary to Révész, Kourtali, and Mazgutova (2017), we did not remove instances of planning, draft revisions (i.e., when participants went back to the beginning of the text to start with systematic editing of drafts) or end revisions (i.e., when participants revised other parts of the text as they were writing the final paragraph of the entire text). This meant that our fluency measures reflected the more linear parts of the writing session as well as explicit planning episodes (e.g., notes) at the beginning, and revisions towards the end of the session. While we agree with Révész, Kourtali, and Mazgutova (2017) that the processes involved in initial draft production might be “different” (p. 219) to other stages of the writing process, we think that it is important to see them as part of the writing process when investigating writing assessment tasks, as this kind of writing is shaped by aspects such as word count and time pressure. In addition, we were concerned about the reliability of delineating initial draft production processes when dealing with nonlinear digitised writing behaviours. The example provided (see Figure 4) originates from our Study 2 (Guggenbichler, 2020) and illustrates the writing behaviour by one participant in the middle of a particularly non-linear writing episode. This participant started by taking notes for each section (in capital letters), which they then developed on and partly deleted later in the writing session. As shown in the output, the participant frequently moved to different parts of the text using the mouse [Movement] or backspaces [BACK] to delete previously produced text. If this hybrid writing phase, in which planning, note-taking and text production merge, were not considered part of the writing process, we would have to exclude this episode and

257

258

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

all the activities related to it from data analysis. Instances such as this one raise the question as to whether note-taking and explicit planning processes should be defined as part of the writing process or not. If not, the analysis would, in our opinion, overlook the reprocessing of main ideas and goals which, as described by Bereiter and Scardamalia (1987), is an element of mature writing. {29476}[RETUR]{114154}[Movement][CAPS LOCK]TITRE·[RETURN]{20112}[LSHIFT]nOUS·[BACK][BACK][BACK][B ACK][BACK][CAPSLOCK][LSHIFT]Nous·vivons·dans·un·monde·[Movement]{11396}en·{4592}continuw·[BACK][B ACK]e·changement·[BACK]o [....] ·{8912}[Movement][LEFT Click] < . > [Movement][BACK][BACK][BACK][BACK]{22262}[Movement][LEFT Clic k][Movement][CAPS LOCK]ETRE·OUVERTS·{2752}[CAPS LOCK][Movement][LEFT Click][Movement][CAPS LOCK]P ROGRESSER·ET·{4220}[Movement][LEFT Click][Movement][BACK][BACK][BACK][BACK](BACK)[CAPS LOCK]la·[B ACK][BACK][BACK][CAPS LOCK]LA·MEILLEURE·{2216}VERSION·DE·NOUS·M{2968}[OEM_5]EME·{2765}[Movemen t][LEFT Click][Movement][RETURN][CAPSLOCK]conclusion[BACK][BACK][BACK][BACK][BACK][BACK][BACK][BA CK][BACK][CAPS LOCK]CONCLUSIOM·[BACK][BACK]N·{5708}[Movement][LEFT Click][Movement]·APPRENDRE·{34 31}[LSHIFT][OEM_6]A·SE·CONNAITRE·SOI·MEME·POUR·DEV{4344}ENIR·{4488}DES·CY{2120}[BACK]O[BACK]ITO[B ACK]OYENS·RESPONSA[BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK]ACTI{3079}FS·ET·TE[BACK][BACK]R ESPONSABLE·{2736}[BACK]S·[Movement][LEFT Click][Movement]{3182}[Movement][LEFT Click][Movement] [....] lick] < e > [Movement][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BAC K][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BAC K][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BACK][BAC K]{16927}

Figure 4. Merging writing and planning phases. Example taken from Guggenbichler (2020), Participant F06

As our second study (Guggenbichler, 2020) looked into qualitative data on multilingual writing processes and used keystroke-logging data on pausing mainly for eliciting stimulated recall data, we deliberately included the prewriting phase in our keystroke-logging outputs (unlike, for instance, Révész, Kourtali, & Mazgutova, 2017). To avoid excluding important planning and task conceptualisation events from the analysis, we used the manual filter to examine the pre-writing phase, i.e. the “temporal span from the beginning of a writing event upon topic assignment till the commencement of continuous textual output (…) following the beginning of a writing event” (Xu & Xia, 2019, p. 10). The automatic cleaning of files or deleting the beginnings of the writing session can, in our

Chapter 12. Investigating cognitive processes during writing tests

opinion, be problematic as we would argue that “the stage before words emerge on paper” (Flower & Hayes, 1981, p. 367) is an integral and relevant part of the writing process. To facilitate the filtering and precisely determine when participants actually started with the pre-writing phase, they were asked to enter their assigned participant number as soon as they could see the task and Word-file in full screen. This time stamp was then used as a cut-off point. In our opinion, the approach taken in our Study 2 (Guggenbichler, 2020) proved to be more suitable when researchers are interested in theory-building and exploring the nature of a phenomenon qualitatively, e.g., multilingual writing processes. In contrast, it did not allow us to compare fluency measures quantitatively with other studies, which was the focus of Eberharter et al. (2020). Finally, we need to be cautious with comparisons across similar studies in the LT literature (e.g., Révész, Michel, & Lee, 2017), as definitions of which elements of keystrokelogging data to exclude from fluency measures appear to differ widely. A further challenge when analysing the data for revision and formulation pauses (as suggested by Révész, Michel, & Lee, 2017) was defining and interpreting “revisions” and “(re)formulations”. These phenomena appear in different studies related to L2 writing assessment, but there is no established consensus yet on how to delineate, operationalise and measure them. In our first study (Eberharter et al., 2020), we decided to identify these events retrospectively, i.e., one researcher watched the eye-tracking recordings and categorised events as revision or formulation based on the stimulated recall. While this procedure proved to be useful, it was time-consuming and not all pausing events were commented on in the stimulated recall. We addressed this issue in our subsequent study (Guggenbichler, 2020). This, however, brought along new challenges. Keystroke-logging programmes offer technical definitions of revisions and algorithms to detect reformulations based on deletions and insertions in contrast to fluent text production (Van Waes & Leijten, 2019). However, these operationalisations only partly capture the two phenomena. As keystroke-logging data only allows to observe external revisions, such as revisions and insertions, on previously written text (Van Waes & Leijten, 2019), it fails to provide insights on the interplay between revisions and the cognitive processes that trigger them or revisions that are of a conceptual nature, such as evaluating an idea (Lindgren & Sullivan, 2006). To address this shortcoming, we followed Révész, Michel, and colleagues’ (Michel et al., 2020; Révész, Michel, & Lee, 2017) recommendations in that we triangulated keystroke-logging data with stimulated recalls. Furthermore, previous studies of writing processes in L2 tests have varied in terms of how explicitly they defined the phenomena of revisions, deletions, insertions, substitutions, etc. When filtering lower-level pauses and revisions in Study 2 (Guggenbichler, 2020), Barkaoui’s definitions (2019) of formulation

259

260

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

pauses, as “pauses [of 2,000 milliseconds or more] immediately followed by the keyboarding of additional text at the point of inscription (i.e., immediately after the last-typed character)”, and revision pauses (immediate or delayed), as pauses of 2 seconds or more “followed by deletion of text at the point of inscription or followed by typing or deletion after cursor movement away from the point of inscription (using the mouse and/or the key arrows)” (p. 539), served us as a helpful guide. This procedure excludes simple typos and is in line with Conijn et al. (2019), who maintained that any low-level revisions, such as typos, should be filtered and excluded from analyses, an approach that, although problematic in and of itself, Lu and Révész (2021) also followed. Guided by this rationale, however, our Study 2 coded formulation pauses and excluded revisions of less than 2,000 milliseconds, which were not deemed “structural revisions” (Leijten et al., 2015, p. 86), when categorising writing processes during revision pauses. Still, the above definitions do not address how to deal with text parts being moved and inserted (e.g., through cut and paste) in the analysis. From experience, some writers also choose to stop writing at the point of inscription, return to a previous point and add an entire paragraph to the previously written text. This draws our attention to the question under which conditions the fluent production of a long stretch of text, which can be moved around easily in a digital writing space, should still be considered a revision. In any case, the non-linear character of the digital writing process, in which revisions are often added at a later point of the writing event or extend over a longer period (Leijten et al., 2015), makes revisions difficult to identify and requires manual checks and careful interpretation in addition to automated analyses.

Interpreting stimulated recall data In both our projects we found that no other method discussed in this chapter is nearly as powerful as stimulated recall data when it comes to providing direct insights into the writing process. However, there are several challenges that arise when interpreting and coding the participants’ self-reported cognitive processes which may be related to individual differences between them. First, the quality of verbal reports is contingent on the participants’ ability to report on cognitive processes. Some participants may generally have easier access to their thinking processes than others or may find it easier to verbalise their observations. Learner proficiency may also play a role in that more proficient learners may foreground more or different writing processes than less proficient learners. Based on our studies, we conclude that variations in how participants verbalise their recalls might impact the interpretation of data.

Chapter 12. Investigating cognitive processes during writing tests

Additionally, in order to be able to reconstruct L2 test response processes, researchers need to be careful to sample cohorts of participants that resemble the targeted test population. However, as we have found in both Guggenbichler (2020) and Eberharter et al. (2020), the stimulated recall data produced by some participant groups is easier to interpret and code than others. Stimulated recalls in Guggenbichler (2020) were produced by English and French learners who had received formal L2 instruction and attended linguistics classes at university level. These learners had the meta-knowledge to explain with precision or even preinterpret the problems they encountered, for example, with verb conjugations or the syntactic structures they consciously chose to employ (see examples a and b in Figure 5). In contrast, Eberharter et al.’s (2020) candidates, who had various educational and professional backgrounds, produced stimulated reports which were much more general and potentially difficult to interpret by the researchers. The example in Figure 6 was taken from the stimulated recall of a participant that produced relatively few sections that could be coded, and even in this excerpt it remains ambiguous whether they were searching for a word or thinking about its spelling. When we compared the relative ease or difficulty with which we coded data from our two samples in these studies, we felt that while it is incredibly valuable to research learners from various backgrounds, participants with less metalanguage or awareness for language production processes might require different prompting or training to verbalise more easily interpretable data. a. b.

Yeah, because I tend to mix up third / third person singular and first person singular. That’s why. (Participant F01) Then I thought, I’ll ask an inversion question here. (Participant F03)

Figure 5. Excerpts of stimulated recall data from Guggenbichler (2020)

I thought about how you (…) / how you say [German word] in English. (…) yes. Maybe that’s why I looked at the word count because I was thinking about words [laughs]. I don’t know. (…) I tried to know how you write this. (Participant 04)

Figure 6. Excerpts of stimulated recall data from Eberharter et al. (2020)

261

262

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

Comparing writing processes across languages A particular challenge faced in one project (Guggenbichler, 2020) was designing and conducting a cross-language study including French and English as foreign languages in a non-English context and in the face of conflicting theoretical paradigms. On the one hand, there seems to be consensus in the literature that writing theories and methodological approaches can be transferred from the L1 to any further language. Accordingly, prominent writing models, such as Kellogg’s (1996), have been used to validate L2 (English) writing tests. On the other hand, whilst referring back to the theoretical knowledge gained from studies into L1 and L2 English writing there appear to be specific intricacies of non-English L2 writing that should be taken into account, such as a wide range of learning contexts, diverse learner and test-taker characteristics alongside different teaching and writing traditions (Reichelt et al., 2012). Comparing writing processes or macro-processes (i.e., planning, translation, monitoring) from the stimulated recalls across languages revealed the importance of rethinking methodological approaches and analyses with non-English L2 test takers. Previous LT studies mainly presented quantitative data from stimulated recalls to analyse to what extent these processes were activated by test tasks (Michel et al., 2020; Révész, Kourtali, & Mazgutova, 2017; Révész, Michel, & Lee, 2017). When comparing the cognitive processes reported by participants across two foreign languages, the results of the Kolmogorov-Smirnoff-two-sample test showed that writing processes, as measured by Kellogg’s (1996) model, were distributed similarly between both test-taker groups in terms of frequency. While this would be indicative of the cross-linguistic validity of the framework, findings of a qualitative analysis of the stimulated recall data showed the use of multilingual strategies not typically covered in traditional writing models (see Figure 7), and also that there were indeed language-specific difficulties in processing. For example, participants in the two L2 groups seemed to approach language problems differently. The French learners reported more translation processes during formulation pauses. In comparison, English learners tended to think in chunks of language and showed more frequent monitoring behaviour. Arguably, models of writing need to be regarded as theoretical approximations of the skill; nonetheless, we discovered that when looking into processes, individual candidates and language groups demonstrated language-specific differences that are not captured in quantitative analyses based on writing models. Put differently, while participants might have, overall, activated the same type and perhaps even quantity of macro-processes (i.e., planning, translation, monitoring), the realization of those processes was language-specific or even crossed language boundaries, as illustrated in the examples. These qualitative findings, albeit

Chapter 12. Investigating cognitive processes during writing tests

I think that’s because the English word “gap year” comes up at this point (…) and I was thinking how I could formulate that and I decided to take the English word and put it in quotation marks. (Participant F05) And then / then I was thinking in Italian. What’s the word again in Italian and ah “voluntary!” Now it has come back again (laughs) (Participant E03)

Figure 7. Examples of multilingual writing processes, excerpts from Guggenbichler (2020)

preliminary, raise questions as to whether currently employed writing models (Field, 2004; Flower & Hayes, 1981; Kellogg, 1996) can fully capture processing in diverse linguistic contexts. In light of this, there is a dire need for further research into multilingual and non-English L2 writing processes under exam conditions. In addition, as some test developers use translated prompts it would seem important to scrutinise task equivalency of test tasks across L2s. We encountered further methodological questions when investigating pausing and revisions across languages. When analysing writing behaviour in L2 French, we noted that many pauses were considerably longer than 2,000 milliseconds – the common cut-off threshold for conscious reflection that should exclude typing-related interkey intervals (Galbraith & Vedder, 2019; Leijten et al., 2019) – and linked to a comparably large number of execution problems. Accents and punctuation marks were challenging for French L2 writers on a standard QWERTZ keyboard. The participant reports are also revelatory of writing strategies, such as spell checker use. Figure 8 provides illustrative examples of these processes. To avoid false interpretations of writing processes due to graphomotoric execution problems, we controlled for participants’ keyboard writing skills via Inputlog’s multilingual copy task (Van Waes et al., 2019). Following Barkaoui’s (2016) classification, none of the participants would have been categorised as having poor typing ability. We used the >2,000 ms threshold for identifying pauses in both languages that should, according to the literature, distinguish conscious moments of reflection, which are sometimes referred to as higher-level processing (Leijten et al., 2019). However, this means that execution problems in one language should have been classified as a higher-level processing pause. In addition, as a number of students consciously reflected on how to produce accents and applied different problem-solving strategies, this raises the question whether these execution problems can still be considered mechanical in nature. Parallel to Roca de Larios et al. (1999), who challenged the view that processes related to execution

263

264

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

Strategies and processes

Examples

Accents

a.

Then I paused because I knew that “eux-memes” is spelt with an “accent circonflexe”, but I didn’t have time for that. (Participant F06)

Punctuation b.

Yeah right. Now I was looking for the (…) the quotation marks, the French ones because I had used the usual quotation ones and I think the programme is set on English, so it put in the usual inverted commas and that’s why I then used the arrows. (Participant F05)

Strategies

Yeah, because of the “cédille” because I didn’t … […] I just didn’t look for it because I thought (…) that’s not / It will be clear that I wanted to use it and usually the spell checker / it just corrects it for me. (Participant F01)

C.

Figure 8. Examples of execution processes reported by the French L2 participants

and (lexical) formulation are automatically considered lower-level in nature, we debated how to categorise graphomotoric execution problems based on this data. Linking back to Section 2 above in this chapter, we would advise future researchers to critically investigate the cross-linguistic validity of the >2,000 ms threshold, as they set out to examine writing behaviour. Our observations are in line with Knospe’s (2017) research, which observes that writing in German as a third language (L3) led to increased motor transcription problems compared to L2 English writing, presumably due to keyboard use. Research by Roorda (2021) on pausing and revising across English and German in a Dutch context also indicates that the (sociolinguistic) role of a language affects automatisation and language processing. Recently, Lu and Révész (2021) did ground-breaking work into writing processes in a non-alphabetic language and concluded that using a less familiar orthographic system causes additional demands for writers. These examples further illustrate that research into writing processes across languages, or in L2s other than English, is far from straightforward and researchers must consider carefully whether conventional tools, approaches and interpretation models are suitable for their research project.

Conclusion Research into writing processes offers plenty of untapped potential for the field of language testing and assessment, particularly so when various data elicitation procedures are combined with careful consideration of possible issues in research

Chapter 12. Investigating cognitive processes during writing tests

design and methodology, such as the combination, handling and interpretation of data. One key implication of the research reported in this chapter for future studies is that a triangulation of the three methods discussed is certainly worthwhile. We have illustrated some of the challenges faced in such studies and how we have dealt with them based on our experience from two research projects. While we have attempted to cover a range of challenges, the list of issues presented here is by no means comprehensive. However, the chapter exemplifies and highlights the impact of any methodological decision on the robustness of findings and claims in this area of research. Our studies have shown that elicitation procedures for stimulated recalls need to be systematic so that they can be replicated. Unsystematic stopping rules, i.e., missing or non-comprehensive guidelines to define at which point researchers should stop the replay of the stimulus for the elicitation of a stimulated recall, might not provide the full picture of cognitive processes that test takers employ and thus not allow for comprehensive and robust cognitive validation. For the definition of pause length thresholds, future studies need to further investigate ideal thresholds for various research purposes, and test validation studies should decide on the threshold in advance, bearing in mind potential limitations as to what processes may or may not be detected as a result thereof. For the appropriate interpretation of keystroke logging data, an important challenge was to strike a balance between definitions based on previous research and adapting approaches that are useful within the context of test validation. As far as the interpretation of stimulated recall data is concerned, our studies have highlighted how different samples provide very different insights into writing processes. Given the great value of introspective self-report data for L2 writing process research, future studies might want to investigate and refine different participant briefing and training methods so that research can tap heterogeneous learner/test-taker samples more successfully. In terms of specific methodological intricacies of comparing writing processes across languages, we would recommend incorporating qualitative analyses to investigate the nuanced differences between processing in different languages. In this way, different challenges for writers in test situations and thus their score interpretations can be better understood, particularly when tests target the same level in different languages. We believe that the reflections presented in this chapter are relevant considerations, given that writing process research in test validation is still only starting to gain ground and will undoubtedly be expanded to new test-taker groups as well as different sociocultural, linguistic, and pedagogical settings in the future. We hope the methodological techniques and decisions we have discussed can constitute a useful basis to explore various topics in the field of L2 writing assessment. Yet, when adapting our suggestions to new settings and foci, researchers will need to

265

266

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

critically use and further improve on current practices in L2 writing assessment research to contribute to a more coherent picture of writing processes during L2 writing tests and do justice to the multifaceted skill of writing.

References Barkaoui, K. (2015). Test takers’ writing activities during the TOEFL iBT® writing tasks: A stimulated recall study. In TOEFL iBT®Research Report TOEFL iBT – 25. Barkaoui, K. (2016). Examining the cognitive processes engaged by APTIS writing Task 4 on paper and on the computer. In British Council (Ed.), ARAGs research reports online ARG/2016/1. Retrieved on 27 April 2023 from https://www.britishcouncil.org/sites/default /files/barkaoui.pdf Barkaoui, K. (2019). What can L2 writers’ pausing behavior tell us about their L2 writing processes? Studies in Second Language Acquisition, 41(3), 529–554. Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Routledge. Bowles, M. A. (2018). Introspective verbal reports: Think-alouds and stimulated recall. In A. Phakiti, P. De Costa, L. Plonsky, & S. Starfield (Eds.), The Palgrave handbook of applied linguistics research methodology (pp. 339–358). Palgrave. Brunfaut, T., & McCray, G. (2015). Looking into test-takers’ cognitive processes whilst completing reading tasks: A mixed-method eye-tracking and stimulated recall study. British Council (Ed.) ARAGs research reports online AR-G/2015/001. Retrieved on 27 April 2023 from https://www.britishcouncil.org/sites/default/files/brunfaut_and_mccray_report_final_0 .pdf Conijn, R., Roeser, J. & van Zaanen, M. (2019) Understanding the keystroke log: The effect of writing task on keystroke features. Reading and Writing, 32, 2353–2374. Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). Eye-tracking: A guide for applied linguistics research. Cambridge University Press. Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press. Retrieved on 27 April 2023 from https://rm.coe.int/1680459f97 Council of Europe. (2020). Common European Framework of Reference for Languages: Learning, teaching, assessment: Companion volume with new descriptors. Retrieved on 27 April 2023 from https://rm.coe.int/common-european-framework-of-reference-forlanguages-learning-teaching/16809ea0d4 Eberharter, K., Guggenbichler, E., Holzknecht, F., Schwarz, V., Zehentner, M., & Kremmel, B. (2020). Probing the cognitive validity of the Linguaskill writing component: Exploring test takers’ cognitive processes using eye tracking and keystroke logging. (Unpublished report). Cambridge Assessment English Funded Research Programme Report Series (Round 9). Field, J. (2004). Psycholinguistics: The key concepts. Routledge. Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387. Galbraith, D., & Vedder, I. (2019). Methodological advances in investigating L2 writing processes: Challenges and perspectives. Studies in Second Language Acquisition, 41(3), 633–645.

Chapter 12. Investigating cognitive processes during writing tests

Gass, S. M., & Mackey, A. (2016). Stimulated recall methodology in second language research (2nd ed.). Routledge. Guggenbichler, E. (2020). Investigating writing across foreign languages: Same Matura task, different processes? (Unpublished MA thesis). Universität Innsbruck. Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences and applications (pp. 57–71). Lawrence Erlbaum Associates. Knospe, Y. (2017). Writing in a third language a study of upper secondary students’ texts, writing processes and metacognition (Doctoral dissertation). University of Antwerp & Umeå University. http://umu.diva-portal.org/smash/get/diva2:1093554/FULLTEXT01.pdf Leijten, M., Van Horenbeeck, E., & Van Waes, L. (2019). Analysing keystroke logging data from a linguistic perspective. In E. Lindgren & K. P. H. Sullivan (Eds.), Observing writing: Insights from keystroke logging and handwriting (pp. 71–95). Brill. Leijten, M., & Van Waes, L. (2013). Keystroke logging in writing research: Using Inputlog to analyze and visualize writing processes. Written Communication, 30(3), 358–392. Leijten, M., Van Waes, L., & Van Horenbeeck, E. (2015). Analyzing writing process data: A linguistic perspective. In G. Cislaru (Ed.), Writing(s) at the crossroads: The process – product interface (pp. 277–302). John Benjamins. Lindgren, E., & Sullivan, K. P. H. (2006). Analysing online revision. In K.P.H. Sullivan & E. Lindgren (Eds.), Computer keystroke logging and writing: Methods and applications (pp. 157–188). Elsevier. López-Serrano, S., Roca de Larios, J., & Manchón, R. M. (2019). Language reflection fostered by individual L2 writing tasks: Developing a theoretically motivated and empirically based coding system. Studies in Second Language Acquisition, 41(3), 503–527. Lu, X., & Révész, A. (2021). Revising in a non-alphabetic language: The multi-dimensional and dynamic nature of online revisions in Chinese as a second language. System, 100, 1–13. Michel, M., Révész, A., Lu, X., Kourtali, N. -E., Lee, M., & Borges, L. (2020). Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study. Second Language Research, 36(3), 277–304. Murphy, L., & Roca de Larios, J. (2010). Searching for words: One strategic use of the mother tongue by advanced Spanish EFL writers. Journal of Second Language Writing, 19(2), 61–81. Reichelt, M., Lefkowitz, N., Rinnert, C., & Schultz, J. M. (2012). Key issues in foreign language writing. Foreign Language Annals, 45(1), 22–41. Révész, A., Kourtali, N. -E., & Mazgutova, D. (2017). Effects of task complexity on L2 writing behaviors and linguistic complexity: Task complexity and L2 writing. Language Learning, 67(1), 208–241. Révész, A., Michel, M., & Lee, M. (2017). Investigating IELTS Academic Writing Task 2: Relationships between cognitive writing processes, text quality, and working memory. In IELTS, British Council, idp IELTS Australia, & Cambridge English (Eds.), IELTS Research Reports Online Series 2017/3. Retrieved on 27 April 2023 from https://www.ielts .org/-/media/research-reports/ielts_online_rr_2017-3.ashx Révész, A., Michel, M., & Lee, M. (2019). Exploring second language writers’ pausing and revision behaviours. Studies in Second Language Acquisition, 41(3), 605–631.

267

268

Elisa Guggenbichler, Kathrin Eberharter & Benjamin Kremmel

Roca de Larios, J., Manchón, R. M., & Murphy, L. (2006). Generating text in native and foreign language writing: A temporal analysis of problem-solving formulation processes. The Modern Language Journal, 90(1), 100–114. Roca de Larios, J., Marín, J., & Murphy, L. (2001). A temporal analysis of formulation processes in L1 and L2 writing. Language Learning, 51(3), 497–538. Roca de Larios, J., Murphy, L., & Manchón, R. (1999). The use of restructuring strategies in EFL writing: A study of Spanish learners of English as a foreign language. Journal of Second Language Writing, 8(1), 13–44. Roca de Larios, J., Nicolás-Conesa, F., & Coyle, Y. (2016). Focus on writers: Processes and strategies. In R. M. Manchón & P. K. Matsuda (Eds.), Handbook of second and foreign language writing (pp. 267–287). De Gruyter. Roorda, M. (2021, May 22). Writing processes in different languages [Conference presentation]. L2WR Seminar 2021. Murcia, Spain. Retrieved on 27 April 2023 from https://eventos.um .es/_files/_event/_56004/_editorFiles/file/Roorda.pdf Sasaki, M. (2000). Toward an empirical model of EFL writing processes: An exploratory study. Journal of Second Language Writing, 9(3), 259–291. Shaw, S. D., & Weir, C. J. (2007). Examining writing: Research and practice in assessing second language writing. Cambridge University Press. Spelman Miller, K., Lindgren, E., & Sullivan, K. P. H. (2008). The psycholinguistic dimension in second language writing: Opportunities for research and pedagogy using computer keystroke logging. TESOL Quarterly, 42(3), 433–454. Tobii Technology AB. (2012). Tobii Studio Pro (3.4.5.). Tobii Technology AB. Van Waes, L., & Leijten, M. (2019). Inputlog: Help documentation (in progress). Retrieved on 27 April 2023 from https://www.inputlog.net/wp-content/uploads/Inputlog_manual.pdf Van Waes, L., Leijten, M., Pauwaert, T., & Van Horenbeeck, E. (2019). A multilingual copy task: Measuring typing and motor skills in writing with Inputlog. Journal of Open Research Software, 7(1), 30. Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave Macmillan. Wengelin, Å., Torrance, M., Holmqvist, K., Simpson, S., Galbraith, D., Johansson, V., & Johansson, R. (2009). Combined eyetracking and keystroke-logging methods for studying cognitive processes in text production. Behavior Research Methods, 41(2), 337–351. Xu, C., & Xia, J. (2019). Scaffolding process knowledge in L2 writing development: Insights from computer keystroke log and process graph. Computer Assisted Language Learning, 34(4), 583–608. Yu, G., He, L., & Isaacs, T. (2017). The cognitive processes of taking IELTS Academic Writing Task 1: An eye-tracking study. In IELTS, British Council, idp IELTS Australia & Cambridge English (Eds.), IELTS Research Reports Online Series, 2017/2. Retrieved on 27 April 2023 from https://www.ielts.org/-/media/research-reports/ielts_online_rr_2017-2 .ashx

chapter 13

Methodology and multimodality Implications for research on digital composition with emergent bilingual students Mark B. Pacheco & Blaine E. Smith

University of Florida | Vanderbilt University

This chapter explores methodological challenges in understanding the relationships between processes, products, and perspectives within digital multimodal composition. Using examples from a research project concerning multilingual and multimodal composing in an eighth-grade classroom in the United States, the authors describe specific challenges – and possible avenues forward – in relation to screen-capture software, student retrospective design interviews, and multimodal timescapes. The chapter concludes with implications from these challenges for research and instruction for emergent bilingual students.

Introduction Digital multimodal composition involves the orchestration of varied semiotic resources, including visuals, texts, animations, and sounds, within composing processes and products (Hull & Nelson, 2005; Jewitt, 2008). For individuals in the process of adding new linguistic resources to their expanding linguistic repertoires – whom we identify as emergent bilinguals (EBs) – digital composition with multiple modes offers unique opportunities within classrooms to develop as multimodal designers, where students might consider multimodal affordances and their rhetorical possibilities (e.g., Dalton et al., 2015; Tan & Guo, 2009); to support content learning, where students might use composing to engage in discipline-specific practices, such as scientific modeling (e.g., Pierson & Grapin, 2021; Smith, 2019; Goulah, 2017); and to engage with readers across time and space, where students might draw on multimodal and multilingual resources to negotiate meanings with readers in and out of the classroom (e.g., Kim, 2018; Pacheco & Smith, 2015).

https://doi.org/10.1075/rmal.5.13pac © 2023 John Benjamins Publishing Company

270

Mark B. Pacheco & Blaine E. Smith

While digital composition affords avenues for EBs to learn about composing, language, and content, it has also presented new challenges for researchers in more traditional, print-centric, and monolingual classroom contexts (Mills, 2007; Ware, 2008). These challenges include identifying the emergent resources that become available to students within the composing process (see Pennycook, 2017); capturing the relationship between composing processes, products, and composer perspectives; and understanding and representing such relationships within data analysis and presentation (see Smith, 2017). This chapter begins to address some of these obstacles, as well as possible directions forward for future research, by detailing the methodological decisions involved in a digital multimodal composing research project – a presentation that explored aspects of heroism in a non-fiction text and in students’ communities. We do so by offering a critical analysis of our research team’s methodological options when examining student composing in an eighth-grade classroom (ages 13–14) within the southeastern United States. We describe these methodological challenges in relation to (i) capturing composing processes, which include students’ learning about modal affordances, seeking information, and iteratively creating digital texts; (ii) analyzing composing products, which include the “My Hero” PowerPoint presentations; and (iii) understanding composers’ perspectives, which include students’ descriptions of design decisions and engagement in the project. We begin below by situating this project amongst other research that focuses on products, processes, and perspectives with emergent bilinguals in secondary classrooms. We then describe the methodological considerations in our study and conclude with reflections on the robustness of our findings in light of our methodological decisions, as well as with implications for future research.

The research landscape: Products, processes, and perspectives A growing body of literature has documented the complex ways in which students leverage multiple languages and other modalities within an array of digital products. We describe the students in our review as emergent bilinguals, rather than English learners or linguistic minority students. This terminology emphasizes that students are in the process of expanding their entire linguistic repertoires, rather than focusing solely on students’ emerging proficiency in English (see Garcia, 2009 for an extended discussion). Research on EBs’ digital multimodal composing details their use of multiple modes (e.g., visuals, sounds, text, and movement) to make meaning with digital tools and in digital formats, including designing videos (e.g., de los Ríos, 2018), presentations (e.g., Pacheco & Smith, 2015), podcasts (e.g., Wilson et al., 2012), and claymations (e.g., Hepple et al., 2014), amongst

Chapter 13. Methodology and multimodality

other digital products. In what follows, we summarize this research in relation to (1) products, or the digital compositions designed by students either individually or collaboratively; (2) processes, or the activities that students engage in while creating these products, including gathering information, drafting compositions, discussing their designs, and interfacing with different composing technologies; and (3) perspectives, or the ways that students and teachers view digital composition, including their understandings of and intentionality within digital processes and products. For studies within classroom spaces, analyses of students’ products stem primarily from a social semiotics perspective of digital composing (Halliday, 1978; Kress, 2010), where messages are conveyed through the orchestration of varied modalities (Unsworth, 2008), each with unique communicative affordances (Jewitt, 2008). We understand modalities as sets of socially and culturally shaped resources for making meaning. They are not fixed or universal, but fluid and created through social processes within specific contexts (Kress & van Leeuwen, 2001). Furthermore, the unique interweaving of modalities creates complex and generative messages. Wilson et al. (2012), for example, use multimodal concordance charts to explore how images as well as oral and written texts worked toward ideational, interpersonal, and textual functions. Using a similar perspective informed by Halliday’s work, D’warte (2014) explores images and texts within language maps or multimodal compositions that detail where and how students use different languages in varied contexts. The author shows how these modalities work together to index students’ understandings of linguistic registers present within their lives as students shared how they “used a variety of languages and dialects to text, watch movies, sing, and Skype with grandparents and friends” (p. 354). The past fifteen years of multimodal scholarship reflect a range of analytical methods used for analyzing digital products. Ajayi (2015), for example, employs Kress and Van Leeuwen’s (1996) Grammar of Visual Design to analyze Facebook postings and multimodal drawings, showing how text layouts reflect known information and new information within a composition. Kim (2018), on the other hand, draws on Kress’s (2000) multimodal discourse analysis, showing how intertextuality is reflected in multimodal designs that include multiple voices and identities. Unsworth (2008) and Smith (2018) both use content analysis to examine how modes can combine to make meaning. Honeyford (2014) similarly investigates how modes interact to make meaning but uses critical discourse analysis to explore how “linguistic-discursive textual structures are attributed a crucial function in the social production of inequality” (Blommaert, 2005, pp. 28–29). Analyses of students’ digital composing processes are less common within the research literature. However, studies of composing processes show rich opportu-

271

272

Mark B. Pacheco & Blaine E. Smith

nities for students to learn more about digital tools, such as specific composition platforms, and to orchestrate divergent semiotic resources, such as the use of multiple languages within a single composition (see Smith et al., 2021 for a review). Dagenais et al. (2017), for example, capture student digital processes through ethnographic methods, using field notes, video recordings, and artifacts of students’ use of ScribJab, a researcher-developed digital tool that supports composing in multiple languages. The authors note the importance of the composing tool in shaping this process and illustrate how aspects of the classroom context, including language norms and relations towards assessments, constrained multimodal and multilingual composing. Dagenais and colleagues describe the school’s strict policy of language separation, and how discussions about language served to alert “students to the dangers of language mixing” rather than “fostering wonder” about language (p. 274). Similarly, Deig and Pacheco (2020), in an attempt to investigate learning opportunities within composing processes, use activity theory (Engestrom, 2015) to understand the relationship between the composer, the digital tool and the composing context. The authors show the importance of tensions between different nodes of activity systems (such as the tension between the composer and the use of a digital tool) and describe them as opportunities for students to learn about science content and to use multilingual resources. One middle-grade emergent bilingual student’s composing process showed, for example, that she resolved her ambivalence towards using Spanish linguistic resources by discussing potential audience members with her teacher. While such studies of processes are scant, Smith’s (2017; Smith et al., 2017) research suggests the importance of capturing processes with the aid of multimodal timescapes and illustrates how the multimodal composing process is complex and recursive, with students traversing multiple modalities to make meaning. While timescapes show students’ digital multimodal composing processes (e.g., search for information, collaborate with peers) both within individual composing sessions as well as across multiple days, the research as a whole sheds light on the rhetorical, linguistic, and technological challenges that students face within digital composition. Lastly, researchers have investigated student and teacher perspectives on digital composing to explore the intentionality of students as sign makers, their modal preferences, and their understanding of modal affordances (Jewitt, 2008; Kress, 2010; Smith, 2017). This work also sheds light on illuminating challenges during the composing process, as well as the complexity of multimodal products. Dalton and colleagues (2015), for example, have offered a fruitful method for exploring these perspectives through retrospective design interviews. As students describe their specific design decisions connected to their digital projects, they can express

Chapter 13. Methodology and multimodality

“design intentionality and a metamodal awareness of how modes work together” (Dalton et al., 2015, p. 548). Burke and Hardware (2015) used a similar approach and describe “screenside-shadowing interviews” in which they sat next to participants and asked questions about multimodal choices within video creations. These interviews revealed student choices concerning thematic music, headings, photographic images, and character dialogue. The authors, as well as others that compare in- and out-of-school composing (e.g., Karam, 2018; Kim, 2018), note the importance of gleaning student perspectives to help understand relationships between students’ “multimodal lived experiences” and in-school practices. Despite the variety of foci on composing processes, products, and perspectives within the research, our own systematic review of literature focused on EB students and digital multimodal composing shows limited methodological variety (Pacheco et al., 2021; Smith et al., 2021). Nearly all of the research in secondary classrooms has relied on qualitative research methods, including case studies, ethnographies, and action research (see Vandommele et al., 2017 for an exception), and a few studies (e.g., Ajayi, 2015) have documented the relationship between composition processes and products (see Smith et al., 2021 for a review). We assert that how an individual accesses and shuttles resources during the composing process informs what their product might include. For teachers that seek to welcome the full range of students’ linguistic repertoires into the classroom, for example, understanding what shapes composing processes offers an opportunity to then figure out how to support student composing. We further assert that students’ perspectives can illuminate the multidimensional factors shaping their multimodal composing processes and products. In this chapter we describe our own methods for examining processes, products, and perspectives in a digital multimodal project. More specifically, we address some of the challenges that we encountered when attempting to capture and analyze these three components of multimodal composing. Below, we describe how we examined EB participation in multimodal codemeshing (Canagarajah, 2011; Lee & Handsfield, 2018; Pacheco & Smith, 2015; Smith et al., 2017), a composing practice we describe as the strategic use of divergent linguistic resources and other modalities to negotiate meaning with a reader. We then describe how we attended to processes, products, and perspectives in data collection and analysis before describing the methodological challenges we faced in our study.

Multimodal codemeshing and the “My Hero Multimodal Project” Two central theories guided our understanding of multimodal codemeshing. First, we drew upon a social semiotics lens to focus on modal affordances and

273

274

Mark B. Pacheco & Blaine E. Smith

the ways in which these modes might interact to create complex messages (Kress, 2010). We understand modal affordances as “what [in a mode] is possible to express and represent easily” (Jewitt, 2008, p. 247). A modal affordance is thus shaped by “how a mode has been used, what it has been repeatedly used to mean and do, and the social conventions that inform its use” (p. 247). Second, we employed a translingual lens to focus on the ways in which an individual strategically accesses and orchestrates varied semiotic resources from an integrated meaning-making repertoire to negotiate meaning with a real or imagined interlocutor (Canagarajah, 2012; García & Li, 2014). This lens emphasizes the ways that communicative contexts shape the ways that individuals negotiate meaning (e.g., Martinez-Roldan’s [2015] work that describes how communicative norms might privilege the use of English linguistic resources). It also frames translingual practices as strategic, where individuals envoice identities, entextualize resources, and recontextualize settings to help align understandings (Canagarajah, 2013). As such, our data collection methods sought to capture not only the resources that emergent bilingual students used to make meaning but also how the classroom context and interactions shaped the use of these resources. Similarly, as an individual’s use of resources can be both intentional and strategic, we sought to understand students’ perspectives on their composing. The classroom in which this research took place was similar to many classrooms across the United States where students bring diverse and valuable linguistic resources to their learning. Nearly ten million students in US public schools are identified as English language learners (National Center for Statistics, NCES, 2019). Approximately one-third of the students in the classroom for this study were identified as English language learners. The school used a sheltered immersion approach to support EBs, where teachers deliver instruction primarily in English but support EB content and language learning through various scaffolds, including graphic organizers, collaborative work, and heritage language supports. In this eighth-grade classroom, students composed multimodal presentations using English and were encouraged by the teacher to also incorporate their heritage languages, such as Spanish, Vietnamese, Pashto, and Bahdini (a Kurdish language). Table 1 below further describes the focal students for our study, who were selected to represent variation in heritage languages, English proficiency, technology experience, and classroom engagement. Our research centered on the “My Hero Multimodal Project”, a 4-week literature unit connected to the text The Warrior’s Heart (Greitens, 2012). The text explores issues of heroism by looking into a soldier’s experiences in the military, as well as the challenges he faced and overcame within various humanitarian efforts. In the “My Hero Multimodal Project,” students (a) read Greitens’ text over multiple weeks, (b) recorded an interview with a hero in their commu-

Chapter 13. Methodology and multimodality

Table 1. Focal student demographics Pseudonym Age

Heritage language & ethnicity

Heritage language/English proficiency

Birthplace

Valerie

 14

Pashto/Afghani

Intermediate/Advanced

USA

Sandra

 14

Viet/Vietnamese

Intermediate/Advanced

USA

Jonul

 14

Bahdini/Kurdish

Advanced/Intermediate

USA

Megan

 15

Spanish/El Salvadorian

Advanced/Beginner

El Salvador

nity using their phones, (c) created a multimodal PowerPoint that synthesized the major themes from their interview in connection to Greitens’ text, and then (d) shared these digital compositions with classmates and community members. Though students created individual compositions, they sat in groups of four and collaborated by sharing technological and linguistic expertise and reviewing one another’s work. At multiple points throughout the project, the students also participated in multimodal workshops in which they examined exemplar presentations created by the researchers and the teacher about heroes in their lives. These workshops were designed and delivered in collaboration between the teacher, Ms. Carr, and the researchers. During these sessions, students would analyze and discuss the effectiveness of different modes in the multimodal examples. They were asked to consider how specific modes – including the use of music or visuals – affected their interpretation of the digital project they examined. The teacher also showed a mentor text – in this case, an exploration of an Italian immigrant’s heroism – where she demonstrated a think-aloud about her process and modal decisions. The goal of these workshops was to provide multiple points of entry into multimodal composing and support students’ understanding of modal affordances as well as how to use the digital tool. Below, we offer a sample slide from one student’s presentation (see Figure 1). Consistent with other ethnographic work that explores digital writing (see Manchón, 2021, for a recent example) our data sources included video-recorded observations of whole-class and focused observations for each composing session, focal students’ multimodal compositions, interviews with students before the project in terms of their language and technology use, and retrospective design interviews in which students described their composing process, language use, and design decisions. The table below shows the data that we collected, how this data relates to understanding processes, products, and perspectives, and how we analyzed each data source.

275

276

Mark B. Pacheco & Blaine E. Smith

Figure 1. Sample slide from Sandra's composition Table 2. Data collection and analysis for the “My Hero Multimodal Project” Aspect of multimodal composition

Data sources

Methods of analysis

Sample research questions

Process

Video recording of small-group composing sessions; Camtasia screenrecording of individual composing; video recording of wholeclass multimodal workshops; fieldnotes

Multimodal timescapes (Smith, 2017; Smith et al., 2017) to provide insights into patterns of how students used modes while composing

How do students traverse across modes during their composing processes? How do students collaborate to support each other during composing?

Product

“My Hero” PowerPoint presentations; Camtasia screenrecordings of iterative composing

Constant comparative method (Corbin &Strauss, 2015) to derive themes related to students’ use of modes and connections to class content

How do students use modalities to make meaning within their presentations? How do students respond to themes within the literature through different modalities?

Chapter 13. Methodology and multimodality

Table 2. (continued) Aspect of multimodal composition

Data sources

Methods of analysis

Sample research questions

Perspectives

Technology use questionnaire; Home language use interview; Retrospective design interview (Dalton et al., 2015)

Phenomenological analysis (Moustakas, 1994) to understand the range of perspectives and differences in these perspectives on composing processes

How do students view their use of different modes within process and product? What aspects of composing context and student experiences shaped these perspectives?

In order to support similar investigations of multimodal composing with emergent bilingual students in school settings, we detail three methodological challenges that we encountered in our data collection and analysis. We begin by describing challenges in analyzing processes and products through the use of video recordings and individual screen recordings. We then focus on challenges in analyzing aspects of products and perspectives in retrospective design interviews. We conclude with challenges that we encountered in understanding and representing composing processes through the creation of multimodal timescapes.

Challenges in capturing composing processes A first major challenge for researchers seeking to understand digital multimodal composition is capturing what Pennycook (2017) refers to as a spatial repertoire, or the resources that emerge within a space that can facilitate the negotiation of meaning. Pennycook and Otsuji (2014) emphasize that this space is characterized by linguistic and material resources, as well as practices and social relationships that intersect with these resources. The spatial repertoire takes shape through the interaction of these elements which, in digital compositions, may include images, texts, sounds, animations, the oral and written language that emerge within composing processes, and the resources that develop through interactions, such as the shared understanding of a topic that could inform further meaning-making. The digital multimodal composition thus expands Pennycook’s notion of spatial repertoires by including both resources not noticeably present within the physical boundaries of a classroom – including interviews captured on a phone outside of school, a website, or a text to a friend – and the resources not immediately present

277

278

Mark B. Pacheco & Blaine E. Smith

within an activity, such as a student’s recognition of Spanish as a valuable linguistic resource when seeking the assistance of a classmate. However, when we began data collection for this project, we held an individual-centric view of a student’s semiotic repertoire (e.g., a student’s Spanish linguistic resources) and assumed that separate students would choose to deploy resources from their individual meaning-making repertoires. For example, when we endeavored to capture how each student used linguistic resources coded in languages other than English, we planned to triangulate these decisions based on interviews at the end of data collection. We soon discovered, however, that the decision to use – or not use – certain resources was related to what resources were available to students within the classroom context. In other words, rather than focusing on the individual student at their computer, we realized that we needed to expand our scope to capture the social nature of composing. We describe Sandra’s use of Vietnamese during composing to illustrate how resources emerged during her composing process through interaction with classmates. Sandra reported that using her home language within the classroom space “sounded funny,” and though she had interviewed her mother in Vietnamese for the project, and wished to include this interview in her composition, she initially chose to create her presentation entirely in English. If we had looked only at Sandra’s product, we would have failed to capture the complexity of her decision to include or exclude Vietnamese. By video recording the group of four students composing together, we were able to capture her conversations with Jonul, a Kurdish student, who described how his use of languages other than English “sounded cool.” In choosing to video-record the group composing collaboratively, we were able to better capture the spatial repertoire available to the students within the classroom than if we focused on a single student’s process or the students’ final compositions. Sandra chose to ultimately include recordings of her mother speaking Vietnamese, and our data show that interactions with Jonul and other multilingual classmates during composing may have supported her choice in using this resource. We also note the importance of ethnographic methods in capturing the emergence of resources within the spatial repertoire over time. Sandra included an audio-recorded snippet of her interview with her mother alongside a paraphrase into English (To me, a hero has to be smart. They has (sic) to be able to live, work, and make money on their own). Sandra’s composing shows how spatial repertoires might change over time, thus presenting a key methodological consideration for researchers attempting to capture the resources that students might – or might not – have access to in time-constrained, classroom composing. Pennycook and Otsuji (2014) emphasize how resources are shaped not only through their interaction with other resources and with individuals but through the repeated activ-

Chapter 13. Methodology and multimodality

ities that might characterize a space. We found, for example, that Vietnamese – a valuable resource for Sandra’s composing – did not emerge until her final composing sessions after repeated interactions with classmates. Thus, for researchers attempting to understand what resources students might access, it is valuable to consider the activities students might engage in over time. A second challenge in our data collection thus arose from our attempt to then capture these spatial repertoires available to composers within the classroom. The digital nature of student composing extended the physical walls of the classroom to include students’ homes and online spaces. The spatial repertoire was composed of resources that emerged within students’ multilingual interviews with heroes, as well as websites and texts with friends. While video recording students allowed us to capture their interactions with one another, it did not help us capture the digital resources that students accessed within the composing process. To address this methodological challenge, we installed screen-capture software (Camtasia) on each laptop that recorded (1) students’ voices and faces as they composed; and (2) all aspects of on-screen activity during student composing. This perspective helped us understand how students accessed new linguistic resources, such as translations of non-English phrases through Google Translate. It also provided insights into how students avoided “sounding funny” in front of their classmates by leveraging non-English linguistic resources to chat with one another through Google Chat, search for information, and compose texts within their digital compositions before converting these texts into English. Figure 2 below shows the layout of the classroom and our camera positioning. We believe that addressing these two methodological challenges was critical for accurately capturing the ways in which students were strategic in their use of semiotic resources. For example, though Megan might be able to leverage oral Spanish when composing, video recordings of the whole-group collaborative composing showed that her proximity to a Spanish-speaking classmate within the classroom possibly facilitated her ability to use written Spanish within her digital composition as she looked to this classmate to review her written texts. Similarly, though Sandra was hesitant to use oral Vietnamese in front of her classmates, the on-screen Camtasia recording showed that she was able to strategically use Vietnamese to access content and paste parts of her interview with her mother into her project. While traditional classrooms might encourage individual composing, we worked closely with the classroom teacher to facilitate collaborative composing by seating students in small groups in order that students might take advantage of the many resources within the classroom’s spatial repertoire.

279

280

Mark B. Pacheco & Blaine E. Smith

Figure 2. Classroom layout and camera positioning

Challenges in understanding relationships between products and perspectives in retrospective design interviews Another major challenge for researchers seeking to understand digital multimodal composition is capturing students’ strategic use of multiple modalities within their compositions. Though thematic analyses of student products might reveal the specific modalities used (e.g., a music clip on slide 3; the use of the bold font on slide 4), it might not capture students’ design intentions in leveraging these modalities. At the onset of our project, we reviewed the research literature on composing with multiple modalities and multiple languages, and then deductively approached our data set to infer the different communicative functions associated with students’ uses of varied modalities. Martinez’s (2010) work in researching classroom “Spanglish” use, for example, showed students meshing linguistic resources to convey nuances in meaning and to communicate with different audience members. Our prior work (Smith, 2017) also revealed that students had distinct preferences for certain modes, and deployed them for specific purposes, such as using music to communicate emotional dimensions of identity. Our own discussions during the data collection process, however, revealed differences in our interpretation of some choices and comments made by students during the composing process. For example, we initially thought that Valerie included a picture of Afghanistan because of her lack of competency with writing

Chapter 13. Methodology and multimodality

in Pashto (see Figure 3). She soon informed us, however, that this picture best captured the essence of her father’s home country. Rather than further inferring such authorial intent, we decided to collect specific data on students’ design decisions through retrospective design interviews (Dalton et al., 2015). In these interviews, we showed students their final digital compositions and asked them questions about design choices. Due to constraints on time, we did not show them multiple iterations of their compositions through Camtasia screen captures, although this approach would have yielded valuable information about the relationship between composing processes and products. Instead, we used the final compositions to ask students what modalities they chose to include on a particular slide (e.g., tell me about what you included on this slide), and then why they chose to use these modalities (e.g., why did you choose this picture?). Researchers have also included group interviews in cases where students collaboratively compose, revealing students’ design decisions and roles taken during the composing process (Jiang et al., 2020), as well as reflective surveys, revealing students’ perspectives on their language learning and engagement (Kim & Belcher, 2020).

Figure 3. Sample slide from Valerie's composition

Still, research has documented that students might not be aware of such choices (Leander & Boldt, 2013) and, in particular, of the reasons for using languages other than English in an English-centric classroom. In order to meet this challenge, we (a) asked questions about modal use and language use in our interviews; (b) showed students exemplar texts prior to the project that intention-

281

282

Mark B. Pacheco & Blaine E. Smith

ally leveraged multiple modes and languages; and (c) asked students to consider different elements outside of modal use, including potential audiences, goals for composing, and collaboration with classmates. To further strengthen the reliability of our conclusions, we both coded each student interview and discussed our findings to generate categories and themes (e.g., home language use; communication with the community, respectively) through open and axial coding (Corbin & Strauss, 2015). Along with discussing any discrepancies in our interpretations, we also conducted member checks with the students to increase the trustworthiness of our findings (Erlandson et al., 1993). We believe that including retrospective design interviews helped address two prominent challenges in our data analysis. First, it allowed us to understand student perspectives on composing by encouraging students to talk about specific design choices in their compositions. We posit, however, that the fruitfulness of these discussions might have been limited if students had not participated in analyses of exemplar texts. Similar to Unsworth and Mills’ (2020) research that shows the importance of developing metalanguage to describe design choices, we found that examining exemplar texts at the beginning of the project encouraged discussions about modal affordances and design intent. For future research, discussions within students’ home languages might further strengthen research findings, given that students may face challenges in describing design decisions if they are at the emerging stages of developing proficiency in English. Lastly, these design interviews allowed us to work towards trustworthiness by discussing design choices with students in order to triangulate our findings. When investigating the different functions associated with varied modes (e.g., using Spanish and English to engage multiple readers), we triangulated findings by analyzing student products, interviews, and composing processes.

Challenges in representing composing processes with multimodal timescapes A final major challenge that we encountered in seeking to understand digital multimodal composition was the need not only to describe the richness and variety within students’ composing processes, but to somehow quantify these processes for purposes of both assessment and instructional intervention (see also Tan et al., 2020, and Dalton, 2020, for discussions of multimodality and assessment). Research on multimodal research is often confined by print-centric publishing restraints (i.e., high fees to publish images in color and limited uses of multimedia to supplement research reports). As noted above, few studies systematically research composing through quantitative measures, and there is a significant lack

Chapter 13. Methodology and multimodality

of research on assessing multimodal composition (see Silseth & Gilje, 2019 for a recent example). We attempted to address this research gap by adapting Smith’s (2017) multimodal timescapes to include students’ use of linguistic resources (see Figure 4). These timescapes document not only the variety of actions and modalities used within student composing but the frequency with which these actions and modalities were used.

Figure 4. Multimodal timescape of composing processes

For instructional implications, the above timescape shows one student’s extensive use of heritage languages throughout informational internet search activities. This finding, when compared alongside other students’ heritage language use during the composing process, might hold implications for instructional approaches used with students with limited proficiencies in English. For research implications, this timescape might also suggest that though this student

283

284

Mark B. Pacheco & Blaine E. Smith

only included the heritage language on a few slides of their final project, it played a role in helping the student identify images, revise their text, and share their compositions with classmates. Moreover, this timescape could inform future research that might investigate the utility of certain activities, such as internet searches of non-essential or disreputable information (see Kohnen & Mertens, 2019). We note that the creation of timescapes helps position the researcher to frame the composing process as an iterative and collaborative activity. In order to capture student interactions with the screen and with one another, we analyzed both the Camtasia video recordings as well as recordings from video cameras that captured student interaction. When creating these timescapes, however, we were faced with two distinct challenges in relation to students’ language use. The first challenge arose from distinct ideologies about language use within the classroom context. We describe language ideologies as “sets of beliefs about language articulated by users as rationalization or justification of perceived language structures and use” (Silverstein, 1979, p. 193). As Kroskrity (2000) notes, “language ideologies within a context might position individuals to view an activity like code-switching as an example of inappropriate linguistic borrowing” (p. 339), rather than strategic or intentional. Based on our conversations with the teacher, and our observations of instruction both before and after students’ completion of the digital project, we noted few instances of languages other than English used in instruction. Officially, the classroom followed a sheltered English immersion model (SEI), where instruction was delivered in English to students in the process of adding English to their expanding linguistic repertoires. While non-English linguistic resources can be used as scaffolds in such an environment, students must learn both content and new language primarily through English, per state law. Unofficially, we noted instances when students talked to one another using non-English linguistic resources but did not observe students using these resources in whole-class discussions or when completing course assignments. As such, our Camtasia and small-group video recordings may not have captured the entirety of students’ language use in the composing process (see Chapter 7, this volume, for an extended discussion of screen-capture technologies). One student reported in her interview, for example, that she had chosen to record her voice in Spanish on her phone in a classroom utility closet outside of the ears of her classmates and the eyes of the camera. We assume that there were other instances of using languages other than English that were not captured by our recordings or within our field notes. Our timescapes, though extensive in detailing students’ modal choices while seated at the desk or working outside of the classroom, did not capture student discussions off-camera. One way that future research could address this challenge is by providing students with individ-

Chapter 13. Methodology and multimodality

ual microphones or through video-tracking hardware, such as SWIVL, which can audio and video record individual participants with the use of markers that the camera automatically tracks. Another methodological challenge relating to composing processes arose from the differences in language proficiencies between researchers and students. While we could understand students’ use of Spanish within composing, we were unable to understand their use of Vietnamese, Pashto, and Bahdini (a Kurdish language). As discussed in the previous paragraph, we took field notes during our observations of student composing. One benefit of using field notes is that they offer the opportunity to engage in theoretical sampling (Corbin & Strauss, 2015), where ongoing observations and analyses during classroom observations lead to new trajectories for data collection. As such, we were unable to pursue relevant trajectories in our data collection – such as asking a student about a choice of vocabulary or an interview question. We believe, however, that we were able to address this challenge through the relationships we developed with students and the teacher in the weeks prior to and during the digital project. Though we could not use Bahdini, for example, we were able to comfortably ask Jonul questions in real time about his language choices. We were able to then use his and other students’ responses to inform our field notes and future creation of the multimodal timescapes. Lastly, these timescapes do not reflect the importance of certain features within the classroom space that undoubtedly shaped composing trajectories. For example, the timescapes do not highlight moments when students interacted with one another outside of ‘product review and sharing.’ In this phase of our analysis, we made the decision to include broad-level categories for student composing processes (e.g., image search) rather than features within these categories (e.g., asking classmates for assistance when searching for an image). Future research could further explore the properties of these categories, such as the extent to which heritage languages were used when searching for information, as well as category dimensions, such as if students accessed websites in languages other than English or included heritage languages in their search terms.

Conclusions and implications In summary, this chapter shows methodological challenges in relation to capturing composing processes, products, and perspectives. First, we have described how we used video recordings to attempt to capture student use of semiotic resources during composing. We described the need to understand these resources as part of the classroom’s spatial repertoire, and the need to focus on

285

286

Mark B. Pacheco & Blaine E. Smith

individual composing through Camtasia recordings and collaborative composing through video recording. Second, we have described how we used retrospective design interviews to understand the relationship between student perspectives and digital products. We summarized the challenges that students faced in articulating their design intentions, the need for interview questions that encouraged analysis of design choices, and the requirement to develop a metalanguage for describing choices through multimodal workshops. Lastly, we have described our use of multimodal timescapes to capture students’ traversal across different modalities during composing processes. We discussed the challenges we faced in capturing student language use due to prominent language ideologies in the English-centric classroom space, as well as the challenges in understanding the content of student language when using resources coded in languages other than English. Given these challenges, we offer three avenues forward for research on multimodal and multilingual composing. First, when considering methodological approaches to capturing students’ strategic movements across modes, we recommend triangulating multiple data sources that elicit student perspectives, including post-composing session retrospective interviews, real-time questioning during the composing process, and student reflective composing journals (see Kim & Belcher, 2020, for an example). Additionally, collaborative multimodal projects provide insights into how students discuss design decisions while composing. Such work could then illuminate how this work is not only possible for emergent bilingual students, but also powerful in shaping their learning of language, content, and composing. This triangulation will be helpful in uncovering ways in which student composing is strategic and intentional. For example, a choice to use Spanish might not be a crutch that signifies a lack of English proficiency, but a purposeful choice to communicate with a family member or classmate. While extensive work documents the rich and complex products that students create, more work should examine how these products are realized through students’ purposeful and deft movement across modalities and digital tools in order to negotiate meanings with their readers. Second, when considering processes, we argue for the importance of framing multimodal composing as a practice in which individuals, tools, goals, and aspects of the classroom context (e.g., language ideologies) shape composing processes. This practice perspective suggests that researchers must then account for the ways these contextual elements intersect within multimodal compositions. This would include collecting data that illuminates the relationship between composing and the classroom context (e.g., interviews with students and teachers about multimodal composition). Graham’s (2018) writer(s)-within-community model helps draw attention to this practice perspective, where an individual’s capabilities are shaped by the physical and social dimensions of a context. This practice per-

Chapter 13. Methodology and multimodality

spective also suggests that researchers must account for the ways that composing processes extend beyond the moments when the student is seated in front of the computer. In our work, we found that multimodal workshops, for example, gave students the opportunity to develop an understanding of modal affordances which could then inform design choices during composition. We believe that creating a thick description, or a contextualized understanding of participants’ activities (Geertz, 1973), through prolonged engagement (Lincoln & Guba, 1985) in the classroom, is helpful for understanding student composing processes. Such a description helps illustrate not only the ways that students use different modalities but how aspects of the classroom and composing process inform this use. Similarly, we argue for the importance of considering composing as a social endeavour (see Bull & Anstey, 2018, for an extended discussion). Our study showed how student interactions with one another, family members, researchers, and the teacher offered new and unexpected trajectories for composing. We argue that future work might investigate how features of these interactions shape composing processes. Such features could include the ways that individuals offer different areas of expertise (e.g., knowledge of the language, technology, or content). Research could further investigate how gestures, proxemics, gaze, and other uses of the body also shape composing. Rowe et al. (2014), for example, show how gestures facilitated young students’ technical expertise when learning to use a book creator app on iPads. Currently, public educators in the United States are grappling with a host of challenges due to the COVID-19 pandemic. Though US classrooms are in the process of returning to face-to-face instruction, the past two years have highlighted the growing importance of digital technologies within schools. We join other scholars (see Chang-Bacon, 2021) in bilingual education and argue for the need to understand the ways that digital learning platforms can include – or exclude – students in the process of adding English to their expanding linguistic repertoires. While this chapter does not touch on implications of the global pandemic for future research endeavors, we believe that digital multimodal pedagogies will only continue to grow as new technologies are included in classrooms. We urge researchers to attend to processes and products but, perhaps most importantly, to the perspectives of students and teachers so that these pedagogies can be both possible and productive for emergent bilingual students.

References Ajayi, L. (2015). Critical multimodal literacy: How Nigerian female students critique texts and reconstruct unequal social structures. Journal of Literacy Research, 47(2), 216–244.

287

288

Mark B. Pacheco & Blaine E. Smith

Blommaert, J. (2005). Discourse: A critical introduction. Cambridge University Press. Bull, G., & Anstey, M. (2018). Elaborating multiliteracies through multimodal texts: Changing classroom practices and developing teacher pedagogies. Routledge. Burke, A., & Hardware, S. (2015). Honouring ESL students’ lived experiences in school learning with multiliteracies pedagogy. Language, Culture and Curriculum, 28(2), 143–157. Canagarajah, A. S. (2013). Negotiating translingual literacy: An enactment. Research in the Teaching of English, 48, 40–67. https://www.jstor.org/stable/24398646 Canagarajah, S. (2011). Codemeshing in academic writing: Identifying teachable strategies of translanguaging. The Modern Language Journal, 95(3), 401–417. Canagarajah, S. (2012). Translingual practice: Global Englishes and cosmopolitan relations. Routledge. Chang-Bacon, C. K. (2021). Generation interrupted: Rethinking “Students with Interrupted Formal Education”(SIFE) in the wake of a pandemic. Educational Researcher, 50(3), 187–196. Corbin, J. M., & Strauss, A. (2015). Basics of qualitative research: Techniques and procedures for developing grounded theory (4th ed.). Sage. Dagenais, D., Toohey, K., Bennett Fox, A., & Singh, A. (2017). Multilingual and multimodal composition at school: ScribJab in action. Language and Education, 31(3), 263–282. Dalton, B. (2020). Bringing together multimodal composition and maker education in K – 8 classrooms. Language Arts, 97(3), 159–171. Dalton, B., Robinson, K. H., Lovvorn, J. F., Smith, B. E., Alvey, T., Mo, E., Proctor, C. P. (2015). Fifth-grade students’ digital retellings and the Common Core: Modal use and design intentionality. The Elementary School Journal, 115(4), 548–569. Deig, A., & Pacheco., M. B. (2020). Exploring opportunities: Disjunctures in multimodal composing. Paper presented at the Literacy Research Association Annual Meeting. Virtual. De los Ríos, C. V. (2018). Bilingual Vine making: Problematizing oppressive discourses in a secondary Chicanx/Latinx studies course. Learning, Media and Technology, 43(4), 359–373. D’warte, J. (2014). Linguistic repertoires: Teachers and students explore their everyday language worlds. Language Arts, 91(5), 352–362. http://handle.uws.edu.au:8081/1959.7 /542591 Engeström, Y. (2015). Learning by expanding. Cambridge University Press. Erlandson, D. A., Harris, E. L., Skipper, B. L., & Allen, S. D. (1993). Doing naturalistic inquiry: A guide to methods. Sage. García, O. (2009). Bilingual education in the 21st century: A global perspective. Wiley-Blackwell. García, O., & Li, W. (2014). Translanguaging: Language, bilingualism and education. Palgrave Macmillan. Geertz, C. (1973). The interpretation of cultures. Basic Books. Goulah, J. (2017). Climate change and TESOL: Language, literacies, and the creation of ecoethical consciousness. TESOL Quarterly, 51(1), 90–114. Graham, S. (2018). A revised writer(s)-within-community model of writing. Educational Psychologist, 53(4), 258–279.

Chapter 13. Methodology and multimodality

Greitens, E. (2012). The warrior’s heart: Becoming a man of compassion and courage. Houghton Mifflin Harcourt. Halliday, M. A. K. (1978). Language as social semiotic: The social interpretation of language and meaning. Edward Arnold. Hepple, E., Sockhill, M., Tan, A., & Alford, J. (2014). Multiliteracies pedagogy: Creating claymations with adolescent, post-beginner English language learners. Journal of Adolescent & Adult Literacy, 58(3), 219–229. Honeyford, M. A. (2014). From aquí and allá: Symbolic convergence in the multimodal literacy practices of adolescent immigrant students. Journal of Literacy Research, 46(2), 194–233. Hull, G. A., & Nelson, M. (2005). Locating the semiotic power of multimodality. Written Communication, 22(2), 224–261. Jewitt, C. (2008). Multimodality and literacy in school classrooms. Review of Research in Education, 32(1), 241–267. Jiang, S., Shen, J., Smith, B. E., & Kibler, K. W. (2020). Science identity development: How multimodal composition mediates student role-taking as scientist in a media-rich learning environment. Educational Technology Research and Development, 68(6), 3187–3212. Karam, F. J. (2018). Language and identity construction: The case of a refugee digital bricoleur. Journal of Adolescent & Adult Literacy, 61(5), 511–521. Kim, S. (2018). “It was kind of a given that we were all multilingual”: Transnational youth identity work in digital translanguaging. Linguistics and Education, 43, 39–52. Kim, Y., & Belcher, D. (2020). Multimodal composing and traditional essays: Linguistic performance and learner perceptions. RELC Journal, 51(1), 86–100. Kohnen, A. M., & Mertens, G. E. (2019). “I’m always kind of double-checking”: Exploring the information-seeking identities of expert generalists. Reading Research Quarterly, 54(3), 279–297. Kress, G. (2000). Design and transformation: New theories of meaning. In B. Cope & M. Kalantzis (Eds.), Multiliteracies: Literacy learning and the design of social futures (pp. 153–161). Routledge Kress, G. (2010). Multimodality: A social semiotic approach to contemporary communication. Routledge. Kress, G., & van Leeuwen, T. (1996). Reading images: The grammar of visual design. Routledge. Kress, G., & van Leeuwen, T. (2001). Multimodal discourse: The modes and media of contemporary communication. Edward Arnold. Kroskrity, P. V. (2000). Language ideologies in the expression and representation of Arizona Tewa identity. In P. V. Kroskrity (Ed.), Regimes of language: Ideologies, polities, and identities (pp. 329–359). School of American Research Press. Leander, K., & Boldt, G. (2013). Rereading “A pedagogy of multiliteracies” bodies, texts, and emergence. Journal of Literacy Research, 45(1), 22–46. Lee, A. Y., & Handsfield, L. J. (2018). Code-meshing and writing instruction in multilingual classrooms. The Reading Teacher, 72(2), 159–168. Lincoln, Y., & Guba, E. (1985). Naturalistic inquiry. Sage.

289

290

Mark B. Pacheco & Blaine E. Smith

Manchón, R. M. (2021). The contribution of ethnographically-oriented approaches to the study of L2 writing and text production processes. In I. Guillén-Gave & A. Bocanegra-Valle (Eds.), Ethnographies of academic writing research: Theory, methods, and interpretation (pp. 84–103). John Benjamins. Martínez, R. A. (2010). Spanglish as literacy tool: Toward an understanding of the potential role of Spanish-English code-switching in the development of academic literacy. Research in the Teaching of English, 45(2), 124–149. https://www.jstor.org/stable/40997087 Martínez-Roldán, C. M. (2015). Translanguaging practices as mobilization of linguistic resources in a Spanish/English bilingual after-school program: An analysis of contradictions. International Multilingual Research Journal, 9(1), 43–58. Mills, K. A. (2007). Access to multiliteracies: A critical ethnography. Ethnography and Education, 2(3), 305–325. Moustakas, C. (1994). Phenomenological research methods. Sage. National Center for Education Statistics. (2019). English language learners in public schools. Retrieved on 27 April 2023 from https://nces.ed.gov/programs/coe/indicator_cgf.asp Pacheco, M. B., & Smith, B. E. (2015). Across languages, modes, and identities: Bilingual adolescents’ multimodal codemeshing in the literacy classroom. Bilingual Research Journal, 38(3), 292–312. Pacheco, M. B., Smith, B. E., Deig, A., & Amgott, N. (2021). Scaffolding multimodal composition with emergent bilingual students. Journal of Literacy Research, 53(2), 149–173. Pennycook, A. (2017). Translanguaging and semiotic assemblages. International Journal of Multilingualism, 14(3), 269–282. Pennycook, A., & Otsuji, E. (2014). Metrolingual multitasking and spatial repertoires: ‘Pizza mo two minutes coming’. Journal of Sociolinguistics, 18(2), 161–184. Pierson, A. E., & Grapin, S. E. (2021). A disciplinary perspective on translanguaging. Bilingual Research Journal, 44, 1–17. Rowe, D. W., Miller, M. E., & Pacheco, M. B. (2014). Preschoolers as digital designers: composing dual language eBooks using touchscreen computer tablets. In R. S. Anderson, & C. Mims (Eds.), Handbook of research on digital tools for writing instruction in K-12 settings (pp. 279–306). IGI Global. Silseth, K., & Gilje, Ø. (2019). Multimodal composition and assessment: A sociocultural perspective. Assessment in Education: Principles, Policy & Practice, 26(1), 26–42. Silverstein, M. (1979). Language structure and linguistic ideology. In R. Clyne, W. Hanks, & C. Hofbauer (Eds.), The elements: A parasession on linguistic units and levels (pp. 193–247). Chicago Linguistic Society. Smith, B. E. (2017). Composing across modes: A comparative analysis of adolescents’ multimodal composing processes. Learning, Media & Technology, 42(3), 259–278. Smith, B. E. (2018). Composing for affect, audience, and identity: Toward a multidimensional understanding of adolescents’ multimodal composing goals and designs. Written Communication, 35(2), 182–214. Smith, B. E. (2019). Mediational modalities: Adolescents collaboratively interpreting literature through digital multimodal composing. Research in the Teaching of English, 53(3), 197–222.

Chapter 13. Methodology and multimodality

Smith, B. E., Pacheco, M. B., & de Almeida, C. R. (2017). Multimodal codemeshing: Bilingual adolescents’ processes composing across modes and languages. Journal of Second Language Writing, 36, 6–22. Smith, B. E., Pacheco, M. B., & Khorosheva, M. (2021). Emergent bilingual students and digital multimodal composition: A systematic review of research in secondary classrooms. Reading Research Quarterly, 56(1), 33–52. Tan, L., & Guo, L. (2009). From print to critical multimedia literacy: One teacher’s foray into new literacies practices. Journal of Adolescent & Adult Literacy, 53(4), 315–324. Tan, L., Zammit, K., D’warte, J., & Gearside, A. (2020). Assessing multimodal literacies in practice: A critical review of its implementations in educational settings. Language and Education, 34(2), 97–114. Unsworth, L. (2008). Multiliteracies and metalanguage: Describing image/text relations as a resource for negotiating multimodal texts. In J. Coiro, M. Knobel, C. Lankshear, & D. J. Leu (Eds.), Handbook of research on new literacies (pp. 377–405). Lawrence Erlbaum Associates. Unsworth, L., & Mills, K. A. (2020). English language teaching of attitude and emotion in digital multimodal composition. Journal of Second Language Writing, 47, 100712. Vandommele, G., Van den Branden, K., Van Gorp, K., & De Maeyer, S. (2017). In-school and out-of-school multimodal writing as an L2 writing resource for beginner learners of Dutch. Journal of Second Language Writing, 36, 23–36. Ware, P. (2008). Language learners and multimedia literacy in and after school. Pedagogies, 3(1), 37–51. Wilson, A. A., Chavez, K., & Anders, P. L. (2012). “From the Koran and Family Guy”: Expressions of identity in English learners’ digital podcasts. Journal of Adolescent & Adult Literacy, 55(5), 374–384.

291

chapter 14

Setting up a coding scheme for the analysis of the dynamics of children’s engagement with written corrective feedback Triangulating data sources Yvette Coyle

University of Murcia

This chapter describes the development of a coding scheme for the analysis of young English as a foreign language learners’ engagement with model texts. After outlining the theoretical rationale underlying our analytical procedure, and the methodological problems we experienced when attempting to apply constructs developed in research with adults to a younger and less proficient group of learners, I go on to explain the multiple steps involved in our process-product analysis. Careful triangulation of different measures including the children’s written texts, handwritten notes, and transcripts of their collaborative dialogue across two multi-stage tasks, enabled us to identify a series of trajectories involving diverse combinations of noticing, strategic problem-solving, and degrees of uptake. The coding categories and methodological decisions are illustrated with examples from the children’s data. Limitations in the procedure are also highlighted.

Introduction A growing number of written corrective feedback (WCF) studies have directed their attention to learners’ cognitive engagement with feedback in order to provide a more detailed picture of why WCF may or may not be effective in particular contexts. Using feedback strategies such as reformulations and model texts, these studies (Adams, 2003; Hanaoka, 2007; Qi & Lapkin, 2001; Swain & Lapkin, 2002; Yang & Zhang, 2010) have attempted to determine how processes such as noticing and metalinguistic awareness may be critical in determining the impact of feedback on learning outcomes. Process-oriented research has implemented a multistage task research design involving the writing of an initial draft, comparison of the draft with available WCF, and a subsequent rewriting task. In situating the https://doi.org/10.1075/rmal.5.14coy © 2023 John Benjamins Publishing Company

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

analysis of learners’ cognitive processing within this sequential framework, the aim of this body of research has been to establish connections between the different stages of the writing task. This has meant identifying: (i) the linguistic problems learners experienced while writing and their outcome (solved or unsolved); (ii) the learners’ noticing (or not) of solutions to their problems in the WCF provided on their writing; and (iii) the successful incorporation of the solutions (or not) in the learners’ revised texts. Instruments used to gather data have included think-aloud protocols, note-taking, stimulated recall, or collaborative dialogue, all of which have offered a window into learners’ feedback processing at different stages of the task. Collectively, these studies afford interesting insights into adults’ noticing from different types of WCF and provide a methodological framework that can be extrapolated to different contexts and groups of learners. In this chapter I discuss key issues that emerged in our research on writing and WCF processing with young learners of English as a foreign language (EFL) in an instructed setting. Specifically, I describe the methodological challenges we faced, the decisions taken, and solutions adopted in the development of a coding scheme intended to account for the dynamics of young learners’ noticing from models carried out within the framework of the multi-stage task outlined above. These challenges included: (i) identifying the full range of problems experienced by children while writing and the strategies they used for solving them; (ii) coding the written texts of very low proficiency learners; (iii) accounting for children’s noticing from model texts; (iii) tracing children’s problem-solving trajectories across the multi-stage task; (iv) determining the language learning potential of those trajectories; and (v) verifying the connection between the trajectories and the children’s L2 writing progress. I begin by offering an overview of our research programme. I then describe the aims, theoretical rationale, and method of a recent study in which my colleagues and I triangulated data from children’s written texts, collaborative dialogue protocols, and handwritten notes to produce a coding scheme intended to describe their engagement with model texts (Coyle et al., 2018). I then describe the procedures involved in analyzing the data and provide examples of the categories used in setting up the coding scheme. Finally, I highlight the limitations of our work and suggest some implications for future research on WCF processing.

Overview of the research programme Framed with the writing-to-learn strand of second language acquisition research (Manchón, 2011), the aim of our study was to produce a systematic classification of the diverse routes or trajectories that young learners follow while writing,

293

294

Yvette Coyle

processing feedback, and rewriting their original texts. We essentially wanted to understand how the cognitive processes the children activated whilst engaged in the task might contribute to improvements in their writing and second language (L2) development. To this end, we designed a longitudinal intervention involving two multi-stage writing cycles, four months apart, following Wigglesworth and Storch’s (2012) suggestion that ”to examine the impact of feedback on language learning, whether that feedback is given to pairs or individual writers, we need to provide learners with several cycles of feedback treatments of the same type over time” (p 371). The participants in our study were eight pairs of low-proficiency children aged between 10 and 11 years old from two EFL classes in a small, rural primary school in Spain. Theoretically, our examination of children’s writing and engagement with WCF draws on socio-cognitive perspectives of L2 learning. It emphasizes the role of noticing and problem-solving in L2 learning, while simultaneously acknowledging L2 development as a social activity in which individual knowledge is coconstructed through collective reflection. In instructional contexts, WCF is seen as a valuable pedagogical tool for promoting attention to language when learners notice and act upon mismatches between their own output and the feedback. Underlying this concept is the idea that the ‘gap’ between learners’ output and WCF must be consciously attended to, so that input can be converted into intake (Schmidt, 1990, 2001). At the same time, noticing has been linked to the problemsolving activity involved in producing written output when learners search for appropriate linguistic resources to express their intended meaning (Manchón & Roca de Larios, 2007). Writing thus helps to prompt self-initiated noticing of forms that are absent from learners’ linguistic knowledge (holes), together with those they have only partially acquired (gaps). This type of self-initiated noticing is held to be important as it can prime learners to search for the missing information in subsequent feedback, thus promoting noticing of new language forms (Izumi, 2013). The problems learners experience during written production may then lead them to reassess their existing L2 knowledge through processes of hypothesis testing and metalinguistic reflection (Swain, 1995, 2000). Studies of feedback processing have shed light on the roles played by diverse feedback techniques on learners’ processing and uptake. Our previous studies with children (Cánovas et al., 2015; Coyle & Roca de Larios, 2014) contributed to the strand of research initiated by Hanaoka (2007) on the use of model texts as an example of an alternative, more discursive feedback technique than explicit error correction. The rationale for using models was based on the notion that providing learners with a complete well-written text would generate deeper reflection and discussion that might lead to more learning than simply correcting their errors explicitly. In fact, we found that models allowed children to stretch their

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

linguistic resources by incorporating new lexis and content and improving the overall structure of their writing, a phenomenon which, in line with expanded notions of acquisition as ‘gradual and nonlinear changes in linguistic and metalinguistic behavior’ (Sachs & Polio 2007, p.75), may be considered as advancing their developing competence. Methodologically, research using model texts has examined learners’ noticing using mainly quantitative measures. Studies have identified and computed the linguistic focus of learners’ problems and their relationship to changes in rewritten texts, without delving into the processes underlying these surface changes (Hanaoka, 2007; Yang & Zhang, 2010). Deeper insights into the ways in which noticing may be conducive to language learning have been reported in research on learners’ processing of reformulations (Storch & Wigglesworth, 2010; Wigglesworth & Storch, 2012) through the detailed and integrated analysis of products (written texts) and process (collaborative dialogue). We considered, therefore, that to identify possible links between learning outcomes and the nature of children’s engagement with feedback, there was a real need for our research to examine actual instances of the learners’ engagement in WCF analysis as manifested in their collaborative talk. Consequently, our data collection involved the triangulation of different sources of information including the children’s written texts, their collaborative dialogue protocols, and the handwritten notes they made while writing and analyzing feedback across both writing cycles. In what follows, I offer a critical reflection on the methodological decisions, challenges, and solutions that arose during the setting up of our coding scheme of children’s WCF processing trajectories.

Coding scheme of children’s writing and WCF trajectories Since the participants in our study were low-proficiency children, it seemed to us both theoretically and methodologically useful to develop a more inclusive coding scheme when accounting for the difficulties children faced when writing and analyzing WCF than those used in research with adults. The data analysis enabled us to identify a set of trajectories that represented the diverse options the children followed across the writing, feedback, and rewriting stages. A subsequent microanalytic examination of the textual changes triggered by the different trajectories enabled us to pinpoint those trajectories that appeared to have more or less potential for fostering improvements in L2 writing and development. To arrive at this point, we embarked on the analysis of the data in two phases: An initial within-stage analysis in which we coded each stage of the task independently, and a second across-stage analysis in which we made sequential

295

296

Yvette Coyle

connections between each of the three tasks (writing, WCF analysis and rewriting). Figure 1 below represents graphically the procedure followed. The vertical arrows represent each stage of the multi-stage task and the categories used to code the data. The horizontal arrow represents the combined across-stages analysis together with further categories that emerged during the cyclical and reiterative reading of the data. In this sense, the diagram should be read firstly vertically and then horizontally.

Uncovering children’s strategic problem-solving while writing One of the first challenges we faced was identifying and coding the type of linguistic problems children experienced while writing and rewriting, as well as the solutions they found for these problems. In stages 1 and 3, we first coded the linguistic aspects learners attended to (lexis, form or clause) using the conventional construct of the LRE (Swain & Lapkin, 2002). This unit was chosen on the grounds of its pervasiveness in existing research (e.g. Qi & Lapkin, 2001; Storch & Wigglesworth, 2010). As in Hanaoka (2007), the resolution of each LRE was coded as “solved” or “unsolved” depending on the outcome the pairs arrived at after discussing the problem. Noticing was operationalized as the children’s identification of problems when attempting to produce written output in the L2, either because they were unable to express a given idea, a morphological form or a lexical item (noticing a hole), or because they could only do so imprecisely (noticing a gap) (Izumi, 2013). However, we soon discovered that LREs (as essentially linguistic measures) failed to capture the strategic problem-solving mechanisms the children activated while writing and engaging with WCF. As a result, they told us very little about the precise cognitive operations the children engaged in to bridge the gap between what they wanted to write and what they were actually able to write, both before and after feedback provision. To overcome this limitation, we decided to pay close attention to the ways in which the children solved problems (or not) in an attempt to make their cognitive processes more visible and concrete. To do so, we drew on García Hernández et al. (2017. See also Roca de Larios et al., 2021) who had described the problem-solving searches and formulation strategies engaged in by young EFL learners when receiving reformulated feedback on their writing. From their typology, we identified four strategies used by the children in this study: syntactic search, lexical search, morphological search, and spelling search (Table 1). These search procedures revealed how our low proficiency pairs were able to pool together their L1 and L2 linguistic knowledge to provide scaffolded assistance to each other in their efforts to find a joint solution to their language-related problems. They were also useful in illustrat-

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

Journal of Second Language Writing, Cánovas Guirao, J., & Roca de Larios J. Identifying the trajectories of young EFL learners across multi-stage writing and feedback processing

297

298

Yvette Coyle

ing their strategic thinking during the writing process, which we suspected might have a priming effect on some of what was subsequently noticed in the feedback. Table 1. Problem-solving procedures at stage 1 Procedure

Description

Example

Syntactic Search

Searching for a syntactic structure via the L1

P1: Let’s see, once upon a time a scientific P2: que hace … [who does … ] P1: como ‘hacer los deberes’, do my homework, entonces sería ‘does’. Does a … ¿mezcla? [Like do homework, do my homework, then it would be “does”, does a … mixture?] P2: a potion

Lexical search

Suggesting lexical alternatives to fill a gap or linguistic problem

P2: ¿Luego o después, más tarde, dentro de un rato? [Later, after that, later on, in a little while?] P1: Espera, entonces … Nos haría falta poner “entonces” … No,. Podemos poner “after” que es después, creo que es después. P2: Sí. After … [Wait, then … We have to put “then” … No, we can put “after” that means later on, I think so … Yes, after … ] P1: The witch … P1: After, the witch …

Morphological search

Suggesting morphological alternatives both in the L1 and the L2 in order to fill a gap or linguistic problem

P2: Está cenando, ¿Cómo se escribía? [She is having dinner; how can we write it?] P1: ¿Have? P2: Have, have dinner … P1: No, espera, espera … [No, wait, wait … ] P2: Haves … ¿Haves? P1: Have Dinner

Spelling search

Suggesting spelling alternatives both in the L1 and the L2 to fill a gap or linguistic problem

P2: And dog is (They write it) atacar … ¿era con dos “t”? [Attack, was with two “t”?] P1: Sí, y también con “c” antes de la “k”, attack a cat. [Yes, and also with a “c” before the “k”, attact a cat]

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

Coding the written texts of very low proficiency learners A second challenge we faced in the development of the coding scheme was the selection of appropriate measures to analyze the children’s written output. The children’s L2 texts were characterized by inaccuracies, frequent L1 use and, at times, fragmented chunks of language. We felt that just measuring the error ratios in learners’ texts would be insufficiently sensitive to shed light on their writing, which, for the most part, was short and grammatically inaccurate and/or incomplete. So, in order to identify improvements or signs of progress in the acceptability and comprehensibility of the children’s written output from their original drafts to their rewritten texts, we opted to use Torras’s (2005) coding scheme, which had been specifically designed for the analysis of young EFL learners’ written texts (see also Coyle & Roca de Larios, 2014). Three units were identified in the data: pre-clause (a fragmented string of words whose meaning was unclear), proto-clause (grammatically inaccurate but comprehensible), and clause (comprehensible and acceptable) (see Table 2). Table 2. Types of clausal units in the children’s written texts Type

Definition

Examples

Pre-clause

Grammatically incorrect unit of language consisting of fragmented or distorted strings of words, at times incomplete, in which the meaning intention is not always apparent.

“Entonces lo convierte in bat”

Protoclause

Language unit in which the children’s meaning intention is clear but which contains grammatical inaccuracies or gaps in the clausal unit.

“The scientific it’s crazy.”

Clause

Grammatically accurate language unit which may present a slight inaccuracy in spelling, lexis, grammar or concordance.

“The scientific turns into a cat!”

While these categories proved subtle enough to capture differences in the children’s writing competence based on the degree of grammaticality (or lack thereof ) in the clausal and subclausal linguistic units, they did not provide insights into the overall accuracy and fluency of their writing. For this reason, we finally decided to use an error ratio to measure overall accuracy: [number of linguistic errors/ total number of words] × 10 (van Beuningen et al., 2012). A 10-word rather than the more common 100-word ratio was used because of the shortness of children’s texts. Following Torras et al. (2006), the total number of words written by the children was also considered to provide an idea of changes in their writing fluency. These holistic measures were intended as a complement to the qualitative analysis

299

300

Yvette Coyle

of the children’s writing, with the aim of obtaining a more comprehensive insight into the development of their L2 output over time.

Accounting for children’s noticing from model texts At the feedback analysis stage, we examined the children’s notes and dialogue transcripts to determine where and how they located their attentional processes. Noticing was operationalized as the identification (fully or partially) of a mismatch between the model and their own output (Izumi, 2013). Using previous classifications as a point of departure (Qi & Lapkin, 2001; Sachs & Polio, 2007), we attempted to describe the ways in which the children’s noticing was apparent in their collaborative talk. This led to the identification of four strategies: (i) spotting the difference; (ii) translation; (iii) filling the hole; and (iv) metalinguistic reasoning (Table 3). These strategies were data-driven categories which emerged from careful and reiterative analysis of the children’s dialogue protocols and notes. Spot the difference involved the superficial detection (via reading aloud or naming of a linguistic feature) of similarities and differences between their own texts and the model, but without any indication that the WCF had actually been understood. Translation (into the L1) and filling the hole (finding a solution in the model to a linguistic problem experienced while writing their drafts) indicated some degree of comprehension on the part of the learners. Metalinguistic reasoning, in contrast, was taken as evidence of comprehended input (Gass, 1997) and of higher levels of awareness than the previous strategies, in accordance with the notion of substantive noticing (Qi & Lapkin, 2001). Following Gass (1997), we assumed that feedback that was not comprehended was less likely to be recalled and incorporated into their revised texts. Table 3. Noticing strategies at stage 2

Spot the differences

Definition

Written text

Model

Children’s dialogues

Noticing a linguistic aspect (lexis, form or clause) that differs or coincides with the original text, without further analysis or discussion.

No related output

She has an idea and uses her magic with the cat.

P2: She has an idea and uses her magic with the cat (reading). P1: We didn’t put anything about that.

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

Table 3. (continued) Definition

Written text

Model

Children’s dialogues

Translation

Noticing by translating a linguistic feature (lexis, form or clause) from the model into L1.

No related output

At night

P2: At night … Night means noche, no? P1: At night P2: Por la noche

Filling the hole

Noticing by finding the solution to a hole produced while writing the original text (lexis, form or clause).

Su juice

Her juice

P1: We did not put “her” … That was what we were looking for! Her … P2: We put “su” and they put “her”

Metalinguistic reasoning

Noticing by reasoning about the language in the model and in their original text (lexis, form or clause).

He drink

He drinks

P2: He drinks, we wrote it. P1: No, they put “drinks” and we put “drink”, we forgot the – s third person

Tracing children’s problem-solving trajectories across the multi-stage task The second phase of the data analysis was carried out transversally across the three stages of the multi-stage task. The procedure involved further refining the coding categories (language-related problems and their solutions) so as to establish connections between (i) each problematic episode identified at stage 1; (ii) the potential solutions or alternatives offered by the model text; (iii) the strategic noticing (or not) of these solutions at stage 2; and (iv) the changes made to the revised texts at stage 3. (see Table 4).

301

302

Yvette Coyle

Table 4. Coding categories across the three stages of the writing and WCF task Initial writing Stage 1 problems

Feedback comparison Stage 2

Rewriting Stage 3 outcome

Model

Noticing

Not applicable No initial written output

Not applicable Unsolvable

Not applicable Not available for noticing

Not incorporated Solutions or alternatives from the model are not incorporated

Unsolved LRE is incorrectly solved

Solvable Solution provided for the problem

Noticed Solution to a problem is noticed

Partially incorporated Solutions or alternatives from the model are partially incorporated, either incorrectly or with slight inaccuracies.

Unreported Written output is not considered problematic even when inaccurate

Partially solvable Partial solution provided for the problem

Partially noticed Part of a solution to a problem is noticed

Incorporated Solutions or alternatives from the model are incorporated correctly

Solved LRE is correctly solved

Alternative New ideas and content included

New input noticed New language or content is noticed

Original output deleted Problematic element from stage 1 is deleted

Unreported noticing No evidence of noticing

Original output repeated Original stage 1 output is repeated Solved without the model Solution to a problem is found without using the model (eg recalling EFL classes, class textbook) Partially solved without the model Partial solution found without the model Addition of new content New ideational content is included in stage 3

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

To begin with, each problematic episode identified (or not) by the children was meticulously traced from its origin in their initial drafts through the feedback analysis stage and into the final written output. This was achieved by engaging in a combined process-product analysis taking each of the six frames of the picture story writing prompt as a point of reference. Additional categories were established to describe the entire range of possible options available to the children at the different stages of the writing task. Some of these categories had appeared in previous research (solved and unsolved problems), and others emerged from our own data once we connected the children’s writing with the model (see Table 4). At the writing stage, these included children’s unreported problems (inaccuracies in their original texts that they did not consider problematic at the time) and instances when there was no related output in the initial texts, but which could be linked to the noticing of new content from the model. In the WCF analysis stage, we considered firstly whether the model offered solutions, partial solutions, or no solutions to all the children’s problems (those they reported themselves and those not initially noticed) as well as any new or alternative content and language, and, secondly, whether the children noticed, partially noticed or failed to notice these solutions or alternatives when available. At the rewriting stage, identifying whether the features noticed in the model had made their way into the children’s revised texts was done by using a broad array of descriptors, which ranged from no change in their texts to the partial or full incorporation of some aspect of the feedback, as well as repetitions, deletions and the addition of new content. We also accounted for incorporations in the final texts which were not associated with the model. As seen in Table 4, this led to a variety of possible combinations across each stage of the task. In practice, it meant that any of the options from stage 1 (problems) could be combined horizontally with any of the possibilities at stage 2 (solutions offered by the model and degree of noticing involved) and again with any of the options available in stage 3 (rewriting). As a result, our coding scheme embodied a series of different developmental paths or trajectories for each languagerelated problem in the children’s initial texts and their eventual outcome. For instance, an unsolved problem at stage 1, whose solution in the model was noticed, could be followed in the rewriting stage by the deletion or repetition of the original output, or a partial or full incorporation of the solution (see Coyle et al., 2018, for the full classification of trajectories). Our decision to focus comprehensively on all the options available to the children throughout the task broadened the scope of inquiry of previous research on models, which had focused exclusively on learners’ unsolved problems (Hanaoka, 2007). This had two advantages. First, by including the children’s solved problems in our analysis, we were able to (i) determine whether they then noticed similar L2

303

304

Yvette Coyle

items in the model that might allow them to confirm their initial hypotheses, and (ii) check whether they maintained the solutions they found themselves in their final texts, thus obtaining evidence of their knowledge consolidation (Bitchener & Storch, 2016). Second, including unreported problems in our analysis enabled us to establish whether the model helped the children identify solutions for problems they were not initially aware of. At the same time, our expanded operationalization of noticing, which included both partially noticed or unreported noticing from the model, together with our comprehensive coding of uptake (from WCF) and all other changes in the children’s texts, went beyond the dichotomous conceptualizations of noticing and uptake employed in earlier research (Adams, 2003; Hanaoka, 2007; Qi & Lapkin, 2001; Swain & Lapkin, 2002), which had coded both as ‘all or nothing’ categories. This decision was important in enabling us to account for incomplete and delayed noticing and incipient signs of L2 development in this group of young learners

Determining the language learning potential of the trajectories Having identified the interrelated trajectories, and in line with the theoretical view that linguistic processing might impact favourably on learning outcomes (Manchón, 2011), we then classified the trajectories according to their language learning potential. For coding purposes, two parameters were taken into consideration: i) the degree of noticing the learners engaged in while analysing the WCF (no evidence of noticing, partial or complete noticing); and ii) the impact of this noticing on their revised texts (not incorporated, partially or fully incorporated; repetitions, deletions, additions). The first of these methodological decisions was theoretically grounded in Bitchener and Storch’s (2016) account of the levels of conscious attention involved in the processing of a feedback episode. At the lowest level of processing, this ranges from learners’ predisposition to attend to input, through to the cognitive registration of input for processing purposes, to their (partial) understanding of the nature of noticed gaps via processes of hypothesis formation and testing. We considered that it was along this continuum, from the detection of linguistic features in the WCF to the comprehension of L2 form-meaning-function relationships, that the different degrees of noticing embodied in the trajectories might create opportunities for L2 learning. Secondly, we classified the trajectories according to the impact they made on children’s L2 development, following Bitchener’s (2016) definition of development as “the processes or stages involved in developing knowledge of the L2” (p. 113). Consequently, we envisaged development not in a general sense as the overall process of language learning, nor in a specific sense as the final product of the

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

acquisition process, but as the minor, ongoing and unstable progress made by the children in the direction of the L2. Additionally, every change in the children’s rewritten texts was further coded as an improvement, a partial improvement, or a drawback (see Table 5). This decision to code the children’s texts for drawbacks as well as improvements was motivated by Kluger and De Nisi’s (1996) observation that feedback studies in general tend to ignore negative effects on performance. We considered that incorporating this idea into our research design would deepen our understanding of the potential of the trajectories to impact children’s L2 development, as any signs of progress would have to be weighed against the shortcomings in their writing. However, coding the children’s written production as an improvement sometimes proved difficult. Taking two examples from Table 5, we coded as an improvement the deletion in the pair’s final text of the sentence “And se asusta a cat” on the grounds that L1 lexis had been eliminated, albeit at the cost of ideational content. Likewise, the sentence “And loves the white bat” that the pair changed in their rewritten text to “And loves with the white bat” was coded as a partial improvement despite being inaccurate, since it showed the children’s active attempt at incorporating a sentence they had noticed in the model “The black bat wakes up and falls in love with the white bat”. Drawbacks involving new inaccuracies or the deletion of accurate ideational content proved easier to identify. Table 5. Coding of changes made to the revised texts Types of change

Stage 3 outcome

Original text

Revised text

Description

Improvement Changes that improve the accuracy of the original clause

Incorporation

The dog jump the cat

The dog attacks the cat

Better choice of verb and 3rd person ‘s’ is used correctly

Deletion of faulty output

And se asusta a cat (the cat is frightened)

No text

L1 deletion

Partial improvement: A change that improves the original clause but is not fully correct.

Partial Incorporation

And loves the white bat

And loves with the white bat

Closer to ‘falls in love with’ in the model

Incorporation

Entonces the witch look su juice

Then the witch look her juice

Incorporation of connector and possessive adjective

Drawback: A change that

Deletion of accurate content

She eats

She eat

The third person singular

305

306

Yvette Coyle

Table 5. (continued) Types of change downgrades the accuracy or quality of the original clause

Stage 3 outcome

Original text

Revised text

Description morpheme – s is missing.

The scientific head bum, bum (explodes)

No text

Ideational content is lost

This combined analysis of noticing and its effects on the children’s writing led to the ranking of the trajectories along a continuum according to whether they were perceived as having more or less potential for L2 learning (Figure 2).

Figure 2. Criteria Used to Define the Language learning Potential of the Trajectories (Reprinted from Journal of Second Language Writing, Vol 42, Coyle, Y. Cánovas Guirao, J., & Roca de Larios J. Identifying the trajectories of young EFL learners across multistage writing and feedback processing tasks with model texts, pp. 25–43, 2018, with permission from Elsevier).

The trajectories considered to embody more language learning potential included: (i) those in which the children successfully identified gaps between their own problematic output and the model or noticed new linguistic or ideational content that allowed them to expand their existing knowledge; and (ii)

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

those in which the result of noticing impacted positively on their rewritten texts in terms of improvements in the accuracy, acceptability and comprehensibility of their written output. Conversely, trajectories considered as having less languagelearning potential had a negligible impact on the learners’ final texts and included those in which the children discarded or repeated their original output when (i) they failed to notice or only partially noticed solutions to unsolved or unreported problems in the feedback; (ii) no solution or only partial solutions were available in the model for different problems; and (iii) when new input in the model and solutions to problems were noticed but not incorporated. The integrated analysis of the different data sources, and the valuable information provided by the children’s dialogic interactions and written texts, meant that the identification of the criteria used to classify the trajectories proved to be relatively straightforward. The entire procedure was facilitated by our previous within-stage analyses and the synthesis of all the evidence obtained from the multiple analytical categories we had already employed, including language-related problems, problem-solving and noticing strategies, and clausal analysis. When it came to determining the L2 learning potential of the strategies, it was simply a question of bringing together the insights already provided by these categories separately. To illustrate the decision-making process underlying the trajectories, I will now present two examples of trajectories, one with more and another with less L2 learning potential. Table 6, for instance, illustrates a trajectory with more language learning potential. At stage 1, the solution for an unsolved problem expressed as a protoclause (The dog jump the cat) was noticed in the model (and attacks the cat) and understood, as the pair filled a hole in their lexical knowledge, having searched previously for “fight”. This noticed solution was later incorporated correctly into the pair’s rewritten text. Table 6. Example of a trajectory with more language learning potential An unsolved problem has a solution in the model which is noticed and incorporated Proto-clause: The dog jump the cat Clause: Finally, the dog attacks the cat Stage 1 L-LRE Lexical search (for fight) Unsolved: jump

P1: Y ahora, ¿cómo es pelean? [And now, how do you say they fight?] P2: Salta sobre él. [He jumps on him] P1: The dog jump the cat

Stage 2 Partially solvable: attacks

P2: and attacks the cat, no. Nosotras no escribimos eso. [We did not write that]

307

308

Yvette Coyle

Table 6. (continued) An unsolved problem has a solution in the model which is noticed and incorporated Noticed Comprehended Input Filling a hole Stage 3 L-LRE: Direct production Incorporated: attacks

P2: The dog attacks the cat (They write it).

Table 7 illustrates a trajectory with less language learning potential. Here, the pair began their narrative by writing “bat” without reporting a problem, despite the absence of the indefinite article. At stage 2, the problem was solvable in the model but unnoticed by the pair, who did not comment, underline or make a note of it, paying attention instead to the adjective “white”. At stage 3, they again failed to notice the problem and repeated their original faulty output. Table 7. Example of a trajectory with less language learning potential An unreported problem has a solution in the model which is unnoticed and followed by the repetition of the original faulty output Proto-clause: Bat sleeping Proto-Clause: Bat in the sleeping Stage 1 Bat (with no article) Unreported problem

P2: Bat sleeping … (They write it).

Stage 2 Solvable: a bat Unnoticed

P1: white … subráyalo [Underline it] Underlined: a white bat

Stage 3 Bat (with no article) Unreported problem

P2: Bat in the sleeping (They write it).

I now turn to the final step in the coding process, which involved verifying whether our theoretical predictions regarding the potential of the trajectories for promoting L2 learning coincided with textual evidence of L2 development in the children’s writing. The procedure we followed is outlined below.

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

Verifying the connection between the trajectories and L2 writing development Taking as our point of departure the clausal units described above (see Table 2), we engaged in a micro analysis of all the clausal unit transitions between the pairs’ initial stage 1 texts and their rewritten texts at stage 3. This involved identifying every minor linguistic change detected in each clausal unit from the original to the revised texts. Three types of clausal transitions were considered: i.

Transitions between different clausal units (Pre-clause > Proto-clause, Protoclause > Clause, Pre-clause > Clause) ii. Transitions between similar clausal units (Pre-clause > Pre-clause, Protoclause > Proto-clause, Clause > Clause) iii. Transitions involving the addition (X > Pre-clause, X > Proto-clause, X > Clause) or deletion of clausal units (Pre-clause > X, Proto-clause > X, Clause > X). Within each of the three transitional patterns, the trajectories were now linked to the improvements, partial improvements and drawbacks they had articulated from the initial draft to the final text. The micro analytic procedure we used (Table 8) was as follows. Each numbered trajectory (e.g., T5d), previously hypothesized as either belonging to those with more or less potential, was linked to its corresponding linguistic item in the children’s texts. Every trajectory led to one of the three clausal transition types outlined above (e.g., Pre-clause > Clause), which, in turn, were considered as either a full improvement, a partial improvement, or a drawback. By computing the ratio of the textual outcomes associated with the use of the different trajectories across the multi-stage task, we were able to corroborate their potential (or not) for promoting L2 learning. Table 8 presents an example of this combined process-product analysis with two trajectories considered to have more potential for enhancing learning. In the first case (on the left of the table), unable to recall the L2 word ‘then’ in their original text, pair 1 wrote the L1 version ‘entonces’. This unsolved lexical problem, which was solvable from the model was noticed and incorporated, corresponded to a trajectory (T5d) we had coded for greater learning potential. Likewise, the pair’s initial use of the possessive adjective ‘su’ was replaced by ‘her juice’ after noticing and incorporating the solution from the model, again via the same trajectory. Both changes led to a clausal transition involving a pre-clause to a clause and were thus counted as having improved the original output.

309

310

Yvette Coyle

Table 8. Example of trajectories and associated changes in written output across clausal transitions Combined process-product analysis More L2 learning potential

Less L2 learning potential

Pair 1 – 2 x T5d, PRE ➔ CLA – Entonces ➔ Then – Su juice ➔ her juice

Pair 1 – 1x T3b, PRE ➔ PRE (the juice of the witch ➔ drawback (missed opportunity to incorporate from the model “the witch’s orange juice”)

Total trajectories: 1 Ratio IM/ PIM/ DR: 2/0/0

Total trajectories: 2 Ratio IM/PIM/DR: 0/0/1

Code: 2 x = Number of trajectories in the transition; Pre = pre-clause; Pro = Proto-clause; Cla = Clause; T = trajectory; 3b = number and outcome of the trajectory (e.g., repetition, deletion, etc.); IM = improvement; DR = drawback; PIM = partial improvements

In contrast, in the second case (on the right) the same pair failed to notice the expression ‘the witch’s orange juice’ in the model to replace their incorrect version ’the juice of the witch’. This trajectory led to the repetition of their original output and the maintenance of a pre-clause. The outcome of the trajectory (T3b), which entailed a drawback in the rewritten text, confirmed it as having less language learning potential. As a result of this exhaustive analysis connecting trajectories, clausal transitions and their positive or negative outcomes, we were able to identify which trajectories generated improvements that better supported learners’ progression to writing more acceptable and comprehensible clausal units, thus improving the overall quality of their written texts.

Conclusion The coding scheme described in this chapter represents a systematic and comprehensive classification of the diverse trajectories followed by young low-proficiency L2 learners when engaging in a writing, feedback processing and rewriting task. It operationalizes within a single construct a wider range of interrelated, problemsolving behaviours, degrees of noticing and uptake possibilities than previously acknowledged in existing process-oriented research with models. The trajectories are distinguishable in terms of their potential for contributing to language learning, taking as their defining criteria learners’ noticing processes in combination with the impact of that noticing on the quality of subsequent written output. In this respect, they afford new evidence as to why some young learners move suc-

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

cessfully from written output through to input processing and on to producing better rewritten output whereas others do not. All research, however, has its limitations and the analytical procedure presented here is no exception. Regarding the coding scheme itself, the wide range of possibilities available to the children at the different stages of the task translated into a similarly large number of trajectories, which proved difficult to systematize. From the 24 trajectories initially identified in the data, we were able to categorize them into a more convenient number by grouping together all the possible outcomes in the children’s writing. Even so, each of the final set of 14 trajectories includes up to four different outcomes labelled from (a) to (d) as in “T5 An unsolved problem with a solution in the model, which is noticed, is followed by (a) the repetition of the original output, (b) the deletion of the original output, (c) the partial incorporation of the solution, or (d) the full incorporation of the solution”. Referring to individual trajectories thus requires the use of number and letter codes (e.g., T5d in Table 8 above) which, in our experience, can make them difficult to recall. A further limitation concerns the fact that these trajectories were identified using model texts as a WCF technique. This means that their definition and number will necessarily vary with alternative types of feedback. Reformulation, for example, which involves the rewriting of the learners’ original text, necessarily offers solutions to learners’ problems, so coding whether a problem is solvable or not, as we did with models, becomes unnecessary. As for error correction and metalinguistic explanation, the trajectories would have to describe comprehension rather than noticing, given the explicitness of both feedback types. Indirect WCF (e.g., underlining or error codes) highlights the location and/or type of error but does not provide an immediately identifiable solution, and none of the above-mentioned WCF techniques offer new ideational content. Since the identification of trajectories is dependent on the type of feedback provided, the categories and definitions included in our coding scheme in its present form would need to be partially modified in future studies to better reflect the characteristics and outcomes of specific feedback techniques. Furthermore, although we are confident in the textual evidence that confirmed our theoretical prediction with regard to the language-learning potential of the trajectories with this group of young learners, the exploratory and qualitative nature of our research means that any claims about the impact of the trajectories on L2 development in other contexts cannot be overstated. More research with larger groups of learners, from different contexts and with different writing genres and feedback techniques would be recommendable.

311

312

Yvette Coyle

As for the data collection procedures, the triangulation of the collaborative dialogue protocols together with the children’s written notes proved advantageous as they enabled us to shed light on the problem-solving strategies learners activated while writing, and the noticing strategies they employed while processing feedback. They also offered preliminary insights into the depth of processing the children engaged in, which was essentially low level. However, it is also true that the children often failed to explicitly report what they had noticed in the feedback, which suggests that pair talk may need to be complemented by additional measures such as stimulated recall, questionnaires or interviews to fully account for learners’ noticing processes. Not having included these elicitation techniques in our research, we have no firm answers, for example, as to why some learners successfully incorporated language from the model they had apparently not noticed. Of course, it is inevitable that individual learner factors (motivation, affective disposition, developmental readiness, etc.) and pair dynamics (e.g., degrees of equality and mutuality) will have impacted on the children’s engagement with WCF. Accordingly, further efforts using more personalized measures are needed to provide deeper insights into why and how learners allocate their attentional resources differently. A further limitation concerns the selection of suitable instruments to examine children’s writing. Since the texts they produced were grammatically inaccurate and, at times, fragmented and hard to understand, standard measures of structural complexity (Norris & Ortega, 2009) or lexical diversity (Guiraud, 1954, in van Beuningen et al., 2012) were not considered suitable for this group of young learners. The clausal unit measure we adopted (Torras, 2005) proved valuable for describing the children’s writing in terms of its acceptability and comprehensibility, and gave a good idea of minor gains across texts, but it was perhaps too crude to yield more detailed information. For this reason, the descriptive specification of improvements, partial improvements and drawbacks was deemed necessary. Even so, although all the data was coded collaboratively and reiteratively, it was sometimes difficult to determine whether a change identified in a pair’s written output actually constituted an improvement or not. This occurred especially when children deleted their original output, which meant that the outcome of some trajectories could lead to improvements (when L1 use was deleted) as well as drawbacks (when ideational content was eliminated). In the future, more stringent measures of inter-rater reliability could be put in place to confirm difficult methodological decisions. In conclusion, the methodological initiative described above has theoretical and methodological implications for research on WCF processing. These include reconceptualizing existing analytical categories (LREs) to account for the strategic problem-solving nature of L2 writing (García Hernández et al., 2017; Roca

Chapter 14. Setting up a coding scheme to analyze children’s engagement with WCF

de Larios et al., 2021), providing evidence of these strategies as used by young learners and strengthening the links between writing, feedback processing and the writing of new output to avoid offering partial or incomplete accounts of their effects on L2 learning. The trajectories we identified have proven useful in doing this.

Funding The study reported in this chapter is part of a wider research programme financed by the Spanish Ministry of Science and Innovation (Research grant PID2019-104353GB-100).

References Adams, R. (2003). L2 output, reformulation and noticing: Implications for IL development. Language Teaching Research, 7(3), 347–376. Bitchener, J. (2016). To what extent has the published written CF research aided our understanding of its potential for L2 development. International Journal of Applied Linguistics, 167(2), 111–131. Bitchener, J., & Storch, N. (2016). Written corrective feedback for L2 development. Multilingual Matters. Cánovas Guirao, J., Roca de Larios, J., & Coyle, Y. (2015). The use of models as a written feedback technique with young EFL learners. System, 52(1), 63–77. Coyle, Y., & Roca de Larios, J. (2014). Exploring the role played by error correction and models on children’s reported noticing and output production in an L2 writing task. Studies in Second Language Acquisition, 36(3), 451–485. Coyle, Y., Cánovas Guirao, J., & Roca de Larios, J. (2018). Identifying the trajectories of young EFL learners across multi-stage writing and feedback processing tasks with model texts. Journal of Second Language Writing, 42, 25–43. García Hernández, F. J., Roca de Larios, J., & Coyle, Y. (2017). Exploring the effect of reformulation on the problem-solving strategies of young EFL writers. In M. P. García Mayo (Ed.), Learning foreign languages in primary school: Research insights (pp. 193–222). Multilingual Matters. Gass, S. (1997). Input, interaction, and the second language learner. Lawrence Erlbaum Associates. Hanaoka, O. (2007). Output, noticing, and learning: An investigation into the role of spontaneous attention to form in a four-stage writing task. Language Teaching Research, 11(4), 459–479. Izumi, S. (2013). Noticing and L2 development: Theoretical, empirical, and pedagogical issues. In J. M. Bergleithner, S. N. Frota, & J. Kei (Eds.), Noticing and second language acquisition: Studies in honor of Richard Schmidt (pp. 25–38). National Foreign Language Resource Center.

313

314

Yvette Coyle

Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254–284. https://www.jstor.org/stable/20182507. Manchón, R. M. (2011). Writing to learn the language: Issues in theory and research. In R. M. Manchón (Ed.), Learning to write and writing to learn in an additional language (pp. 61- 82). John Benjamins. Manchón, R. M., & Roca de Larios, J. (2007). On the temporal nature of planning in L1 and L2 composing: A study of foreign language writers. Language Learning, 57 (4), 549–593. Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. Qi, D. S., & Lapkin, S. (2001). Exploring the role of noticing in a three-stage second language writing task. Journal of Second Language Writing, 10, 277–303. Roca de Larios, J., García Hernández, F. J. & Coyle. Y. (2021). A theoretically-grounded classification of EFL children’s formulation strategies in collaborative writing. Language Teaching for Young Learners, 3(2), 300–336. Sachs, R., & Polio, C. (2007). Learners’ uses of two types of written feedback on an L2 writing revision task. Studies in Second Language Acquisition, 29(1), 67–100. Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129–158. Schmidt, R. W. (2001). Attention. In P. Robinson (Ed.), Cognition and second language learning (pp. 3–32). Cambridge University Press. Storch, N., & Wigglesworth, G. (2010). Learners’ processing, uptake, and retention of corrective feedback on writing: Case studies. Studies in Second Language Acquisition, 32(2), 303–334. Swain, M. (1995). Three functions of output in second language learning. In G. Cook & B. Seidlhofer (Eds.), Principles and practice in applied linguistics (pp. 125–144). Oxford University Press. Swain, M., & Lapkin, S. (2002). Talking it through: Two French immersion learners’ responses to reformulation. International Journal of Educational Research, 37, 285–304. Torras, R. (2005). Procesos psicolingüísticos implicados en la adquisición del inglés en el contexto de la enseñanza primaria [Psycholinguistic processes involved in learning English in a primary school context]. Lenguaje y Textos, 23, 89–112. Retrieved on 28 April 2023 from http://hdl.handle.net/2183/8240 Torras, R., Navés, T., Celaya, M., & Pérez-Vidal, C. (2006). Age and IL development in writing. In C. Muñoz (Ed.), Age and the rate of foreign language learning (pp. 156–182). Multilingual Matters. Van Beuningen, C. G., de Jong, N. H., & Kuiken, F. (2012). Evidence on the effectiveness of comprehensive error correction in second language writing. Language Learning, 62(1), 1–41. Wigglesworth, G., & Storch, N. (2012). Feedback and writing development through collaboration: A socio-cultural approach. In R. Manchón (Ed.), L2 writing development: Multiple perspectives (pp. 69–97). Mouton de Gruyter. Yang, L., & Zhang, L. (2010). Exploring the role of reformulations and a model text in EFL students’ written performance. Language Teaching Research, 14(4), 464–484.

chapter 15

Methodological considerations in the analysis of synchronous and asynchronous written corrective feedback The affordances of online technologies Natsuko Shintani & Scott Aubrey

Kansai University | The Chinese University of Hong Kong

Providing written corrective feedback in computer-mediated communication (CMC) environments has increasingly attracted the interest of both researchers and practitioners. In this chapter we reflect on our study, Shintani & Aubrey (2016), which examined the comparative effects of synchronous and asynchronous written corrective feedback on the accurate production of target grammatical features in a guided writing task. The methodological challenges we experienced related to (a) operationalizing synchronous and asynchronous written corrective feedback; (b) designing treatment materials and procedures; (c) testing; and (d) analyzing the data. In each decision, we tried to find a balance between experimental control and ecological validity. This chapter not only provides a window into how we overcame these challenges but also gives suggestions for research methodologies that can be used in future studies to explore the provision of written corrective feedback through online technologies.

Introduction This chapter provides a critical reflection on key issues in data collection and data analysis for Shintani and Aubrey (2016). This classroom-based study aimed to investigate (a) the effect of synchronous written corrective feedback (SCF); (b) the effect of asynchronous written corrective feedback (ACF); and (c) the different effects of SCF and ACF on grammatical accuracy in a new writing task. We begin the chapter by discussing feedback provided within digital environments and contrasting SCF to other forms of feedback. We then provide a theoretical rationale for providing SCF during writing tasks, drawing on both cognitive and sociocultural perspectives. Next, we offer a detailed reflection on methodological https://doi.org/10.1075/rmal.5.15shi © 2023 John Benjamins Publishing Company

316

Natsuko Shintani & Scott Aubrey

decisions faced in designing and conducting the study, highlighting challenges, such as operationalizing SCF, designing suitable writing tasks that would provide a context for SCF, choosing a target language structure, and coding learners’ production of such structure to show its gradual learning as a result of the feedback received. Finally, we will conclude with methodological implications for examining SCF in digital environments.

Overview of the research in focus: Rationale, aims and methods Electronic writing, or writing that takes place in a digital environment, has become ubiquitous in second language (L2) classrooms, giving learners a greater capacity for individual participation and interactivity. In recent years, researchers have made use of computer-mediated communication (CMC) technology to explore collaborative writing where learners communicate via online editing platforms (e.g., Wikis, Google Docs) to produce joint texts (e.g., Kessler et al., 2012; Dao et al., 2021) as well as to analyze how learners can produce a spontaneous exchange of information through text chat (Adams et al., 2015). CMC technology has also expanded the ways in which teachers can deliver written corrective feedback (WCF) to learners, allowing them to provide WCF synchronously and asynchronously. Figure 1 shows the various types and examples that are possible in a CMC environment (Aubrey & Shintani, 2021).

Figure 1. Types of CMC feedback. Source: Aubrey & Shintani (2021)

While SCF is provided in an interactive manner when both the teacher and the students are online (e.g., via video chat, text chat), ACF does not allow for real-time interaction (e.g., email exchanges). For example, Ene and Upton (2014)

Chapter 15. Methodological considerations in the analysis of written corrective feedback

provided ACF in an EAP program on students’ writing drafts using Microsoft Word revision comments and track changes functions, which were then emailed to students who in turn used the feedback to make revisions. In an example of SCF, Odo and Yi (2014) used voice chat software to provide interactive feedback to students on their academic writing. The benefit of using voice chat to provide SCF was that learners were able to make clarification requests regarding the intention of the feedback, teachers could give follow-up commentary, and learners could make use of multimodality (e.g., taking notes while listening). However, both of these kinds of WCF (ACF using MS Word; SCF using voice chat) were provided some time after students had composed their texts, thus – similar to traditional pen-and-paper WCF – they constitute cases of delayed feedback. Less examined is the use of CMC technology to provide immediate SCF, that is, the provision of WCF to students as they write. This is made possible using online simultaneous editing programs, such as Google Docs, which allow for a document to be shared with multiple users who can then edit and comment in real time. Synchronous teacher-student interaction to support L2 writing in a CMC environment was initially documented in early exploratory studies (Aubrey, 2014; Kim, 2010). These studies served to highlight the role of the teacher in the digital classroom and the practical possibilities of implementing immediate SCF in a writing course, including how teachers could monitor learners as they write, with the intention of intervening at opportune moments to provide feedback. However, these studies stopped short of investigating its efficacy as a feedback tool in terms of potential L2 acquisition. Investigating immediate SCF is of both practical importance for language pedagogy and theoretical value for second language acquisition (SLA). In terms of the former, it is important to explore the range of tools afforded to teachers so that they can more fully support learners during the writing process. With the use of online editing programs, teachers have the option of engaging with students as they write, thus taking a more active role in the process of writing than they otherwise could. Additionally, there are potential logistical advantages of immediate SCF, such as the ability to monitor several students at once and to deliver feedback in a contingently responsive manner (i.e., when learners need it). Investigating SCF was thus, in part, an attempt to develop a principled approach for teachers who wish to use CMC technology to engage with students while they are writing. The theoretical motivation for our study is related to the issue of timing in WCF. We understood that the cognitive processes of learners who are responding to (immediate) SCF and (delayed) ACF may be different. As SCF is provided before learners finish their compositions, they are expected to revise while writing, and thus prompted to invoke a set of cognitive processes which are different

317

318

Natsuko Shintani & Scott Aubrey

in nature from those involved in self-motivated revision, such as (a) noticing the feedback, (b) detecting differences between the correction and the original piece of writing, (c) deciding to revise or not, and (d) executing the revision. Additionally, learners are thought to use SCF to engage in self-monitoring by consulting the already provided feedback, and possibly making both planning and translating processes more efficient. As SCF is an interactive process, whereby the feedback-revision process occurs in a continuous manner throughout the writing task, providing SCF may also be seen as analogous to oral CF (e.g., recasts; see Lyster & Ranta, 1997; Nabei & Swain, 2002) in that both occur in exchanges where the message is at stake, and thus facilitate form-function mapping (Long, 2007). Furthermore, the permanent record of SCF may assist learners’ noticing processes and prompt uptake with repair (Lyster & Ranta, 1997), which is usually considered to be an important indication of L2 knowledge consolidation (Doughty, 2001; Long, 2007). Finally, providing SCF on formal issues (e.g., morphology, specific syntactic structures, spelling) may lessen learners’ cognitive burden by freeing them from the need to constantly evaluate the accuracy of their texts, and allow them as a result to attend to planning and translating their messages. In contrast, as ACF is necessarily provided after a piece of writing has been completed, planning, translating, and monitoring processes (during the composition) occur separately from text revision (after the composition). ACF requires learners to undertake a ‘revision task’ separate from the ‘composition task’ where attention may be shifted back to the content of the text once feedback has been provided and after the composition task has finished. ACF can be interactive but only in multi-draft writing, in which learners might attend to a new draft with alternative ideas. Thus, ACF arguably involves simpler cognitive processes (i.e., understanding the feedback and executing the revision), which may be less burdensome on working memory. Seen from a sociocultural perspective, ACF and SCF are different forms of WCF that facilitate learners’ cognitive development through the social interaction between teacher and student they prompt (Storch, 2018) and the collaborative and dialogic context they create (Aljaafreh & Lantolf, 1994). Yet, the timing of SCF may be more optimal for teachers to scaffold L2 writers’ gradual development while learners are drafting their compositions. With these assumptions in mind, our study (Shintani & Aubrey, 2016) adopted a quasi-experimental design with the aim of investigating the relative effects of SCF and ACF on grammar acquisition. The study took place at a private university in Japan and participants were recruited from three intact classes of an elective intensive English course. The 76 students from these classes that agreed to participate in this study were all 2nd-year students, majoring in Sociology, Business, Economics, and Theology. They had obtained a score range of 460–500 on

Chapter 15. Methodological considerations in the analysis of written corrective feedback

the Test of English for International Communication (TOEIC), placing them at an intermediate proficiency level. In each of the three classes, the participants were randomly assigned to three groups according to the feedback they would receive: an SCF group, an ACF group, and a comparison group that did not receive any corrective feedback on their writing. Students completed two timedwriting tasks using the online editing program Google Docs, through which corrective feedback was provided to the treatment groups. Accurate use of a target feature was measured by a set of three text reconstruction tasks conducted as pre-, immediate post-, and delayed post-tests. Arriving at the final design of the study involved several decisions: (a) operationalizing SCF and ACF; (b) designing treatment materials and procedures; (c) testing; and (d) analyzing the data. The next section will focus specifically on methodological considerations, the challenges experienced, and the solutions we adopted.

Methodological decisions, challenges, and solutions Operationalization of SCF and ACF From the outset, we decided that SCF was to be provided after learners had made an error but before the timed writing task had finished (i.e., while writing). ACF, on the other hand, was to be provided only after the writing task had finished (i.e., after writing). Implementing SCF and ACF in this way, a precursor study (Shintani, 2016) that compared eight learners’ behaviours when receiving SCF and ACF suggested that those who received ACF tended to attend to the feedback without engaging in planning or translating new ideas. Thus, these operational definitions were assumed to trigger different cognitive processes (namely, planning, translating, revision, and monitoring), which we hypothesized would have a differential effect on learners’ L2 acquisition. In other words, the distinguishing feature of SCF was that text revision (prompted by feedback) should take place during writing when other processes were taking place (i.e., planning and translating). However, further decisions needed to be made in terms of how feedback was to be delivered; specifically, we needed to consider feedback scope (i.e., focused or unfocused) and type (i.e., direct or indirect). Regarding feedback scope, we were aware that research on WCF has extensively examined focused CF (see Kang & Han, 2015) while a relatively small amount of research has investigated unfocused CF (e.g., Ellis et al., 2008; Truscott & Hsu, 2008; van Beuningen et al., 2008), with the difference possibly being due to teachers’ beliefs and preferences (Ene & Upton, 2014; Mao & Crosthwaite, 2019). Despite this research gap, we opted

319

320

Natsuko Shintani & Scott Aubrey

for focused CF (directing feedback at a single grammatical feature) primarily for methodological reasons. This decision allowed us to control for the number of corrections on the writing and enabled us to measure learners’ accurate use of a single targeted feature, thereby making it possible to ascertain gains in accuracy after the treatment (Bitchener & Ferris, 2012). We also believed that focused WCF would be more effective for our participants. Previous studies have indicated that focused WCF is preferable for restructuring learners’ L2 knowledge as they are more likely to notice and understand corrections when they repeatedly receive them on the same error (Bitchener, 2008; Sheen, 2007; Shintani & Ellis, 2013). To employ focused WCF, we needed to select one target structure – a grammatical structure students would receive feedback on. We wanted the target structure to be (1) sufficiency complex so that we could observe participants’ (Japanese university students at an intermediate proficiency level, see above) gradual development of grammatical knowledge, and (2) reasonably difficult for them to produce it accurately. The hypothetical conditional (e.g., If I had bought the ticket, I would have gone to the concert) was chosen as the target structure, as it satisfied these criteria. First, the structure is syntactically and semantically complex. It consists of two clauses (the if-clause and the main clause) and several grammatical features (past perfect, modals, past tense), which potentially makes incremental scoring of component aspects possible. Second, previous studies have shown that Japanese university students (similar to the participants in this study) tend to have difficulties when using this structure (Izumi et al., 1999; Shintani et al., 2014). The next decision was whether to provide direct or indirect WCF. Direct WCF provides students with the correct form and, as a result, gives them explicit guidance on how to revise their errors. Indirect WCF, on the other hand, only indicates that the student has made an error without correcting it. While indirect WCF allows teachers to provide opportunities for guided learning and problemsolving, it may not suit learners with limited linguistic knowledge because they cannot make the correction if the structure is unfamiliar (Cerezo et al., 2019). For us, the argument for using direct CF was strengthened when considering the complex structure of the hypothetical conditional. We felt it was likely that, if we simply indicated the existence of an error, intermediate proficiency learners would not understand the nature of that error and that, if they did, they might not have the linguistic knowledge to revise it. Furthermore, on a practical level, the problem-solving work required to revise errors identified by means of indirect feedback would be so time-consuming that it might not be suitable for the kind of time-constrained writing tasks the study would employ (see Treatment materials).

Chapter 15. Methodological considerations in the analysis of written corrective feedback

Eliciting the target grammatical feature Decisions had to be made regarding, first, the writing task which would provide the context for the WCF treatments, and second, the amount of WCF treatment that could reasonably be expected to cause an effect. The writing task for the treatment had to be designed to engender opportunities for students to produce hypothetical conditional sentences. We initially intended to use freewriting activities, which would only provide a topic, allowing participants to decide the content and language freely. This would, after all, provide a more authentic context for learners to demonstrate their use of language. However, freewriting activities are problematic in controlling for variables such as opportunities for using the target structure. Participants could, for example, interpret the topic differently resulting in no attempted use of the target structure. As Manchón (2014) points out, in such writing tasks the task environment is likely to change, “with resulting modifications of the representation of the task by the writer” (p. 34). Thus, we opted for a writing activity that: a) would elicit the target structure with similar frequency; and b) would be manageable for the participants of the study in terms of time constraints and difficulty. We then had to deal with how the treatment task could reliably elicit the target structure. In the past, many WCF studies had examined English definite/indefinite articles (e.g., Sheen, 2007), which are easily elicited in picture description tasks. However, eliciting the hypothetical conditional structure is not easy because the structure only occurs in a specific situation (i.e., when one reflects on the hypothetical [non-]occurrence of an event in the past and its consequences). To address this, we designed a treatment writing task in which the parameters of the task were constrained by prompting learners to first reflect on specific past events and then to write on what would have happened if those events had not occurred. By creating a writing task around such a prompt, use of the hypothetical conditional was not essential but useful (Loschky & Bley-Vroman, 1993). To control for the frequency of feedback, we fixed the number of past events that students reflected on at five, which would elicit five potential target structures for feedback. We also included two treatment sessions, which necessitated the design of two separate writing tasks. These two treatment tasks, we believed, would provide sufficient feedback to restructure learners’ knowledge of the target structure. They would also overcome the criticism that many WCF studies involve only oneshot treatments (Kan & Hang 2015). For each treatment task, we created a familiar topic for the participants: Treatment task 1: ‘past events that changed your life’, and Treatment task 2: ‘past events that influenced the history of Japan’. As for making the task manageable in terms of time constraints and difficulty, a concern was that the treatment task might be too challenging for students to

321

322

Natsuko Shintani & Scott Aubrey

complete within the lesson period. To overcome this issue, we employed a twostep activity: (1) a pre-task in which participants wrote on a pre-task sheet five sentences describing the past events that actually occurred; and (2) a main task in which learners referred to their pre-task sheet and wrote an essay arguing what would have happened if the events had not occurred. The instructions for the pre-task (i.e., “Write five sentences in English describing five events that had changed your life”) and main task (i.e., “Write an essay in English about what would have happened if the events you listed had not occurred”) were presented to students in their L1 (Japanese) so that learners could not borrow language from the instructions that may have helped them to use the hypothetical conditional in their writing. The pre-task preparation phase was completed as homework where the participants were not restricted by time. The main task was completed in the classroom under a time limit, where the researchers would provide the WCF treatment. This controlled the occurrences of the target structure in the task and thus the opportunities to receive feedback on them. Through piloting the treatment task with learners of similar proficiency, we decided on a time limit of 20 minutes.

Designing the treatment procedures To implement the treatment, we needed to decide the timing of SCF and ACF – that is, how much time should be given between when the learners attempted the target structure and when SCF was provided (i.e., the immediate aspect of SCF), and how much time should be given between when the learners finished their composition and when ACF was provided (i.e., the delayed aspect of ACF). There were no WCF studies that had operationalized these constructs in the past, so our decisions were based on experience and practical considerations (see Appendix for the instructions). As this was a classroom-based study, a practical constraint was that we needed to provide SCF to multiple learners (almost) simultaneously. It was therefore not possible to consistently deliver feedback at precise time intervals from when learners attempted the target structure – a procedure that might have been feasible in a laboratory study with individual participants. The extent to which we could provide ‘immediate’ SCF was influenced by (1) the fluency of learners’ writing, and (2) the number of compositions that the researcher had to monitor at once. Based on our prior experience conducting activities involving SCF with similar learners (e.g., Aubrey, 2014), we surmised that SCF could be provided to participants before they completed the sentence which contained the error while monitoring up to seven compositions. However, taking into account possible unforeseen classroom issues, we conservatively operationalized ‘immediate’

Chapter 15. Methodological considerations in the analysis of written corrective feedback

SCF as corrective feedback that was provided before the learner had completed the writing task – that is, SCF occurred anytime between the moment when the learner completed the sentence in which the error was made and the moment when the learner completed the writing task. Additionally, we also asked learners who received SCF during the treatment task to revise their writing as soon as they noticed the feedback. This ensured that the writing processes of text revision, planning, and translation (Kellogg, 1996) occurred during the task. Regarding ACF, we decided to provide ACF on the same day participants completed their writing to control the possible confounding variables (e.g., opportunities for the learners to self-study between the task completion and receiving feedback). Piloting this procedure beforehand, we allotted 10 minutes to provide the ACF immediately after students completed the writing task and 5 minutes for learners to examine their feedback while making the appropriate revisions to their original texts. Although 5 minutes is a relatively short time to revise their writing, we believed it to be sufficient, given that (1) direct corrective feedback was provided, which is more straightforward to revise than indirect corrective feedback; and (2) in most cases, feedback was only provided to up to five sentences that included the target feature. Further consideration was given to learners’ time on task. The experiment involved two treatment groups (i.e., SCF and ACF). However, the SCF group’s task demands were higher than those of the ACF group as they had the additional burden of making revisions based on the feedback received during their completion of the task. We, therefore, provided an additional time allowance of 5 minutes for learners in the SCF group, which was equivalent to the amount of time the ACF group was given to examine their feedback and make revisions. Figure 2 shows the time limits for each of the three groups. ACF

SCF

Comparison

Writing Task without feedback (20 min)

Writing Task receiving feedback (25 min)

Writing Task without any feedback (20 min)

Teacher provides feedback on their writing (10 min) Participants receive feedback and revise their text (5 min)

Figure 2. Time allocation for the treatment sessions

323

324

Natsuko Shintani & Scott Aubrey

Random assignment We made three randomly formed subgroups (ACF, SCF, Comparison) in each of the three intact classes (22 to 24 students; see Figure 3), a decision taken to control for possible proficiency differences among the three classes and to make the provision of SCF manageable (the SCF subgroups comprised 7–9 students). To prevent learners in different subgroups from influencing each other, all treatments were done in a computer room and the seating was arranged so that the three groups were seated separately. Task procedures were explained individually to the students on the computer screen. Intact classes Subgroups

Class 1 (22) ACF SCF Comp. (n = 7) (n = 8) (n = 7)

Class 2 (22) ACF SCF (n = 7) (n = 8)

Comp. (n = 7)

Class 3 (24) ACF (n = 7)

SCF (n = 9)

Comp. (n = 8)

Figure 3. Grouping for treatment sessions

Dealing with on-site problems during data collection The online editing platform, Google Docs, was used to perform the treatment tasks. This decision was based on the researchers’ experiences using the program and previous accounts that described the simultaneous editing function as a means to monitor and interact with learners’ writing in real-time (Aubrey, 2014; Kim, 2010), an essential feature for delivering SCF. A number of challenges arose when preparing to use this software. First, it was essential that all participants had a Google account so documents could be shared with the researchers. Second, to conduct the treatment sessions, we needed to decide how we shared the co-edited documents. There were two options for this purpose: either (1) each learner opened a new document on Google Docs and shared it with the researcher using the researcher’s email account; or (2) the researcher created documents for all participants and shared them individually with each learner. The first option involved a simpler and less time-consuming procedure than the second but posed a crucial problem: if learners created their own documents, the automatic review function in Google Docs would remain activated. This was undesirable as this function suggests ways to resolve grammatical errors. The second option was therefore chosen: We created documents for all participants in advance, with the automatic review function disabled, and shared them with participants at the beginning of the treatment sessions.

Chapter 15. Methodological considerations in the analysis of written corrective feedback

Providing SCF and ACF with Google Docs also required us to consider how two researchers could effectively provide different feedback to multiple learners (see Figure 3) in the same classroom. It was decided that, during the treatment sessions, each researcher would give feedback to a group of 7 to 9 learners under time constraints, with one researcher providing SCF and the other providing ACF. Preparing for this involved opening the 7–9 pre-shared documents in separate tabs on the researchers’ computers. The procedure for SCF included monitoring learners’ writing by scrolling through the tabs and looking for opportune moments to deliver feedback on learners’ attempts at using the hypothetical conditional. As shown in Figure 4, the SCF was delivered using the ‘Comment’ function in Google Docs.

Figure 4. Example of immediate synchronous CF using Google Docs

For the ACF, the researcher told learners to sign out of their Google Accounts immediately after the 20-minute task before providing the feedback. ACF was then provided via the ‘Comment’ function during a 10-minute period after learners had completed the task.

Assessing learners’ grammatical knowledge To measure the learners’ ability to use the target structure (i.e., the hypothetical conditional), a text reconstruction (TR) task was employed, whereby learners listened to a story while taking notes, and reconstructed the story in writing. The task allowed us to elicit the target structure and control the number of occurrences of the target structure by embedding a fixed number of target structures in the story. A benefit of this task was that asking the test-takers to “reconstruct” the story reduced the possibility that learners would avoid using the target structure. Although this was not a ‘free’ writing task, it was still meaning-based as the learners needed to process the whole story to reconstruct it. In order to prevent the participants from focusing entirely on the language form or memorizing exact sentences, input was only provided orally, and learners were required to fill in a worksheet, which was used to reconstruct the story in a text. Three stories of approximately 200 words were created for three parallel TR tasks to use as pre-,

325

326

Natsuko Shintani & Scott Aubrey

post-, and delayed-post-tests. Each TR task included five sentences that expressed the target structure. To control for the familiarity and difficulty of the three stories, the TR tasks were counterbalanced, with the three stories (A, B, and C) being used in a different order for the pre-, post-, and delayed tests, as shown in Figure 5. Class 1 (22) ACF

SCF

Comp.

Class 2 (22) ACF

SCF Comp.

Class 3 (24) ACF

SCF Comp.

Pretest

TR Task A

TR Task C

TR Task B

Immediate posttest

TR Task B

TR Task A

TR Task C

Delayed posttest

TR Task C

TR Task B

TR Task A

Note. Comp. = Comparison Group; TR Task = Text Reconstruction Task.

Figure 5. The text reconstruction tasks in the three classes (N)

Analyzing writing outcomes In analysing the data, we felt it was important to examine the process and the product of the treatment. To conduct the process analysis, we examined the participants’ attempts to produce the target structure and their responses to the feedback they received. A within-group analysis was thus conducted to investigate their production of the target feature during the treatment tasks, the frequency of feedback received, and the changes in accuracy during the two repeated writing tasks. More precisely, all writing tasks were analyzed in terms of: 1. The number of hypothetical conditional sentences attempted. 2. The number of sentences that included errors in the hypothetical conditional. 3. A score indicating the degree to which hypothetical conditional sentences were accurate, which was determined by examining the if-clause features (perfect aspect, past tense, past participial form) and the main clause (modal in the past tense, the perfect aspect, modal form, and the past participial). 4. The frequency of feedback received and the number of revisions/instances of uptake (treatment tasks only). Each measurement required several decisions to be made. To determine the number of attempts of the target structure (1), we needed to define ‘attempt’. For the TR tasks, this was rather straightforward. We considered an ‘attempt’ of the hypothetical conditional as any sentence (completed or not completed) that reconstructed the hypothetical situation embedded in the model story. For the treatment task, however, we could not adopt this definition as writing was not based on a reconstruction. Upon examining the learners’ actual writing, we

Chapter 15. Methodological considerations in the analysis of written corrective feedback

noticed that participants began sentences with “if ” to answer the task prompt (i.e., “What do you think would have happened if the event had not occurred?”). We, therefore, set the criterion for attempt as “any sentence beginning with ‘if ’”. To identify the number of inaccurate sentences (2) and determine the degree to which they were inaccurate (3), we needed to consider how the accuracy of the hypothetical conditional sentences should be scored. In previous studies, grammatical accuracy has often been scored as correct or incorrect (see Kang & Han, 2015). However, our experience as teachers suggested that students make a variety of errors when using complex structures. Considering that writing the accurate form of the hypothetical conditional requires the accurate production of multiple elements (i.e., accurate forms for the perfect aspect and past tense) (Izumi et al., 1999; Shintani et al., 2014), we created the scoring criteria in Table 1 based on identifiable features of the structure. In order to validate this scheme, we scored a random selection of sentences from the pre-test by the comparison group. As the scoring scheme differentiated the accuracy of sentences produced by different learners (see Table 2, also Shintani et al., 2014), which matched with our experience, we adapted this for our study. Table 1. Accuracy points criteria Clause

Criteria Features

Components

Point

if-clause (maximum 2 points)

 1

the perfect aspect

have (aux) + verb

 1.0

 2

the past tense

Had

 0.5

 3

the past participle form

correct PP

 0.5

 4

the modal in the past tense

past modal

 1.0

 5

the perfect aspect

have (aux) + verb

 1.0

 6

the modal form

correct form of have (aux)

 0.5

 7

the past participle form

correct form of PP

 0.5

Total possible

 5

main clause (maximum 3 points)

327

328

Natsuko Shintani & Scott Aubrey

Table 2. Scoring examples (extracted from texts provided in the comparison group) Sentence

Criteria

Total

1

2

3

4

5

6

7

If he marry to her, they will live in a big house.

 0





 0

 0





 0

If he married her, he could meet a lot of famous people.

 0





1.0

 0





1.0

If I have married to her, I would enjoy fantastic meals every night.

1.0

 0

0.5

1.0

 0

 0

0

2.5

If he had married her, he would have live in big house.

1.0 0.5

0.5

1.0

1.0

0.5

0

4.5

Analyzing writing processes Finally, to observe the learning process (4), we analyzed the participants’ writing during the treatment sessions using the ‘History’ function of Google Docs. This function allowed us to retroactively see the specific WCF that was provided, as well as learners’ responses to the WCF, in each document. We then coded the data in terms of a) frequency of feedback, b) learners’ corrections of their errors based on feedback, and c) their uptake (i.e., successful corrections). In the course of the analysis, the aggregate accuracy score for the five treatment sessions did not tell us whether the SCF learners could produce the correct form as a result of receiving the moment-to-moment error corrections. Thus, we calculated the accuracy separately for each of the five hypothetical conditional sentences in the treatment tasks. By doing so, we were able to see that the learners in the SCF group reduced their errors on the hypothetical conditional as they wrote (see Figure 6).

Figure 6. Number of target structure errors in each attempt in treatment 1 and treatment 2

Chapter 15. Methodological considerations in the analysis of written corrective feedback

In an attempt to obtain further information about why the learners in the SCF group reduced errors while working on the treatment tasks, we took a qualitative approach by examining actual writing processes, which uncovered the interactive scaffolding that occurred around repeated attempts at producing the target structure. Data on writing processes (including the error, feedback, and revision attempt) were retrieved using the ‘History’ function of Google Docs. As shown in Table 3, learners were not able to provide the correct form immediately. Rather, they made errors in different elements of the target structure (such as in the tense or in the perfect aspect) and gradually improved their accuracy as they received feedback and produced the same target form in subsequent sentences. Table 3. Example of the process of writing, feedback, and revision in SCF condition Sentence Student’s writing/ revision

 1

Researcher’s feedback Highlighted (underlined part)

Comment box

If I did not joined a soccer club, …

had

…I would not have a best friends.

have had

If I had give up playing soccer, …

given

…I would not have got a confident.

gotten

If I did not joined a soccer club, I would not have a best friends.

If I didhad not joined a soccer club,

I would not have had a best friends.  2

If I had give up playing soccer, I would not have got a confident.

If I had given up playing soccer,

I would not have gotten a confident.  3

If I had not experienced it, I would not get never give up mind and another confident.

329

330

Natsuko Shintani & Scott Aubrey

Table 3. (continued) Sentence Student’s writing/ revision

Researcher’s feedback Highlighted (underlined part)

Comment box

If I had not experienced it, I would not get never give up mind and another confident.

have given

If I had not experienced it, I would not have get nevergiven up mind and another confident.

To investigate the effects of the treatment, we analyzed the test data in terms of the number and accuracy of the attempts. Following the coding criteria shown in Table 1, all test data were scored and subjected to between-group analyses to determine the effects of SCF and ACF on learners’ accurate use of the target feature in the tests. Repeated measures ANOVAs (three groups x three tests) were conducted to test for significant mean differences in accuracy scores (see Table 2) between the pre-test, post-test, and delayed post-test. These tests, together with follow-up pairwise within-group comparisons, revealed that: a. the comparison group did not show any significant change across the three tests, b. both the ACF and SCF group significantly increased their accuracy scores from pre-test to post-test and from pre-test to delayed post-test, and c. the SCF group had significantly higher accuracy scores than the comparison group, while the ACF group’s scores did not significantly differ from the comparison group’s scores. Our analysis thus provided insight into the learning process (what happened during the treatment) as well as the learning outcome (changes in accuracy use on new pieces of writing), allowing us to conclude that SCF had robust positive learning effects compared to ACF.

Methodological conclusions and implications for future studies The study analyzed in this chapter adopted a quasi-experimental approach and used intact classes. As such, the process of designing the study reminded us that “classroom research virtually never meets all the requirements for experimental research in the true sense” (DeKeyser & Prieto Botana, 2019, p. 2). Each of the

Chapter 15. Methodological considerations in the analysis of written corrective feedback

decisions made in that process was an attempt at finding a balance between experimental control and ecological validity (Spada, 2019). In many cases, we prioritized the exclusion of possible conflating factors over maintaining internal validity (i.e., what teachers would actually do in real-life practice). The ACF group, for example, received feedback 10 minutes after completing their writing, which, we admit, is not common in real language classes. The learners in the ACF group were also only given five minutes to revise the corrected text, which is unusual in a writing class. To bring the ACF procedures more in line with actual classroom practice, future research could extend the time spent providing ACF and the time given to students for revising their texts. Furthermore, we adopted TR tasks, rather than freewriting activities, to prioritize the elicitation of the target structure in writing. At the same time, the classroom setting brought with it certain practical constraints that would not be present in an equivalent laboratory study. Some of these constraints included the imprecise timing of SCF (due to groups of students) and the use of relatively short timed-writing tasks (due to limited lesson time). Though we felt our decisions allowed us to show conclusive results based on the learners available to us in this particular setting, the limitations of this study enable us to point to possibilities of further investigations involving different methodological decisions. As this was an exploratory study, many of the decisions we made should be further explored. For example, providing SCF with slower timing (e.g., after completing each paragraph) or using different types of feedback might lead to different outcomes. Indirect WCF, or a combination of both direct and indirect WCF, would provide gradual support for writers, which might be suitable for SCF in classroom practice. It would also be worthwhile to investigate whether SCF is equally effective with simpler grammatical structures (see Spada & Tomita, 2010), although some of these, such as the indefinite article, are not rule-based and their learning may, in principle, be less likely to be influenced by repeated written corrective feedback. Also, the current study only used one measurement tool for examining learners’ ability to produce the target structure, the TR tasks. By including different types of tests, such as a freewriting task and a discrete point grammar test, future studies could provide more robust evidence of learning. Finally, learner variables including computer literacy and attitudes towards new technology could also be investigated as mediating factors (Ayers, 2010). Importantly, we also believe that future research should investigate teachers’ and learners’ perceptions of SCF. The pilot case study (Shintani, 2016) suggested that learners were appreciative of receiving SCF but also that they could feel distracted by SCF as it forced their attention to revise, likely at the expense of other writing processes, such as planning. While teachers might see SCF as a useful tool to provide support for writing, some of them might also question the pedagog-

331

332

Natsuko Shintani & Scott Aubrey

ical value of providing feedback while writing. If this is the case, SCF might be better utilized when learners signal that they have a problem (e.g., a message via the comment box), which can alert teachers to respond with feedback at moments when students desire help. The current study points to the usefulness of online editing tools in eliciting the process features of L2 writing. A methodological advantage of using online synchronous tools, such as Google docs, is that it allows researchers to easily video-record the writing process, which shows moment-to-moment writing behaviours. The video data can also be used as a stimulus in think-aloud protocols to obtain information on learners’ decision-making processes while writing (Shintani, 2016; also see Gass & Mackey, 2000). Making use of these technologies for data collection purposes may provide a window into the dynamic nature of the writing process. The study has also provided pedagogical implications which may motivate future teacher-practitioner research. Our experience indicates that SCF, provided using Google Docs (and other similar computer platforms), is feasible as a pedagogical tool in the L2 writing classroom. First, we showed that one teacher can provide SCF to between seven and nine individual documents almost simultaneously, which would not be possible in a face-to-face classroom setting. Although providing individual SCF in this manner to regular-sized classes may not be manageable, SCF would be quite feasible during small-group collaborative writing activities. In Aubrey (2014), for example, a teacher provided SCF to five documents shared with five groups (22 students in total) who were each completing a writing task. Aubrey reported that the teacher managed this successfully and students responded favorably to the feedback. By putting learners into groups to perform collaborative writing tasks, it may also be feasible for teachers to provide synchronous feedback on ideational problems. Second, this study suggests that SCF is effective on a relatively complex grammatical feature (i.e., the hypothetical conditional) for EFL learners. The applicability of SCF should be further explored in terms of various types of learners (e.g., proficiency levels, and age groups), classrooms (e.g., class size), tasks (e.g., picture description), and language features (e.g., grammar, vocabulary, and rhetorical features). There is thus potential to enhance the learning and teaching experience by exploring the use of SCF in various classroom activities. This reflection has provided an ‘inside look’ into the methodological challenges experienced and the solutions adopted in Shintani and Aubrey (2016). The strength of this research is that it has operationalized an innovative type of feedback (immediate SCF) and explored its effect on learners’ accurate production of a target structure through an examination of both process and product measures (see Chapter 1 for a framework on writing processes, and Chapter 14 for a WCF

Chapter 15. Methodological considerations in the analysis of written corrective feedback

study integrating the analysis of product and process dimensions). It was inspired by the pedagogical possibility for teachers to engage with learners’ writing using online editing tools as well as the potential for immediate SCF to shed light on the underlying mechanisms of language acquisition through writing and receiving feedback.

References Adams, R., Alwi, N. A. N. M., & Newton, J. (2015). Task complexity effects on the complexity and accuracy of writing via text chat. Journal of Second Language Writing, 29, 64–81. Aljaafreh, A. L. I., & Lantolf, J. P. (1994). Negative feedback as regulation and second language learning in the zone of proximal development. The Modern Language Journal, 78(4), 465–483. Aubrey, S. (2014). Students’ attitudes towards the use of an online editing program in an EAP course. Annual Research Review, 17, 45–57. Retrieved on 28 April 2023 from http://hdl .handle.net/10236/14771 Aubrey, S., & Shintani, N. (2021). L2 writing and language learning in electronic environments. In R. M. Manchón & C. Polio (Eds.), Handbook of second language acquisition and writing (pp. 282–296). Routledge. Ayers, R. (2010). Learner attitudes towards the use of CALL. Computer Assisted Language Learning, 15(3), 241–249. Bitchener, J. (2008). Evidence in support of written corrective feedback. Journal of Second Language Writing, 17(2), 102–118. Bitchener, J., & Ferris, D. R. (2012). Written corrective feedback in second language acquisition and writing. Routledge. Cerezo, L., Manchón, R. M., & Nicolás-Conesa, F. (2019). What do learners notice while processing written corrective feedback? A look at depth of processing via written languaging. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 173–187). Routledge. Dao, P., Nguyen, M., Duong, P., & Tran-Thanh, V. (2021). Learners’ engagement in L2 computer-mediated interaction: Chat mode, interlocutor familiarity, and text quality. The Modern Language Journal, 105(4), 767–791. DeKeyser, R. & Prieto-Botana, G. (2019). Current research on instructed second language learning: A birds’s eye view. In R. DeKeyser & G. Prieto-Botana (Eds.), Doing SLA research with implications for the classroom. Reconciling methodological demands and pedagogical applicability (pp. 1–7). John Benjamins. Doughty, C. (2001). Cognitive underpinnings of focus on form. In P. Robinson (Ed.), Cognition and second language instruction (pp. 1–69). Cambridge University Press. Ellis, R., Sheen, Y., Murakami, M., & Takashima, H. (2008). The effects of focused and unfocused written corrective feedback in an English as a foreign language context. System, 36(3), 353–371. Ene, E., & Upton, T. A. (2014). Learner uptake of teacher electronic feedback in ESL composition. System, 46, 80–95.

333

334

Natsuko Shintani & Scott Aubrey

Gass, S. M., & Mackey, A. (2000). Stimulated recall methodology in second language research. Lawrence Erlbaum Associates. Izumi, S., Bigelow, M., Fujiwara, M., & Fearnow, S. (1999). Testing the output hypothesis: Effects of output on noticing and second language acquisition. Studies in Second Language Acquisition, 21, 421–452. https://www.jstor.org/stable/44486913. Kang, E., & Han, Z. (2015). The efficacy of written corrective feedback in improving L2 written accuracy: A meta-analysis. The Modern Language Journal, 99(1), 1–18. Kellogg, R. T. (1996). A model of working memory in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing: theories, methods, individual differences, and applications (pp. 57–72). Lawrence Erlbaum Associates. Kessler, G., Bikowski, D., & Boggs, J. (2012). Collaborative writing among second language learners in academic web-based projects. Language Learning & Technology, 16(1), 91–109. https://hdl.handle.net/10125/44276 Kim, S. (2010). Revising the revision process with Google Docs. In S. Kasten (Ed.), TESOL classroom practice series. Effective second language writing (pp. 171–177). TESOL. Long, M. H. (2007). Problems in SLA. Lawrence Erlbaum Associates. Loschky, L., & Bley-Vroman, R. (1993). Grammar and task-based methodology. In G. Crookes & S. Gass (Eds.), Tasks and language learning: Integrating theory and practice (pp. 123–167). Multilingual Matters. Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake. Studies in Second Language Acquisition, 19(1), 37–66. Manchón, R. M. (2014). The internal dimension of tasks: The interaction between task factors and learner factors in bringing about learning through writing. In H. Byrnes & R. Manchón (Eds.), Task-based language learning. Insights from and for L2 writing (pp. 27–52). John Benjamins. Mao, S. S., & Crosthwaite, P. (2019). Investigating written corrective feedback: (Mis)alignment of teachers’ beliefs and practice. Journal of Second Language Writing, 45, 46–60. Nabei, T., & Swain, M. (2002). Learner awareness of recasts in classroom interaction: A case study of an adult EFL student’s second language learning. Language Awareness, 11(1), 43–63. Odo, D. M., & Yi, Y. (2014). Engaging in computer-mediated feedback in academic writing: Voices from L2 doctoral students in TESOL. English Teaching, 69(3) 129–150. Sheen, Y. (2007). The effect of focused written corrective feedback and language aptitude on ESL learners’ acquisition of articles. TESOL Quarterly, 41(2), 255–283. Shintani, N. (2016). The effects of computer-mediated synchronous and asynchronous direct corrective feedback on writing: a case study. Computer Assisted Language Learning, 29(3), 517–538. Shintani, N., & Aubrey, S. (2016). The effectiveness of synchronous and asynchronous written corrective feedback on grammatical accuracy in a computer-mediated environment. The Modern Language Journal, 100(1), 296–319. Shintani, N., & Ellis, R. (2013). The comparative effect of direct written corrective feedback and metalinguistic explanation on learners’ explicit and implicit knowledge of the English indefinite article. Journal of Second Language Writing, 22(3), 286–306. Shintani, N., Ellis, R., & Suzuki, W. (2014). Effects of written feedback and revision on learners’ accuracy in using two English grammatical structures. Language Learning, 64(1), 103–131.

Chapter 15. Methodological considerations in the analysis of written corrective feedback

Spada, N. (2019). Discussion: Balancing methodological rigor and pedagogical relevance. In R. DeKeyser & G. Prieto-Botana (Eds.), Doing SLA research with implications for the classroom. Reconciling methodological demands and pedagogical applicability (pp. 201–215). John Benjamins. Spada, N., & Tomita, Y. (2010). Interactions between type of instruction and type of language feature: a meta-analysis. Language Learning, 60(2), 263–308. Storch, N. (2018). Written corrective feedback from sociocultural theoretical perspectives: A research agenda. Language Teaching, 51(2), 262–277. Truscott, J., & Hsu, A. Y. -p. (2008). Error correction, revision, and learning. Journal of Second Language Writing, 17(4), 292–305 Van Beuningen, C. G., de Jong, N., & Kuiken, F. (2008). The effect of direct and indirect corrective feedback on L2 learners’ written accuracy. ITL International Journal of Applied Linguistics, 156, 279–296.

Appendix.

Instruction in text reconstruction test

All three task sheets (for the SCF, the ACF and the Control groups) contained the following basic instruction.

Basic instruction for all groups 宿題で書いた人生を変えた5つの出来事について、もしそれが起こらなかったら(又 は、もしあなたがそれをしなかったら)どうなっていましたか。5つのできごとすべ てについて、できごとの簡単な説明と、もしそれが起きなかった場合、どうなってい たか、想像して英語で簡単に書いてください。 Translation: You have written five life events that changed your life. What would have happened if the five events had not occurred (or you had not done that)? Imagine what would have happened, write short explanations on the events and the things that would happened in English. You must write about all five events.

This was followed by the instruction for each group.

The instruction for the SCF group あなたが書いている途中、間違いがあると先生が、その場所に正しい英語を書き込み ますので、そのときには、文を修正してください(resolvedをクリックしないで、訂正 だけをしてください!)。( )に入ったコメントは先生の指示、入っていないコメントは 正しい英語の表現です。時間は25分です。Stopのサインが現れたら、このファイルを閉 じてください。 Translation: When you make errors while you are writing, the teacher will write the correct English to the part. When that happens, please revise it (Do not click ‘resolved”. Simply revise the error). Comments in ( ) indicate the teacher’s comment, and comments without ( ) indicate the correct English expression. You have 25 minutes. When you see the “Stop” sign, close this file.

335

336

Natsuko Shintani & Scott Aubrey

The instruction for the ACF group 時間は20分です。Stopのサインが現れたら、このファイルを閉じてください。 その10分後に、先生の修正と一緒に英作文をお返しします。 Translation: You have 20 minutes. When you see the “Stop” sign, close this file. 10 minutes later, your writing will be returned with the teacher’s correction on it.

When the ACF group students received their writing with corrective feedback, the following instruction had been added to their document. At this time, the instruction was in red letters to attract students’ attention. あなたが先ほど書いた文に先生のコメントが入っています。今からそれをもとに、5 分間で文を修正してください(resolvedをクリックしないで、訂正だけをしてくださ い!)。 ( )に入ったコメントは先生の指示、入っていないコメントは正しい英語の表現 です。 Translation: Now the teacher’s comments are on your writing. Revise your text in 5 minutes. (Do not click ‘resolved”. Simply revise the error). Comments in ( ) indicate the teacher’s comment, and comments without ( ) indicate the correct English expression.

The instruction for the control groups 時間は20分です。Stopのサインが現れたら、このファイルを閉じてください。 Translation: You have 20 minutes. When you see the “Stop” sign, close this file.

chapter 16

Analysing L2 writers’ processing of written corrective feedback via written languaging and think-aloud protocols Methodological considerations Sophie McBride & Rosa M. Manchón University of Murcia

This chapter provides a reflection on the methodological decisions taken in a study that investigated the affordances of diverse data collection procedures for inspecting depth of processing of written corrective feedback, namely think-aloud protocols, written languaging, and a combination of both. We will start by formulating the overall aims and the specific questions guiding the study and by providing a synthetic account of the rationale behind our aims and methods. In the main part of the chapter, we will report (i) the main challenges and problems experienced when analysing the data as well as the solutions adopted; and (ii) the kind of data on feedback processing provided by the three data collection procedures used in the study. We will close with methodological conclusions for future studies intended to shed light on depth of processing of written corrective feedback.

Introduction This chapter responds to the overall aims of chapters in Part 3 of the book to provide critical reflections on data collection instruments and analytical procedures in the study of L2 writing processes, in our case focusing on processing of written corrective feedback (WCF). More precisely, we shall reflect on the methodological problems and solutions adopted when inspecting feedback processing via metacognitive think-aloud protocols, written languaging, and a combination of both. The research to be reported is part of a wider program of research currently underway at the University of Murcia that investigates the processing and the effects of WCF as a function of writing conditions (print-based vs. screen-based https://doi.org/10.1075/rmal.5.16mcb © 2023 John Benjamins Publishing Company

338

Sophie McBride & Rosa M. Manchón

writing) and learner-related variables (cognitive and affective individual differences, L2 proficiency, academic background and resulting academic literacy skills). The ultimate aim of this global research is partly methodological in nature as we planned to look into the affordances of diverse data collection procedures, understanding such affordances in terms of the light they could shed on how L2 writers process the feedback provided on their writing. As part of this overall methodological aim, in this chapter we will report on a part of the global project in which we inspected the affordances and potential effects of three experimental conditions on feedback processing with one of the groups of participants in the wider project, namely, a group of university students with a background in language and Linguistics. In the three treatment conditions, participants were instructed to analyze the feedback provided on their writing, which was followed by a revision session of their original texts on the basis of the feedback received and their processing of it. The three conditions varied in terms of whether the participants’ processing activity was verbalized orally (metacognitive think-aloud), reflected in writing by filling in a written languaging table, or both. We will start with a synthetic account of the rationale behind our aims and methods, the specific questions guiding the study, and an overview of the methodological decisions taken. In the main part of the chapter, we will report the main challenges and problems experienced when analysing the data and the solutions adopted, and we will reflect on the way in which our varied data collection procedures allowed us to inspect our participants’ processing and use of the feedback they received on their writing. We will close with methodological conclusions for future studies intended to shed light on feedback processing.

Overview of the global research program and of the study: Rationale, aims, and methods Rationale Depth of processing (DoP) has become a key variable in WCF research interested in the connection between WCF processing and language learning (see Chapter 1) on the assumption that potential language learning gains crucially depend on how deeply L2 writers process the WCF provided on their initial writing. This basic tenet is primarily grounded in theoretical accounts of the role of attention in second language acquisition processes (Bitchener, 2012, 2016, 2017, 2019; Bitchener & Storch, 2016; Leow, 2020; Leow & Suh, 2021; Manchón, 2023; Manchón & Vasylets, 2019; Polio, 2012).

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

Most of the empirical studies in the domain (e.g., Caras, 2019; Cerezo et al., 2019; Kim & Bowles, 2019; Manchón et al., 2020) are framed within Leow’s (2015) conceptualization of DoP (further developed into his “Feedback Processing Framework”, in Leow, 2020) and have set out to test ways in which DoP – alongside further task-related, feedback-related, and writer-related variables – can promote language learning. Yet, despite the notable empirical advancements made (see Roca de Larios & Coyle, 2021, for a review of this body of work. See also Chapter 3, this volume), some critics (e.g., Leow & Manchón, 2021; Manchón & Leow, 2020; Manchón et al., 2020) have problematized some of the methodological procedures used in this research strand, especially participants’ selection and data collection and analyses procedures, precisely two central considerations in our own global research. For our current purposes, the latter are especially relevant. Thus, regarding data collection procedures, and framed in a more global interest in making validity issues more central in disciplinary debates in L2 writing process studies, several voices have advocated for the conduct of more controlled studies in order to test the affordances of diverse instruments for capturing DoP. In this respect, Leow and Manchón (2021) claim that “the validity of instruments in relation to the research questions being posed must be a major consideration” (p. 20). In the case of feedback studies, a pending task is to test the validity (i.e., the capacity of the instruments to measure what is meant to be measured) of diverse data collection instruments for answering questions related to how L2 writers engage with and process the WCF provided on their writing. This validity issue was first brought to attention in Manchón et al.’s (2020) study on feedback processing in individual and collaborative writing conditions. The researchers set out to investigate whether writing individually or collaboratively had an effect on, first, the levels of DoP observed and, second, the accuracy measures of the texts written before and after the feedback processing stage. Regarding DoP, no effect for writing conditions was observed, which contradicted previous findings. When trying to account for these discrepancies (especially regarding the results of the collaborative writing condition), the researchers speculated with the possibility that methodological considerations could provide the answer, especially variables such as the “degree of explicitness of the WCF provided, the directiveness (or lack of ) of the instructions given to participants for their languaging activity, the oral or written medium of languaging, and the time span between the languaging activity and the language use activity that follows” (p. 260). Important for our current purposes were their observations regarding the medium (oral/written) of the participants’ verbalizations while processing the WCF provided on their writing. When comparing their own findings (obtained via written languaging) with those of previous research (obtained via oral languaging that “was not guided or constrained by instructions or medium”, p. 258),

339

340

Sophie McBride & Rosa M. Manchón

the researchers observed that their own approach probably allowed them to have access only to “the outcome of the languaging activity and WCF processing” (p. 258), but not to the WCF processing activity itself, as they did not analyse the interaction among the participants in the collaborative writing groups while completing the written languaging table. This led them to conclude (our italics): DoP of WCF, as manifested in written languaging behaviour, might not be a function of whether the WCF is processed individually or collaboratively in cases in which (i) such processing is done in writing (in contrast to the more open, extensive reflection that could derive from oral languaging in the form of individual think-aloud or collaborative dialogue), and (ii) learners’ languaging activity is guided and mediated by strict instructions that encourage higher levels of awareness. (p. 258)

Manchón et al. (2020) formulated these claims as empirical questions worth addressing in future studies, which is precisely what we attempted to do in our own global research and, more specifically, in the study within the global project we focus on in this chapter. While the ultimate aim of the study was to inspect the affordances of diverse data collection procedures -i.e., metacognitive thinkaloud (TA), written languaging (WL) and a combination of both (TA + WL)for the analysis of DoP of WCF, our aim in this chapter is to account for the methodological decision-making process we engaged in while analyzing the data obtained in these three treatment conditions for the purpose of setting up our DoP coding scheme. Important from the perspective of validity, the data analysis includes an examination of (i) the kind of feedback processing data provided by the three data collection procedures; and (ii) the participants’ degree of involvement with the feedback provided on their writing as a function of feedback processing conditions.

Research questions Three main research questions guided our study: 1.

To what extent is engagement with feedback influenced by the way in which participants are asked to reflect on the WCF provided on their writing? 2. What kind of data do TA, WL and TA + WL provide on WCF processing? 3. To what extent do TA, WL and TA + WL vary in their affordances for inspecting the depth of processing of WCF?

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

Methods Participants and context The participants (n = 18; 14 females and 4 males) were a group of undergraduate students at a Spanish university enrolled in an Applied Linguistics module as part of their 4th year in their undergraduate English Studies degree. They participated in the study on a voluntary basis and were rewarded following departmental regulations on course credits. All participants took an initial Oxford Placement Test (OPT) in order to confirm homogeneity in their proficiency levels. No initial differences between the participants were observed as their average proficiency was between a B2 and a C1 level of English, according to the Common European Framework of Reference for Languages (CEFR).

Tasks and procedures

Figure 1. Outline of data collection procedure

As shown in Figure 1, the study followed a pre-test/post-test design through four 50-minute sessions over the course of a one-week period. On day one the participants completed the OPT test; on day two they wrote their initial text (pretest); and on day three they received direct, unfocused WCF and then processed this feedback according to the treatment group to which they were assigned: (i) metacognitive think-aloud protocols; (ii) written languaging; and (iii) simultaneous written languaging and metacognitive think-aloud protocol. The decision to opt for unfocused direct WCF was to guarantee that all participants had similar information about the errors made and the corrections provided, and thus avoid any additional intervening variables influencing feedback processing in the three experimental conditions. The final session, on day 4, invited participants to rewrite their original text. In order to do so, they were provided with a clean copy of their original texts without WCF and were asked to rewrite said task, taking into consideration the errors they had previously languaged about in the WCF processing session. Upon finalising the rewriting, participants were asked to complete a questionnaire (Appendix 2) that included questions on their views of the

341

342

Sophie McBride & Rosa M. Manchón

WCF provided as well as the experimental conditions involved in the study (written languaging and/or think-aloud protocols). In order to complete the WCF processing stage, all participants received the same instructions, which were provided orally (think-aloud condition) and in written form on the task sheet in English (written languaging condition), with clarification requests being responded to in the participants’ L1, Spanish. The participants were asked to reflect on the errors they had made in their original texts, categorise them according to the error type (lexical, grammatical, orthographic, etc.), and provide a metalinguistic explanation for each error. The participants belonging to the written languaging group were asked to complete these activities on a written languaging table (see Appendix 1); those in the metacognitive thinkaloud processing condition, to perform the activities orally; and, finally, those in the simultaneous condition, to carry them out both orally (while thinking aloud) and in writing (when completing the written languaging table). Before the participants took part in the experimental phase of the study, the groups who would be asked to think aloud during their feedback processing stage were first instructed on how exactly they were required to do so. More specifically, these participants were informed that the researchers were interested in “everything that crossed their minds the entire time they were working”, almost as if they were “talking to themselves out loud” in the language they felt most comfortable using (L1 or L2). Throughout the feedback processing stage, the participants concerned were reminded to think aloud whilst reflecting on their errors and encouraged to provide metalinguistic explanations for the errors made (via metacognitive TA), in line with what was expected of the participants who were asked to fill out the written languaging tables, a task which was also regarded as metacognitive in nature.

The writing task The participants completed the complex version of the “Fire Chief ” task (Gilabert, 2007), a problem-solving, picture-based writing activity, as seen in Figure 2 below. This task was selected as it had been implemented in various studies conducted within the wider research program and validated empirically in terms of benefits in the production of written output (Sánchez et al. 2020). As task-related variables were not part of the design of this study, we simply chose a task that had been proved to elicit both task engagement and written output with the population under study (but see discussion of task-related considerations in the final part of the chapter).

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

Figure 2. “Fire Chief ” task (adapted from Gilabert, 2007)

Data coding considerations: Challenges faced and decisions taken when developing a coding scheme for DoP of feedback Operationalizing DoP Following trends in recent studies of WCF processing, we started by analyzing our data in the three experimental conditions based on Leow’s (2015) definition of depth of processing as the primary lens. Thus, the processing data was viewed according to: The relative amount of cognitive effort, level of analysis, elaboration of intake, together with the usage of prior knowledge, hypothesis testing, and rule formation employed in decoding and encoding some grammatical or lexical item in the input (Leow, 2015, p. 204)

The adoption of this initial, theoretically-based operational definition of DoP was followed by a data-driven analysis in the process of setting up a coding scheme of

343

344

Sophie McBride & Rosa M. Manchón

DoP. As detailed below, this proved to be a real challenge, especially regarding the written languaging data.

Setting up a coding scheme of DoP Data and unit of analysis Our first task was to segment the data to identify the unit of analysis for DoP. The first problem we faced resulted from the variation in the kind of data provided in the three experimental conditions: verbalizations in the metacognitive TA protocol data, annotations in the WL tables, and a combination of both in the metacognitive TA plus WL condition, as seen in the following examples: TA data: Original TA data: Translation [1] [WHISPERS: would open the main [1] WHISPERS: would open the main door so that the people that are inside door so that the people that are inside the building] … el de could … hmm … the building] … el de could … hmm … yo (2) bueno … I always (2) make this I (2) well … I always (2) make this mistake … and (2) I don’t know … I just mistake … and (2) I don’t know … I don’t … es como que no lo interiorizo just don’t know … it is like I don’t ese … ese tiempo allí pero bueno (3) are interiorize this … this tense there but trapped inside the building (2) can yeah (3) are trapped inside the exit (2) vale … si … si porque como el building (2) can exit (2) ok … yes … are está en presente (2) pues yes because the are is in present (2) so el (2) can … o sea el próximo verbo the (2) can … I mean the next verb también tiene que ir en el mismo tiempo also has to be in the same verb tense verbal (2) (2)

WL data [2] Error Fog

Correction

Code

Explanation

Smoke

V

“Fog” is a meteorological phenomenon, “smoke” is the correct word

Estinguished Extinguished SP

A spelling mistake, perhaps triggered by the influence of the Spanish “extinguir” which is usually pronounced as “estinguir”

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

WL & TA data [3] WL Error

Correction

Code Explanation

Origen

Source

V

The second term is more accurate

TA data: Original

TA data: Translation

[4] Vale … as they have the origin of the fire … the source (2) porque origen sería mas … vale … creo que origen (3) y source (2) al pensarlo en español creo que aqui en inglés hay una division que en español no lo tenemos [filling in the WL table] origin (2) source (6) porque no son … en español distinguimos (2) no … solo está origen y aquí está origen y source (2) esto también es vocabulario (3) [humming] (9) hmm … el segundo término … the second term [filling in the WL table] is … more … accurate (2)

[4] Ok … as they have the origin of the fire … the source (2) because origin would be … ok … I think that origin (3) and source (2) thinking about it in Spanish I think that here in English there is a distinction that we do not have in Spanish [filling in the WL table] origin (2) source (6) because they are not … in Spanish we disringuish (2) no … there is only origin and here we have origin and source (2) this is also vocabulary (3) [humming] (9) hmm … the second term … the second term [filling in the WL table] is … more … accurate (2)

As can be seen in these examples (and more fully discussed in a later section), the data provided in the three conditions essentially varied in the amount and precision of the details provided: It was clear from the outset that the TA data included information that was missing in the written languaging data. On account of these differences, we decided to start by analysing the TA and WL data separately. We assumed that this separate analysis could serve as the basis for the subsequent and key task of coming up with a common coding scheme of DoP. In the case of the metacognitive TA data, we could readily start by identifying the language-related episodes (LREs) found in the verbalisations produced by the participants. LREs were defined as “any segment of the protocol in which a learner either (i) spoke about a language problem he/she encountered while writing and solved it either correctly or incorrectly [or left it unresolved] or (ii) simply solved it without having explicitly identified it as a problem” (Swain & Lapkin, 1995, p. 378). In the case of the WL data, coding was based on the annotations provided in the different columns of the WL tables (see Appendix 1), particularly those where the participants were asked to indicate the error code and give an explanation. Globally, this was a recursive process, as we explain below.

345

346

Sophie McBride & Rosa M. Manchón

Analysis of the think-aloud protocols The metacognitive TA data recordings were transcribed by the first researcher and coded according to the depth of processing of each LRE. That is, each time a participant commented on an error or its correction, the corresponding section in the verbal report was identified and subsequently coded in terms of the component elements of our operational definition of DoP. To this end, we started by classifying the LREs in broad categories roughly corresponding to the actions implemented by the participants, as shown in Figure 3 below. – – – –

– – – – – – –

Read/repeat target structure Disagrees with error correction (no explanation given) Disagrees with error correction (explanation given) Fully understands the error correction and provides an accurate explanation – The error was due to an incorrect translation from the L1 – Rule formulation Attempts but incomplete/incorrect explanation Accepts the correction as they didn’t know how to say it in their L2 Cannot explain the error, isn’t sure of the mistake Ignores the error Accepts the error as it was made due to rushing Understands, “always makes this error”, provides an (in)complete explanation Translates the error to the L1 and accepts with little/no explanation

Figure 3. Broad category descriptors of the TA protocols

Although informed by Leow’s definition of DoP, as noted above, a data-driven process was followed when establishing levels of participant engagement with the WCF provided. As shown in Figure 4, two macro-groups of actions were identified according to whether or not participants engaged with the error corrections provided on their writing. Upon establishing the WCF processing trends found within the data, the next step was to categorise them according to levels of processing. Previous studies in the field had categorised DoP into two (Kim & Bowles, 2019) or three levels (Adrada-Rafael & Filgueras-Gómez, 2019; Caras, 2019; Cerezo et al. 2019; Park & Kim, 2019). Kim and Bowles (2019) distinguished between high and low levels of processing, in which high DoP corresponded to participants who showed a high level of cognitive effort when engaged in the formulation and application

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

Figure 4. Initial classification of participants’ engagement with WCF

of the target rules and spent a long time processing the target forms. Conversely, low DoP was characterised by very little cognitive effort on the part of participants who limited themselves to recognizing the target forms just by reading or repeating them and showed a lack of understanding of the corrections provided as well as signs of minimal processing. As mentioned above, a third category (i.e., medium, or intermediate DoP) had also been previously identified, in this case as the type of behaviour displayed by those participants who spent slightly more time processing target items than their counterparts in the low DoP level, commented on target items and showed some indication of cognitive effort (e.g., Caras, 2019). With this in mind, we aimed at refining our initial categories according to whether they were indicative of low, medium, or high processing levels. Since we observed that there were certain verbal reports that did not necessarily fit within the criteria of low or deep processing, we decided that, in addition to these two levels, a medium level of processing should also be included in our coding scheme. However, when attempting to categorise the LREs into the three levels of processing already established, it became apparent that not only did we have to take into account the actions observed in the LREs (as described in Table 3 above), but also that it was essential to take into consideration several dimensions of cognitive effort, including time spent on engaging with the feedback and

347

348

Sophie McBride & Rosa M. Manchón

effort put into providing the (successful/unsuccessful) explanation of errors. Consequently, we finally decided to code the LREs according to the following criteria: 1.

LREs were coded as high DoP when the participants a. provided correct formulation of rules; b. hypothesised about the error correction provided; c. translated the target item correctly and provided a metalinguistic explanation; d. demonstrated a high level of cognitive effort when processing the error, as seen in the time spent on processing and providing correct metalinguistic explanations; or e. disagreed with the error correction provided but included a sound metalinguistic explanation as to why.

The following is an example of high DoP, as the participant, by activation of prior knowledge, provides a metalinguistic explanation of the correct tense provided and formulates the corresponding rule which applies to the use of this tense, i.e., the present perfect is used for an action that started in the past but is still in progress. TA data: Original TA data: Translation [5] (rule formulation): A ver … (5) as they [5] (rule formulation): Let’s see … (5) as have not enjoyed life enough time … they have not enjoyed life enough enough … they did not enjoy life time … enough … they did not enjoy enough … (10) claro … porque si life enough … (10) of course … because todavía están vivos pues todavía if they are still alive well they can still pueden disfrutar … have … sería have fun … have … it would be the presente perfecto … have present perfect … have enjoyed (2) have (2) not enjoyed y yo enjoyed (2) have (2) not enjoyed and I lo he puesto en pasado (3) pero sería have put it in the past (3) but it would en presente perfecto [filling in the be in the present perfect [filling in the table] enjoyed … grammar … this is table] enjoyed … grammar … this is grammar … and they are still grammar … and they are still alive (3) thus … we are talking about alive (3) thus … we are talking about an an action in progress [filling in the action in progress [filling in the table] table] vale. ok.

2. LREs were coded as medium DoP when the students a. translated the target form into the L1 (correctly or incorrectly) but did not provide any further information;

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

b. discussed repeatedly making the same mistake throughout the text (or, in general, when using the L2) as well as providing a metalinguistic explanation (generally very brief ); or c. attempted to provide an explanation, usually very brief in terms of the time spent on processing, and at times even gave up on trying to explain. In addition, depending on whether the linguistic information provided in the LREs was correct or incorrect, medium DoP was also considered according to whether the verbalizations showed more or less awareness at the level of understanding. This decision was based on the consideration that some LREs fit within the criteria of medium DoP, although the reflections provided were not correct (e.g., incorrect translation into L1). As seen in Excerpts [6] and [7] below, medium DoP is reflected in an example where the participant recognizes an error s/he had previously made (and received corrections for) in the text ([6]), and in another example ([7]) where the participant correctly translates the correction into their L1, showing greater awareness at the level of understanding (s/he has successfully understood the correction provided). TA data: Original TA data: Translation [6] (recognising a repeated error): Vale … [6] (recognizing a repeated error): Ok … otra vez el mismo error … el mismo again the same mistake … the same case caso con also que va entre … hmm … with also which goes between … auxiliar y verbo hmm … auxiliary and verb [7] (translates the correction and accepts [7] (translates the correction and accepts new L2 form): El siguiente error es de new L2 form) The next mistake has to vocabulario … yo he escrito [Laughs] do with vocabulary … I wrote [laughs] floor zero hmm … para referirme al floor zero hmm … to refer to the first … primer … a la planta baja … la prim … to the ground floor … the firs … no … no … no la primera planta no (2) la not the first floor no (2) the floor … that planta … que está … al entrar [Laughs] is … at entry level [laughs] of the del edificio y sería … ground building and it should be … ground

3. Comments coded as low DoP included instances where the participants a. simply read or repeated the target form; b. agreed (or disagreed) with the error and moved on to the next error without providing any explanation; or c. provided no signs or verbalisations of understanding the correction provided.

349

350

Sophie McBride & Rosa M. Manchón

As seen in Excerpt [8], the participant recognises the error correction provided but simply reads the target form, repeats it, and moves on with no further processing involved. TA data: Original [8] (reads and repeats the target form) on the roof terrace … claro (3) vale puse on the roof y lo habéis cambiado por roof terrace … the roof terrace (3)

TA data: Translation [8] (reads and repeats the target form) on the roof terrace … of course (3) ok I put on the roof and you have changed it for roof terrace … the roof terrace (3)

There were some verbalizations that did not fit into the three levels of DoP established. These instances, which corresponded to those cases in which the students ignored an error correction or showed no signs of processing the correct target form provided in the WCF, were eventually coded as null processing As depicted in Table 1, the final DoP coding scheme consisted of four levels: null, low, medium, and high. Table 1. Final coding of DoP in the metacognitive TA data Null DoP

Low DoP

Medium DoP

High DoP

Criteria

No cognitive effort and no time spent on processing target form

Low cognitive effort and minimal time spent on processing target form

Target form is commented on but with little discussion/attempt to provide an explanation. Brief engagement with target form.

High cognitive effort, accurate rule formulation, high engagement/ time spent on each target item

Examples

Avoids the error correction or ignores it

Reads the feedback. Repeats the target form. Spends very little time on the correction. Uses words such as yes, ok, hmm before moving on quickly.

Translates into L1 (correctly/ incorrectly) but quickly moves on. Recognises making the same error previously. Attempts at providing an explanation/ discussion of error but very brief (correct/ incorrect).

Hypothesising about the target form. Translating the target from L2 to L1 correctly and providing an explanation for the error. High cognitive effort, as seen in time spent on processing. Correct ruleformation.

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

Table 1. (continued) Null DoP

Low DoP

Medium DoP

High DoP Disagrees with the correction but provides a correct metalinguistic explanation.

Analysis of the written languaging tables The data set provided by the WL tables was of a different nature from the one collected through the metacognitive TA protocols. To begin with, there was no need to segment the former (the first step we took in the analysis of the latter) as the columns in the table readily provided information that could be equated with the LREs identified in the metacognitive TA data. Yet, when analysing the written languaging annotations, we realized that it was not possible to apply our initial categories of engagement with WCF established for the TA condition. This was so because, reiterating what was found by Manchón et al. (2020), it became evident that the WL tables were not providing data on the processing stage itself, but rather on the outcome of that stage, as seen in Excerpts [9] and [10]. Thus, rather than allowing us to gain insight into the levels of processing, the WL data mostly offered evidence of the quantity of metalinguistic information and, hence, the level of metalinguistic knowledge each participant was capable of providing/ willing to provide. We therefore decided, as a starting point, to code the WL data according to levels of awareness (noticing, reporting, and understanding). To this end, we followed Cerezo et al.’s (2019) proposal of distinguishing between errors that had been noticed or unnoticed (as evidenced in the WL tables). Subsequently, the noticed errors (i.e., those about which there was some information in the WL tables) were coded according to the instances of correct information they included, especially in the “code” and “explanation” columns. [9]

[10]

Error

Correction

Code

Explanation

Also

Change of position

Word Order

Word order error. “Also” should go before the verb

Error

Correction

Code

Explanation

All

All of

GR

I agree but it was a mistake that has to do with not paying attention.

351

352

Sophie McBride & Rosa M. Manchón

Annotations were further divided according to how much correct information was provided in the written languaging table and, eventually, a total of five subcategories were established (as seen in Table 2). Level 1 corresponded to awareness at the level of noticing, as manifested by a participant providing only the error transcription. Levels 2 and 3 were defined as awareness at the level of reporting and corresponded to when participants provided the error transcription, as well as either the correct error correction, the error category, or both. Levels 4 and 5, defined as awareness at the level of understanding, ranged from those instances in which, at a minimum, participants provided the error transcription and a correct metalinguistic explanation, and, at a maximum (Level 5), successfully provided all relevant information in the table. Table 2. Categorization of the WL data Level of awareness Awareness at the level of noticing Awareness at the level of reporting Awareness at the level of understanding

Manifestation in data Level 1

Error transcription alone

Level 2

Error transcription plus either error correction or error category Error transcription, error correction, and error category Error transcription and metalinguistic explanation OR Error transcription, error correction, and metalinguistic explanation OR Error transcription, error category, and metalinguistic explanation Error transcription, error correction, error category, and metalinguistic explanation

Level 3 Level 4

Level 5

In short, the data output provided by each instrument reflected, as previously mentioned, two different phenomena, i.e., degree of awareness (WL) and levels of DoP (TA). As we explain next, given that it was not possible to equate the data provided by the two introspective measures to levels of DoP, we worked on finding a correspondence between the two that would allow us to explore the data conjointly.

Setting up a coding scheme for the global data In order to establish such correspondence and end up with a coding scheme that might be usable across instruments, we decided to rely on the criterium we had previously used with the metacognitive TA data, i.e., participants’ engagement with the feedback provided. The application of this criterium on both data sets allowed us to (i) distinguish between participants who engaged with the EC and

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

participants who did not, and (ii) use this distinction as the basis on which the coding scheme shown in Table 3 was set up. It must be noted that the table does not provide new information. Rather, it does reflect the way in which we were able to compare the information provided by both data collection procedures, despite them not providing the same DoP information. This allowed us to compare them in terms of metalinguistic information and engagement, two key components of our operational definition of depth of processing Table 3. Final coding for processing of WCF for TA and WL TA

WL

(i)

Participant does not engage with the EC

Ignores the error Reads/Repeats the error

(ii)

Participant engages with the EC a. Disagrees with the EC

Leaves a blank space Annotates error and correction with no further analysis

No further verbal explanation

“I disagree” in the explanation section, but no further information provided. Provides a written explanation Explanation section left blank or evidence of not being able to provide an explanation Evidence in the explanation section that a new L2 form has been accepted. “I didn’t know that word/form”/ “I didn’t know how to say it” Translation included in the explanation section Personal reasons (i.e., rushing, always make this mistake.)

b. Agrees with the error correction

Explains further No further explanation/ cannot provide an explanation Acceptance of new L2 form

Use of L1 translation Personal reasons (i.e., rushing, always make this mistake.) Rule explanation/ Formulation

Rule explanation/Formulation included in the explanation section

Once we had this common coding scheme, we could approach our next task, which consisted of inspecting further, first, the kind of data on feedback processing provided by the three data collection procedures used in the study, and, second, the participants’ degree of involvement with the feedback provided on their writing as a function of feedback processing conditions, as discussed in the next 2 sections.

353

354

Sophie McBride & Rosa M. Manchón

Nature of the data provided in different processing conditions and affordances for inspecting DoP The data were first approached in order to determine which procedure(s) was/ were more indicative of levels of DoP. Given that one of the main indicators of processing corresponds to cognitive effort, it was essential to view the output in terms of how much this dimension was visible in the data. As noted above, it became evident that, by looking at the WL tables in isolation, it was almost impossible to measure how much cognitive effort participants were implementing when processing their feedback. In the following example, taken from one participant in the WL + metacognitive TA treatment group, we could appreciate how very little information was provided in the WL table in the explanation section. Yet, if viewed in conjunction with the participant’s think-aloud protocol data, we were able to assess more accurately the effort the participant truly made when engaged with the WCF provided on his/her writing: [11]

Error

Correction

Code

Explanation

Origin

Source

V

The second term is more accurate

TA data: Original TA data: Translation [11] Vale … as they have the origin [11] Ok … as they have the origin of the of the fire … the fire … the source (2) because origin source (2) porque origen sería would be … ok … I think that origin más … vale … creo que (3) and source (2) thinking about it in origen (3) y source (2) al Spanish I think that here in English pensarlo en español creo que there is a distinction that we do not aqui en inglés hay una división have in Spanish [filling in the WL que en español no lo tenemos table] origin (2) source (6) because [filling in the WL table] they are not … in Spanish we origin (2) source (6) porque no distinguish (2) no … there is only son … en español distinguimos origin and here we have origin and (2) no … solo está origen y source (2) this is also aquí está origen y source vocabulary (3) [humming] (9) hmm … (2) esto también es the second term … the second term vocabulario (3) [humming] (9) [filling in the WL table] is … more … hmm … el segundo termino … accurate (2) the second term [filling in the WL table] is … more … accurate (2)

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

In this particular case, the participant reflected on the differences between their L1 (Spanish) and the English correction of the word “origin”, and concluded that the term provided (“source”) was in fact more accurate to describe the fire than the one used in the original writing (2 terms in English –“origin” and “source” – for one in Spanish –“origen”). This shows that, just by viewing the WL table without any access to the TA transcription, it would have been impossible to know how the participant came to that conclusion. We thus realized that the metacognitive TA protocol became a crucial instrument for the analysis of cognitive effort, and found that this was a trend throughout the entire data in which, without the TA transcriptions, it was extremely difficult to view the mental processes participants underwent. In the WL tables we were left solely with the outcome of the feedback processing, rather than with more tangible evidence of the processing stage itself, hence confirming the conclusions reached by Manchón et al. (2020) on their own written languaging data. Another common trend in the data was the abundance of instances in which categories within the written languaging table were left blank, particularly the explanation column. In these cases, without the availability of the corresponding think-aloud protocol, it was impossible to gauge the levels of DoP. In example [12], the participant, a member of the WL + TA group, did not include any explanation for the error or the correction in the written languaging table. [12]

Error

Correction

Code

Could

Can

GR

Explanation

I have another mistake in relation to the tense verbs … I used the past … and I should have used the present tense as I was presenting a situation so it would be (2) present … present tense (2) as in the modal verb can instead of could [writing in the table] (33)

However, the corresponding TA protocol shows how the participant did in fact process the error and even provided metalinguistic information regarding the use of tenses, something which was lost in the WL table. This provides another example of both the cruciality of the metacognitive TA protocols and the lack of optimality, when used alone, of the WL tables as an introspective instrument to shed light on the cognitive processes involved in the WCF processing stage. The transcription of think-aloud processing data provided crucial information not only concerning the levels of DoP but also on the participants’ reactions to WCF processing, including, in some cases, their resistance to accepting the feedback provided. These insights can be added to the distinction that became evident, via the think-aloud transcriptions, of those cases in which a participant had made a genuine slip-of-the-pen mistake [13] and those that corresponded to a

355

356

Sophie McBride & Rosa M. Manchón

new error, an issue with relevant implications for future research in this domain and for practice. [13]

Error

Correction

Code

Explanation

All

All of

GR

I agree but it was a mistake that has to do with not paying attention.

TA transcription: Original TA transcription: Translation [13] El Segundo error es … all of the [13] The second error was … all of the emergency … se me ha olvidado la emergency … I forgot the preposición [writes down the error preposition [writes down the and the correction (13)] sería un error and the correction (13)] it error de [writing] gra-ma-ti-ca … would be an error of [writing] y … estoy de acuerdo pero fue gra-mmar … and … I agree but it un … despiste [Laughs and writes was an … oversight [laughs and down the explanation in the table] writes down the explanation in the table]

In this example, the participant explains that the error was an oversight and laughs upon processing the correction, thus signalling that this was a genuine mistake rather than a new error. In short, the metacognitive TA data proved essential in revealing insights not only into the participants’ cognitive effort but also into the types of errors they made and their attitudes toward the feedback received. However, when looking solely at the LREs from the metacognitive TA-only group, it became clear that the participants in this condition were the ones who spent the least time processing errors (5 minutes on average). In the following examples, we can see two transcriptions taken, respectively, from the metacognitive TA-only group [14] and the metacognitive TA + WL group [15]:

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

TA transcription: Original TA transcription: Translation [14] Hmm … on (16) as far away (14) ok … [14] Hmm … on (16) as far la gran mayoría de los errores son away (14) ok … the great majority of verbos (6) y preposiciones (5) y … no the mistakes are sé … . verbs (6) prepositions (5) and … I don’t know … [15] Vale … another group equipment … [15] Ok … another group equipment … vale … vocabulary vale equipment ok … vocabulary ok equipment [copies down the error and the [copies down the error and the correction (4)] yes … group of firemen correction (4)] yes … group of yo creo que es de vocabulario … este es firemen I think it is vocabulary de vocabulario porque lo he traducido because I translated it from del español … equipo de bomberos y Spanish … “equipo de bomberos” equipment … vale … este es de and equipment … ok … this is vocabulario porque … [writes down vocabulary because … [writes down the explanation] in English … in the explanation] in English … in English, the collocation is a group of English the collocation is a group of firemen (3) y porque equipment es firemen (3) and because equipment equipamiento … no equipo (3) Qué is “equipamiento” … not desastre (3) team (3) What a disaster. (3)

As can be seen, not only did we find that LREs from the metacognitive TA-only group were shorter, but also that the participants in this condition tended to generalize and comment on clusters of errors, rather than focusing on one error at a time. In turn, the participants in the metacognitive TA + WL group [15] spent more time processing error correction (37 minutes on average), focusing on one error at a time, and showing signs of higher cognitive effort. We can therefore conclude from our data that it was the combination of instruments that promoted the most engagement with the feedback when compared to the use of each instrument in isolation. This combination appeared to be the most motivating for participants, with those in the TA + WL group being the most involved in terms of both time spent on task and overall cognitive effort. Consequently, we interpret our data as suggesting that the written guidance provided to participants through the WL tables in the simultaneous group proved essential in fostering deeper cognitive processing. This conclusion was further reinforced by the answers provided by the participants in the exit questionnaire, where, in addition to acknowledging that the metacognitive TA protocols were helpful in promoting their reflection on errors, they commented on the usefulness of the WL tables in helping them remember error corrections in subsequent text revisions, as seen in Excerpt [16] and [17].

357

358

Sophie McBride & Rosa M. Manchón

[16] I think they (WL tables) are very useful because they have helped me to realize my mistakes when I am writing [ … ] I think they (corrections) will be retained in my mind longer. [17] Because I wrote the errors down, I remembered them pretty well.

Relevant methodological conclusions and implications for future studies Our analysis of the data collected in the study and reported in the chapter has provided empirical evidence of the affordances of diverse data collection instruments in terms of the light they may shed on the study of feedback processing. We would conclude that, for the population under study, written languaging tables elicited information about metalinguistic awareness but did not provide any insightful data on levels of DoP. Hence, they proved to be an invalid instrument to measure DoP. Whether this was the case because of the form they took (i.e., elements and instructions in the tables), or the written modality they involved, remains an open question that future research should address. The lost information concerning levels of DoP was indeed evident in the metacognitive TAs which, once again within the specific population studied, did provide significant data on the participants’ cognitive processing of WCF. In this respect, our study supports previous findings (e.g., Caras, 2019; Park & Kim, 2019) and claims on the relevance of using TA in research into DoP (Leow & Manchón, 2021). Of relevance, the missing information in the WL condition that was retained in the TAs was only found to be truly complete when the participants received guidance from the WL tables. Thus, a further conclusion from our analysis is that the two instruments in isolation may not suffice for engaging L2 writers in full WCF processing as they may even be too cognitively demanding for some students (see Chapter 6, this volume). As previously mentioned, the WL tables included many instances in which the explanation category was left blank, perhaps suggesting that students might have felt reluctant to provide a tangible representation of their L2 knowledge in writing, particularly when hesitations and doubts about some L2 target forms were the main content of their thoughts (Roca de Larios, personal communication). Yet, despite the potential methodological relevance of these insights for future studies of WCF processing, we are fully cognizant that our reflections in this chapter are based on data collected from L2 writers with a background in Linguistics. Thus, exploring more diverse populations, particularly younger L2 learners and students with no experience in language or Linguistics, would enable a deeper

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

understanding of how effective WL and TAs are in engaging learners with WCF and in providing crucial insights into the DoP of WCF. It is also important to keep in mind the potential difficulties involved in implementing a combination of introspective measures in classroom-based studies. This is a relevant point, given recent calls to apply a curricular perspective to WCF research (e.g., Leow, 2020; Leow & Manchón, 2021; Manchón & Leow, 2020). Additionally, our study was focused on just one task, completed in timecompressed conditions, and based on one type of WCF. Therefore, future research should test the affordances of diverse research instruments for tapping into WCF processing across a variety of tasks, when performed in diverse timeon-task conditions and when different types of feedback are provided. Despite these limitations, we hope that the research reported in this chapter serves primarily to draw attention to the crucial relevance of both advancing disciplinary discussions on methodological considerations in the study of WCF processing and answering key empirical questions in this domain.

Funding The global program of research on L2 writing mentioned in this chapter was financed by the Spanish Research Agency (Research Grant PID2019-104353GB-I00 and Pre-Doctoral Grant BES-2017-081873) and The Seneca Foundation (Research Grant 20832/PI/18).

References Adrada-Rafael, S., & Filgueras-Gómez, M. (2019). Reactivity, language of think-aloud protocol, and depth of processing in the processing of reformulated feedback. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 199–211). Routledge. Bitchener, J. (2012). A reflection on the language learning potential of written CF. Journal of Second Language Writing, 21, 348–363. Bitchener, J. (2016). To what extent has the published written CF research aided our understanding of its potential for L2 development? ITL – International Journal of Applied Linguistics, 167, 111–131. Bitchener, J. (2017). Why some L2 learners fail to benefit from written corrective feedback. In H. Nassaji & E. Kartchava (Eds), Corrective feedback in second language teaching and learning. Research, theory, applications, implications (pp. 129–140). Routledge. Bitchener, J. (2019). The intersection between SLA and feedback research. In K. Hyland & F. Hyland (Eds.), Feedback in second language writing. Contexts and issues (pp. 85–105). Cambridge University Press.

359

360

Sophie McBride & Rosa M. Manchón

Bitchener, J., & Storch, N. (2016). Written corrective feedback for L2 development. Multilingual Matters. Caras, A. (2019). Written corrective feedback in compositions and the role of depth of processing. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 188–200). Routledge. Cerezo, L., Manchón, R. M., & Nicolás-Conesa, F. (2019). What do learners notice while processing written corrective feedback? A look at depth of processing via written languaging. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp.173–187). Routledge. Gilabert, R. (2007). Effects of manipulating task complexity on self-repairs during L2 oral production. International Journal of Applied Linguistics, 45, 215–240. Kim, H. R., & Bowles, M. (2019). How deeply do second language learners process written corrective feedback? Insights gained from think-alouds. TESOL Quarterly, 4, 913–938. Leow, R. P. (2015). Explicit learning in the L2 classroom: A student-centered approach. Routledge. Leow, R. P. (2020). L2 writing-to-learn: Theory, research, and a curricular approach. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 95–117). John Benjamins. Leow, R., & Manchón, R. M. (2021). Directions for future research agendas on L2 writing and feedback as language learning from an ISLA perspective. In R. M. Manchón & C. Polio (Eds.), The Routledge handbook of second language acquisition and writing (pp. 299–311). Routledge. Leow, R. P., & Suh, B-R. (2021). Theoretical perspectives on L2 writing, written corrective feedback, and language learning in individual writing conditions. In R. M. Manchón & C. Polio (Eds.), The Routledge handbook of second language acquisition and writing (pp. 9–21). Routledge. Manchón, R. M. (2023). The psycholinguistics of L2 writing. In A. Godfroid & H. Hopp (Eds.), The handbook of second language acquisition and psycholinguistics (pp. 400–412). Routledge. Manchón, R. M., & Leow, R. P. (2020). An ISLA perspective on L2 learning through writing. Implications for future research agendas. In R. M. Manchón (Ed.), Writing and language learning. Advancing research agendas (pp. 335–355). John Benjamins. Manchón, R. M., Nicolás-Conesa, F., Cerezo, L., & Criado, R. (2020). L2 writers’ processing of written corrective feedback: Depth of processing via written languaging. In W. Suzuki & N. Storch (Eds.), Languaging in language learning and teaching (pp. 241–265). John Benjamins. Manchón, R. M., & Vasylets, O. (2019). Language learning through writing: Theoretical perspectives and empirical evidence, In J. W. Schwieter & A. Benati (Eds.), The Cambridge handbook of language learning (pp. 341–362). Cambridge University Press. Park, E. S., & Kim, O. Y. (2019). Learners’ engagement with indirect written corrective feedback: Depth of processing and self-correction. In R. P. Leow (Ed.), The Routledge handbook of second language research in classroom learning (pp. 212–226). Routledge. Polio, C. (2012). The relevance of second language acquisition theory to the written error correction debate. Journal of Second Language Writing, 21, 375–389.

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

Roca de Larios, J., & Coyle, Y. (2021). Learners’ engagement with written corrective feedback in individual and collaborative L2 writing conditions. In R. M. Manchón & C. Polio (Eds.), The Routledge handbook of second language acquisition and writing (pp. 81–93). Routledge. Sánchez, A. J., Manchón, R. M., & Gilabert, R. (2020). The effects of task repetition across modalities and proficiency levels. In R. M. Manchón (Ed.), Writing and language learning: Advancing research agendas (pp. 121–144). John Benjamins. Swain, M., & Lapkin, S. (1995). Problems in output and the cognitive processes they generate: A step toward second language learning. Applied Linguistics, 16, 371–391.

361

362

Sophie McBride & Rosa M. Manchón

Appendix 1

Chapter 16. Analysing L2 writers’ processing of written corrective feedback

Appendix 2

363

Afterword Charlene Polio

Michigan State University

In this afterword, I summarize the strengths and challenges of different research methods used to study second language writing processes with reference to the various chapters in the volume. This is followed by a discussion of considerations researchers have to make when choosing a method. Next, I summarize themes that are apparent throughout the volume, and then focus on certain themes that I believe need more attention.

Introduction In Understanding, evaluation, and conducting second language writing research (Polio & Friedman, 2017), a volume on approaches to and tools for studying second language (L2) writing, we included one chapter that focused on the study of the writing process. Specifically, the chapter on retrospective and introspective methods (i.e., primarily stimulated recalls and think alouds) explained how these techniques were used to better understand writers’ processes and their processing of feedback. In the conclusion to the book, we briefly discussed emerging research methods, including eye-tracking and keystroke logging. Now, not many years later, research using these two tools has ballooned, as has research using the previously more established methods for gaining insights into writing processes. It is clearly now time to reflect on how previous research on the writing process and more recent research using newer technologies can complement each other. More specifically, we need to consider the affordances and shortcomings of the various methods or tools. But more importantly, we need to critically examine how data from different sources can be triangulated to avoid reductionism and better understand research on L2 writing processes and feedback processing to improve L2 instruction. Before discussing the themes addressed in this volume, as well as themes that I think can be more fully developed in future research agendas, I summarize the various tools and approaches discussed in the book in Table 1. It should be clear https://doi.org/10.1075/rmal.5.17pol © 2023 John Benjamins Publishing Company

Afterword

that there is no one-size-fits-all method and that no method can fully capture any participant’s process, no matter the writing prompt, context, or person. As noted by Rijlaarsdam, Van Steendam, and van Weijen (Chapter 2), there is no ideal study. Furthermore, despite the prevalent use of the definite article with a singular count noun (i.e., the writing process), there is no one single process, and we are not seeking a universal truth as to how everyone writes (although I will use the singular and plural interchangeably). It should also be clear that many methods can and have been combined. The strengths and challenges of the different methods reported in Table 1 are varied, and because most are discussed at some point in the other chapters, I will not discuss each one individually. Rather, I first detail here the various factors that need to be considered by researchers who study the writing process. This is followed by a summary of issues raised throughout the volume and a discussion of matters that I believe need further attention. Table 1. Strength and challenges of methods for studying the writing process Method

Related chapters

Strengths

Challenges

Think aloud and oral languaging

– Leow & Bowles – Roca de Larios – McBride & Manchón

– Ability to capture online reflection of what the writer is thinking. – Can be used for both writing and feedback processing. – Extensive research on reactivity and veridicality. – Participants can speak in either their L1 or L2.

– Possible reactivity. – Only certain types of information may be accessible. – It is not clear if it is better to perform the think aloud in the L1 or L2 because both have shortcomings. – Might need metalanguage and, hence, it may not be appropriate for young writers. – Training may bias participants if done on a related task. – Cannot be used for collaborative writing tasks. – Time-consuming to transcribe. – Must generally be done in a lab setting. – Needs clear coding schemes. – Relies on self-reported data.

Stimulated recall

– Leow & Bowles

– Can be conducted in either the L1 or L2

– Ideally needs to be done immediately after writing.

365

366

Charlene Polio

Table 1. (continued) Method

Related chapters

Strengths

Challenges

– Guggenbichler et al.

without affecting the writing process. – The researcher can ask focused questions.

– Participants often report what they are thinking at the time of the recall and not at the time of writing. – Time-consuming to transcribe. – Must be done in a lab setting. – Needs clear coding schemes. – Relies on self-reported data.

Eye-tracking

– Johansson et al. – Guggenbichler et al.

– Possibly unobtrusive. – Can access behaviors that are not reportable.

– Specialized equipment and training are needed. – Documents only observable behavior and not thinking/higher-order processes. – Must be done in a lab setting. – Captures only one type of observable behavior. – Used alone can be reductionist.

Keystroke logging

– Johansson et al. – Garcés et al. – Guggenbichler et al.

– Has screen capture software built in. – Serves as an appropriate stimulus for stimulated recall. – Provides comprehensive descriptions of what the writer is doing.

– Requires special (albeit free) software and some training. – Generates massive amounts of cumbersome data. – Cannot be used with handwritten tasks. – Captures only one type of observable behavior. – Used alone can be reductionist.

Screen capture software/ History function

– Séror & Gentil – Pacheco & Smith – Shintani & Aubrey

– Serves as an appropriate stimulus for stimulated recall. – Can capture behaviors beyond typing, such as accessing resources. – Can capture more realistic types of composing, including those outside of the lab.

– Requires special (albeit free) software. – Needs clear coding schemes. – Captures only one type of observable behavior.

Afterword

Table 1. (continued) Method

Related chapters

Strengths

Challenges

Surveys

– Hort & Vasylets

– Can include large numbers of participants. – Can be conducted in the L1 or L2. – The researcher can elicit specific information.

– Relies on self-reported data. – Data are generally not collected immediately after a writing task and thus rely on writers’ memories. – Reliability and validity need to be assessed.

Interviews

– Hort & Vasylets – Pacheco & Smith

– The researcher can modify questions while collecting data. – Can be conducted in the L1 or L2. – Can consider writing holistically. – The researcher can elicit specific information.

– Time-consuming. – Relies on self-reported data. – Greater chance for observer effects. – Time-consuming to transcribe. – Can be reactive in longitudinal studies.

Process logs

– Hort & Vasylets

– Can be conducted in the L1 or L2. – Don’t need to be transcribed. – Can capture more realistic types of composing including those outside of the lab.

– Relies on self-reported data. – Can be reactive.

Written – Suzuki et al. verbalizations – McBride & Manchón

– Can be conducted in the L1 or L2. – Don’t need to be transcribed.

– Generally limited to research on feedback. – Relies on self-reported data.

Collaborative dialogue

– Real representations of the collaborative process including choice of L1 or L2 use. – Can be used as the stimulus for stimulated recall.

– Can be used only in collaborative writing settings. – Generally done only in lab/ classroom settings. – Interaction will vary not only by individuals but also by group dynamics.

– Pacheco & Smith – Coyle

367

368

Charlene Polio

Table 1. (continued) Method

Analysis of revisions to texts

Related chapters

– Coyle – Shintani & Aubrey

Strengths

Challenges

– Can be used with younger writers.

– Time-consuming to transcribe. – It is not clear how the choice of language will affect results.

– Real representation of what the writer is doing.

– Coding can be difficult. – Provides information only on the final outcomes of the writing process and not on the writers’ thoughts.

Factors affecting choice of method (and vice versa) Ideally, the focus of and motivation for research should be the primary factor determining which method or methods are used. However, this is not always the case because sometimes the tool limits the scope of the research, the selected participants, the number of participants, and the writing conditions. For example, Roca de Larios (Chapter 10) wanted to discuss the challenges involved in the elaboration of a coding scheme intended to understand the role of genre knowledge in the composing process across writers’ L1s and L2s. To do this, he revisited thinkaloud data from two one-hour writing tasks completed in a lab. It is unlikely here that the focus of the research -genre knowledge- motivated the use of a timed task, as other studies of genre knowledge tend to use more realistic writing tasks (e.g., Kessler, 2021). Rather. the use of the method, think alouds, motivated the writing condition. Johansson, Johansson, and Wengelin (Chapter 9) say, with regard to eye-tracking, “The data output is often rich and time-consuming to interpret, and studies which limit their focus to well-defined (experimental) tasks will probably have an easier task” (p. 197), implying that the method affects the task. That said, the following factors can affect the choice of method.

Writing conditions and prompts I would argue that the majority of studies on the writing process are conducted in laboratory settings with time limits. One reason for this is that researchers can limit writers’ use of outside sources and assistance, although a more likely reason is the required access to the technological tools needed for data collection, such as computers and recording devices. For example, the increase in the number of

Afterword

eye tracking and keystroke logging studies has only increased lab-based studies. In these cases, the technology may actually influence the writing conditions and not the other way around. More specifically, if one wants to conduct a study using eye tracking, the study needs to be done in a lab with some type of time limit and, generally, no use of outside sources. If researchers want to study how writers write in extended conditions, namely over time and consulting other sources and collaborators, different methods need to be used, as in Pacheco and Smith (Chapter 13), who used screen capture software as well as observations of workshops and collaborative sessions.

Participant characteristics A wide variety of participant characteristics, as well as sample size, can influence researchers’ choices of data collection methods. These include age, language proficiency in the L2, literacy in the L1 and L2, education level and language awareness, and access to and comfort with technology and typing. As discussed by Andringa and Godfroid (2020), much research in Applied Linguistics has focused on western, educated, industrialized, rich, democratic (WEIRD) populations, and research on the writing process is probably more so. Given the tools that have been used to study the writing process, it makes sense that WEIRD populations have been more fully studied. For example, tools such as think alouds and stimulated recalls arguably require the use of vocabulary and the verbalization of thought processes that younger and less proficient writers may not have access to. In addition, more educated and literate participants are more likely used to talking about writing. They are also more familiar with technology and typing. Children also usually have less developed language and metacognitive awareness and are less proficient at typing, so using retrospective and introspective methods as well as keystroke logging and screen capture software may be more challenging. However, Garcés, Criado, and Manchón (Chapter 11) and Coyle (Chapter 14) were able to study children’s processing of feedback using keystroke logging software and collaborative dialogue, respectively. Garcés et al. addressed the problem of children’s typing skills, which might cause more or longer pausing, while Coyle was able to view children’s processing of feedback through a regular classroom task. More educated participants might be able to talk about their writing while doing it or while reflecting back on the process. Nevertheless, their language proficiency, particularly in think alouds, may limit what they can say in the L2. Conducting a think aloud in multiple languages raises other possible concerns,

369

370

Charlene Polio

particularly that it may not reflect how the participants actually use multiple languages and that it may increase reactivity.

Focus of the research This volume covers research on both writing processes and feedback processing. This distinction alone will affect a researcher’s choice of methods. I would also argue that how learners process and use feedback is somewhat easier to study than writing from scratch. Processing and using feedback is a more defined task with clearer parameters. Any of the data collection methods shown in Table 1 can be used to study how learners process feedback, but just as with any studies of the writing process, no method can fully capture how learners process feedback. Regarding the writing process, certain research instruments lend themselves better to more cognitively-oriented studies while others to more socially-oriented studies. For example, eye tracking studies, by their nature, can capture only a small part of what a writer is doing. The benefit, however, is that through eye tracking, researchers can uncover cognitive processes that are not accessible to writers. Researchers interested in social factors that influence writers need to use methods that are more holistic. These issues are discussed in the next section, where I address the tension between quantitative and qualitative research as well as reductionism and authenticity.

One-shot or longitudinal research Reactivity has long been a concern for those using think alouds to study the writing process (see Leow & Bowles, Chapter 5). Specifically, many studies have examined how or if talking about what participants are thinking while they write changes the process of writing or the final written product. Of course, the reactivity of think alouds can be affected by participant factors, the language(s) of the think-aloud, and the type of writing task. However, reactivity is also a major concern in the use of any method in longitudinal research. For example, retrospective methods such as stimulated recall sessions cannot affect the focus of the writing session because it has passed, but they can affect how writers think about their writing. During stimulated recall sessions, writers may notice that they did not edit carefully, and this may then change their editing process in future writing tasks. This is addressed further in a later section, where I discuss how students can learn while being research participants and how we can use process research tools in the classroom, but anyone conducting longitudinal research needs to be aware of the effects of the research process on learning.

Afterword

Summary of major themes addressed in the book Several themes and subthemes appear throughout the chapters in the book, and I summarize four of them here. I also elaborate on themes that are not discussed in as much detail or that I think need more attention in future research.

Ecological validity, external validity, and authenticity Much discussion is devoted to whether or not the data collection process is able to capture what we might call real-life writing. One issue is how the data collection process might change how one writes, namely, the reactivity issue (discussed further below), but related to this is whether or not the writing task itself is a representation of what we are trying to study, or what we might call authenticity. Much has been written about authenticity in language assessment, and there is not space here to review the various conceptualizations and facets of the term (but see Harsol, Zakarai, & Aryadoust, 2022, for a comprehensive overview). In its simplest terms, we can think of authenticity in terms of what Bachman (1991) called situational authenticity, namely, how well the features of the test (or research) task match the features of the actual task. The concept of authenticity is not directly addressed in the chapters in the book, at least in the sense that it is used in language testing; rather, the representativeness of the task is discussed in the contexts of external and ecological validity. External validity refers to the extent to which the results of a study can be extended to other contexts or populations. So even if a data collection method does not change the process, the process associated with the writing tasks may not be, and almost certainly cannot be, extended to other writing tasks. Ecological validity is similar and refers to “whether an effect has been shown to operate in conditions that often occur for people in the population of interest” (Wegener & Blankenship, 2007, p. 275), which in the context of writing studies would be the classroom or real-life writing tasks. Given that the majority of writing process studies use timed writing tasks, it is difficult to say how well study results extend to real-world tasks, unless that real-world task is a timed test (Guggenbichler, Eberharter, & Kremmel, Chapter 12). Because we can never fully sample writers’ processes in a range of real-life conditions, Rijlaarsdam, Van Steendam, and van Weijen (Chapter 2) mention the option of using more than one writing task. Coyle, Nicolás-Conesa, and Cerezo (Chapter 3) note that most of the studies on written corrective feedback have been on timed writing on topics proposed by the researcher making them less authentic than assignments that might be part of a real language curriculum.

371

372

Charlene Polio

Finally, several chapters in the book discuss digital and multimodal writing, which is now the default in how most people write and, increasingly, in what they write. Thus, we need to explore methods that can tap into writers’ dances among multiple screens and sources (see also Tan, 2023 for a recent study). For example, Séror and Gentil (Chapter 7) state that screen capture technologies provide “a window into processes that have frequently remained largely unobserved and invisible, opening the proverbial ‘black box’ of writing processes (Séror, 2013), especially when studying writing in natural, non-controlled, non-laboratory conditions” (p. 142).

Reductionism, triangulation, and the relationship between the parts and the whole The majority of writing process research has been quantitative, which is likely because of its cognitive focus, but this is somewhat surprising given its messy and complex nature. As a result, some of the research is reductionist in that it focuses on such a small aspect of writing, namely, the point of inscription. As a result, we sometimes lose sight of the whole picture including what happens before the point of inscription. As Manchón (2021) said, “process-oriented research has traditionally prioritized the study of the act of writing itself ” (p. 87). The same is generally true of revision studies: many studies of revision focus on changes in response to written corrective feedback and not on what happens when we step away from the text, think about it, and return to it later. Reductionism has been a concern throughout the social sciences, particularly by qualitative researchers, who often problematize the examination of specific phenomena in isolation. Some applied linguists use a complex systems framework to view language learning by examining individual parts (e.g., LarsenFreeman & Cameron, 2008). I do not advocate this framework for studying the writing process. Instead, the trend toward triangulation, as seen throughout this volume, helps us broaden the description of what is happening. For example, through keystroke logging, we can identify pausing behavior, which in and of itself might not tell us much. When combined with stimulated recall, however, we can more thoroughly describe the process. Hort and Vasylet (Chapter 4) highlight the importance of triangulation when using interviews, questionnaires, and process logs, which all rely on learner self-report data. Leow and Bowles (Chapter 5) and Roca de Larios (Chapter 10) explain that verbal reports can be combined with a range of methods, and Johansson, Wengelin, and Johansson (Chapter 8) and Guggenbichler et al. (Chapter 12) discuss combining keystroke logging with other methods.

Afterword

Related to reductionism is the tension between quantitative and qualitative research addressed throughout the volume. Séror and Gentil (Chapter 7) address this issue directly in their discussion of screen capture technologies (SCT): While SCT data does not always offer the same degree of quantitative precision as the automated breakdown and timing of composition events associated with keystroke logging, it does offer an opportunity to record students’ writing processes in naturalistic environments since students can download and use screen capture software on their machines. This allows authentic and situated insights which are often closer to the everyday reality of students as they record themselves completing assignments on their computers, with access to their resources at a time and in a location of their choosing. Indeed, SCT offers valuable support for ethnographic and case study approaches to writing when combined with other forms of data).

We also should not lose sight of the body of ethnographic research, such as Lillis and Curry’s (2010) work on how international scholars go about publishing, illustrating the writing process at the macro level. Manchón (2021) highlights several studies that use ethnographic methods to study the process that can serve as models for research with expanded foci, including Hort (2017), which used mobile process logs to capture a fuller picture of how students write.

Population variability and expansion to non-WEIRD populations As mentioned earlier, external validity concerns the generalizability of findings to other contexts and populations. The field of second language learning and teaching is now acknowledging the importance of studying less traditional populations. Several chapters mention this issue in passing. For example, Leow and Bowles (Chapter 5) state that more research on think alouds and stimulated recall with different populations is necessary, and McBride and Manchón (Chapter 16) note that they are aware that their study was done with L2 writers with linguistics and languages backgrounds. This is particularly an important point, given that those who study language have the vocabulary and concepts needed to talk about writing. Furthermore, Suzuki, Ishikawa, and Storch (Chapter 6) explain that written verbalizations may not be appropriate for students of limited proficiency (if the verbalization is completed in the L2). Garcés, Criado, and Manchón (Chapter 11) specifically contribute to the issue of working with younger learners by explaining how keystroke logging software analyses can be adapted. Their study of how learners engage with feedback is also significant because it was conducted in a classroom, something which is arguably more important when working with young learners than adults as children might

373

374

Charlene Polio

not understand the value of writing in a lab setting. Coyle (Chapter 14) also reports on a study of how 10–11 year-olds process feedback given through model texts. She illustrates how collaborative dialogue, despite some coding challenges, may be useful for working with children. While there is now more discussion of the challenges of working with younger learners, research on adults with limited literacy is scarce. A special issue of the Journal of Second Language Writing (Pettitt, Gonzalves, Tarone & Wall, 2021) recently addressed this population. Gonzalves (2021), in particular, was able to understand the challenges adult students faced in writing in any language for the first time by observing them in class and looking at their text production. She noted that Fiona, a 69-year-old Eritrean woman with no formal schooling, “knew to begin writing near the top of the page, starting from the left-hand side of the page and proceeding towards the right in a linear fashion, and then continuing on the line below,” (p. 5), yet she had trouble with spacing between words and started writing with the notebook paper upside down. It becomes clear in Gonzalves’s description that most of the tools covered in this book would not work well with adults with little formal schooling, as even obtaining consent is challenging, as she explains. An additional understudied population includes adults who write for nonacademic occupational purposes. While there are needs assessments of what such writers may need to do (e.g., Chan, 2019), few studies have attempted to follow non-student adults as they write workplace genres (but see Leijten et al., 2014, as an example of a case study). Such populations, whether well-educated or not, most likely think about writing differently. Furthermore, the writing tasks would differ from those used in studies with language student populations.

Data reduction/presentation/coding Despite the challenges in collecting data on the writing process, I would argue that the data collection process itself is actually easy compared to what is involved in coding and reducing the data for analysis and presentation. While we may be tempted to use keystroke logging and eye-tracking because the cool kids are using them, they both generate huge amounts of data that may be difficult to interpret and present. Going into these methods, in particular, without a clear sense of how to analyze the data, can be a waste of time. As an example, think-aloud and stimulated recall data can be notoriously challenging to present in written form. Readers of the research should be able to see both what the writers have verbalized and what they have written. If the data are to be analyzed quantitatively, reliable coding is needed. Even dividing the transcripts into discrete units of analysis can be confusing as the richness

Afterword

and complexity of the data may obscure the boundaries between categories (e.g., Roca de Larios, Chapter 10, for the distinction between listing and argumentative goals). Many researchers have addressed this issue and proposed coding schemes, notably López-Serrano, Roca de Larios, and Manchón (2019), who describe a complex system of coding language-related, think-aloud episodes. In addition, Coyle (Chapter 14) describes a way to code children’s engagement with feedback. Yet, as Coyle acknowledges, achieving intercoder reliability can be difficult and was not assessed in her study. Séror and Gentil (Chapter 7) explain some options for dividing screen capture videos into units of analysis and provide examples of studies where intercoder reliability is addressed. Johansson, et al. (Chapter 8) describe ways of dealing with the massive amount of data generated by keystroke logging, where even defining a pause is not straightforward. Furthermore, presenting the data for those unfamiliar with such data is not an easy task. Each method has its challenges, and there is no one correct way to present and code the data. In sum, researchers need to present data in ways that those who do not use the various tools can understand. They also need to be clear about their coding decisions and assess the reliability of their codes.

Themes needing more attention In this section, I address the theoretical underpinnings of writing process research (or lack thereof ), the learning potential of writing process research methods, and the influence of writing process studies on teaching (or lack thereof ). Because I believe that each of these areas needs more attention, I end each section with an example of a specific research task.

Theoretical bases for studying the writing process Leow and Bowles (Chapter 5) are indeed correct when they say that think alouds are atheoretical. In fact, most of the tools addressed throughout this volume are not related to one particular theory of second language writing or second language acquisition. With regard to SLA-oriented writing research, Polio and Kessler (2019), in their discussion of SLA theory and writing, described studies conducted within usage-based frameworks, sociocultural theory, and skill acquisition theory, none of which, not surprisingly, are addressed in this volume. Polio (2012) argued that these theories could also be invoked in discussions of corrective feedback. Manchón and Roca de Larios (Chapter 1) frame studies in terms of “cognitive, socio-cognitive, sociocultural, sociomaterial, social semiotic and

375

376

Charlene Polio

translingual literacy approaches” (p. 25), but I would argue that we are far from a predictive theory that we can use. The writing process, of course, draws on both unconscious cognitive processes and conscious decisions related to audience and context. We may turn to theories of genre to address the latter. I return to this point below. Rijlaarsdam et al. (Chapter 2) state, “As with all data, these scores are just data, they do not tell a story. It is the researcher who has to construct the story, guided by theory or hypotheses.” (p. 55). They further state, “[w]e hope that writing researchers will distill from this chapter those elements that they may take into account in their own context. In that way, we hope to have made a contribution to theory-building on the construct of (L2) writing process research and thus help move the field forward.” (p. 55). In fact, they do not directly address how descriptions of the writing process build theory. They mention that several studies draw on Flower and Hayes’s (1981) model, which some may argue is not a theory but a description of what goes into writing. Yet that model and the revised Hayes (1996) model, which took into account some of the more social aspects of writing, such as audience, is still heavily cognitive. This model was also modified to account for the use of digital resources (Leijten, Van Waes, Schriver, & Hayes, 2014) and includes more reference to social context because it was based on a case study of workplace writing. Theories (or schools) of genre are generally social and view writing as conforming to social conventions (such as the ESP approach among others) or as choices writers make as they produce language to convey certain meaning to certain audiences (such as systemic function approaches). No one really sees writing as a purely cognitive process, and while some chapters try to include more contextual aspects of writing, not many get at genre knowledge or audience awareness. We write for audiences including their genre expectations, but these issues are not addressed. Roca de Larios (Chapter 10) is the only author who brings in Tardy et al.’s (2020) model of genre knowledge and awareness. Ideally, we should be able to describe how writers generate text while drawing on both their linguistic knowledge and genre knowledge. As an example of a research task, we might ask writers to compose a text based on a specific set of task and audience parameters (such as an email to a professor). Through a modified stimulated recall, we could ask questions that fall at the intersection of language and genre regarding their linguistic choices.

The learning potential of research methods Rijlaarsdam et al. (Chapter 2) characterize the methods used to study the writing process. In the column called impact on the learner, they characterize those meth-

Afterword

ods as being intrusive or non-intrusive. Another way to think about the impact is to consider the learning potential of the method. Awareness of how one writes has been considered a key component in what some have called process writing. In an early characterization of what a process approach is, Susser (1994) explained that making students aware of the writing process was inherently a good thing. The ultimate impact of how much teachers know about learners’ processes, as discussed in the next section, is unclear, but arguably, there seems to be an assumption that the more students know about how they write, the better they will write. With exception of maybe eye-tracking, most of the research methods addressed throughout this book have the potential to be used as a teaching tool. As one example, Lim, Tigchelaar, and Polio (2022) showed that simply having students watch a screen capture video of their writing (collected via Inputlog) resulted in the students noticing and correcting errors, without any prompting from the researchers. Furthermore, Brooks and Swain (2009) specifically discussed how the use of researcher-guided (thus modified) stimulated recall sessions resulted in a focus on language and improvements in written texts. Both of these examples relate nicely to sociocultural approaches to learning whereby learners can do more with help before moving on to completing a task independently. Sometimes the only intervention needed is for students to observe their writing process, and sometimes some researcher or teacher prompting is needed for them to address problems (e.g., as shown in Aljaafreh & Lantolf, 1994). McBride and Manchón’s (Chapter 16) research questions directly address this topic. They ask, “To what extent is engagement with feedback influenced by the way in which participants are asked to reflect on the WCF provided on their writing?” They argue that students in the written languaging plus think aloud group were “the most involved in terms of both time spent on task and overall cognitive effort.” (p. 357). This is an important observation in that it shows how many methods slow down the writing process so that learners can focus on language (or other aspects of their writing). Their study, along with Lim et al. (2022) and Brooks and Swain (2009), suggests that the research tools can act as an intervention for subsequent revisions. Suzuki, Ishikawa, and Storch (Chapter 6) say, “We argue that written verbalizations may also be valid data collection procedures, but that, under some circumstances, written verbalizations (and oral verbalizations, see Swain, 2006a, 2006b) are part of the L2 learning processes, not just a medium of data collection.” (p. 123). In other words, reactivity may be advantageous from a learning perspective. None of the chapters in the book provide detailed explanations of how to implement these methods in the classroom, but it would not be difficult to conduct some longitudinal classroom-based studies. For example, one group of students could watch videos of their own process and talk about what they were

377

378

Charlene Polio

doing while a second group simply reread their writing. We might predict that the stimulated recall group would pay more attention to types of language errors that they consistently make with improvement in subsequent writing. Similarly, a similar study could be conducted with think alouds to examine the learning potential of such an activity.

How understanding the writing process has or has not improved teaching Johansson et al. (Chapter 8) discuss how keystroke logging has been used to study writers who have participated in interventions, and McBride and Manchón (Chapter 16) examine different ways that learners might engage with feedback. Nevertheless, many studies of writing processes are descriptive and do not really provide clear implications for teaching. It may be controversial to say that all studies need to have teaching implications, but my stance is that our goal should be to improve teaching and learning. Shintani and Aubrey (Chapter 15) attempt to consider instructional implications by, as they say, maintaining a balance between internal and ecological validity. Nevertheless, actually implementing their procedure in most classrooms would be challenging. Several of the chapters in the book address how the various tools can differently affect the writing of different individuals. For example, Suzuki et al. (Chapter 6) mention that working memory can affect written verbalizations and Johanson et al. (Chapter 8) talk about the effects of typing and technological skills on the output from keystroke logging software. However, individual difference research (such as research focusing on working memory) does not translate well to the classroom. For example, we know that the expanding list of individual differences such as motivation, aptitude, grit, anxiety, and so on affect learning. But, ultimately, teachers are faced with varying populations, making it impossible to differentiate instruction for those with different degrees of aptitude, for example. Even studies of engagement with written corrective feedback do not have clear links to acquisition. It is likely that learners need to be engaged with feedback, but long-term effects of this engagement are not clear and have not been thoroughly investigated. Thus, examinations of the writing and revision process need to be combined with longitudinal studies of feedback. For example, consider a widely cited feedback study such as Hartshorn et al. (2010). After implementing an intensive cycle of what they called dynamic feedback, they showed that writers’ accuracy improved. To supplement their study, they could have periodically collected think-aloud data during the revision process to help us better understand what about dynamic feedback led to improvement.

Afterword

Conclusion As noted throughout this volume and in this chapter, there is a tension between the writing process at the macro versus micro levels, qualitative versus quantitative research, natural versus laboratory settings, and socialization versus SLAoriented research. This tension is beneficial because it means that the scope of research on the writing process has expanded. It is not so good because what constitutes research on what we call the writing process has become difficult to delineate, but it is not my role to limit what we should be studying. Nevertheless, given the complexity and variability of how second language writers write, I think we need to pull back on descriptive research that may or may not add to models of the writing process. Rather, we need to focus more on authentic writing tasks, classroom settings, and diverse populations, which many chapters in this volume have done, as well as study how the writing process (and its outcomes) are affected by teaching interventions. Although I ended on a somewhat negative note about the descriptive nature of the research and the lack of teaching implications, progress is obvious with regard to data triangulation, more authentic prompts, more rigorous coding methods, and studies of underrepresented populations.

References Aljaafreh, A., & Lantolf, J. P. (1994). Negative feedback as regulation and second language learning in the zone of proximal development. The Modern Language Journal, 78(4), 465–483. Andringa, S., & Godfroid, A. (2020). Sampling bias and the problem of generalizability in applied linguistics. Annual Review of Applied Linguistics, 40, 134–142. Bachman, L. F. (1991). What does language testing have to offer? TESOL Quarterly, 25(4), 671–704. Brooks, L., & Swain, M. (2009). Languaging in a collaborative setting. In A. Mackey & C. Polio (Eds.), Multiple perspectives on interaction: Second language research in honor of Susan M. Gass. (pp. 58–89). Routledge. Chan, C. S. (2019). Long-term workplace communication needs of business professionals: Stories from Hong Kong senior executives and their implications for ESP and higher education. English for Specific Purposes, 56, 68–83. Flower, L., & Hayes, J. R. (1981). A cognitive process theory of writing. College Composition and Communication, 32(4), 365–387. Gonzalves, L. (2021). Development of copying skills in L2 adult English learners with emergent print literacy. Journal of Second Language Writing, 51. Hartshorn, K. J., Evans, N. W., Merrill, P. F., Sudweeks, R. R., Strong-Krause, D., & Anderson, N. J. (2010). Effects of dynamic corrective feedback on ESL writing accuracy. TESOL Quarterly, 44(1), 84–109.

379

380

Charlene Polio

Hasrol, S. B., Zakaria, A., & Aryadoust, V. (2022). A systematic review of authenticity in second language assessment. Research Methods in Applied Linguistics, 1(3). 1000023. Hayes, J. R. (1996). A new framework for understanding cognition and affect in writing. In C. M. Levy & S. Ransdell (Eds.), The science of writing: Theories, methods, individual differences, and applications (pp. 1–27). Lawrence Erlbaum Associates. Hort, S. (2017). Exploring the use of mobile technologies and process logs in writing research. International Journal of Qualitative Methods, 16(1). Kessler, M. (2021). The longitudinal development of second language writers’ metacognitive genre awareness. Journal of Second Language Writing, 53. Larsen-Freeman, D., & Cameron, L. (2008). Research methodology on language development from a complex systems perspective. The Modern Language Journal, 92(2), 200–213. Leijten, M., Van Waes, L., Schriver, K., & Hayes, J. R. (2014). Writing in the workplace: Constructing documents using multiple digital sources. Journal of Writing Research, 5(3), 285–337. Lillis, T., & Curry, M. J. (2010). Academic writing in a global context: The politics and practices of publishing in English. Routledge. Lim, J., Tigchelaar, M., & Polio, C. (2022). Understanding text-based studies of linguistic development through goals for academic writing. Language Awareness, 31(1), 117–136. López-Serrano, S., de Larios, J. R., & Manchón, R. M. (2019). Language reflection fostered by individual L2 writing tasks: Developing a theoretically motivated and empirically based coding system. Studies in Second Language Acquisition, 41(3), 503–527. Manchón, R. M. (2021). The contribution of ethnographically-oriented approaches to the study of L2 writing and text production processes. In I. Guillén-Galve & A. Bocanegra-Valle (Eds.), Ethnographies of academic writing research: Theory, methods, and interpretation (pp. 83–104). John Benjamins. Pettitt, N., Gonzalves, L., Tarone, E., & Wall, T. (2021). Adult L2 writers with emergent literacy: Writing development and pedagogical considerations. Journal of Second Language Writing, 51. Polio, C. (2012). The relevance of second language acquisition theory to the written error correction debate. Journal of Second Language Writing, 21(4), 375–389. Polio, C., & Friedman, D. A. (2017). Understanding, evaluating, and conducting second language writing research. Routledge. Polio, C. & Kessler, M. (2019). Teaching L2 writing: Connecting SLA theory, research, and pedagogy. In N. Polat, T. Gregersen, & P. MacIntyre (Eds.), Driven pedagogy: Implications of L2 theory and research for the teaching of language skills (pp. 76–96). Routledge. Susser, B. (1994). Process approaches in ESL/EFL writing instruction. Journal of Second Language Writing, 3(1), 31–47. Swain, M. (2006a). Verbal protocols: What does it mean for research to use speaking as a data collection tool? In M. Chaloub-Deville, M. Chapelle, & P. Duff (Eds.), Inference and generalizability in applied linguistics: Multiple research perspectives (pp. 97–113). John Benjamins. Swain, M. (2006b). Languaging, agency, and collaboration in advanced second language proficiency. In H. Byrnes (Ed.), Advanced language learning: The contribution of Halliday and Vygotsky (pp. 95–108). Continuum.

Afterword

Tan, X. (2023). Stories behind the scenes: L2 students’ cognitive processes of multimodal composing and traditional writing. Journal of Second Language Writing, 59. Tardy, C.M., Sommer-Farias, B., & Gevers, J. (2020). Teaching and researching genre knowledge: Toward an enhanced theoretical framework, Written Communication, 37(3), 287–321. Wegener, D., & Blankenship, K. (2007). Ecological validity. In R. F. Baumeister & K. D. Vohs (Eds.), Encyclopedia of social psychology (Vol. 1, pp. 275–276). Sage.

381

Index A Action research 273 Activity Theory 9, 13, 31, 101, 149, 152, 272 Applied Instructed Second Language Acquisition (ISLA) 63, 75, 76, 108 Appraisal 218 Aptitude 133, 204, 378 Argumentative essays 88, 110, 168, 169 Argumentative genre 205, 209, 214 Argumentative goals 22, 202, 205, 206, 211, 375 Argumentative moves 22, 205–211, 219 Argumentative strategies 216, 217 Argumentative tasks 38, 63, 88, 110, 204 Argumentative texts 22, 63, 169, 202, 206 Audio-logs 90 Automated writing evaluation (AWE) 64. see also Grammarly Awareness 61, 62, 73, 76, 77, 106, 116, 123, 156, 238, 261, 369, 376, 377 digital awareness 146 levels of 19, 20, 24, 66–71, 73, 115, 116, 300, 340, 349, 351, 352 metalinguistic awareness 60, 61, 292, 358 metamodal awareness 273 AZERTY keyboard 172 B Blogs 64 Blueberry Flash Recorder 148 Burst 36, 47–50, 149, 168, 170, 225, 230 Formulation burst 48 P-burst 38, 42, 47–49, 51, 169, 170. 256

R-burst 42, 47, 49, 51, 170 C Camtasia 148, 276, 279, 281, 284, 286 CAQDAS 150–152 Case study designs/research 62, 64, 91, 93, 144, 179, 183, 273, 373 Causal model of writing processes 34, 36, 38–45, 51, 52, 58 Classroom observation 65, 72, 285 Clausal transitions 309–310 Clausal units 299, 309, 310, 312 Clause 299–301, 305–310, 327 Pre-clause 299, 309–310 Proto-clause 299, 307–310 Cognitive validity (of writing tests) 22, 249, 250, 265 Cognitive (writing) processes/ processing 60, 74, 87, 91, 105, 123, 131, 135, 143, 152, 156, 185, 203, 210, 224, 229, 317–319 Cognitive (writing) strategies 68, 69, 71, 72, 74, 87, 95, 151 Collaborative dialogue 23, 70, 292, 293, 295, 312, 340, 367, 369, 374 collaborative dialogue protocols 23, 293, 295, 312 Collaborative talk 107, 295, 300 Common European Framework of Reference for Languages (CEFR) 90, 130, 251, 341 Composing Dynamics/temporal development of 40–42, 52 Space and visual elements while composing 21, 145, 152 Composing actions/moves 19, 40–43, 45–48, 50, 51, 53 Composing processes. see Writing Processes

Composing strategies. see Writing Strategies Comprehended input 300, 308 Computer-mediated communication (CMC) environments 315 Computer-mediated communication (CMC) feedback 316–317 Computer-mediated communication (CMC) technology 316 Concurrent data elicitation methods & procedures 19, 66, 60–71, 104, 109, 116. see also Think-aloud protocols, Noticing tables, Written languaging tables Construct validity. see Validity framework Copy task function (of Inputlog) 231, 263 Correlational designs/studies 16, 62, 94 CyWrite 163, 186, 187 D Depth of processing (DoP) 14, 24, 61, 66, 68, 69, 70, 71, 73, 74, 76, 77, 106, 112, 115, 116, 132, 312, 338–340, 343–359, Descriptive studies 16, 77, 194, 378 Diaries 20, 123, 125 Digital diaries 92 Learning diaries 91 Time-use diaries 91 Digital composing/writing 10, 11, 16–18, 21, 27, 60, 76, 148, 161, 167, 168, 183, 187, 195, 196, 231, 235, 241, 260, 270, 272, 275, 279, 316 Digital composing/writing processes 187, 195, 196, 260 Digital feedback 19, 23, 64, 76

384

Research Methods in the Study of L2 Writing Processes

Digital genres 64 see also Blogs, Digital stories, Tweets Digital L2 literacies 76 Digital multimodal composing 76, 269, 270, 272, 273, 277, 280, 282 Digital multimodal composing processes 272 Digital screen capture 67, 68, 70, Digital stories 64 Digital tools (to capture/record writing processes) see also Eye-tracking, Keystroke logging, Inputlog, Screen capture Digital tools (for writing) 141, 161, 270, 272, 286 Draft revisions 257 Dual writing process model 43 E Ecological validity 19, 97, 147, 149, 194, 195, 233, 315, 331, 371, 378 Elaboration 12, 14, 61, 73, 86, 115, 132, 202, 211, 218, 243 ELAN 151–153 Electronic writing see digital writing Emergent bilinguals (EBs) 23, 269, 270, 272, 274, 277, 286, 287 End revisions 170, 257 Engagement with feedback 16, 19, 23, 62, 64, 69, 73, 74, 76, 77, 292, 294, 295, 340, 346, 347, 350–353, 357, 378 Error correction. see also Written corrective feedback 111, 132, 183, 294, 311, 346, 348, 352 Ethnographic research and methods 13, 144, 272, 273, 278, 273, 295 Event detection algorithm 186, 193, 195 European Writing Survey (EUWRIT project) 85 Experimental designs/research 61, 62, 109, 112, 162, 177, 183, 187, 195, 196, 330 Explicit planning episodes/ processes 257, 258

Exploratory designs/research 63, 94, 183, 187, 311, 317 Ex post facto designs 196 External validity 371–373. see also Validity framework Eye & Pen 184, 186, 194 EyeLink 185 Eye–mind hypothesis 184 Eye-tracking 65, 68, 70, 71, 115, 116, 183–196, 219, 234, 247, 250, 252, 366, 368–370, 374. see also EyeWrite, Event detection algorithm, Gaze, Fixations, Saccades, Smooth pursuit Eye tracking glasses 192 EyeWrite 163, 184, 186, 187 F FAU-word 162 Features noticed 68, 73, 303 Feedback see Engagement with feedback, Feedback and language learning, Feedback literacy, Feedback processing, Feedback processing framework, Feedback processing studies, Multimodal feedback, Written corrective feedback;Written corrective feedback timing;Written corrective feedback types Feedback and language learning 60, 62–64, 69, 71, 74, 77, 116, 292–295, 304–313, 316, 320, 330–332, 338 Feedback literacy 74, 76, 77 Feedback processing 10, 14, 16, 60, 70, 72, 74, 76, 77, 112, 131–133, 137, 293. 294, 297, 306, 310, 313, 337–340, 353, 355, 358, 359, 365. see also Depth of processing, Engagement with feedback. Feedback Processing Framework 339 Feedback processing model 339 Feedback processing studies 19, 60–65, 67, 68 Interventionists 19, 60–65 Non-interventionists 19, 60–64 Feedback scope 63, 319, 320

Feedback types 63, 111, 133, 319, 320. see also Models, Reformulations Fixations 67, 185, 187, 193–195 Fluency 36–38, 46, 48, 53, 106, 109, 170, 171, 189, 190, 230, 238, 241, 257, 259, 299 Formulation (as writing process) 11, 22, 36, 46, 47, 30–53, 205, 259, 260, 261, 264, 296. see also Transcription G Gaze 183–187, 189–197 GenoGraphiX-Log 163 Genre 22, 54, 63, 202–205, 209, 214, 218, 368, 374, 376. see also Argumentative Global planning 233, 240 Goals 20, 22, 36, 39, 40, 44–46, 51, 75–77, 125, 132, 152, 202–206, 209–219, 249, 282, 375. see also Argumentative goals Grammarly 64 I IKI (Interkeystroke interval) 230, 236 Individual differences 11, 76, 90, 115, 133, 146, 168, 169, 260, 338, 378. see also Learner factors/ variables Input variables/factors 39, 233 Inputlog 37, 163, 165, 166, 169–175, 177, 184, 194, 225, 226, 229–243, 249, 251. 253, 254, 257 Inputlog analyses General analysis 165, 225 Summary analysis 225 Pause analysis 166, 225, 236 Revision analysis 225–226 Source analysis 233 Instructed Second Language Acquisition (ISLA)-applied 63, 75 Interkeystroke interval (IKI) 230 Internal validity 116, 331. see also Validity framework Interviews 89, 91, 94, 96–99, 367 Open-ended interviews 151 Retrospective interviews 89, 146, 162, 179, 234

Index

Retrospective design interviews 269, 272, 275, 277, 280–282, 286 Screenside-shadowing interviews 273 Semi-structured interviews 89, 142, 251 Stimulated recall interviews 142, 251, 253 Structured interviews 89 Unstructured interviews 89 Inventory of Processes of College Composition (IPIC) Questionnaire 86 J Journals 20, 90, 91, 94, 125, 129, 132, 286 K Keyboard gazers 189, 194 Keystroke logging 37, 50, 51, 67, 68, 70, 161–163, 165, 167–179, 183, 184, 189, 194, 224, 225, 233, 238, 242, 247, 248, 249, 250, 256–259, 265, 366, 369, 372, 373–375, 378 L L1 use in L2 writing 36, 38, 48, 92, 156, 298–301, 309, 312, 353, 355, in verbalizations/reporting 106, 112, 127, 130–132, 135, 138, 342, 346, 349, 350, 365, 367 Language-related episodes 73, 74, 345, Language-related problem spaces 74 Language testing (and writing processes) 247–250, 255, 264, Languaging 65–70, 72, 123, 125, 136, 337–346, 345, 351, 352, 355, 358, 365, 377 Learner factors/variables 38, 39, 133, 312, 331. see also Aptitude, Cognitive styles, Motivation, Learner strategies, Working memory Learner strategies. see Strategies Learning diaries. see Diaries

Lexical searches 205, 212, 213, 217, 296, 298, 307 Likert scale 85, 88 Linguaskill writing test 250, 251, Literacy. see Computer Literacy, Feedback literacy, Literacy approaches, Literacy development, Literacy events, Literacy practices, Multimodal literacy, Online literacies, Translingual literacy Literacy approaches 25 Literacy development 9, 13 Literacy events 146, 147, Literacy practices 7, 13 Logs Audio-logs 90 Process logs 84, 90, 91–94, 97, 99, 146, 367, 372, 373 Scripted logs 144 Writing logs 142, 172 M Metacognitive processes 36, 40, 67, 69, 72, 143. see also Writing processes Metacognitive think-aloud protocols 24, 69, 104, 110, 337, 338, 340, 341, 342. 344–346, 350–352, 355–358. see also Think aloud protocols Metalinguistic awareness. see Awareness Metalinguistic analysis/ reasoning/reflection 69, 74, 131, 294, 300, 301 Metalinguistic explanation/ information (or errors) 72, 115, 132, 311, 342, 348, 349, 351–353, 355 Metalinguistic feedback 63 Metalinguistic knowledge 65, 351 Mixed methods designs/ approaches 61, 64, 94, 152, 191, 249, 250 Mobile head-mounted systems 192 Mobile technologies 20, 91, 92, 99 Modal/multimodal affordances 23, 269, 270, 272–275, 282, 287 Modalities (as resources for meaning making) 271–273, 276, 280–283, 286, 287

Models/Model texts 22, 23, 63, 70, 124, 131, 227, 228, 292–295, 300—312, 374 Monitoring processes 12, 37, 49, 90, 262, 318, 319 Morae 148, 151 Morphological search 74, 296, 298 Motivation (as a learner factor/ variable) 38, 62, 86, 94, 115, 131, 162, 204, 312, 378 Multilingual resources/strategies (while composing) 156, 262, 269, 272 Multimodal codemeshing 273 Multimodal composing. see Digital multimodal composition Multimodal compositions 23, 64, 76, 269–277, 280, 283, 286 Multimodal concordance charts 271 Multimodal feedback 19, 76, 143 Multimodal literacy processes 146 Multimodal lived experiences 273 Multimodal longitudinal journaling 91 Multimodal strategies 93 Multimodal timescapes 23, 269, 272, 276, 277, 282–286 Multimodal workshops 275, 276, 286, 287 N Non-concurrent data elicitation procedures 19, 66, 71–73. see also Stimulated recalls Non-metacognitive think-aloud protocols 67, 69, 72, 110 Note-taking (as data collection instrument) 65, 67, 68, 70, 257, 258, 293 Noticing 60–62, 66, 70, 73, 77, 92, 113, 116, 123, 153, 228, 238, 292–296, 300, 302–304, 306, 309, 310, 312, 318, 351, 352. see also Noticing strategies, Noticing tables Noticing strategies 300–301, 307, 312 Noticing tables 67–68 NVivo 151–152

385

386

Research Methods in the Study of L2 Writing Processes

O Online behaviours (as processes) 7, 11, 17, 25, Online editing platforms 316 Online literacies 16 Online planning 204, 210, 211, 212, 214, 217 Online resources (use of while writing) 12, 21, 145–146 Online revisions 170 Open Broadcaster Software 148 Oral verbalizations. see Stimulated recalls, Thinkaloud protocols P P-burst. see Burst Pauses. see also Burst, Inputlog Analsyses Pause criteria 163, 174, 175, 176 Pause distribution 144, 168 Pause duration/length 22, 50, 167, 168, 176, 177, 179, 196, 225, 228, 229, 231, 140–242, 248, 255, 256, 265 Pause frequency 225, 228, 231, 236, 238, 256 Pause location 22, 167, 174, 175, 176, 196, 228, 229, 231–234, 236, 238, 240, 242 Pause threshold 22, 225, 229, 240, 242, 255, 256 Pause time 163–165, 167, 168, 177, 197, 236, 241 Pause type 251 Pause Logging File (of Inputlog) 231, 232, 236, 240, 241 Pausing/pausological behavior 168, 169, 224–230, 235, 236, 238–243, 256, 372 Parameters (of the construct of writing processes) 19, 34, 39, 45, 53, 55 Planning (as writing process) 36, 37, 115, 169, 318, 319. see also Explicit planning episodes/ processes, Global planning, Online planning, Planning episodes, Planning strategies, Pre-writing planning, Rhetorical planning, Synthetic content planning

Planning episodes. see Explicit planning episodes, Online planning, Pre-writing planning Planning strategies 87 Pre-writing planning 204, 209 Problem solving 12–14, 40, 74, 77, 104, 205, 292–294, 296–298, 301, 307, 312 Process learner corpus 153 Process logs. see logs Q Quasi-experimental designs/ research 61, 318, 330 Questionnaires 84–89. see also IPIC Questionnaire, Writing Process Questionnaire QWERTY keyboard 172, 263 R Reaction times 116, 184 Reactivity 15, 19, 20, 68–72, 107–115, 124, 130–137, 176, 179, 365, 370, 371, 377, R-burst. see Burst Reformulations (as feedback). 63, 111, 112, 295, 311 Refutation strategies 215 Reliability 19, 25, 58, 69, 70, 72, 77, 92, 96, 98, 150, 195, 257, 282, 312, 367, 375 Research designs. see Case study designs/research, Correlational designs/ research, Experimental designs/research, Exploratory designs/research, Mixed methods designs/research, Quasi-experimental designs/ research Retrospective design interviews. see Interviews Retrospective verbal reports. see Stimulated recalls. Revision 36, 37, 47, 62, 169–176, 178, 179, 189, 210, 257, 259–260, 263, 317–319, 368, 372, 378. see also Draft revisions, End revisions, Inputlog analyses, Online revisions, R-burst, Revision-based Linear Analysis, Writing styles Revision-based Linear Analysis 233

Revision strategies 87 Rhetorical knowledge 88 Rhetorical planning 43 Rhetorical transfer 88, 94 S Saccades 185, 186, 193 Screen capture technologies/ software 67, 68, 70, 115–117, 141. 142, 144, 146–151, 154, 269, 279, 281, 284, 366, 369, 372, 373, 375, 377 Screen recordings 89, 93, 94, 152, 153, 235, 277 Screencast-O-Matic 148 Screenside-shadowing interviews. see interviews ScribJab 272 ScriptLog 162–165, 173, 177, 184, 186, 187, 194 Segmentation (of process data) 150, 151, 210. see also Unit of analysis Self-report instruments/ techniques/measures 19, 20, 72, 84, 93–95, 98, 176, 179, 194, 196. see also Interviews, Process logs, Questionnaires, Stimulated recalls, Thinkaloud protocols SensoMotoric Instruments 185, Smooth pursuit (eye movement) 193 Snagit 148 S-Notation 226, 239 Spatial repertoire 277–279 Spelling searches 296, 298 Stimulated recalls 37, 50, 67–68, 71, 104–113, 142, 146, 151, 162, 169, 170, 173, 191, 194, 234, 235, 238, 239, 242. 243, 247–250, 252–262, 265, 365–367, 369–372, 374, 376 Strategies 9, 12, 14, 23, 36, 40, 62, 66, 68, 69, 86, 87, 90, 95, 105, 214, 215, 217, 219, 264, 296, 300, 307. see also Lexical searches, Morphological searches, Multilingual codemeshing, Noticing strategies, Planning strategies, Refutation strategies, Revision strategies, Spelling searches, Strategy clusters, Syntactic searches,

Index

Synthesizing strategies, Weighing strategies, Writing strategies, Writing Strategy Questionnaire Strategy clusters 74 Syntactic searches 210, 296, 298 Synthesizing strategies 214–216 Synthetic content planning 43 T Task representation 22, 44, 149, 203, 206, 209, 210, 258, 321 Task variables (and writing processes) 9–11, 34–39, 54, 58, 60, 63, 68, 69, 77, 108, 115, 131, 147, 168, 169, 176, 249, 250, 255, 256, 263, 371–379 Test takers’ writing processes 248 Test task 248, 251, 252, 262, 263 Test validation 248–250, 265 Text production processes (vs. writing processes) 7, 9, 13, 15, 16 Theoretical frameworks (informing research on writing processes) 8–10, 143, 149–150, 161, 188, 375, 376 Think-aloud protocols 104–106, 108, 109, 114–116. see also Metacognitive think-aloud protocols, Non-metacognitive think-aloud protocols, Reactivity, Validity, Veridicality. Threats to validity 19, 20, 95, 115, 116 Time use diaries. see also Diaries Tobii 185, 251 TraceIt 162 Transana 151 Transcription/translating (as writing process) 188. see also Formulation Translanguaging 9, 13, 143, 156. Translog 163 Triangulation 20, 64, 75, 77, 90, 93, 98, 116, 142, 151–153, 234, 243, 265, 286, 292, 295, 312, 372, 379 TRAKTEXT 116, 184 Tweets 64

U Unit of analysis 150, 344. see also Segmentation Uptake 137, 238, 292, 294, 304, 310, 318, 326, 328 V Valid written explanations 123 Validity. see Cognitive validity, Ecological validity, External validity, Internal validity, Threats to validity, Validity framework Validity framework 19, 34, 35 Construct validity 19, 34, 35, 38, 46, 55 External validity 34, 35. see also External validity Internal validity 34, 35, 58. see also Internal validity Statistical validity 35, 58 Veridicality 19, 20, 70, 72, 107, 108, 112, 123, 124, 130–135, 137, 365 Verbal reports see ;Stimulated recallsThink aloud reports Verbalizations (as data collection procedures). see Diaries, Languaging, Noticing Tables, Process logs, Stimulated recalls, Think-aloud protocols, Valid written explanations, Written reflections, Written languaging, Written languaging tables W Weighing strategies 215 Working memory (and writing processes data collection procedures) 115, 124, 134, 135, 188, 190, 196, 318, 378 Writing conferences 65, 107, 133 Writing models 8–10, 13, 16, 22, 23, 40, 51, 84, 105, 161, 188, 203, 224, 249, 251, 262, 263, 376. see also Dual writing process model, Feedback processing framework, Feedback processing model Writing processes (vs. text production processes) 7, 8 Writing process activities. see Causal model of writing processes

Writing process construct 19, 25, 45, 46, 53 Writing process research variables. see also Causal model of writing processes Writing Process Questionnaire 85 Writing process time. see also Causal model of writing processes Writing process during L2 writing tests. see Test takers’ writing processes Writing strategies 9, 12, 14, 23, 36, 62, 66, 68, 74, 77, 85, 86, 87, 90, 92, 93, 94, 95, 123, 143, 151, 156, 162, 171, 214–219, 263, 264, 285, 286, 293, 296, 300, 307, 312 Writing Strategy Questionnaire 86, 87 Writing styles (taxonomy of ) 86 Written corrective feedback. see feedback scope, Feedback types Written corrective feedback timing 24, 62, 317–318. see also Computer-mediated communication feedback Written reflections (as feedback processing data elicitation instrument) 123, 126, 136 Written languaging 65, 67, 68, 70, 72, 123, 125, 127, 132, 136, 337, 340 Written languaging tables (as feedback processing data elicitation instrument) 338, 340, 342, 351–352, 355, 358 Written verbalizations. 123, 124, 125, 127–133, 134–136. see also Diaries, Languaging, Noticing tables, Process logs, Valid written explanations;Written reflections, Written languaging, Written languaging tables Written verbalizations language 130, 131 Written verbalizations prompts 128–130, 136, 137 Written verbalizations training 134, 136 Writing Process Questionnaire 85

387

This volume brings together the perspectives of new and established scholars who have connected with the broad fields of first language (L1) and second language (L2) writing to discuss critically key methodological developments and challenges in the study of L2 writing processes. The focus is on studies of composing and of engagement with feedback on written drafts, with particular attention to methods of process-tracing through data such as concurrent or stimulated verbal reports, interviews, diaries, digital recording, visual screen capture, eye tracking, keystroke logging, questionnaires, and/or ethnographic observation. The chapters in the book illustrate how progress has been made in developing research methods and empirical understandings of writing processes, in introducing methodological innovations, and in pointing to future methodological directions. It will be an essential methodological guide for novice and experienced researchers, senior students, and educators investigating the processes of writing in additional languages.

isbn 978 90 272 1410 2

John Benjamins Publishing Company