Accuracy across Proficiency Levels: A Learner Corpus Approach 9782875584304, 9782875584311


129 58 19MB

English Pages [346] Year 2015

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Accuracy across Proficiency Levels: A Learner Corpus Approach
 9782875584304, 9782875584311

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

iHAN

ARAN hit WANA NWA

mmaculate College

sl HNL 10362917

—UCL_ PRESSES UNIVERSITAIRES DE LOUVAIN

Digitized by the Internet Archive in 2023 with funding from Kahle/Austin Foundation

https://archive.org/details/accuracyacrosspro000unse

Accuracy across

Proficiency Levels

Accuracy across

Proficiency Levels A Learner Corpus Approach

Jennifer Thewissen

UCL PRESSES UNIVERSITAIRES MY DE LOUVAIN

Colaiste Mhuire Gan Smal Luimneach

© Presses universitaires de Louvain, 2015

Registration of copyright: D/2015/9964/43 ISBN: 978-2-87558-430-4 ISBN PDF version: 978-2-87558-431-1 Printed in Belgium by CIACO scrl — printer number 92328 All rights reserved. No part of this publication may be reproduced, adapted or translated, in any form or by any means, in any country, without the prior permission of Presses universitaires de Louvain Graphic design: Marie-Hélene Grégoire Distribution: www.i6doc.com, on-line university publishers Available on order from bookshops or at Diffusion universitaire CIACO (University Distributors) Grand-Rue, 2/14 1348 Louvain-la-Neuve, Belgium

Tel: +32 10 47 33 78 Fax: +32 10 45 73 50 [email protected] ° Distributor in France: Librairie Wallonie-Bruxelles

46 rue Quincampoix 75004 Paris, France

Tel: +33 1 42 71 58 03 Fax: +33 1 42 71 58 09

[email protected]

Corpora and Language in Use Corpora and Language in Use is a series aimed at publishing research monographs and conference proceedings in the area of corpus linguistics and language in use. The main focus is on corpus data, but research that compares corpus data to other Gade of empirical data, such as experimental or questionnaire data, is also of interest, as well as studies focusing on the design and use of new methods and tools for processing language texts. The series also welcomes volumes that show the relevance of corpus analysis to application fields such as lexicography, language learning and teaching, or natural language processing. Editorial Board

Kate Beeching (University of the West of England, Bristol) Douglas Biber (Northern Arizona University) Mireille Bilger (Université de Perpignan) Benjamin Fagard (Université Paris cf Gaétanelle Gilquin (Université catholique de Louvain) Stefan Th. Gries (University of California, Santa Barbara) Hilde Hasselgard (University of Oslo) Philippe Hiligsmann (Université catholique de Louvain) Diana Lewis (University of Aix-Marseille Christian Mair (Universitat Freiburg) Fanny Meunier (Université catholique de Louvain) Rosamund Moon (University of Birmingham) Maj-Britt Mosegaard Hansen (University of Manchester) Joanne Neff-van Aertselaer (Universidad Complutense de Madrid) Marie-Paule Péry-Woodley (Université Toulouse-Le Mirail) Paul Rayson (Lancaster University) Ted Sanders (Utrecht University) Anne Catherine Simon (Université catholique de Louvain) Series Editors

Liesbeth Degand (Université catholique de Louvain) Sylviane Granger (Université catholique de Louvain) Editorial Management Veronka Kéver Université catholique de Louvain Contact: [email protected] http://www.uclouvain.be/cluse.html Published volumes

Granger, Sylviane, Gilquin, Gaétanelle & Meunier, Fanny (eds). (2013). Twenty Years of Learner Corpus Research: Looking back, Moving ahead. Proceedings of LCR 2011, Louvain-la-Neuve, 15-17 September 2011 [Corpora and Language in Use - Proceedings 1]. Louvain-la-Neuve: Presses universitaires de Louvain. Bolly, Catherine & Degand, Liesbeth (eds). Across the Line of Speech and Writing Variation. Proceedings of (LPTS 2011), Louvain-la-Neuve, 16-18 November 2011. [Corpora and Language in Use — Proceedings 2]. Louvain-la-Neuve: Presses universitaires de Louvain. Sarda, Laure, Carter-Thomas, Shirley, Fagard, Benjamin & Charolles, Michel (eds). Adverbials in Use: From predicative to discourse oie [Corpora and Language in Use - Monograph 1). Louvain-la-Neuve: Presses universitaires de Louvain.

ke

Bg

ei bile,

64a 2 oA

«

lane

wr, ah

Maw

WAL ati

@.

ie Cheb

7

es =.

ent ORDO

=

ieee

ors

Ce

Aadvorseg

rma

:

Py

s

7

‘4

* ?

\srrip





Table of Contents

General Introduction Part I: The Concept of Error in Previous Error Analysis Work Errors in early error analysis studies Traditional error analysis Gathering error samples Error detection .l. Errors of competence vs. errors of performance . Overt vs. covert errors . Local vs. global errors Error classification

Error counting . Obligatory occasion analysis . T-unit analysis . Using part-of-speech denominators Error explanation Error gravity Concluding remarks A new-look error analysis: the learner corpus approach Learner corpora in error analysis Treatment of the proficiency level factor New methodological directions Taking stock of CEA findings Error rankings Grammatical errors Lexical errors Orthographic errors Concluding remarks Part II: Rating Learner Performance, Annotating and Counting Errors Annotating and rating the International Corpus of Learner English Data: The International Corpus of Learner English Error tagging the learner corpus sample Detecting, correcting and tagging the errors in ICLE The checking procedure Results of the checking procedure Rating the learner corpus sample The raters and rating guidelines

Accuracy across Proficiency Levels

133 Rating results 134 Inter-rater reliability scores 139 Assigning a final CEFR score to each learner essay 142 Concluding remarks 143 Counting errors with potential occasion analysis 143 Potential occasion analysis: definition 144 Creating part-of-speech denominators 144 The part-of-speech tagger 150 The part-of-speech denominators 163 The error tag-denominator pairings 4.3. 169 Interpreting potential occasion analysis results 4.4. 170 Concluding remarks 4.5. Part III: Capturing Corpus-Based Developmental Patterns: Findings 173 for Second Language Acquisition and Language Testing Research LTS Capturing EFL accuracy developmental patterns 75 Statistical method: ANOVA Three main error developmental patterns 79 179 Breaking down the strong developmental pattern 180 .1.1. Strong pattern 1: BI>B2>C1>C2 . Strong pattern 2: B1>[B2/C1/C2] 184 Strong pattern 3: B1>B2 and [non-adjacent levels] 189 . Strong pattern 4: B2>B1 19] . Strong pattern 5: [B1/B2]>[C1/C2] 15 Breaking down the weak error developmental patterns 194 . Weak pattern 1: The [B1/C1]>[B2/C2]>[B1/C2] pattern 194 . Weak pattern 2: The BI>C1 & B1>C2 and B2>C2 & B1>C2 patterns Nesey . Weak pattern 3: The B1>C2 and B1>C1 patterns roy, The non-progressive error developmental pattern 199 Adding to findings on developmental second language acquisition 203 Scope of the L2 features studied 205 L2 proficiency level establishment 206 L2 learning context 209 L2 developmental routes: fragmentary results 209 Concluding remarks 210 Working towards L1- and L2-dependent proficiency descriptors Zl What the CEFR critics have to say ZS

Table of Contents

6.2.

Dissecting the CEFR descriptors for linguistic competence: Layer 1

6.2.1. 6.2.2. 6.3.

Where are the cannot do’s? Implied CEFR developmental patterns for linguistic competence Towards L1- and L2-dependent descriptor scales: Layer 3 Grammatical accuracy snapshot Article errors Verb tense errors Noun number agreement errors Uncountable noun errors Adverb placement errors Dependent preposition errors Determiner and pronoun errors

6.3.1. 6.3.1.1. 6.3.2.2. 6.3.2.3. 6.3.2.4. 6.3.2.5. 6.3.2.6. 6.3.2.7. 6.3.2. Vocabulary control snapshot 6.3.3. Orthographic control snapshot 6.3.4. Punctuation control snapshot 6.4. Reconsidering the six-level proficiency scale 66: Concluding remarks General conclusion References Appendices Subject index Author index

220 221 PUT) Dol 232 235 Dees) 238 240 241 243 244 250 258 261 265 267 269 279 303

10

Acknowledgements I would like to express my gratitude to the many people who saw me through this book and who actively contributed to its improvement and completion. The present volume started off as a doctoral thesis for which I wish to first of all acknowledge the vital contribution of Professor Sylviane Granger, director of the Centre for English Corpus Linguistics at the University of Louvain, Belgium. I consider myself privileged to have had such an excellent and dedicated supervisor who enabled me to step outside of my comfort zone so as to explore linguistic domains and methodologies that have inevitably contributed to the improvement of this book. I am particularly indebted to Professor Yves Bestgen, also from the University of Louvain, who gave the statistical aspect of my research a muchneeded sense of direction. Thanks to Professor Bestgen’s sound advice and clear explanations, statistics have become an essential tool in this volume and have played a key role in the identification of L2 accuracy development patterns. Rather than an area to be feared, statistics gradually appeared as an accessible field of research in itself, a field that I very much enjoyed the challenge of getting acquainted with. Investigating the issue of accuracy across proficiency levels was a fruitful and life-enriching endeavour which spanned several years and enabled me to come into contact with linguistic scholars who were always willing to act as a sounding board and provide assistance and support when needed. I am thinking in particular of Professor Fanny Meunier, Professor Gaétanelle Gilquin and Professor JoAnne Neff who have actively accompanied my error tagging journey. Special thanks also go to each member of the Centre for English Corpus Linguistics in Louvain for always finding the words to spur me on. I wish to wholeheartedly thank the anonymous reviewer of this volume for having carefully read through each chapter and for pinpointing a number of essential elements that warranted further work and research. The comments made were both relevant and thought-provoking and have actively contributed to refining the contents of the present volume. Any remaining problems with the book are my own responsibility. I am also grateful to the editors of the Corpora and Language in Use series, Professor Sylviane Granger and Professor Liesbeth Degand, for considering my work worthy of inclusion in one of their volumes. Last but not least, thank you to my family who has supported my linguistic project since day one.

List of Acronyms and Abbreviations A NOVA LANOVA

One-way between-groups analysis of variance

SK c

[CEA | Computer.aided enoranalysis C

EA

i

IA CLAWS

Constituent Likelihood Automatic Word Tagging System

C LC [ELC [Cambridge LeamerComus

C PE [CPE | CertificateofProficiencyinEnglish SCS E A PS me FL [EFL E

iecme E SL

| EnglishasaForeign Language

ea

e English as a Second Language

FC E RCE [First CertificateinEnglish FL Aly FLT. __|Foreignlanguage teaching” a ieiMnmmgnis cc eo F GE (cic ANGE ees FR

Cc

Longman Learners’ Corpus

13

Accuracy across Proficiency Levels

LONGDALE

Longitudinal Database of Learner English

MELD

Montclair Electronic Language Database

.

Morpho-syntactic deviances

PiKUST

The Pilot Learner Corpus Perspective

RLD RLD [SLA | Sevond Language Acquisition [Sis Sain ee a TEA | TEA __|Tradional SOS enoranaysis SLA

SP

14

LIST OF TABLES Table Table Table Table Table Table

Table Table Table Table Table Table Table Table

Table Table Table Table Table Table Table Table Table Table Table

Table Table Table Table

Table Table Table Table Table

1.1: 1.2: 1.3: 1.4:

L2 English study design synthesis The three main TEA error families Errors of competence and errors of performance: key characteristics Linguistic taxonomy for verb phrase errors (based on Svartvik et al. 1973) 1.5: The surface structure taxonomy (based on Ellis & Barkhuizen 2005: 61) 1.6: Obligatory occasion analysis: example with and without cases of over-suppliance 1.7: Calculating accuracy scores with obligatory occasion analysis (Dulay & Burt 1974: 44) 1.8: TEA error counting methods 2.1: Proficiency level assignation methods (based partly on Thomas 1994) 2.2: Multi-proficiency learner corpora (based partly on Carlsen 2012) 2.3: The ICLE CEFR rating procedure: 20 essays per L1 subcorpus (Granger ef al. 2009: 12) 2.4: Existing error tagging systems and their associated learner corpora 2.5: Error taxonomy types used in CEA 2.6: Accuracy rates: perfective aspect in ICLE-BU and ICLE-GE (based on Rogatcheva 2009) 2.7: Acquisition order of grammatical morphemes 2.8: Breakdown of error counting methods in the CEA research synthesis table 2.9: Reshaping error analysis work with learner corpora 2.10: Four comparable CEA studies 2.11: Main error domains: error rankings compared 2.12: Grammatical errors in the Louvain taxonomy 2.13: Grammatical error subcategories: within-category rank 2.14: Verb error tags in the Louvain error tagging system 2.15: Grammatical errors on verbs: within-category ranking 2.16: Percentage of correct and incorrect article use in obligatory contexts: Spanish learners 2.17: Percentage of correct and incorrect article use in obligatory contexts: Chinese learners 2.18: Main contexts for omitted third person singular -s 2.19: Uncountable noun errors in ICLE Japanese (Kobayashi 2008: 78) 2.20: The construct of lexical error in Chuang and Nesi (2006) and Dagneaux et al. (1998) 2.21: Progress rates from A2/B1 to C2 for RV, AGV, IV and FV errors (Hawkins & Filipovic 2010) 2.22: Four main CEA study profiles 3.1: ICLE task and learner variables 3.2: Text selection criteria used 3.3: Result of text selection procedure: learner corpus sample 3.4: Example of text description file for FR67 in Excel

15

Accuracy across Proficiency Levels

Table 3.5:Proportion of the most frequent ICLE topic when added to 107 the three initial selection criteria 108 Table 3.6: Breakdown of timed/untimed texts in the corpus sample ae Table 3.7: The eight broad error categories in the Louvain error taxonomy 113 Table 3.8: The Louvain error tagging taxonomy in full LLY, Table 3.9: Checking procedures: methodological phases 119 Table 3.10: Undercorrection examples 120 Table 3.11: Tag assignation corrections 121 Table 3.12: Overcorrection examples Table 3.13: Unchecked and checked error-tagged data: quantitative comparison 123 (raw figures) Table 3.14: Main error domains: boost and slump in 125 the checked error-tagged data 126 Table 3.15: Individual error profile examples eA Table 3.16: CEFR communicative competences: three components 128 Table 3.17: The six European levels of proficiency Table 3.18: CEFR descriptors for essay writing (and reports) (CEFR 2001: 62) 128 Table 3.19: Descriptor table sent to raters for linguistic competences 130 from B1 to C2 (based on CEFR 2001: 112-125) 132 Table 3.20: Rating guidelines sent to RI and R2 133 Table 3.21: Individual proficiency profiles: examples Table 3.22: Agreements, near agreement, disagreements 134 (based on the global CEFR scores) 17 Table 3.23: Numerical 11-point scale for Pearson test Table 3.24: Inter-rater reliability score across the corpus: Pearson test results 13% Table 3.25: Inter-rater reliability scores per L1 subcorpus 138 139 Table 3.26: Rerating the 34 D texts: R3 results 140 Table 3.27: The eleven-point scale used to calculate the final CEFR score Table 3.28: Interpreting mean scores to assign final CEFR grade 140 Table 3.29: Assigning final CEFR scores: SP49 and GE22 141 Table 3.30: Number of texts per proficiency level per mother tongue background 141 Table 3.31: Detailed proficiency and error profiles per text 142 Table 4.1: UCREL CLAWS7 tagset 145 Table 4.2: Accuracy scores of three POS taggers on ICLE data (Van Rooy & Schafer 2003) 149 Table 4.3: POS categories developed for potential occasion analysis 150 Table 4.4: ADJ, ADV, PREP and PUNC POS categories ~ let Table 4.5: The NOUNall and NOUNcom POS categories 154 Table 4.6: Characterising the verb POS categories 155 Table 4.7: The five POS denominators for verbs 156 Table 4.8: The CONJall, CONJco, CONJsu POS categories 137 Table 4.9: The Det-Pro POS categories 159 Table 4.10: Counting the ditto tags 162 Table 4.11: Potential occasion analysis: error tag-denominator pairings 163 Table 4.12: Explaining some of the error type-POS denominator pairings 165 Table 4.13: Error tags counted out of the total tokens 167 Table 4.14: Error tags counted out of the total sentences 168

16

List of Tables & Figures

Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table

Table Table Table Table Table Table Table Table Table Table Table Table Table Table

Table Table Table

5.1: 5.2: 5.3: 5.4: 5.5: 5.6: 5.7: 5.8:

Step 1: Calculating the potential occasion analysis score per text Step 2: Calculating the mean error percentage per level Step 3: ANOVA output for GPU errors Step 4: Ryan post-hoc test output Five strong error developmental patterns Error categories in the B1>B2>C1>C2 developmental pattern Denominator-based groups for B1>[B2/C1/C2] errors Error category in the B1>B2 and [non-adjacent-levels] developmental pattern 5.9: Error category in the B2>B1 developmental pattern 5.10: Error category in the [B1/B2]>[C1/C2] developmental pattern 5.11: Three weak error developmental patterns 5.12: Error category in the [B1/C1]>[B2/C2]>[B1/C2] pattern 5.13: Error categories in the B1>C1 & B1>C2 or B2>C2 & B1>C2 patterns 5.14: Error categories in the B1>C1 and B1>C2 patterns 5.15: Error categories in the [B1/B2/C1/C2] pattern 5.16: Research synthesis of SLA developmental studies: research components considered 5.17: Language features studied in SLA developmental work 5.18: Proficiency level specification in SLA developmental studies 5.19: Learning context in SLA developmental studies 6.1: Three major guiding criteria for the elaboration of the CEFR 6.2: Five criteria for the elaboration of the CEFR descriptors 6.3: Fair and unfair criticism of the CEFR 6.4: Dissecting the CEFR descriptors for linguistic competence 6.5: Implicit cannot do’s 6.6: Grammatical accuracy developmental profiles 6.7: Lexical error developmental profiles 6.8: Breakdown of French-speakers’ texts per proficiency level 6.9: Top ten most frequent Louvain error tags for grammatical accuracy 6.10: Grammatical accuracy snapshot grid for B2/C1 French-speaking learners of English 6.11: Vocabulary control snapshot grid for B2/C1 French-speaking learners of English 6.12: Orthographic control snapshot grid for B2/C1 French-speaking learners of English 6.13: Punctuation control snapshot grid for B2/C1 French-speaking learners of English 6.14: Number of discriminating and non-discriminating error types per adjacent proficiency levels

176 Wt 178 178 179 180 185 190 191 WS) 194 Wes 195 Loy 200 204 205 207 209 216 216 217 223 224 Zi 229 Za 232 248 25a

260 264 265

17

Accuracy across Proficiency Levels

LIST OF FIGURES Figure Figure Figure Figure Figure Figure

1.1: 1.2: 2.1: 4.1: 4.2: 4.3:

Figure 4.4: Figure 5.1: Figure 5.2:

Figure Figure Figure Figure Figure

5.3: 5.4: 5.5: 5.6: 5.7:

Figure 6.1: Figure 6.2: Figure 6.3: Figure 6.4:

18

The overt/covert cline Psycholinguistic sources of errors (Ellis 1994: 58) Lexical error taxonomy (Granger & Montfort 1993) Potential occasion analysis Potential occasion analysis formulae Potential occasion analysis: horizontal reading of results for uncountable noun errors Potential occasion analysis: vertical reading of results Boxplot representation for the mean error percentage of total errors Prototypical representation of the B1>[B2/C1/C2] developmental pattern Article error development Punctuation — conjunction of coordination confusion: development Adverb placement error development Modal auxiliary verb error development Breakdown of accuracy developmental patterns in the ICLE corpus sample: general overview The six CEF proficiency levels (branching principle) (CEFR 2001: 23) The CEFR’s three possible levels of granularity Overall developmental path implied in the CEFR descriptors A comparison of intuitive and learner corpus-derived developmental patterns for spelling

GENERAL INTRODUCTION The aim of this volume is to capture the construct of accuracy among learners of English as a Foreign Language who have been assessed at an intermediate or advanced level of proficiency. The research presented here is carried out within the framework of computer-aided error analysis (Dagneaux et al. 1998) which involves the study of language errors following their identification in a learner corpus. Importantly, accuracy will be captured from a developmental point of view. Our aim is to capture the developmental profiles of c. 40 error types across a four-tiered proficiency continuum ranging from threshold level B1 to mastery level C2, as defined by the Common European Framework of Reference for Languages (CEFR) (2001). Accuracy developmental profiles for the 40-more error types will be defined in terms of progress, stabilization and regression patterns across proficiency levels B1 to C2. The publication of a volume which focusses on the construct of accuracy within a developmentally-oriented research framework is particularly timely for the reasons outlined below. One of the added values of this study is that it exemplifies the potential displayed by learner corpus research in capturing L2 developmental patterns by taking accuracy as its object of study. Second language acquisition (SLA) research (see the volume by Ortega & Byrnes 2008) argues that, ideally, developmental profiling should be based on genuine longitudinal data, that is to say data from the same learners collected over time and which reveal the developmental processes at play in the acquisition of a given L2 feature. A longitudinal approach, Hasko (2013: 2) argues, enables researchers to gather valuable information on “the pace and patterns of changes in global and individual developmental trajectories of L2 learning”. Although corpus linguists agree that the truly longitudinal approach is better suited to SLA research purposes (e.g. Meunier & Littré 2013; Vyatkina 2013; Zhang & Lu 2013), they are also forced to admit that “truly longitudinal learner corpora continue to be a rarity” (Meunier & Littré 2013: 63). Meunier and Littré (2013) give the example of the learner corpora listed on the “learner corpora around the world” webpage’ and bring to our attention that, out of the 107 learner corpora listed, only 12 include truly longitudinal data that follow cohorts of learners over time. The present volume aims to show that the rarity of truly longitudinal data does not prevent learner corpus based research into L2 developmental trends. In this study, we approach developmentallyoriented research via pseudolongitudinal (Granger 2004a; Ellis & Barkhuizen 2005; Jarvis & Pavlenko 2007) rather than truly longitudinal learner data. 1 See http://www.uclouvain.be/en-cecl-lcworld.html

Accuracy across Proficiency Levels

The main characteristic of a pseudolongitudinal research design is that it includes “language users at successive levels of language ability (...) though not within the same language users, as would be the case in a true longitudinal study” (Jarvis & Pavlenko 2007: 37)*. We are keen to make a distinction between cross-sectional learner data, that is to say data gathered from different categories of learners at a single point in time (Granger 2004a: 263), and pseudolongitudinal data which are gathered at one point in time but from learners at different proficiency levels (Granger 2004a: 263). Although by comparing learner groups at proficiency levels B1, B2, Cl and C2, this volume admittedly analyses “cross-sections of L2 language use” (Hasko 2013: 2), we do so with the added proficiency level component which enables us to infer L2 developmental trends. This study will show that, as argued by Hasko (2013: 5), “cross-sectional L2 data have been proven useful in select strands of developmentally oriented research when employed in a pseudolongitudinal design”. Ellis and Barkhuizen (2005: 97) also argue in favour of pseudolongitudinal data as they claim that “a longitudinal picture can be then constructed by comparing the devices used by the different groups ranked according to their proficiency”. Because of the nature of the data used in this volume, we are admittedly not in a position where we can add to the field of learner longitudinal research per se. This said, the results presented in this study will contribute to the area of L2 developmental research more generally by providing a detailed description of accuracy among EFL learner groups across the B1 to C2 proficiency continuum. By studying accuracy from a developmental viewpoint we also aim to favour a rapprochement between the fields of SLA and learner corpus research. Though not fully excluded, learner corpus analysis has nevertheless largely remained a marginal procedure in SLA research (Myles 2005;

Granger & Meunier 2010; Hasko & Meunier 2013). This is in part due to the fact that, although both fields aim to better understand learner language

development, they do so via very different methodological approaches. Some of the core differences between the two involve SLA’s focus on “patterns of individual learners’ language development” (Zhang & Lu 2013: 46) vs. the focus on “group-level trends” (Zhang & Lu 2013: 46) in learner corpus research. SLA also tends to favour elicitation techniques, while learner corpus research has a marked preference for (semi-) authentic language use data (Meunier & Littré 2013). While both fields have greatly added to existing knowledge in linguistics, we argue that the group-level approach used in this volume is particulary valuable when research has an applied perspective such 2 The terms “quasi-longitudinal” is sometimes used instead of pseudolongitudinal (Granger 2004a: 263). Both refer to data from different learners at different levels of language proficiency.

20

General Introduction

as the improvement of L2 descriptors in language testing. One of the aims of this study is to see how the results pertaining to the development of L2 accuracy can be used to empirically inform the CEFR descriptor scales which are largely intuition-based. One of the applied outcomes of the volume will hopefully be to show that “[l]anguage testers (...) need to use corpus data to identify the linguistic exponents of a particular proficiency level” (Barker 2010: 636). There are similarities between our work and that carried out by the English Profile team at Cambridge (Hawkins & Filipovi¢ 2012) who also use learner corpus data to make the CEFR descriptors more specifically relevant to learners of L2 English and whose overall objective is to enable discrimination between proficiency levels based on “an explicit set of criterial features” (Hawkins & Filipovié 2012: 16). This volume is divided into three major parts, each of which is made up of two distinct chapters. The first part is devoted to the treatment of the concept of error in previous error analysis work. Chapter 1 goes back in time to the pre-learner corpus era and the early days of error analysis. It analyses the existing body of traditional error analysis literature and dissects research practices in terms of the error samples used, the ways in which errors were identified, classified, explained, counted and their level of gravity assessed.

Chapter | is intended as a reassessment of the early error analysis period. While acknowledging the caveats that are usually put forward in the summaries of traditional error analysis, attention is also drawn to the sometimes forgotten areas where traditional error analysis laid important foundations for subsequent corpus-based error analysis work. Chapter 2 summarises and interprets the accumulated empirical evidence gathered from subsequent error analysis studies carried out with learner corpora within the framework of computeraided error analysis. The observations put forward in this chapter rest on a detailed review of c. 70 studies. The chapter sets a number of studies side by side to compare results and also presents the main findings that have emerged for particular error areas (e.g. articles, tenses, noun number agreement, lexical etVOrsmctca): The second part is largely methodological

and consists of Chapters 3 and 4. Chapter 3 presents the learner corpus data used in this study (the International Corpus of Learner English, Granger et al. 2009) along with a detailed description of (a) the error annotation phase, i.e. the error tagging procedure which distinguishes between 40-plus error types, and (b) the rating process that was carried out to assign a proficiency level to each learner script. These two steps made it possible for each individual learner text to be linked up to a detailed error profile and L2 proficiency level, therefore enabling a series of developmental analyses to be carried out. Chapter 4 is dedicated

21

Accuracy across Proficiency Levels

to the error counting method developed to count the errors at each level of proficiency. This method has been termed “potential occasion analysis” and relies on an error-tagged and a part-of-speech-tagged version of the learner corpus data. The rationale for this method is that, rather than de facto counting different error types out of the total tokens per text, the counting denominator should be restricted to the environment of potential occasions for error, e.g. singular-plural errors on nouns are counted out of the total number of nouns used as it is nouns, rather than the total tokens, which constitute the

potential occasions for this error type. This method has very rarely been used in computer-aided error analysis before. The third and final part of this book captures the development of the 40plus error types across levels B1, B2, C1 and C2. It subsequently puts forward the contributions of the present corpus-based error developmental study to second language acquisition research and language testing. The different developmental trajectories that are followed by the error types across the B1 to C2 continuum are captured and described in Chapter 5. Using statistical tests (the ANOVA and Ryan post-hoc tests), areas of marked progress, stabilisation and regression are identified for each error type. The results are subsequently set against a research synthesis of previous studies that have taken a developmental approach to the study of second language acquisition, hence highlighting the novel contributions of our work to existing L2 developmental findings. Chapter 6 is the final chapter. It mainly considers the contributions of this volume to language testing research. This chapter first dissects the largely intuitively-developed CEFR descriptors for linguistic competence (grammatical accuracy, vocabulary control, vocabulary range, orthographic control, coherence/cohesion), pinpointing some inconsistencies in the CEFR document. This analysis also points to a number of differences between the contents of the CEFR descriptors and the developmental patterns yielded by the learner corpus analysis. In a second step, the errors made by Frenchspeaking learners of English at the upper-intermediate/advanced proficiency level are qualitatively investigated, highlighting a number of error-inducing contexts that remain at the more advanced levels for this learner population. Corpus-attested error examples and error inducing contexts are argued to provide a handy pool of items for language tests targeting this specific learner group. Importantly, we acknowledge that zooming in as is done here on the construct of accuracy could raise eyebrows in certain circles. We are for

instance acutely aware of current research into World Englishes and the subsequent accompanying wariness towards the very notion of “error”, with a preference for speaking of emerging norms instead (Jenkins 2006).

Le

General Introduction

The move away from errors is further reflected in the CEFR compilers’ wish to put can do’s, i.e. positive achievements, rather than cannot do’s at the forefront of assessment practices. Rather than a move away from errors, this book proposes a move back towards research into L2 errors. Hulstijn (2011: 244) recently suggested that a move be made in the same direction, stressing the need to put linguistic competences back at the heart of language proficiency and calling for a “revival of the testing of core components of LP [language proficiency], in particular control of vocabulary and grammar”. This makes sense given the weight that linguistic accuracy continues to carry in assessment practices: “assessment practices at school, both formative and summative, heavily rely on counting errors and scoring them based on various types of ‘gravity’” (Pallotti 2009: 159). Forsberg and Bartning (2010: 137) make a similar claim: “[w]hich professional language proficiency rater — be he/she trained in the CEFR or not — will not take linguistic form into account at some level, especially in the written modality? Can a person judge a written text, without noticing, for instance, grammar and orthography?”’. Finally, the focus on accuracy in this volume should not be interpreted as an underhand way of implying that this component should be given more weight than the other concepts usually subsumed under the proficiency construct, namely complexity and fluency (see Housen & Kuiken (2009) for a discussion of the complexity, accuracy, fluency triad). We concur with Pallotti (2009: 159) who argues that understanding the processes of second language acquisition goes beyond the description of interlanguage “simply in terms of errors and deviations from L2 norms”. Yet, we also believe that accuracy deserves a description as much as any other aspect of the L2, with the hope that this will contribute thought-provoking findings for second language acquisition and language testing research.

23

nT?

-

7

om

pa cir:



:

Sees

~

a

=

es|e }

7 +“) erie.

e

=

gta

azo

i >

ir

pes

a

Met

NT)

lee

tae ~ —

©

_

=

Peiic hia:

ee

y

=") etiw

=

Ling

ipned

m4

“36 7"

;)

a

,

P=

= 4

—ee

a

4 7

¥



id

1]

fs ¥

j .

'

*

i

Sea

:

|7

My

7

:

~

J ‘

: +

Part I

The Concept of Error in

Previous Error Analysis Work

Bre , it9

an s4s

N

2 “3 é

"< ant;

1. ERRORS IN EARLY ERROR ANALYSIS STUDIES Towards the late 1960s, error analysis, “the linguistic study and interpretation of errors made by second-language learners” (Dagut & Laufer 1982: 21), supplanted contrastive analysis and became known as the “darling” of the 1970s (Schachter & Celce-Murcia 1977: 442). After enjoying a heyday period from approximately the late 1960s to the late 1970s, error analysis eventually came under fierce criticism in the same way as contrastive analysis before it (Bell 1974; Hammarberg 1974; Schachter & Celce-Murcia 1977). This chapter is intended as a reassessment of error analysis and its five key steps as they were defined by Corder (1967), (1) the collection of data samples, (2) error detection, (3) error classification, (4) error explanation, and (5) error gravity. We further add an additional step which is core to this book, namely that of error counting, which usually follows the error classification phase. Of special interest to us in this section is the considerable knowledge that error analysis studies have bequeathed us, an aspect that literature reviews of the period sometimes fail to do justice to. In what follows we thus take a fresh look at non-corpus-based error analysis studies (henceforth TEA for traditional error analysis) which have focused on L2 English learners. 1.1. Traditional error analysis The knowledge gathered from TEA can be seen to result from two types of practices: (a) practices that were carried out at the time but which were later deemed to be ill-suited (e.g. the presentation of errors in decontextualised sentences) or in need of further refinement (e.g. the error classification systems used), and (b) practices which were implemented in early error analysis and which should continue to be encouraged today (e.g. the use of sophisticated error counting methods). The knowledge imparted by the collection, detection, classification, counting, explanation and gravity assessment of errors will be considered in the individual sections that follow. Each section will end with a ‘what to do’ and ‘what not to do’ box summarising practices in each step of the error analysis procedure.

1.1.1. Gathering error samples A first preliminary step in error analysis involves gathering the data on which the analysis of errors will be carried out, i.e. “getting the errors to analyse” (James 1998: 19). A reproach that is sometimes voiced against data gathering practices in TEA studies is that unfortunately, many EA studies

26

remain decidedly tight-lipped on the make-up of their work, “with the result that they are difficult to interpret and impossible to replicate” (Ellis 1994). Dagneaux et al. (1998: 164) additionally stress that TEA work was too often based on “heterogeneous learner data” whose precise origins remain rather obscure. To verify these claims, we carried out a detailed analysis of the type of data information reported in a number of key TEA studies and present the results in an evidence synthesis (Table 1.1) which determines whether the necessary information is provided in terms of the object of study, the learners’ L1, their proficiency level, the number of texts studied, the number of words

in each text and the overall number of words analysed. The ubiquitous presence of the ‘unspecified’ category in Table 1.1 does indeed point to a general tendency towards underspecification when it comes to describing the set-up of each individual study, hence supporting the claim concerning the rather difficult interpretability and replication of results. Concerning the size of the data samples, although the number of texts analysed is usually stated, we are generally left in the dark about the total number of tokens this represents. This number can be expected to vary quite substantially given that the number of texts analysed ranges from one (Singleton 1987) to 110 (Dulay & Burt 1974b). It also appears from the evidence synthesis that although TEA generally specifies the mother tongue background of the populations whose errors it sets out to analyse, studies tend to differ in their actual treatment of this factor. Richards (1974) and Ellis (1987), for instance, investigated errors by learners from a wide array of mother tongue backgrounds by grouping them all together. The aim here was therefore not to find traces of L1 on the errors made. Others, however, (Bertkau 1974; Schachter 1974;

Bardovi-Harlig & Bofman 1989) made a point of studying multiple L1 groups separately in order to find out how the learners’ L1 impacts the errors made. In terms of proficiency level description, Table 1.1 reveals that this factor is either left unspecified or, in cases where it is specified, this tends to be done in

rather vague terms which lack the necessary common terminological ground to enable proficiency level comparison across studies. For example, it is unclear how Scott and Tucker’s (1974) ‘low intermediate’ learners compare to Singleton’s (1987) ‘elementary’ or Ellis’ (1987) ‘post-beginner’ populations.

(L861)

(6961)

SUA

ying |

BAOYSNG

(rL61)

Aejnqpue

yeoeUIWeIS slojouNy -) “SuwIOJ “sopore ‘sateryixne ((2ja yeoreuUeIH pure JOrxXo] |

ysijsug

AoeIndoy Ul ay} asn JO 9y} ysed sua} Je[nso1)

SIO

SUI

Joy Oy} []

osoueder |

JO (20UdI0S ‘ueuLIoy ‘youoly “‘ysiueds ‘gsonsnylog

SIQUIBD]

ssourya pue

YOOZD

ystueds |

payiedsuy

ssoueder pue

payrsadsuy -sod) | 0S

pue¢¢

JUUI39q-}S0g 0} fi O}PIPSUWID}UI

osoueder pue|

-9soulyD Suryeods (uorpylyo

-ysiueds

Suryeods usIp]!yo

psynodsuy)

poytoodsuy)

poyroodsuy)

payiodsuy

poynodsuy

payisedsuy,

sasnejopasnJod [J

poyiodsuy UOTJVULIOJUT) SI poprAoid se 0} dy}

poyisodsuyy

poyisodsuyy

Sp10 19d 3x9}

Aguaioiyoid pue Jod ([2A9]

siouses] Jod (punossyoeq

poyioodsuy)

|pes9A0 |

AA

ystueds (soures]

poyisodsuyy OT 09)

poynodsuy

¢]

ayeIpowuajuy o¢ c])

ojenpeis sjuopnys

Jo poouvape yey) dy} Joquunu [je19A0 (4ay}0

psouRAPYSioUIed] | (OE 9) [JeJOAO 3Nq) QUO ['] dnoissem o10w

(sjuopnys

S}X9}€¢)

JOQUINN] JO $}X9}

sisoqjuds

jeorsojoydioui | “OIgery “OSOUIY. “URdIOyY “ARR ysiueds

S2NUdIOTFIP UL dU} (uayods) 10.119 sayizoid JO

sonueyis pure

poyioodsuy ZZ

AJUIIYOI

JOAd]-A}ISIOATUN) SATIeIOI (sosne]o

poipnys (S)[9A9]

ysiysugqApnys usisap

poytoadsuy)

dIgeIy

,SiouIesT] [J

Ade.21°, 7]

o1geIy

uolyeyUosoIg JO yeonewwess pue [eorxo] S101 uaifO opel Aq SIOUIe3] (Udyods) 10119 sojljoid

JouIe3] OM} sdnois

neyuog (pL6]) |

SipeP]-1Aopseg | ‘onoeUAS uewwyjog Pue [ROTX9] SIOMID (6861)

ying pue Aysiedry (ZLO61L)

pue

(0107)

Apnys

SANR[DIYBSNLID SIOMID

y0afqoO jo

TUBLLIOYS-[V [ROTXO’]SIOL9

HOAGV (O86L) |

Apnjs

“Ysljog “Issey ‘gsourder

sejnsomn ‘\sed sed qsed

29

poyiodsuy

payipedsuy

payisodsuy

poyiedsuy

poyiedsuy

payisedsuy)

paynodsuy

paynpodsuy

[[V1IAQ SP1OM

30

uOSS|Ie

(8261)

SUIUUS}{

|

(uayods) JOU9

A, QI9

S109

‘pIODUOD

[BOIXd] pure GIOA WO}

[ROIXOTSIOLIO

soUuaN

I]

|, rey

YsSIpams

UPIUP

uedIOY ueULIay

,sdourvay] [J |

(pL6l)

spreyory

090907 (9/6) |

4oRIT (L00Z)

10.9

sayiyoid jensuryenu] SIOUS

souonyuy Jo yse} adAqjuo |

[BOIXS] SIO

(S007) 4ORIT (S007) | yeorxa s1ouo

‘osoueder ‘gsouryy

‘osowing‘youoly

ysijsuq

ystueds

ysiuedg

SIOII9 -UdSIL'T ‘onoejuds-oyudioy| poyroodsuy, ¢ uewos1 pue “uorsodaid “gsua} Wons (LL61) ‘aporyie qioA-joa[gns JudWide1seuoljesou SIO uouusT (1661) | yeatxaTsro a ueULIOD) UOUUDT (966]) | [ROIxa] SIOIO UBULIDD) yory] ja yp [BOIX9’]SIOMID ueuay pure ysiueds

(Z007) uortsodaid pue 9porse

yy

UI

Apnys

(ejndoo S90qdy} sul JeuOTON SUIS WOOISSR]D) SA di}SI]eINJeU SUID]

9afqcQ Jo

gsayyoid

enyoulsy pue |

(1861)

HAYS (9007)

X494

Apmis

dJeIPIWLID}UI porroodsu¢, GE

ystueds Jo siouse9] payisodsuy)

Arequouloyq

siouUIsog

yeIPW9,U]

poyiodsuyy

yy

QDI

QO]

poynodsuy

poyisodsu

SPIOM

payodsu

payisodsuy poytoodsuy SPIOM

ase10Ay soquinu jo | spiom Jod uorisoduio0s pue Jod

poyiodsuy)

ha

poynodsuy

SPIOM

Ueoyy ysud] jo

=31X9}CHE

poyisodsu

Sp.10 19d 3X9}

pooueapy poourapy SIQUUISIG

°O

yy

AA

[9A9]

601



JIQUIN\] JO $}X2}

parpn3s

JOUTISIP Aouaro1yoid sjaag] Woy ,J00d,, 0} _JUd][99OX9,,

poouvapy

339]]00) (Sjuapnys

OSSa1

u99M40q sp pue

sjuedionieg WIM 0Z

TAAOL So109S

poyiodsuy pe

Souarayjo.d

(s)[9A2]

poyioodsuy

poyioodsuy

poyodsuy

paytoodsu)

paryioodsuy poioodsuy) payisodsuy

poyioodsuy,

payioodsuy

psynpodsuy

parjisadsuy)

poyisadsu

[[B19AQ SP10M

Accuracy across Proficiency Levels

“yst[od “Y99ZzD, ‘ory ‘soese |,

AR,

|

(SZ6]) |

(L861)

Apnjs

98po|Mouy JO YSU]

ysijsuq snjd)

o1qery

‘asoury asourder

asayyepure sy} Jofewr UeIpUypure 1S9\ UROL sosensury ‘o1qery ‘ueIsiog

une] pure (ysiuedg

Vy

yy

.sdousreaT] ['T

JUOpN)s SUILIM uRULIDD)

souonyguy Jo Aouaroyoid | ystuedsg S]aA9] WO 10119 sopjoid 98e]U99019g JO gsouly) yensurypioyur pue yeNnsuTenuI SIOUS UI

JOJSUBL SION

OTORJUAS SIOLID

SATILIOYISNLID SIOLIO

y9fqco Jo

UURULISWIUZ[BDIX9']SIOLD

SueyZ (€007)

10]

(L861)

UO}O[SUIS |,

19yON (F161)

00S pue

(vL6L)

Joqyoeyos

Apnjs

Arequowio]q

QJCIPOUWLIDIUT (46

A|[suruig9es,,

.poourape

SQUIB] _ poyioodsuy

JOquINNy JO $}X9}

paipnys

Areyuowsy7] pue O}eIPOUg}UT 910UL,, _Aoustoyoid pue SSO], _juororyoid

MOT

poyroodsuy 0s

AJUIIIYO.I

(s)[>A2]

AA

poytoadsuy,

poynpodsuy)

poyisedsuy)

paynodsuy

payioodsuy)

poynodsuy

sp10 Jad 3x9}

paynodsuy

paywodsu

payisedsu

payioodsu

payisadsuy)

payiseadsuy

[[¥19AQ SP.10M

Error in Early Error Analysis Studies

31

Accuracy across Proficiency Levels

The influence of the learners’ L2 proficiency level on error production is a factor that, to this day, has received rather scant attention, hence its importance in this volume. Thomas (1994) sounded the warning bell to emphasise the deleterious impact that ignoring the proficiency factor may have on the interpretation of research results. She remarks that “too little attention has been paid to the effects of varying levels of proficiency among subjects in studies of second language acquisition” (Thomas 1994: 327). She puts her finger on the dangers that overlooking proficiency levels may entail: [I]t is vital to control for levels of proficiency in studies of the acquisition of a given L2 by different groups of learners- for example, learners with differing Lls, or learners who have or have not undergone some special training. If the two groups do not start out with comparable skills in L2, the research may spuriously attribute differences in knowledge or performance between the two groups to their differences in L1 or to their differences in training. (Thomas 1994: 310)

Despite the very limited attention given to the proficiency factor, close

inspection of the small set of TEA studies which did take an early interest in this matter shows that, although few in number, these papers had already yielded influential qualitative findings about the impact of differing L2 levels of proficiency on error production. Larsen-Freeman and Strom (1977) is a case in point. The researchers warn against the dangers of overinterpreting the influence of certain factors if proficiency is not controlled for: “we argue about the influence of a certain factor on the language learning process while overlooking the probability that the influence of that factor might fluctuate depending upon the learner’s proficiency level” (Larsen-Freeman & Strom 1977: 125). The authors analysed the written compositions of 48 non-native speakers from “a diversity of language backgrounds” (Larsen-Freeman & Strom 1977: 126) at five proficiency levels: poor, fair, average, good, and excellent’. One of the aims of the study was to identify discriminatory features,

namely features that made each level distinct from the others. A thorough error analysis was run on the 48 texts which were scanned for morphological, syntactic, preposition, tense, aspect, article, subject-verb agreement, case and negation errors. Counting the number of error-free T-units*, the authors report on the accuracy trends detected in their data: 3 The compositions were written in the framework of the English as a Second Language Placement Examination. The grades were assigned independently by two raters who displayed a high inter-rater reliability score (r = .9713). 4 A T-unit is defined as “a main clause with all subordinate clauses attached to it” (Hunt 1965: 20).

32

Error in Early Error Analysis Studies

¢

¢

¢ ¢

«

An important contribution of their work, which also applies to the developmental findings reported in this volume (Chapter 5), is that error profiles do not always progress linearly. For example, the compositions that had been evaluated as “poor” included fewer article errors than those in the “fair” category, with the number of article errors decreasing again in the “good” and “excellent” categories. This corresponds to an inverted U-shaped pattern whereby error frequency increases along with proficiency level before decreasing again. On this basis, Larsen-Freeman and Strom claimed that articles were probably not a strong discriminating feature for the “poor” and “fair” groups. Errors in morphology (although it remains unclear exactly which types of errors are included in this category) were found to decrease slightly from the “fair” compositions to their “excellent” counterparts. As for prepositions, these were a constant problem throughout all five levels, with no marked developmental change across the five levels. Spelling and tense errors were found to decrease as proficiency increased, hence standing out as potential discriminatory error types. Because of the different developmental patterns displayed by the error categories, the authors concluded that: “we are not optimistic (...) that the reduction of errors in any particular structure or group of structures will be the answer to our quest for an index of development” (Larsen-Freeman & Strom 1977: 132). In other words, changes in error behaviour may not help discriminate between levels of proficiency. The absence of information about the number of errors in the different categories as well as the lack of any concrete error examples call for caution in the interpretation of this conclusion but the authors have certainly helped raise the relevant question of whether, and if so which, error types will help distinguish between learner levels of L2 proficiency.

Taylor (1975: 125) also investigated errors profiles against the backdrop of proficiency levels. He administered a translation test to 20 Spanish-speaking learners of English at two proficiency levels, the elementary and intermediate levels, and studied the auxiliary and verb phrase errors in the translated sentences. The error analysis revealed that the errors at both levels were not qualitatively, but quantitatively different: the intermediate group made fewer errors than its beginner counterpart, but the most frequent errors were the same in both groups. The three most frequent error types concerned subject placement in questions, e.g. do work the children a lot in school?, the insertion

of to after the auxiliary can, e.g. what guests can to bring to the party?, and the lack of subject-modal inversion in questions, e.g. J can go now? Taylor’s

33

Accuracy across Proficiency Levels

main finding concerned the psycholinguistic source of the errors which he claimed differs depending on the learners’ proficiency level: beginner learners appeared to rely much more on the strategy of transfer than their intermediate counterparts who made a higher number of intralingual errors. In Taylor’s (1975: 84) words, “reliance on overgeneralization is directly proportional to proficiency in the target language, and reliance on transfer is inversely proportional. .Taylor’s (1975) hypothesis on the link between proficiency level and transfer reliance was later confirmed by Ringbom (1987: 109) in his comparison of Swedish- and Finnish-speaking learners of English when he concluded that “it is clear, both from previous research and from the present study, that the L1 exerts a stronger influence on L2 learning in the early stages of learning than in later stages”. The same trend also stood out from a later SLA study, Zhang (2003), who carried out an error analysis of texts written by “less proficient” and “more proficient” Chinese ESL learners. She found that the less proficient learners relied more on their mother tongue than their more proficient counterparts who made a greater number of intralingual errors. These studies indicate that the amount of transfer displayed in different proficiency groups could constitute a potential “criterial feature’ (Hawkins & Buttery 2010), one that may develop markedly across the proficiency continuum and which could help distinguish between learners at various stages of development. This section has considered data gathering practices in TEA, providing the following type of knowledge on ‘what to do’ and ‘what not to do’: Data gathering in TEA

é

What not to do

Avoid what James (1998: 19) calls a “broad trawl”, i.e. “casting one’s net to catch all and any sorts of errors that happen to be at large, indiscriminately”.

What to do

*

* *

Be highly specific concerning the provenance, size and nature of the data sample used for error analysis which, ideally, should be as homogeneous as possible. Carry out more developmental research into the discriminatory power of errors as proficiency develops. Bear in mind that L2 proficiency level and transferrelated errors are possibly closely intertwined: the literature suggests that the lower the proficiency level, the likelier learners are to make transfer-related errors’.

5 Note that some scholars have made the opposite claim (e.g. Hakansson, Pienemann & Sayehli 2002).

34

Error in Early Error Analysis Studies

1.1.2. Error detection

The difficulty involved in error detection was sometimes brushed aside, or at least taken for granted, as proved by statements such as “errors are helpful since they are conspicuous and easy to identify” (Hammarberg 1974: 191) or “in beginners and intermediate foreign language learners, the problem of detecting and categorizing errors is not unduly complex in practice, since nearly all examples are fairly clear” (Ringbom 1987: 71). While this is true for some glaringly obvious error types, e.g. *mouses, * I runned, other deviances may be the cause of much discrepancy among correctors who may disagree as to their potentially erroneous character (Lennon 1991; Andreu-Andres ef al. 2010 on the issue of inter-corrector agreement). As argued by Johansson (1978: 1), “no single standard of correctness” exists, hence considerably complicating the error detector’s task. Taylor (1986), for his part, says that “errors are not primitive absolutes whose identification is unproblematical”. One of the reasons why categorically and unhesitatingly determining what constitutes an error lies in the multi-facetedness of the concept itself or what Enkvist (1973) calls the “functional relativity” of the notion of error. Rather than being easily captured in a single definition, an error is a relative concept that TEA researchers found could be broken down into three main families: Table 1.2: The three main TEA error families

of performance (Corder 1967) | analysed at the level of learner cognitive abilities. 1971)

breadth of context into account.

Kiparsky 1974)

of the intelligibility of the message conveyed.

1.1.2.1. Errors of competence vs. errors of performance The first widely quoted distinction is that between errors of competence and errors of performance which happens at the level of cognition. Errors of competence “are not physical failures but the sign of an imperfect knowledge of the code. The learners have not yet internalized the formation rules of the second language” (Corder 1973: 259). Their performance counterparts are generally considered to be of a less serious nature, due to temporary lapses in attention rather than actual competence issues:

35

Accuracy across Proficiency Levels

These (...) are due to memory lapses, physical states such as tiredness and psychological conditions such as strong emotion. These are adventitious artefacts of linguistic performance and do not reflect a defect in our own language. We are normally immediately aware of them when they occur and can correct them with more or less complete assurance.” (Corder 1967: 24)

The distinction between competence and performance errors is widely attested in the second language acquisition literature. Their key defining features are laid out in Table 1.3. Table 1.3: Errors of competence and errors of performance: key characteristics Errors of competence

Incomplete knowledge, competence

Errors of performance

i.e.

lack

Not usually made by native speakers

of|!No defect in knowledge processing problems

but

rather

Also made by native speakers

E.g. J want that you come here (Corder | E.g. It didnt bother me in the sleast... 19737279) slightest...but those frunds...funds are He can sings (Ellis 1994: 59) frozen (Corder 1973: 258) She was led out by a niece (nurse) (Van |They wanted they said they wanted to Els et al. 1987: 55) leave (correction + repeat) (Van Els et The mouses (mice) (Van Els et al. 1987: | al. 1987: 55)

55)

On Wednesdays he always bruys two loaves of bread (anticipation) (Van Els et al. 1987: 55)

In the earlier days, this distinction was sometimes looked upon quite optimistically, as shown by Muskat-Tabakowska (1969: 50) who believed that “an analysis of errors and mistakes based on the theory of competence and performance seems very promising”. Two main operationalisation criteria were selected in order to tease apart errors and mistakes: TEA either (1) sought out an authoritative interpretation of the errors or (2) applied the 90% accuracy rate criterion. The authoritative interpretation (Corder 1973) involved asking the learners what they actually meant and testing whether or not they were able to self-correct: if they could self-correct, the error was classified as an error of performance. If they could not self-correct, however, the error was considered to be due to a deeper problem in competence. The second operationalisation criterion, the 90% accuracy score (Brown 1973;

Ellis & Barkhuizen 2005), stipulated that, once a language feature reached the

36

Error in Early Error Analysis Studies

90% accuracy rate, it could be considered ‘acquired’. Any remaining errors were consequently interpreted as the result of a failure in performance. A learner essentially “got” a rule if it could be seen to be followed 90 percent of the time. These methods involved several caveats. First, one may wonder just how ‘authoritative’ authoritative interpretations actually were. James (1998: 79) makes a good point in explaining the problematic nature of the self-corrigibility criterion by pointing out that a learner may (a) sense that something is wrong without however being able to correct it, (b) overlook errors in exam conditions that they would spot and self-correct if given more time or (c) may be able to self-correct provided an error is pointed out to them. Additionally, trying to dually classify errors as resulting from competence or performance issues means ignoring the concept of variability, the fact that “the learner varies between correctly applying and failing to apply the rule in question over a period of time” (Rogers 1982: 45). Fairbanks (1982), for instance, found that a Japanese learner of English almost never used the third person singular —s in casual speech, producing utterances such as ifshe have a children, because she have to care their son, he live with their children.

However, his careful style almost always included the —s ending for both singular and plural verbs, e.g. each store has own price, that store sells this transportation, some parts of town has a lots offood. Correct and non-correct uses of a form can hence exist side by side depending on the context of use. The above discussion seems to point to the gradient nature of the errors of competence and performance distinction: “perhaps a less contrived, ad hoc way of dealing with error types might be to regard the competence error vs performance error distinction as a continuum of possible error types” (Martin 1982: 49). Despite seeing the theoretical value of this distinction, some researchers appear to be uncomfortable with its actual application as “it raises all the classic problems of the dualistic principle” (Enkvist 1973: 19). 1.1.2.2. Overt vs. covert errors

The overt/covert distinction is primarily context-dependent. Overtly erroneous sentences are “superficially ‘ill-formed’ in terms of the rules of the language” (Corder 1971: 166). Examples include: J runned all the way (Ellis 1994: 52); splitted (Lennon 1991: 189). Covert errors, in contrast, can only be uncovered while considering a wider stretch of text: “[t]he first stage in the technical process of describing the linguistic nature of errors is to detect them. The difficulty in doing this lies in the fact that what looks or sounds like a perfectly acceptable sentence may nevertheless contain errors” (Corder 1973: 272). A widely cited example is: J want to know the English (Corder 1973:

37

Accuracy across Proficiency Levels

273). Out of context this can be seen to constitute an acceptable sentence. The larger context of the utterance, however, showed that the learner actually

meant the English /anguage and not the English people, hence the presence of an error. Again, the overt/covert distinction is a strongly gradient one where the unit needed to detect the presence of an error can range from the word, to the phrase, to the sentence, to the extra-sentential discourse. The

breadth of context needed for error detection can be tied up to Lennon’s (1991: 191) notion of “error domain” which he created to mean the “rank of the linguistic unit which must be taken as context in order for the error to become apparent’. We propose Figure 1.1 below to illustrate Lennon’s gradient overt/ covert cline. Each phase of the cline comes with a representative example, all of which were taken from Lennon (1991). Figure 1.1: The overt/covert cline OVERT ERRORS

Word

COVERT ERRORS

Phrase

Sentence

Extrasentential

discourse

a

*splitted

*a scissors

* Behind him stands a

*The thief is

man, well, who looks

lucky (vs. happy

somewhat naughty

as can be

(man and naughty do not collocate well)

inferred from a picture shown)

Increasing covertness

Earlier TEA work has mainly focussed on the overt end of the continuum: “due to the problems of identifying covert errors; most schemes of error evaluation consider overt errors only” (Johansson

1978: 4). Overt errors

were mainly studied on the basis of decontextualised sentences or parts of sentences which usually contained glaring errors, e.g. we are live in this hut, he did not found...; She cannot goes, they @ running very fast; he always talk a lot; I am interesting in that (examples taken from Richards

1974’).

6 Lennon (1991: 191) also coined the phrase ‘error extent’ to refer to “the rank of the linguistic unit, from minimally the morpheme to maximally the sentence, which would have to be deleted, replaced, or supplied in order to repair production”. Hence, in *a scissors, the error domain is the phrase a scissors and the error extent is the article a. 7 The sentences were reproduced as they were found in the original.

38

Error in Early Error Analysis Studies

As stressed by Johansson (1978: 2), however, it would be wiser to take total learner performance into consideration: “covert errors can only be identified if the total performance of the learner” is considered. Providing too limited a context may even shed doubt on the erroneous nature of certain sentences, e.g. one of the most important decisions he has been forced to make which is said to be erroneous in Svartvik e¢ al. (1973). The authors argue that it is the tense which is wrong and propose one of the most important decisions he will be forced to make, which is rather difficult to understand given the limited amount of context supplied.

1.1.2.3. Local vs. global errors Local vs. global errors (Burt & Kiparsky 1974) refer to the impact that errors have on the intelligibility of the learner message conveyed. Local errors are typically “agreement, articles, noun phrase formation etc.” (Burt & Kiparsky 1974: 73). A locally located error is “a linguistic error that makes a sentence appear awkward but, nevertheless, causes a native speaker little or no difficulty in understanding the intended meaning of a sentence, give, its contextual framework” (Hendrickson 1979: 28) (e.g. Their mother didnt spanked them (Hendrickson 1979: 35)). Their global counterparts are known to affect intelligibility much more, e.g. the girl is surprising (instead of the girl is surprised) (Hendrickson 1979: 34). Global errors may stand in the way of efficient communication in one of two ways: (1) the error causes the addressee to understand a different message to the one intended or (2) the error renders the message completely incomprehensible. Again, the notions of local and global errors should be understood as relative terms in the sense that what constitutes a global error for one addressee may constitute a local error for another. Familiarity with the learners’ mother tongue may play a facilitative role in terms of understanding the message conveyed. TEA error detection practices have imparted the following type of knowledge concerning what to do and what not to do when detecting learner errors:

39

Accuracy across Proficiency Levels

Detecting errors in TEA What not to do

Take the error detection step for granted and consider it as a straightforward process. Consider and present errors in decontextualised

contexts. What to do

Consider errors as a functionally relative concept which can occur at different levels: cognition (the competence/performance error cline), breadth of context (the overt/covert cline), and message intelligibility (the local/global cline). Use the surrounding context to detect and present errors. Provide error examples to support the point made.

1.1.3. Error classification Once

the errors

in the learner data have been detected, researchers

undertake to classify them as optimally as possible so as to enable the subsequent quantification of the error data. TEA has substantially contributed to the elaboration of error classification systems, suggesting taxonomies that were either descriptive or explanatory. Descriptive error classification systems are further broken down into two subtypes, (a) linguistic taxonomies and (b) surface structure taxonomies. Linguistic taxonomies involve categorising errors into the linguistic categories that “correspond closely to those found in structural syllabuses and language text books” (Ellis 1994: 54), e.g. spelling, morphology, grammar, lexis. Subdivisions at finer levels of granularity are also possible. Depending on their research question, researchers may wish to further divide the grammatical error category into errors affecting articles, auxiliary verbs, tenses, verb number agreement or any other areas of interest. Svartvik et al. (1973) and

Stenstrém (1975) developed detailed linguistic taxonomies for grammatical errors, which were the main research focus at the time. Part of Svartivik ef

al.’s (1973) classification for verb phrase errors is given below in Table 1.4. Unless one wishes to focus on verb phrase errors exclusively, the taxonomy proposed below may be considered overly-detailed and rather impractical for broader error annotation purposes. However, it constituted a first important step towards research into error taxonomies.

40

Error in Early Error Analysis Studies

Table 1.4: Linguistic taxonomy for verb phrase errors (based on Svartvik et al. 1973)°

THE VERB PHRASE ASOT ti th ued

Mila

He had sneak in (sneaked)

CCC

He had stealed downstairs

eee) ey

Passive

Every

Pe

aie ae i (stolen

dane oo ee

time a Parliament is gathered (gathered

Present progressive An old building from which a narrow portico is leading into the (simple past) church (leads) Past progressive | Swedish business men were following the incidents (followed) (simple past) Perfect progressive For a long time archaeologists have been agreeing that it was a (simple perfect) place for worshipping (have agreed) Simple (progressive Simple present (present | A distinguished old gentleman who slowly goes to the door (is rogressive) slowly goin Simple past (past The strongly unsimilar areas of walls make one feel as if one progressive) stood in an old cave (were standing) Present Whoever explains by himself that he was not coming (explained) Present (past Present (present For a long time archaeologists agree that (have agreed) perfect) Present (future) Maybe in 1980 we don’t laugh (will not laugh) Past Past (present The charwomen at an old mansion went there (go) Past (present perfect A picture that was reproduced many times (has been reproduced) Past (past perfect) Mary told that she was informed to be at the secretary's office (had been informed) Past (future) President Nixon who truly had to find a successor to Hoover (will have to) Past (modal past ... was not that an opportunity (would that not have been)

perfect) Present perfect

Present perfect (past Present perfect (past perfect) Present perfect (future)

It has mainly been a place for worshipping (was mainly) I once visited my friend in his palace which he has inherited (had inherited) One of the most important decisions he has been forced to make (will be forced to)

8 In addition to verb forms, Svartvik et al. (1973) also provided detailed classifications for errors on nouns, pronouns, numerals, adjectives and adverbs, prepositions, etc. For more on this, the reader is referred to Svartvik et al.’s (1973) manuscript.

Accuracy across Proficiency Levels

A criticism that can be levelled at linguistic error classifications concerns terminological fuzziness: Terms such as “grammatical errors” or “lexical errors”, for instance, are rarely defined, which makes results difficult to interpret, as several error typesprepositional errors for instance- fall somewhere in between and it is usually impossible to know in which of the two categories they have been counted.

(Dagneaux et al. 1998: 164)

Larsen-Freeman (1978), for instance, explained that she counted learners’ “grammatical” and “lexical errors” but did not specify which elements she included under both these headings. The same goes for Arthur (1979) who set out to investigate “grammatical errors” as an undifferentiated mass. Additionally, TEA sometimes classified the same error type in different categories, e.g. Stenstr6m (1975) treated prepositions as a grammatical error; Duskova (1969) listed them in a separate category of their own, and BardoviHarlig and Bofman (1989) considered them as a morphological problem. The second descriptive classification system, surface structure taxonomies,

involves

classifying

errors

in terms

of omission,

addition,

misinformation and misordering, as illustrated in Table 1.5 below: Table 1.5: The surface structure taxonomy (based on Ellis & Barkhuizen 2005: 61)

ungrammatical

too; my sisters Q very pretty.

ungrammatical Misinformation/ misselection Misordering eres

| Use of the wrong form of the |Do they be happy? morpheme or structure Me not happy. Incorrect placement of] He.every time comes home a morpheme or group of | date; she fights all the time her morphemes in an utterance | brother.

As Ellis and Barkhuizen (2005: 62) themselves admit, the surface structure taxonomy alone is perhaps of “less obvious practical use” than the linguistic taxonomy, especially when it comes to unequivocally classifying errors into one and only one of these surface structure categories. For instance, where does one classify an error which involves both a misordering and an omission problem, e.g. she has usualy breakfast at 7.30 (vs. she usually has

42

Error in Early Error Analysis Studies

breakfast at 7.30). As will be shown in Chapter 3 when we present our own error taxonomy, later work into error classification has tried to make the most of the linguistic and surface structure taxonomies by combining them both within the same system. The second major type of error taxonomy is explanatory. Rather than classifying errors following their linguistic membership, it explains errors by “determining their sources in order to account for why they were made” (Ellis & Barkhuizen 2005: 62). TEA generally distinguished between two main error sources: interlingual (transfer) vs. intralingual (developmental) errors. Intralingual errors have two main characteristics: (1) they “do not derive from transfers from another language” (Richards 1974: 173) and (2) they “represent English errors that are common to native speakers of many varied languages” (Berkoff 1982: 9). Examples of intralingual errors include overgeneralising target language rules (e.g. adding a faulty —s ending to verbs as in he can sings, ignoring rule restrictions (e.g. he made me to do it), or creating false hypotheses concerning specific concepts (e.g. believing that come and go can be used interchangeably) (see Richards (1974) for more subdivisions of intralingual errors). Dulay and Burt (1972, 1974), for instance, are a representative example of an explanatory classification system as they immediately classified their learners’ errors into ‘interference goofs’ (interlingual errors) and ‘developmental goofs’ (intralingual errors). They also added another useful category, ‘ambiguous goofs’ or errors which could be either interlingual or intralingual. This was a sensible move as a large number of errors can be seen as the result of an interaction between transfer and developmental processes. Ellis (1994: 62) exemplifies the interlingual/ intralingual interaction as follows: in no look my card, the source may be intralingual as no+verb was found to be a universal error. In the case of Spanish learners, however, this error may be strongly linked to the L1 as it was found to occur more frequently and for longer than in other learner populations. A problem with explanatory error taxonomies in TEA is that the classification criteria were often left unexplained, most probably because the interlingual/intralingual categorisation was intuitively carried out. Reviewing the TEA literature reveals that the two types of error classification systems (descriptive and explanatory) were sometimes freely mixed. As noted by Dagneaux ef al. (1998: 164): “the error typologies often mix levels of analysis: description and explanation” (Dagneaux ef al. 1998: 164). Certain taxonomies thus require a distinction to be made between spelling errors, grammatical errors, vocabulary errors, and L1-induced errors. As Scholfield (1995: 189-

190) warns, such a system presents “‘an exclusiveness problem, in that many spelling, grammar and vocabulary errors may also be L1 induced (due to the

43

Accuracy across Proficiency Levels

transfer from the mother tongue)”. Granger (2003: 467) lends further support to this claim, stressing that “distinctions such as (...) ‘interlingual’ versus ‘intralingual’ errors (...) are difficult to assign and better left for a second stage in the analysis”. This is the stance that has been adopted in this book where errors are classified linguistically first (Chapter 3) and where possible traces of the L1 are identified at a later stage (Chapter 6). The ‘what to do’ and ‘what not to do’ box for TEA error classification practices includes the following elements: Classifying errors in TEA

What nottodo What to do

-|*

Use a taxonomy that mixes error description and error explanation. ¢

¢

¢

First carry out “objective error analyses” (Svartvik et al. 1973: 1) by using a descriptive linguistic error classification system. Leave error explanation for a later analytical stage, bearing in mind that errors can sometimes be due to a combination of interlingual and intralingual processes rather than one or the other exclusively. Be transparent concerning what types of errors are included under the different headings (e.g. “grammatical errors’, “lexical errors”, “morphological errors”, etc.).

1.1.4. Error counting Once errors have been classified into different clearly outlined categories, they can be quantified via various methods. Error counting is a crucially important step in any error analysis enterprise but it is intrinsically dependent upon the quality of the preceding collection, detection and classification phases. As Grotjahn (1983: 237) pointedly explains: “the application of quantitative methods is only one step in the course of an empirical investigation and (...) the use of even the most sophisticated methods does not supply valid and reliable results if, for instance, the data have not been adequately collected”. Schachter and Celce-Murcia (1977: 446) claim that there were two main TEA approaches to error counting, with studies either making “very informal statements of error frequency” (examples from the literature review include Burt & Kiparsky 1972; Edstrém

1973; Jain 1974; Richards

1974; Berkoff

1982) or actually approaching error quantification more rigorously. Fine combing through the TEA literature enabled the identification of five main types of error counting methods (see Table 1.8) ranging from the most to the

44

Error in Early Error Analysis Studies

least detailed, (1) obligatory occasion analysis, (2) T-unit analysis, (3) counting the errors of a particular type out of the relevant part-of-speech category (e.g. verb tense errors out of the total number of verbs used) (4) error percentages (counting errors of each type out of the total errors,) and (5) error frequencies (counting errors out of the total number of words used). Each of these methods is outlined below.

1.1.4.1. Obligatory occasion analysis Obligatory occasion analysis became a particularly popular error counting method in the 1970s. It was initially devised by Brown (1973) for the study of the acquisition of grammatical morphemes in first language acquisition and was later taken up by SLA studies. Obligatory occasion analysis in SLA includes the following four steps (Ellis & Barkhuizen 2005: 73-92): 1.

Determining the feature to investigate (e.g. articles, plural —s, progressive —ing, irregular past, regular past);

2.

Going through the data to identify the obligatory contexts of use for the chosen feature and counting these contexts. Brown (1973) originally defined obligatory contexts as follows: [G]rammatical morphemes are obligatory in certain contexts, and so one can set an acquisition criterion not simply in terms of output but in terms of output-where-required. Each obligatory context can be regarded as a kind of test-item which the child passes by supplying the required morpheme or fails by supplying none or one that is not correct (Brown 1973: 296).

3.

4,

Establishing whether the correct morpheme has been supplied in each obligatory context, then counting the number of times it has been correctly supplied; Calculating the accuracy rate using the following formula:

45

Accuracy across Proficiency Levels

Numerator: 7 correct suppliance in contexts x 100

= percent accuracy

Denominator: Total obligatory contexts

Brown’s obligatory occasion analysis considers three elements of learner performance: correct use, misuse, and non-suppliance in an obligatory context. As Pica (1984) later pointed out, the formula was missing information on instances of over-suppliance, i.e. suppliance in non-obligatory contexts. She thus modified the original formula as follows: Numerator: 1 correct suppliance in contexts x 100

= percent accuracy

Denominator: n obligatory contexts + n suppliance in non-obligatory contexts Table 1.6 exemplifies the difference between the two. Table 1.6: Obligatory occasion analysis: example with and without cases of over-suppliance analysis (Brown 1973)

[Over-suppliance

(Pica 1984)

|__| Theydances every week_

Obligatory occasion analysis is a counting method that has much to offer error studies. First, it enables researchers to precisely quantify the level of accuracy reached on different target language features, thereby revealing whether or not learners have attained the 90% acquisitional landmark. Dulay and Burt (1974) exemplified how they calculated accuracy scores for the 11 English morphemes that they studied (articles, third person singular, irregular past, etc.). They attributed a score to each obligatory occasion as follows: no functor is supplied = 0 (e.g. she’s dance_); misformed functor supplied = 1 (e.g. she’s dances); correct functor supplied = 2 (e.g. she’s dancing). They then proceeded to compute group scores in order to obtain an accuracy rate, as in the example in Table 1.7 for the irregular past.

46

Error in Early Error Analysis Studies

analysis Table 1.7: Calculating accuracy scores with obligatory occasion

(Dulay & Burt 1974: 44)

(each occasion is worth 2 points)

|

[| thismantakedivaway [1__+([2SSSSC~*S

SRI

ieevinestpirnrmmlinees |Oimnetedink]Die ote ints hey Tess iwal

eee

asi | re

ee

ae

The accuracy score for the three above learners on past irregular morphemes was calculated as follows: group score = (6/10)*100 = 60%. The same procedure can be undertaken for other functors, the results of which can then be classified into “decreasing group scores” (Dulay & Burt 1974: 45) so as to get insights into the developmental order in which learner groups acquire a series of morphemes. Another strength of obligatory occasion analysis is that it very much measures what learners get right in addition to what they get wrong. Obligatory occasion analysis therefore constitutes a viable answer to Hammarberg’s (1974: 185) criticism according to which “error analysis (EA), unlike the related discipline of contrastive analysis, is limited by definition to the study of errors, whereas the non-errors are not taken into account”. Despite its precision, obligatory occasion analysis is nevertheless limited in a number of ways. First, the method is particularly well suited, if not restricted to, calculating the accuracy rate of discrete items, e.g. articles,

progressive —ing, plural —s, copula, or auxiliary be to name but a few. Hakuta (1976: 323) emphasises this when he says that obligatory occasion analysis can only be used to count the accuracy scores of “grammatical morphemes, which are easy to score in terms of percent supplied in obligatory contexts”. More recent work by Andreu-Andres ef al. (2010), however, may lead one to question the ease with which obligatory contexts can be identified even for discrete items. The authors analysed the degree to which six different error analysts agreed on the presence of article errors and found that this constituted an area of wide discrepancy, with one rater detecting as many as ten article errors against just three errors detected by another rater for the same text. The implication is that different analysts may not necessarily agree on what constitutes an obligatory occasion of use even for the more discrete items. As for the more covert-type errors such as lexis for instance, obligatory occasion 47

Accuracy across Proficiency Levels

analysis is mostly unworkable as deciding on obligatory contexts of use for a particular lexical item would be extremely challenging to say the least. A second limitation is that in order for obligatory occasion analysis results to be meaningful, a large number of obligatory contexts need to be considered. Grotjahn (1983) criticised the study by Tarone ef al. (1976) “who use the 90 per cent-criterion even if the learner has supplied no more than five obligatory contexts for a certain variant” (Grotjahn 1983: 238). He pointedly argues that a high accuracy score may simply be the result of successful guesswork by the learners, hence the need for a substantial number of obligatory contexts. 1.1.4.2. T-unit analysis T-units were another popular measure which TEA studies chose to count errors (Gaies 1980; Wolfe-Quintero et al. 1998). A T-unit is defined as “a main clause with all subordinate clauses attached to it” (Hunt 1965: 20), as in the following example where the whole sentence (subordinate clause + main clause) constitutes one T-unit: When you make a milk shake, you mix it in a blender. Wolfe-Quintero et al. (1998) list two T-unit methods to count learner errors: researchers can calculate (1) the overall number of error-free T-units or (2) the number of errors per T-unit. Certain studies (e.g. Bardovi-Harlig & Bofman 1989) have refused to use the number of error-free T-units, mainly for the following reasons: ¢

Focussing on error-free T-units does not “give a picture of how the errors are distributed” (Bardovi-Harlig & Bofman 1989: 22).

¢

Moreover, a T-unit which contains, say, one error will be treated in much the same way as a T-unit with multiple errors and will be discarded from the error-free category. Calculating the number of error-free T-units does not reveal the different types of errors included in the learner data. Cutting up a text into T-units is not a straightforward task and can be complicated by sentence fragments, inconsistent use of punctuation, etc.

¢

¢

Importantly also, TEA studies differ in their operationalisation of what they count as an error-free T-unit: Larsen-Freeman and Strom (1977: 128) counted a T-unit as error free “if the T-unit was perfect in all respects”. Scott and Tucker (1974), for their part, considered a T-unit to be error-free if it

was accurate in the areas of syntax and function words. According to WolfeQuintero ef al. (1998: 35), such an approach makes it “difficult to interpret the meaningfulness of comparisons across studies”.

48

Error in Early Error Analysis Studies

1.1.4.3. Using part-of-speech denominators Although it is not as detailed as obligatory occasion analysis, another promising way of counting learner errors is by calculating errors of a particular category out of the corresponding part-of-speech category, e.g. the overall number of verb tense errors out of the total number of verbs as each verb represents an opportunity for tense error: In such a study the relative frequency refers to a fraction obtained by using as numerator the number of times an error was committed and as denominator the number of times the error type could have occurred. (Schachter & CelceMurcia 1977: 446)

This method, which we have termed potential occasion analysis’, will be further tackled in Chapter 4 where it is presented as the error counting methodology chosen in this book. Back in 1974 Scott and Tucker (1974: 71) deplored that “no studies have examined the frequency of specific types of errors as a function of the total usage of that structure”. Schachter and CelceMurcia (1977: 446) similarly concurred that, in counting errors, one needs to consider “the number of times it would have been possible for the learner(s) to make a given error as well as the number of times the error occurred (articles and prepositions are frequent errors because the need to use them arises so often)”. As for Lococo (1976: 71), she is also of the opinion that “[t]he number of errors depends largely on the opportunity to commit the error”. One reason why researchers perhaps avoided potential occasion analysis is that, depending on the size of the data, manually counting the occurrences of relevant parts-of-speech can be rather time consuming. As shown in Table 1.8, some studies preferred to stick to more easily applicable counting schemes which involved counting the number of errors of different types out of the total number of errors made (error percentages) (Duskova 1969; Stenstrém 1975), or out of the overall number of words produced by the learners (error frequencies) (Arthur 1979; Linnarud 1986).

9 I would like to thank Professor Sylviane Granger for having kindly suggested this term to me.

49

|,

SHIA

(L861)

Quo J2 72

(9L61)

Aysuesoy (9L6L)

09090] (9L61)

MR (9L61)

|

SuUNUNOD Spoyjoul

I9Y}0

payuNod Ino JO 94}

|,

sIsAyeUR ue=)

99JJ-1O.9 |, s}tun-

poAoIduMUOISIOA JO

JOqUINN JO 99JJ-JO |], s}tunIdvIOAYYISUS| JO 9d1J-IO.D J, S}fuNJoquinyy JO 99JJ-10.U9 | sytun-

JOquINN JO

Aro}esI]qGoworsesd0 (siskjeue

dyI]-JO3Ie aSN

AlOyesITGQ,WOIsed90 sisAyeue

Al0}eSI[GC,worsedd0 sisAjeue

A10;eSI[GOUOIse990 siskyeue

ArO}eSI]G—CUOIsed90 sisAyeue

A10jeSI]qGOworsess0 sisAyeue

A1o8aye9 si

‘uoROUNfUOD ‘unou ‘srx9] p4om

“ejndoo Arerrxne ‘ag ‘opoue Je[nsoain uosiad ‘s— pue dArssassod S— SIO

‘Sd1I039}e9 “qiaApe

dAtssorso1g ‘sur— yemyd ‘s— IejnSar ised ‘pa— pry)

A1OWeSI[GOUOIsed00 siskyeue

AIOJeSI]GOUOIsed00 sisAyeue

AI0}BSI]GOuorsedo0 sisAyeue

youoly ajOnIV SIOLI9 Aq sioures] JO YOudry ‘a) ‘py ‘un (aun jeoneUUeIH pue [eo1sojoydiow :s1oUoy} pry} uossod Jejnguis JUdSoId9sud}‘s— oy) “‘ojore pue psy) uosiad IejnsuIsJOOP yoolgo

ised

sununod poyjom

ATOyeSI[GOUOISed9d0 sishjeue

10117

‘uonisodaid ‘unouold ‘aatjoalpe

(6L61) wueA

WIONS pue URWI901,J-UdSIe’]

Arelrxne aq

[eONRUIUeIH :sJOU9 ‘qidA

“IapsO dy} Jaquinu JO s1O.9Jod {[e1JOAO JoquINU JO (S109

10,J)94}

JOUTUIO}Op SIOLIO

asf)JO oy)

SIOMID [BOIXS] pue oNoeUASOYdIO

(8/6[) Wedel ,J-Uasse]

(LL61)

(yL61) JON, ue HOIS (sasneys 10) syiun-|,

(v861) 891d

‘sIOU9 Areynqeooa pur [eoyewUes

S109 uOKesoU pure ‘ased “UoWDaIse qioajoolqns ‘osuaj) ‘uonisodoid ‘onoeyuAs ‘eotso,oydioy|

T

WALL 10112

‘yoodse

BInyeH (PL6L)

O1281 =

P2VSHSIAUI

sy], osn Jo [J] ysIsuq :ssojoUNy unouoid ‘oseo ‘ojonse sJe[NSurs | “epndoo“sur- ‘femyd sJepnsurs “Areryrxne ysed “‘reyndo1 ysed ‘Tejnsoin Buoy jemyd“3°a) “(sasnoy ‘aarssassod pry} uosiod Jejnsurs aatssoiso1g ‘sui— yemyd ‘s— “ejndoo Azerixne ‘ag ‘oor Jejnsomi “sed reynSax1 ‘Ya—ysed piry} uosiod ‘s— pur dAIssassod S— SIOMID

S1O1I

‘gone

SUOONSUOD SATITUDS “SOATIOOIpe ‘squoApe ‘sIoZHUeND “Tapio plom “‘JoquINU BUOIM ‘s}UIWS|dwWOD JeIUDUAS “SosNR]d dATJEIOI “Sop

‘suontsodaid “sqioA o}Uly “Spiom uoTOUNy pue xejUAS JO SIO

S109 S— dAIssassod pur ‘s— uosiad psy) ‘pa— sed sejnse1 Ysed

“(PL61)

sunouoid

Jejnsout ‘aporse ‘ag Arerrxne “ejndoo ‘s— yemyd ‘“sui— datssoso1g

72 (S61) ueuneEN #2 |(S861) eu0le

osn 4sed ysed SIOUq sowmydiou sejn3o1) UI dy} oy} sjduns Jo

aj/j2 pue vj/ayja Jo yous1y Aq ysi[suq sioumesy JO os. Sunouoid

(ejndoo ised 4sed sepnsoun

50

Aejngpue UNg

SIIpNn}s A10jeSI1GQ uoIsedI0 sisk[eue

Accuracy across Proficiency Levels

JaqUINN JO 994-109 sylun-],

I9zZ)1]0g puke

(SL61) wonsudsS

“SopoePIOM

*SIOLIO

ye}O}

no

JO

AlOsa}VO

JOMID

JO

PLOTeUTUTeIS siOIIOINO JO Sy} [e}0}

Oy}

odRWWDdINg

Joquinu JO $10.9

|, [e}0 JOqUINU JO Joquinu JO SPJOM

SIOIIO JO DUO ‘adA} [JRIZAO JoquINU JO

odeRIUDIINg JO yORo JO.LIO ArOsayeoqns no JO oy} []eIDAO JoquINU JO SIOMID Ul Jey) “AIOSO}VO ‘Ba dsR}UdDIOd JO dsuUd}-}UdSaId sIOLID yNO JO oy] ]|]}PIDAO JOQUINU JO ISU} SIOLIO

dSeUDDINg JO [e}0} JOqUINU JO ~8:a 9sUd} ‘sIOLIDjNO JO dy} SIO.LIO Ul dy} PRJEP

JOYINJ ainjny

“JOPIO

OseIYd°3°A) GI9A “WOJ “DDIOA “S9sUd}(272 pue SY} ‘siouTWo}9p) “Joquinu “satuds (272

‘UONSUNTUOD “‘UNOU “SLx9] PIOM JOPIO

[ROIFRULUR SIO

‘qIsAPy

asud} SIO Ul OY) GIA unou asesyd

ID)

‘sasud}

94}

je}07

je}0}

pnieuury](9861)

10.119 ssruonbaay nyu (6L61)

“ASo;oydiow yepow “sqioA

[e}0 ], JO SQJ9A JOquINU

‘suonisodaid ‘sixa] yorg AyOSayeo st POpIAIp OJUT ‘soLIOSoyoqns °3"a sud} :SIO.LIO SIO.LIO Ul sy} “o]duiis jsed pure

yROIX9] SIONSJNO JO

[BOIX9] SPJOM

JoquuNuU JO

JO JOQuINU

JOquINU JO

[e}O],

suontsodaid pasn

SIOLID

‘xeyuAs

yOIUOOUI osn Jo Jaquinu jo

JOOLIODUT ASN JO OY}

]JRJDAO

JL}O} SASNJO SQIOA UI osejUdNI0g JO joo1I09pure

suoNIsodsid yno JO dy}

BsLIUDIIOg JO J99LI09 Puke

SANLIOI IsNR]ID SJOIIDINO JO JY} SANe]O sasnejd‘posn

o]dusysed asud)INO JO DY} oy} ojduns ‘sed (Z)

(1)

9] pue :(suontsodaid duns) ysed asus)

JOQUINY JO JOquINU JO

JBOIX9]

sioug ul

Arejngeood

SunuNod poyjow

QIOA

$938)U9I100 BAOYsNG(6961)

SIOMIO

“Asojoydsow xequds pur

10.114

SY} JO INO

10119]

[BILXIT]SIOIIO

JROTXO] QIDA

sioug Jo

SALIDAYSNL]D SIOLID So1O89}e9 JUINbIY OY} sNdo04 OM] SOW 109 UO

19q5uq (S661)

(6861) Sury pur Aaprey

ZouUIeY,(€L6])

[RNUIIOg UOISEIIO SISATRUK

(PL61) IYORYS

(6861)

URWIOG PUv SIL] ]-1AOpIeg (OR6]) BULeYS

SIIPNAS

[210]

SPJOM JO JOqUINU [e}0] DY} JO INO sIO1ID JO JoquNuU [R10],

SIOMD [ROTXO] puke yeorsojoydiow ‘onoeyAs s10.09 uoHeNjouNd puke Surpods ‘yeorxoy “ONoeUASOYio]

PICSHSIAUL JO WUN-], 924J-J0.9 spsOM JOQUINY Jod SIOLIO JROIXD] /peorsopoydsow /oyoewAs JO JOqUINN, SHUN-},99.1J-JOU9JO JOQUINY

S.10114

Error in Early Error Analysis Studies

[BOIXI'] SIOLIO

090907] (9L61)

51

Accuracy across Proficiency Levels

Table 1.8 is testimony to the range of error counting methods developed by TEA and which present-day error analysis studies can draw inspiration from. As will be shown in Chapter 4, the present study has taken heed of the error counting legacy left by early error analysis work and has applied it to learner corpus research. The ‘what to do’ and ‘what not to do’ guidelines in terms of error counting are outlined in the summary box below. Counting errors in TEA

What not to do

|

¢

Omit to provide any form of quantitative evidence.

¢

Consider TEA as a research period which developed a series of sophisticated error counting methods. Consider that different error types will require different counting methods, e.g. obligatory occasion analysis may be workable for discrete items but is less suitable for covert-type errors. If possible, use a relatively refined counting denominator, if not the total obligatory contexts, then perhaps the relevant parts-of-speech.

¢

*

1.1.5. Error explanation

As was explained in Section 1.1.3 in relation to explanatory error taxonomies, TEA largely relied on intuitively tracing back errors to either interlingual or intralingual sources (or a possible interaction between the two). Figure 1.2 is a detailed representation of the psycholinguistic sources of error taken from Ellis (1994: 58). Figure 1.2: Psycholinguistic sources of errors (Ellis 1994: 58) Interlingualtransfer

Errors of competence

Intralingual (e.g. overgencralisation)

Unique (e.g. induced) | Errors of performance

Processing problerns Communication

strategies

52

Error in Early Error Analysis Studies

A number of comments

are worth making about Figure 1.2. A first

noteworthy elementis the clear-cut separation between interlingual/intralingual/ unique errors’ which are positioned under the errors of competence branch and, on the other hand, processing problems/communication strategies which are listed exclusively under the errors of performance branch. Concerning interlingual, intralingual, and unique errors, it is perhaps debatable whether they are always de facto competence-related. As a reminder, competence errors were defined earlier as resulting from a lack of L2 knowledge which the user could generally not self-correct while their performance counterparts were used as synonyms for ‘slips’, with learners normally being able to self-correct. We argue here that, in addition to positioning transfer errors (e.g. it depends of the context as produced by a French speaker) and intralingual errors (e.g. the DVD I lended him) under the competence branch, it might also be worth including these under the errors of performance section as it is possible for transfer and intralingual errors to be due to a temporary lack in performance. The L2 learner could spot and self-correct depend of and ended upon revision of his L2 production. Additionally, the ‘unique’ category remains rather vague as an error source, with just teaching-induced errors being used to illustrate this type of error. A question that might be raised is whether ‘unique’ errors should indeed be listed as a separate category on the same level as their transfer and intralingual errors as is the case in Figure 1.2. Teaching-induced errors can in fact be due to potential transfer from the L1 (e.g. a French-speaking teacher who repeatedly uses the word journey (French = journée) instead of day may reinforce this error in their students’ interlanguage’’) or or to interlingual issues (e.g. a teacher’s use of J wonder where are you going may induce similar-type word order errors among the learners). Concerning the errors of performance branch, Figure 1.2 suggests that “it is helpful to recognize two different kinds of performance mistake: those that result from processing problems of various kinds, and those that result from such strategies as circumlocution and paraphrase, which a learner uses to overcome a lack of knowledge” (Ellis 1994: 58). The graph thus implies that L2 errors can be made because of (a) processing problems whereby L2 knowledge is not directly and easily accessible, and (b) communication strategies or “communication tactics” (Tarone 1980: 418) which are used to make up for the inadequacies of L2 knowledge. Arguably, deciding whether processing problems and communication strategies have actually resulted in

10 Unique errors include teaching-induced errors “which occur when learners are led to make errors by the nature of the instruction they have received” (Ellis 1994: 60). 11 This is an attested teacher error heard in the French-speaking part of Belgium.

53

Accuracy across Proficiency Levels

an error may not be entirely straightforward. Processing problems, which are strongly linked to the pressures of L2 speech production, can lead to a series of ‘disfluencies’ such as hesitations, unintentional repetitions, self-corrections,

false starts or truncated words, which, as Gilquin and De Cock (2011) explain, can be difficult to distinguish from actual errors. The authors argue that the two “are rather complex and fuzzy notions that involve numerous factors and that could be seen to (at least) partially overlap” (Gilquin & De Cock 2011: 145). Similarly, the L2 items that result from communication strategies'* might not always be straightforwardly erroneous. Tarone (1977) distinguishes between three broad types of communication strategies: 1. 2.

3.

Avoidance, i.e. avoiding reference to a salient object or giving up reference to an object because it is too difficult. Paraphrasing which encompasses (a) circumlocution where the learner describes the characteristics of an object instead of the appropriate TL item as in ‘an instrument for grating cheese’ instead of ‘cheese grater’, (b) word coinage where the learner makes up a new word such as ‘long worm’ instead of ‘caterpillar’, (c) approximation where the learner uses an item known to be incorrect but which shares some characteristics with the correct item as in “worm” instead of ‘silkworm’. Conscious transfer which includes cases of (a) literal translation where the learner translates a word or phrase literally from the L1 as in ‘thing promised, thing owed’ (French = ‘chose promise, chose due’) which should be ‘promises are made to be kept’, and (b) /anguage switch or borrowing where the learner inserts words from the L1 into the L2 as in ‘charcuterie’ from French, for example.

Determining the erroneous nature of such communication strategies is by no means an easy feat and will, to a large extent, depend on how “overtly strategic” (Kasper & Kellerman 1997: 8) such instances are felt to be. Paraphrases which result from a gap in L2 knowledge and which are less efficient substitutes for a more precise target language term were flagged as errors in the present study. Examples from our data include ‘unmixed schools’ for “single-sex schools’, ‘way of eating’ for ‘eating habits’ (French-speaking data), ‘swimming costume’ for ‘swimsuit’, ‘playing fields’ for ‘playgrounds’ (German-speaking learners), “petrol without lead’ for ‘unleaded petrol’ or ‘foot-citizen’ for ‘pedestrian’ (Spanish-speaking learners). As for instances of conscious transfer, certain instances in our data were found to be more 12 According to Ellis (1994: 396), communication strategies are used primarily to deal with lexical problems.

54

Error in Early Error Analysis Studies

acceptable than others. Using an Ll term in cases where “a concept is lexicalized in the source language but not in the target language” (Kasper & Kellerman 1997: 8) could be viewed as acceptable, e.g. ‘charcuterie’ in French which has no direct equivalent in English. On the other hand, cases where the L1 term was used despite the existence of an L2 equivalent should indeed be marked as erroneous. For example, our data included the use of the Spanish ‘objecién de conciencia’ instead of the existing English equivalent, namely ‘conscientious objection’. This can therefore be considered a case of unnecessary borrowing. A certain amount of disagreement can be expected to arise among different analysts concerning the degree of acceptability of the L2 items that result from communication strategies. Moreover, identifying features that result from communication strategies may grow increasingly difficult as proficiency develops: “highly-proficient non-native speakers have been shown to be very good at anticipating and circumnavigating bottlenecks such that there is no obvious trace of difficulty in their speech protocols” (Kasper & Kellerman 1997: 3). It may thus be that although learners use a series of strategies to circumvent gaps in L2 knowledge, they may develop ways of doing so that are barely noticeable. One last issue in respect to Figure 1.2 concerns the presence of communication strategies under the errors of performance branch exclusively. This suggests that the deviances that result from such strategies are de facto due to a temporary lapse in performance. However, it is argued here that learners sometimes resort to avoidance, paraphrases and conscious transfer because of competence issues. When learners say J like it when the little white stones fall from the sky, intending to refer to hailstones, or J didn t have the_thing that you wear on your head when you ride a motorbike, meaning helmet, this

could potentially be because they suffered from either “temporary ignorance” (Singleton 1987: 335), in which case the circumlocution might indeed be considered as a performance issue, or “absolute ignorance” (Singleton 1987: 337), in which case the circumlocution is due to a limitation in competence. Kasper and Kellerman (1997: 8) put this point forward when they claim that communication strategies result from cases where “‘a speaker wishes to label a concept for which she does not have the lexical resources or where these resources are available but cannot be recalled”. The main error explanation insights passed on by TEA are presented in the box below:

55

Accuracy across Proficiency Levels

Explaining errors in TEA

What not to do

*

What to do

¢

¢

1.1.6.

De facto consider that interlingual and intralingual errors result from competence issues: they may occur at the level of performance and be the result of a heavy cognitive load placed on the learners. De facto consider items that result from processing problems and communication strategies as erroneous. If these are deemed to be deviant, however, the errors might be situated at the level both of competence or performance. See TEA as having laid the major groundwork for error explanation. Errors today are still regularly described in transfer vs. intralingual terms. Be cautious in pinpointing possible error sources and rather speak of ‘possible’ transfer or developmental errors. Bear in mind an error may be due to a possible interaction between both transfer and developmental processes.

Error gravity

A final step considered in TEA which will just be touched upon here is error gravity. The evaluation of error gravity, which became a subject of interest in the 1980s, aimed to develop a commonly accepted hierarchy of errors (Olsson 1972; Albrechtsen et al. 1980; Hughes & Lascaratou 1982; Vann, Meyer & Lorenz 1984; Santos 1988; Rifkin & Roberts 1995).

Comprehensibility, irritation and acceptability were considered as the three main determinants of error gravity. An error which affected message understandibility or which irritated interlocutors was considered ‘serious’. Because they are very relative concepts, these three determinants greatly compromised the development of an objective hierarchy of error gravity, if such a thing even exists. An important finding which emerged from this strand of error analysis is that an error is not intrinsically serious or minor. Albrechtsen ef al. (1980: 393) rightly argue that deciding on the seriousness of an error “is perhaps not primarily a function of its inherent qualities, but of the context in which it occurs”. Hence, for instance, a teacher who has spent

several hours explaining the use of relative pronouns in English is likelier to find an error such as the person which is in front of me is tall to be more serious than a colleague who has not yet tackled this issue. Similarly, an error such

as she gave me advices may be considered more or less serious depending on whether it is made by a beginner or an advanced learner. 56

Error in Early Error Analysis Studies

1.2. Concluding remarks As any emergent field, early error analysis suffered from a number of limitations which left considerable room for improvement. Regularly mentioned are sampling problems, the isolation of errors from their context, the use of sometimes impractical error classification systems and the tendency to generalise limited results. The point made here is that, in spite of their shortcomings, error analysis studies led the way for more refined work to be carried out on learner errors. In other words, it is felt that its actual contributions

should not be shrouded over by the weaknesses it displays. The lack of transparency concerning the nature of the data used in the studies reviewed here has brought to the fore the importance of a clear description for replication and generalisability purposes. By digging deeper into the literature published at the time it also appeared that a number of key qualitative findings had been suggested for the influence of the proficiency level factor on resulting error profiles. Additionally, early error analysis researchers greatly contributed to specifying the construct of error. Although the competence/performance, overt/covert, local/global distinctions are not frequently operationalised in present-day error analysis practices, knowing that errors can be made along the dimensions of cognition, context and intelligibility certainly helps in the error detection and explanation phases. The error taxonomies proposed in the 1970s were either overly detailed or mixed the description and explanation stages of error analysis, hence weakening the classification results. Nevertheless, it is such work which has constituted the basis for current-day error taxonomy development and which has raised subsequent awareness of the dangers of trying to simultaneously describe error types and explain their sources. Current error analysis work can be grateful to early error analysis for the wide array of error counting methods it has developed and which range from the very detailed obligatory occasion analysis to more general error proportion/error frequency methods. Present-day researchers wishing to refine their error counting practices are likely to find a method that will suit their research purposes among the variety of methods proposed at the time. The suggestion according to which errors may result from interlingual or intralingual sources, or a mixture of both, can be traced back to early error analysis and is still widely used in current work. The next step was to take on board some of the theoretical and methodological lessons learnt from traditional error analysis and incorporate them in a newlook error analysis carried out on the basis of computer learner corpora. This is the object of the next chapter.

5/7

x=

ont ea Ri Right ot 2 =

Yt ‘

+ we ie i

ch =

7

tan

vy) Sa i

,

i «

Cant

rn

7 he

i) panera :ar ~

pints

Vy

ied

Pie yea

rp

2.A NEW-LOOK ERROR ANALYSIS: THE LEARNER CORPUS APPROACH Despite steadily losing popularity in the late 1970s, error analysis never completely disappeared from the research scene, instead taking on a new look thanks to the compilation of computer learner corpora in the 1990s. The newlook error analysis conducted on learner corpora has since become known as computer-aided error analysis (CEA) (Dagneaux etal. 1998) and has aroused a substantial amount of research interest. We believe the time has now come to take stock of this work. In order to guide our observations of CEA practices throughout this chapter, a detailed research synthesis table was developed which describes the individual profile displayed by 69 corpus-based error analysis studies. For each study, the table provides information concerning the learners’ L1, the number of error types studied (single or multiple error types), the learner corpus used and its size, the learners’ proficiency level, the task type, whether the error detector was a native speaker or non-native speaker, the error annotation scheme, and the error counting method used. The research synthesis can be found in full in Appendix 1. This section focusses on two main issues: it first briefly reviews whether, and if so in what ways, learner corpora have reshaped error analysis practices in five of the steps considered in Chapter 1: data collection, error detection,

error classification, error counting and error explanation. The second section zooms in on the specific results yielded by CEA studies and synthesises the main research findings which are presently scattered throughout the CEA literature.

2.1. 2.1.1.

| Learner corpora in error analysis Treatment of the proficiency level factor

Granger (2002: 7) pioneered the compilation of learner corpora which she defined as: [E]lectronic collections of authentic FL/SL [foreign language/ second language] textual data assembled according to explicit design criteria for a particular SLA/FLT [foreign language teaching] purpose. They are encoded in a standardized and homogeneous way and documented as to their origin and provenance.

59

Accuracy across Proficiency Levels

The advent of computer learner corpora and their “explicit design criteria” means that researchers are now in a position where they are able to compile their own “customised corpus” (Pravec 2002: 94), controlling for some of the criteria that are known to influence resulting error profiles. Some influencing variables include the learners’ L1, whether the task was timed or untimed,

the task topic or the genre required of the learners (e.g. graph descriptions, argumentative essays, literary essays). One variable that has received comparatively scarce attention in learner corpus compilation so far and which warrants further commenting is that of learners’ proficiency level (Chapelle & Pendar 2008; Wulff & Romer 2009; Carlsen 2012). Wulff and R6mer (2009: 131), for example, await large corpora “that control for proficiency levels”. In Carlsen (2012), the author distinguishes between learner corpora which represent one level of proficiency such as for example the Montclair Electronic Language Database (MELD) (Fitzpatrick & Seegmiller 2001) which focuses exclusively on ‘advanced’ learner writing, and those which claim to include production at different proficiency levels such as the Cambridge Learner Corpus which is stratified according to the level of each of the Cambridge tests. Multi-proficiency learner corpora have relied on three main methods to assign proficiency levels. The methods are listed and explained in Table 2.1. They can be seen to go from grouped assessment (method 1), to general individual assessment (method 2), to specific individual assessment methods (method 3).

Table 2.1: Proficiency level assignation methods (based partly on Thomas 1994) Proficiency assignation method

Method 1: Institutional status assessment (= grouped assessment)

60

Description

Strengths/caveats

Learner production is |Quick and easy method but | assigned to a proficiency} there is no guarantee that level depending on _ the | learners in the same school learners’ position in an] year are at the same level. educational hierarchy, e.g.| This is a rapid group-based | students in their first and | assignation method. third year at university will be assigned to different levels.

A New-Look Error Analysis: The Learner Corpus Approach

Method 2: In-house and standardised test assessment

(= general individual assessment)

Learner production is assigned to a_ proficiency level on the basis of a locally developed (=in-house) or internationally recognised (=standardised) test such as the TOEFL or Cambridge ESOL tests.

Standardised tests are internationally recognised which are benchmarks considered to bess more reliable than their in-

house counterparts. In such instances, the test indicates proficiency as a whole rather

than proficiency on a specific skill (e.g. written essays, oral interviews). Method 3: Assessment of individual production

(= specific individual assessment)

Each individual production Proficiency is assessed for is assessed according to a each individual student on number of specific rating the basis of the specific task criteria and is subsequently included in the corpus. The assigned a corresponding proficiency score attributed is limited to the task represented proficiency score. in the corpus (e.g. writing) and does not necessarily extend to other skills.

Table 2.2 lists the multi-proficiency level learner corpora that, to our knowledge, have been developed to date. The table additionally provides information on the number of levels that have been distinguished between and the proficiency assignation method used. Table 2.2: Multi-proficiency learner corpora (based partly on Carlsen 2012) Learner corpus

Number of | Proficiency assignation method proficiency levels

INSTITUTIONAL STATUS ASSESSMENT(METHOD Chinese Learner English

5

1)

Year of study

Corpus of Learner German (CLEG)

Year of study + time abroad

Interlangue francaise (INTERFRA)

Year of study

Israeli Learner Corpus

Year of study

Japanese EFL Learner Corpus (JEFLL)

Year of study

61

Accuracy across Proficiency Levels

Polish and English Language Corpora for Research and Application (PELCRA)

Year of study

Corpus Ecrit de Frangais Langue Etrangére (CEFLE)

Year of study

IN-HOUSE AND STANDARDISED TEST ASSESSMENT

Cambridge

Learner

Corpus

(METHOD 2)

ESOL examination levels

(GLC) Corpus Escrito del Espafiol L2 (CEDEL2)

Japanese Learner Corpus (NICT JLE)

3

The University of Wisconsin college level placement test

English

Standard Speaking Test

(Tono et al., 2001)

2

Norwegian AndreSpraksKorpus (ASK) ASSESSMENT Longman (EEC)

Electronic

OF INDIVIDUAL PRODUCTION (METHOD 3)

Learners’

Learner

Norwegian (ASK) Written Corpus English (WriCLE)

| Two tests of Norwegian L2

Corpus|

Corpus of

Unknown

13

of |

ail 5

Learner

| Corpus Langue | corpus)

PARalléle Etrangére

Oral en See (PAROLE

Teacher assessment

CEFR-based assessment

Combination of error analysis and automatic grammatical sophistication analysis CEFR-based assessment going project)

(on-

Of the fifteen learner corpora listed in Table 2.2, as many as seven have made use of the quick and easy institutional status method. However, the proficiency-related work carried out on the /nternational Corpus of Learner English (ICLE) (Granger et al. 2009) is proof of the overall unreliability of the institutional criterion. The initial aim of the ICLE project in the early 1990s was to collect texts from advanced learners of English. Following Atkins and Clear (1992), the external institutional status was considered a safe

basis on which to determine proficiency: because the ICLE learners were all English-major undergraduate university students in their third and fourth year 13 Readers interested in the Longman Learners’ Corpus may visit the related website: http://www.pearsonlongman.com/dictionaries/corpus/learners.html although information as to the actual number of proficiency levels is not provided.

62

A New-Look Error Analysis: The Learner Corpus Approach

of study, they were de facto believed to be at an advanced level of proficiency. However, because the ICLE team had a hunch that certain texts did not qualify as advanced, they randomly selected 320 essays from their database (20 for each of the 16 L1 backgrounds) and had them professionally rated according to the Common European Framework of Reference (CEFR) (2001) descriptors. The findings confirmed the researchers’ hunch and revealed that the ICLE texts were actually spread across a proficiency range, from B2 and lower to C2. The results are provided in Table 2.3". Table 2.3: The ICLE

CEFR rating procedure: 20 essays per Li subcorpus (Granger et al. 2009: 12)

tongue

il Ma Ke RU "a Ral Mle ii a a “arr hele Saeomed ober gaa a a eee “Sra oo] reece aayRSSa any NO aa ART eral Fane oe Pe a ee) ee FcraANR |PRA 0 Meats[erotic wforeagtances] emieD0UF sei ld al Wael a a Maa | a “stolid ll od ral Se ef Pa ed Pe aa ee [isch a A RO fae Sere rr Although the majority of the corpus sample (194 out of the 320 texts; 60% of the data) was rated at the advanced C-levels (especially Cl), the proportion of C-level texts is much higher in certain L1 corpora (e.g. Dutch, German, Swedish) than in others (e.g. Tswana, Chinese, Japanese, Turkish) which are mainly representative of the B2 or lower range. In describing the 14 Although revealing, these results should nevertheless be taken as indicative as they concern only a small sample of texts graded by just one rater.

63

Accuracy across Proficiency Levels

overall proficiency level reflected in ICLE, it may therefore be more suitable to say that “it falls in the intermediate-advanced range” (Granger 2004a: 130). A very interesting case in the assessment of individual production category is that of the ASK corpus for learners of L2 Norwegian (Tenfjord et al. 2004, 2006; Carlsen 2012). This learner corpus had initially been proficiency stratified on the basis of method 2, namely two standardised tests of Norwegian as a second language, the first measuring intermediate language proficiency and the second advanced proficiency. Carlsen (2012) nevertheless wished to have the essays reassessed on the basis of method 3,

the assessment of each individual essay in terms of CEFR proficiency levels. She called upon as many as 10 raters and had them reassess the texts from eight Ll backgrounds, namely Dutch, English, German, Spanish, Russian, Polish, Somali and Vietamese, representing 1222 texts out of the total 1700 in the corpus (72% of the corpus). An extremely interesting finding was that texts selected from the same test and originally placed at the same level of proficiency (intermediate or advanced depending on the test taken) actually showed differences across L1 backgrounds: learners who had taken and passed the intermediate or advanced level test were not necessarily assigned an intermediate or advanced CEFR score. The statistical analysis of the grades given by the ten raters showed that while the English, Polish, and Russian learners were mainly representative of the B1 intermediate level, their Somali and Vietnamese counterparts were rated lower at A2/B1. The German group displayed the highest proficiency score, having been rated between C1/C2 overall. The assessment procedure carried out on this corpus now enables researchers working on Norwegian L2 to tightly control for proficiency scores when comparing learner groups. 2.1.2. New methodological directions

Error detection in CEA . In terms of error detection, TEA has taught us that errors are best detected and presented within their larger context of use. Although learner corpora have the advantage of enabling such an approach to errors, they have not magically ‘solved’ the error detection problem which remains one of the greatest challenges in CEA work. This is due to three main factors: (1) the learners’ level of proficiency, (2) the blurred border between actual errors and

infelicitous language and (3) the equally blurry issue of the norm. Concerning the role of proficiency level, advanced learners are renowned for making errors that can be particularly difficult to spot because they are in the more “shadowy areas” (Neff & Bunce 2006: 698) of language. This subsequently

64

A New-Look Error Analysis: The Learner Corpus Approach

raises the issue of the distinction between indisputable errors and infelicitous yet non-erroneous language use. Ellis and Barkhuizen (2005: 59) refer to this as “absolute errors” vs. “dispreferred forms” (infelicities), a notoriously grey area for anyone who has ever got their hands dirty doing error analysis work. Knowing which norm to adopt further adds to the detection difficulty. As shown in the research synthesis table, CEA researchers have mainly called upon native-speaker judges (and hence the ‘native-speaker norm’) to carry out the error detection work. Others however, e.g. Lewandowska-Tomaszezyk et al. (2000), Ju Lee (2007), Waibel (2007), Diez-Bedmar (2005), Eriksson (2008), have either (a) relied on non-native speaker judges exclusively or (b) detected the errors themselves as non-native speakers and only presented them to a native-speaker judge at a later stage!’. Spotting errors against a nativespeaker norm is not free of difficulty either: native speakers do not all share the same intuitions (Guo 2006) and they themselves also make errors (Ringbom 1987). The issue of register can also be contentious as accuracy in writing will be looked upon differently from accuracy in speech (O’Keeffe, McCarthy & Carter 2007). Degrees of prescriptiveness are also likely to vary among error analysts who will have different approaches to the use of /ess + countable nouns (e.g. /ess children, less ambitions), for example. Despite the difficulties intrinsic to taking on the native-speaker English norm, North (2000) rightly points out that there is at present no better alternative and workable norm against which to detect learner errors. The error detection difficulties are at the source of two main methodological caveats in CEA: (a) overdetection, also known as “overflagging” (Granger ef al. 2007: 258) or flagging elements which are in fact correct or largely acceptable, and (b) underdetection or underflagging, namely overlooking errors, which may be a particularly acute problem if the analyst is a non-native speaker of English. In spite of the above-mentioned difficulties, which are intrinsic to any error analysis endeavour, learner corpora have made it possible to detect and study a much wider set of error types. As explained by Granger (2009a: 16), learner corpus researchers are in a position where they can move away from overt errors and also investigate more covert errors: “[t]he traditional overemphasis on morphology is progressively being replaced by a greater attention to lexis, phraseology (...) and many other hitherto neglected aspects of learner language”. This is not to say that discrete-point items are not studied anymore: the research synthesis table shows that errors involving the English morphemes (e.g. third person singular —s, plural —s, article use, etc.) are still of interest in learner corpus research (e.g. Milton 2001; Izumi & Isahara 2004; Osborne 2007; Diez-Bedmar & Papp 2008). However, the 15 See the instances marked NNS (and NS) in the research synthesis table.

65

Accuracy across Proficiency Levels

more covert errors have recently also been drawing a considerable amount of interest. The research synthesis points to CEA studies that have investigated phraseological errors such as learners’ collocational errors (Chi e¢ al. 1994; Nesselhauf 2005; Gilquin 2007; Osborne 2008b) as well as more covert grammatical errors such as those affecting tenses (Granger 1999; Eriksson

2008; Rogatcheva 2009). The learner corpus error detection procedure today remains largely manual. Although the field of automatic error detection is very active in trying to encourage the machine-led detection of errors (Izumi et al. 2003; Izumi et al. 2004; Dodigovic 2005; Chen 2007; Chodorow et al. 2007; Tetreault & Chodorow 2008), this still constitutes work in progress. On the whole, research

so far tends to concur that the more overt-type errors, e.g. certain spelling and morphological errors, are much more amenable to automatic error detection (via spellcheckers for instance) than their more covert-type counterparts (Chen 2007). The power of spellcheckers should not be overestimated, however, as they may overlook some homophone errors, e.g. it's/ its, their/there/they’re, hence the importance of the wider context provided by the corpus and the human error detector (Rimrott & Heift 2008; Bestgen & Granger 2011). In order to assist research into the automatic detection of learner errors, more

manually error-annotated learner corpora should ideally be made available to the research community so as to be used as gold standards for the elaboration of error algorithms. The problem, however, is that once researchers have carried out the painstaking error annotation procedure, most are indeed reluctant to readily pass on the finished product. Error classification in CEA Annotating learner corpora for errors has become known as error tagging, which McEnery ef al. (2006: 42) define as “assigning codes indicating the types of errors occurring in a learner corpus”. A crucially important aspect to remember in relation to the definition of error tagging is that it constitutes a methodology, not a theory: “error recording and error coding is not methodologically misguided since error analysis is not a theory of SLA but rather a method, a method that can, in principle, service any theory” (Tenfjord et al. 2006: 93), Although ‘error tagging’ is strongly linked to the computerlearner corpus methodology, the underlying principle of detecting and coding errors is not intrinsically different from that practised in TEA. Learner corpus

research has nevertheless contributed to the error classification phase by bringing some much-needed rigour and consistency to the error annotation phase, with researchers now working on the basis of better-defined error classification systems (Diaz-Negrillo & Dominguez-Fernandez 2006). Table 66

A New-Look Error Analysis: The Learner Corpus Approach

2.4 below lists some of the learner corpora which, to this day, have been annotated for errors. As can be seen from the table, error tagging endeavours mainly target L2 English and concern written rather than spoken learner production. Table 2.4: Existing error tagging systems and their associated learner corpora corpus

language

Fitzpatrick & Seegmiller (2001, 2004)

MELD

English

Milton

HKUST

(2008) (2008)

| Multiple

Writing

aoe

English

Chinese

&Chowdury (1994) Ishara et al. (2002); Izumi et al. (2005)

NICT JLE

Gillard &

Longman Learners’ Corpus!®

English

Multiple

Writing

JEFLL

English

Japanese

Speech and writing

English

Multiple Multiple

Gadsby (1998);

Biber & Reppen (1998) Tono (2002)

Nicholls (2003)

Japanese

Kammerer (2009)

LINDSEI

English

Diaz-Negrillo & GarciaCumbreras

Unnamed

English

(2007)

ea

Speech

eae

(2002) Tomaszczyk | (2003)

67

Accuracy across Proficiency Levels

Chuang & Nesi (2006)

Chinese Learner

Granger (2003)

FRIDA

English

Chinese

French

Multiple

Corpus

.

on Moodle

(2005) (2002)

Martelli (2008)

CLEI

Tenfjord et al. (2004)

ASK

Stritar (2009)

PiKUST

English Norwegian

Multiple

Writing

Multiple

Although error tagging work is usually, and quite understandably, described as “time-consuming”, “laborious” and “painstaking”, Table 2.4 is testimony to the fact that error taxonomies are being actively developed and used by the research field. The table also highlights that error tagging is not a fully standardised procedure as researchers do not all rely on the same system to flag errors. Rastelli and Frontini (2008: 447) suggest this may be a weakness of CEA: “researchers have yet to agree about [sic] general taxonomy, the standardization of [sic] error tagset still a long way from being at hand”. We, on the other hand, argue that standardisation need not be seen as a necessary requirement, as different research objectives and different target languages

can justify the use of different annotation methods. For example, studies intending to carry out a detailed investigation of tense errors (e.g. Meunier & Littré 2013) will devise detailed annotation schemes to allow the in-depth study of this feature according to their specific research questions. Similarly, CEA studies of L2 German, for example, will understandably be based on different taxonomies than for L2 English as German will include errors in areas that do not apply to English (e.g. the German article system, adjective and noun declensions). However, it would be desirable for researchers who

study the same L2 and similar features to rely on a standardised taxonomy to facilitate the comparison of results. The research synthesis table revealed the following breakdown concerning the CEA studies that have used individuallycompiled taxonomies, those that have made use of the Louvain system (which is used in this volume and presented in Chapter 3) and those that have not 16 The Longman Learners’ Corpus website (http://www.pearsonlongman.com/dictionaries/ corpus/learners.html) does not provide any information concerning the error annotation scheme.

68

A New-Look Error Analysis: The Learner Corpus Approach

error tagged their data (but relied on other means such as the extraction of specific erroneous strings from part-of-speech tagged corpora). Table 2.5: Error taxonomy types used in CEA

Error taxonomy Lowvain taxonomy Individually-compiled

No error tagging Total

Although the main trend is to rely upon an individually-compiled error tagging system (51% of the CEA studies listed used this system), Table 2.5 nevertheless shows that the Louvain taxonomy (Dagneaux ef al. 2008) is one that is quite regularly used in CEA (29% of the CEA studies), probably because of its availability to the research community (it can be obtained from the Centre for English Corpus Linguistics (CECL)) and its transparency in the form of the accompanying error tagging manual which is regularly updated by the CECL team as error tagging work progresses. Inserting error tags into a learner corpus can be done in one of two ways, (a) via a traditional embedded mark-up system whereby the error tags are directly integrated in the learner file. In this system, annotators are usually required to choose one error tag per error; (b) alternatively, CEA researchers have started making use of the multilevel standoff annotation system (Liideling et a/. 2005; O’Donnell et al. 2009). In this case, the error annotation is stored in a parallel file, thereby enabling the marking of multiple error tagging alternatives, e.g. say as a possible lexical (say vs. te//) or lexicogrammatical error (say vs. say to). At present, however, multilevel standoff annotation systems are much less frequent than their embedded mark-up counterparts. This volume relies on the embedded mark-up system (Chapter 3). Whether one opts for an embedded mark-up or multilevel standoff annotation, the return on investment once the learner corpus has been error tagged is huge. It is acknowledged that by using text retrieval software programs such as WordSmith Tools (Scott 2012), researchers can (1) generate concordances for particular error types, (2) sort these in particular ways so as to make certain error trends stand out and (3) submit the errors to different error counting methods. An error-tagged learner corpus also gives learners access to errors which would be impossible to spot if unnannotated, e.g. omission errors are a case in point.

69

Accuracy across Proficiency Levels

Error counting in CEA Chapter 1 (section 1.1.4) emphasised that the traditional error analysis

period had developed sophisticated error counting methods, all of which served to calculate an accuracy score for each individual learner considered. Although a favourite counting method in SLA, obligatory occasion analysis represents the exception rather than the rule in CEA. There are several reasons why this may be so: (a) learner corpus researchers may not be used to applying this counting method, (b) identifying obligatory occasions of use would require a rather burdensome manual analysis of the large amounts of data used in CEA, (c) obligatory occasion analysis is impossible to apply to certain error types studied in CEA, e.g. the more covert lexical errors. Of the 69 CEA studies listed in the research synthesis table, only four have made use of obligatory occasion analysis, namely Izumi and Isahara (2004), DiezBedmar and Papp (2008), Rogatcheva (2009) and Crompton (2011). In her analysis of the misuse of the perfective aspect by German and Bulgarian EFL learners in ICLE, Rogatcheva (2009) compared both formulae and found that including cases of redundant use in the denominator, as suggested by Pica (1984), did indeed make a difference to accuracy rates, as shown in Table 2.6. While the accuracy rate went down slightly in the Bulgarian corpus, the decrease was more significant in the German data when redundant use was considered. Table 2.6: Accuracy rates: perfective aspect in ICLE-BU and ICLE-GE (based on Rogatcheva 2009) Obligatory occasion analysis

ICLE-BU

ICLE-GE

Redundant use disregarded

98.3%

91.4%

Diez-Bedmar and Papp (2008) also included redundant use in their calculations of article errors by Chinese and Spanish EFL learners. The authors calculated an accuracy measure for each of the English articles, the, a/

an, and the zero article. Overall, it was found that the Spanish learners had a higher accuracy rate than their Chinese counterparts. As for Izumi and Isahara (2004), they used the original formula proposed by Brown (1973) and did not consider redundant use. The authors used obligatory occasion analysis to

investigate the order of acquisition of grammatical morphemes by Japanese EFL learners in the NICT JLE corpus. They then compared the resulting order of acquisition with that proposed in SLA by Dulay and Burt (1973) who claimed theirs to be universally applicable across L1 populations. As it

70

A New-Look Error Analysis: The Learner Corpus Approach

happens, Izumi and Isahara (2004) found a low degree of correlation between the two acquisition orders (see the bolded elements in Table 2.7) and hence argued that differences in learner background may in fact cause differences in the acquisition order of English morphemes!’. Table 2.7: Acquisition order of grammatical morphemes (Dulay & Burt 1973; Izumi & Isahara 2004)

Acquisition order (Dulay & Burt |Acquisition order (Izumi & Isahara 1973) -‘universal order 2004) - Japanese EFL learners ie©

Plural — S

Possessive —s

The research synthesis table has revealed that the most frequent error counting methods in CEA involve: (a) Counting the errors of each type out of the total number of errors in the data (e.g. total article errors/ total errors) or out of the number of errors in a specific domain (e.g. total article errors/ total grammatical errors). These are known as error percentages or error-based calculations. (b) Counting the number of errors of each type out of the total tokens in the corpus, often out of a normalised basis (e.g. /1000 words) if several learner corpus samples of different sizes are compared. These are ‘wordbased calculations’ (Chen 2006). These two methods constitute a different way of looking at the number of errors in a text: word-based calculations show the actual frequency with which a particular error is made when considered out of the total tokens in the data. As for error percentages, they assess the importance of the individual error categories within learner errors as a whole. Hence, saying that a Chinese EFL learner corpus includes 10% article errors means that, when a Chinese learner makes an error, there is a 10% chance that it will turn out to be an article error.

17 See Milton (2001) and Tono (2002) for two other corpus-based studies of morpheme acquisitional order. Both of these studies also found significant differences compared with the initial order proposed by Dulay and Burt (1973).

71

Accuracy across Proficiency Levels

A small number of CEA researchers (Milton 2001; Ke 2004; Abe & Tono

2005; Abe 2007) have found a happy medium between the detailed nature of obligatory occasion analysis, on the one hand, and the quick and easy methods of error-based and word-based calculations, on the other. This happy medium comes in the form of potential occasion analysis and invoives calculating errors “as ratios of each relevant word class” (Milton 2001: 46), e.g. the number of grammatical errors on nouns out of the total number of nouns in the learner data. One characteristic shared by the learner corpus studies which have relied on potential occasion analysis is that they do not make explicit how exactly they implemented their counting method (the part-of-speech tagger used; the types of denominators extracted, etc.). This volume transparently describes its operationalisation of potential occasion analysis in Chapter 4. The CEA studies in the research synthesis have each been classified according to the counting method they used. As shown in Table 2.8, the two most frequently used methods to date are error-based and word-based counting, sometimes in combination with each other. Of all the CEA studies reviewed here, 75.5% relied on this quick and easy counting method. Table 2.8: Breakdown of error counting methods in the CEA research synthesis table Counting method

Description

Number of studies

Error-based and wordbased counting

Error-based calculations: Error types/ total errors or total errors of one type Word-based calculations: Error types/ total tokens — studies often use these two methods in tandem

D2, Ui 0)

Potential occasion

Counting the error types out ofthe potential occasions for error

8 (11.5%)

Mixed method

Use of potential occasion analysis plus another method, e.g. word-/ error-based calculations

3 (4%)

Obligatory occasion analysis

Counting the error types out of the number of obligatory contexts of use

4 (6%)

No quantitative evidence

No figures are provided because the study is work-in-progress

2 (3%)

analysis

Total

72

a Se ae

OE EN

69 (100%)

A New-Look Error Analysis: The Learner Corpus Approach

Error explanation in CEA Error explanation in CEA the psycholinguistic processes intralingual processes (section possible transfer or interlingual

is still widely established on the basis of identified by TEA, namely interlingual vs. 1.1.5). A frequently used method to pinpoint errors, is that of back-translation (Granger

1998; Tanimura ef al. 2004; Nesselhauf 2003, 2005; Borgatti 2006; Martelli

2007; Waibel 2007; Thewissen 2008) whereby erroneous target language items are translated back into the learners’ L1 to establish possible congruence between the L1 form and the interlanguage item. In Thewissen (2008) I used back translation to establish whether certain multi-word unit errors could possibly be L1-related. L1-transfer was established as a possible cause for the following errors, for instance: Anti-authoritarian parents willingly swim along with the current $go with the flow$ and promote a permissive upbringing (German= mit dem Strom schwimmen) or J hope that kids are not watching their small screen $television$ (French= petit écran) (see also Chapter 6 for the use of this method). Although a very worthy enterprise, back-translation remains a tentative approach to pinpointing the possible presence of transfer. In order to provide further empirical support to the transfer hypothesis, researchers may wish to draw on a recently developed corpus-based method which relies on evidence from learner corpora, native English corpora and corpora in the learners’ L1 (see Gilquin 2008; Paquot 2010; Jarvis & Crossley 2012): Although a very helpful tool, learner corpora should not be considered as having magically done away with the difficulties initially associated with the error analysis enterprise. The contributions and remaining caveats are summarised in Table 2.9 for each error analysis step. Table 2.9: Reshaping error analysis work with learner corpora EA step

Positive contribution

Remaining difficulties/ caveats

Data collection

Learner corpora were compiled | Largely unreliable indication of on the basis of strict design | individual student proficiency criteria that enable the study of | level. homogeneous datasets. Errors are not randomly gathered.

Error

Errors are presented in context, which helps detect both overt and covert error types. Learner corpora also enable the study of error types that have hitherto remained underresearched.

detection

This remains an_ intrinsically difficult step because of the fuzzy line between errors and infelicities, especially _—_ for higher-level learners.

73

Accuracy across Proficiency Levels EEE EE EEE

helped | Certain errors allow for several has tagging Error Error classification | systematise error classification |possible error tags (see the| procedures. Accompanying |appearance multilevel of manuals, when they exist, standoff systems for such cases). heighten tagging consistency. Error tagging manpower may be Error tagging highlights errors difficult to come by and be quite that would otherwise remain costly. undetected (e.g. omission). An error-tagged corpus can be submitted to concordancing software tools that enable error counting and the identification of error trends.

Error counting

Although this is still rarely done, |CEA studies have tended to combining an error-tagged and} depart from the sophisticated part-of-speech tagged version of |error counting method used in the same learner corpus allows |TEA (e.g. obligatory occasion for relatively refined counting | analysis) and have more heavily methods such as _ potential |relied on error-based and wordoccasion analysis. based counting.

Error explanation

Corpus linguistics enables the more reliable identification of transfer vs. intralingual error sources.

The identification of transfer and intralingual errors tends to remain largely intuition-based and relies on back-translation processes. Explanatory results, though insightful, therefore remain tentative.

2.2. Taking stock of CEA findings This section delves into the findings yielded by corpus-based error analysis studies to date. Going from the general to the more specific, results are first presented in terms of error rankings, that is to say the general position of error types with respect to each other. Focus will be on findings pertaining to grammatical, lexical and orthographic errors respectively as these are the areas where most error analysis studies have been conducted so far. Fine combing through CEA findings serves as a necessary backdrop against which

to situate the present volume and its added-value for the field of corpus-based error analysis work.

74

A New-Look Error Analysis: The Learner Corpus Approach

2.2.1.

Error rankings

Of the CEA studies presented in the research synthesis table, rare were those that decided to present general error rankings and consider the position of error types with respect to each other. Among the studies which did opt for a presentation of resulting error rankings, we identified four (Dagneaux et al. 1998; Neff et al. 2007; Ju Lee 2007; McDonald 2003) which were amenable to comparison given their two main common features, (a) the use of the same error tagging system, the Louvain error taxonomy (albeit in slightly different versions) and (b) the reliance on the same error counting method, namely error-based counting. For the sake of clarity, the research profiles of each of the four studies are reproduced in Table 2.10. Table 2.10: Four comparable CEA studies

Profile Dagneaux et | Neff et al. characteristics | al. (1998) (2007)

Ju Lee (2007)

MacDonald (2003)

Louvain taxonomy

Louvain taxonomy

Louvain taxonomy

Wordbased and

Wordbased and

Error-based calculations

error-based | calculations

error-based | calculations

Make-up similarities Error taxonomy

Louvain taxonomy

Error counting |Wordbased and error-based calculations Make-up differences

Learners’ L1

French

Spanish

Korean

Mixed: French, German, Spanish, Latvian,

Norwegian

Single/ multiple errors

Multiple

Multiple

Multiple

Multiple

Learner corpus |ICLE

[CLE

A selfcompiled Korean Learner Corpus

A self-compiled corpus of computer-mediated communication (chat and email writing)

Learner corpus | 150,000 size tokens

50,000 tokens

20,000 tokens

84,684 tokens

75

Accuracy across Proficiency Levels

Proficiency level(s)'®

|

Multiple: Intermediate |Advanced and advanced

Single: Elementary

writing

writing

writing

Single: | Intermediate to advanced

email writing

As seen in Table 2.10, differences in the make-up of these four studies nevertheless outnumber the features they have in common, meaning that the comparison of results should be carried out tentatively. The studies notably differ in terms of the learners’ level of proficiency which should be cautiously interpreted as “supposed proficiency” given the proficiency assignation methods used, namely institutional status and impressionistic judgement. Neff et al. (2007) described their Spanish EFL learners as advanced on the grounds that ICLE used to initially be presented as including advanced EFL learner production. In order to compare the errors of intermediate and advanced learners, Dagneaux et al. (1998) similarly used a sample of ICLE French as their advanced subcorpus and then supplemented this database with a similar-sized corpus of essays by learners in their first year at university, which constituted their intermediate subcorpus. MacDonald (2003) used impressionistic judgement (Thomas 1994) to describe her learners’ proficiency, positing that all the L1 groups in her project were at an intermediate to advanced level of proficiency, except maybe the German group which, according to her, seemed to display higher proficiency. Ju Lee (2007: 93) presented his learners’ proficiency as a given, “Grade 10 students who are at high beginner/ elementary level of English”. The respective error profiles yielded by these studies are reviewed in terms of (a) main error domain rankings and (b) within-category rankings for grammatical errors which were found to be the most frequent error type in the groups investigated. Table 2.11 below displays the error rankings for the broad error domains considered in each of the four studies, namely form, grammar, lexis, lexico-

grammar,

style,

word

order/redundant/missing,

punctuation,

register,

infelicities, code-switching, and sets the results side by side. The table shows

that grammar is the most densely populated category in all four studies. Grammatical errors happen to be ona par with lexical errors in the intermediate 18 We distinguish between studies which investigate intermediate and advanced proficiency levels, i.e. which use both an intermediate and an advanced corpus sample, and those which study intermediate to advanced learners, i.e. which analyse one proficiency level that happens to be situated between the intermediate and advanced levels.

76

A New-Look Error Analysis: The Learner Corpus Approach

French corpus (31% of all errors) and MacDonald’s (2003) mixed L1 corpus (25%). Surprisingly, lexical errors only rank fourth in the beginner Korean group where it accounts for 11.5% ofall errors compared to the 25%-31% range in all the other studies. Given the Korean learners’ low proficiency, one might have expected lexical errors to rank higher in this group. Undercorrection may have led to this result, as the main error detector was a non-native speaker of English with L1 Korean. Another explanation might be that the elementary learners used very simple lexis, hence reducing the number of errors in this domain. The researchers all found it disheartening that, in spite of the heavy teaching focus on grammar in the university curriculum, this still stood out as the most frequent error domain: “while the high proportion of lexical errors was expected, the number of grammatical errors, in what were untimed activities, was a little surprising in view of the heavy emphasis on grammar in the students’ curriculum” (Dagneaux ef al. 1998: 169). The prominence of grammatical errors may however have partly been skewed by the make-up of the Louvain taxonomy where grammar represents the largest error category. Another noteworthy difference in the main domain rankings concerns the percentage of formal errors. It stands out from Table 2.11 that the percentage of formal errors in MacDonald (2003) is double that found in the other three studies, 24% vs. 9-10% elsewhere. This is likely to be due to the task type considered in MacDonald (2003), namely computer-mediated writing (chat- and email-type writing), where it is implicitly more acceptable not to conform as tightly to the written norm as in essay writing. Given the more lenient approach to spelling, MacDonald’s (2003) detection method for the identification of formal errors is questionable: concerning erroneous non-capitalisation, for instance, she decided, on rather unclear grounds, to

tolerate (a) sentence initial missing capitalisation and (b) non-capitalisation of other participants’ names, but to error tag the non-capitalisation of names of countries, nationalities, languages, days of the week, months of the year, and

the use of the pronoun ‘I’ in lower case. Concerning register errors (e.g. kids vs. children), there appears to be quite a large discrepancy between French- and Spanish-speaking learners, 10% and 12% in the intermediate and advanced French data vs. only 2% in

the Spanish data'’. This difference may be due to the amount of attention that the error analysts paid to this category which is admittedly more peripheral in the sense that it is not concerned with errors per see but more with sensitivity to formal/informal language use. Analysts may understandably choose to consider errors proper more carefully, thereby leaving register aside. 19 MacDonald (2003) did not consider register errors as it would have been problematic to identify register problems in chat- and email-type communication.

Th

Accuracy across Proficiency Levels

The two studies that considered punctuation errors (Ju Lee 2007; Neff et al. 2007) found that these ranked quite high in their data: punctuation ranked third (12% of all errors) for the Spanish learners and second (14.5%) for the Korean group. Lexico-grammatical and word missing/redundant/order errors are all situated towards the lower end of the rankings.

78

B2 B2>CI

265

EE SRS

Accuracy across Proficiency Levels SS ORIG SS ae Et

2 Number of non-discri minating error typ es

[B2/C1/C2] Little (2006: 179) points out that “the higher the level, the more difficult it is to define level-specific linguistic resources”. Jones (2005: 18) makes a similar point concerning the link between the development of proficiency and learning gains: In the early stages learning proceeds quickly. A relatively small amount of effort produces a very substantial change in observable behaviour- enough to warrant identifying a level and offering accreditation of it. As learning proceeds, it takes progressively more time to make a substantial difference, and indeed, many learners plateau or drop out on the way. The higher levels are separated by smaller observable differences, but each level is needed because it accredits a final learning achievement or provides an interim target for those who wish to go further.

The findings yielded by the learner corpus analyses have opened up the debate as to whether, as Jones (2005) suggests above, it is desirable to keep B2, C1, C2 as separate entities or whether it would make sense to distinguish between, on the one hand, a B level (the current B1) and a broader C level which would encompass current levels B2, C1, C2. Importantly, our questioning applies to accuracy only and does not imply that levels B2-C1-C2 do not exist. They may well represent distinct learning stages which may be better captured via another construct than accuracy (such as complexity perhaps). Although we raise rather than answer this question, our findings have pointed out the important underlying issue of the “naturalness” of the six-level proficiency yardstick which should be borne in mind in the subsequent quest for CEFR-related criterial features. We may indeed want to ask ourselves whether we should really be looking for criterial features that distinguish between six proficiency levels from Al to C2 or whether the empirical evidence suggests that certain levels should either be grouped or further broken down. In other words, using learner corpus evidence to capture developmental paths may lead to a very different picture of L2 proficiency evolution than that currently presented in the Framework.

266

Working towards L1- & L2-Dependent Proficiency Descriptors

6.5. Concluding remarks This chapter has approached the CEFR from a variety of new angles by analysing the validity of present-day criticisms of the document, by pinpointing the place occupied by cannot do’s in the current linguistic competence descriptors, and by making suggestions about how a learner corpus methodology could help towards the elaboration of L1- and L2dependent proficiency descriptors. The important point was made that while some criticism is indeed warranted (e.g. the intuition-based methodology behind the elaboration of the descriptors or the unclear differences between certain adjacent proficiency levels), other ‘caveats’ such as the underspecification of the descriptors constituted rather unfair criticism given the very ambitious aim of the Framework which was to be applicable across L1 and L2 backgrounds. The point made here is that the CEFR should not be criticised for failing to achieve an aim that it was in fact not pursuing. In a way, the CEFR’s ambitious quest for a universal, L1/L2-neutral description of L2 language proficiency can be seen as mission impossible: “[t]he question remains: how can one document be applicable to all learners of English? It would be impossible to provide a universal list of language points that is of relevance to all learners wherever they are in the world” (Sheehan 2010). That said, the in-depth analysis of the descriptors for linguistic competence in their current L1-L2 neutral form nevertheless pointed to a series of more detailed inconsistencies which, to our knowledge, have not been brought to

light before. One such inconsistency was the fact that quite a number of implicit and explicit cannot do statements are in fact hidden behind the widely promoted ‘can do’ approach. Instead of beating around the bush and trying to wrap up ‘cannot do’s’ in ‘can do’ wording (as in the “errors occur, but...” phrasing), it

may perhaps be best to explicitly state what it is that learners at different levels of proficiency still cannot do. A second noteworthy inconsistency was the unexplained use of plus-levels for certain descriptors (grammar and coherence/ cohesion), but not others (vocabulary control, vocabulary range, orthographic control). Importantly also, we have highlighted the random approach to L1 influence in the present descriptors: L1 influence appeared for grammar at B1+ and orthographic control at B2 but remained conspicuously absent from the descriptors for vocabulary control and range, for example. This is questionable as the snapshot evidence in this chapter has shown that traces of L1 influence visibly remain in this area even at the higher B2/C1 level. In fact, it is very debatable whether even referring to L1 influence in the current CEFR descriptors is a wise move as the Framework is expected to be L1-L2

267

Accuracy across Proficiency Levels

neutral. Not mentioning the L1 would be understandable and would be seen as part of the “necessary underspecification” of the document in its current general form. Hopefully, the accuracy performance snapshots provided in this chapter show that error-tagged learner corpus data could serve towards the concrete elaboration of L1- and L2-dependent language descriptors. The snapshots given here are limited to L2 performance for one broad macro-level (B2/C1) and a single L1 group (French speakers) and therefore cannot claim to be reference level descriptors for French learners of L2 English. However, they can nevertheless be seen to be of value at more local levels for testing and teaching purposes at the B2/C1 level, providing teachers of French-speaking EFL learners with an attested pool of remaining error-inducing contexts and useful learner corpus-derived examples. Two main steps will be necessary towards transforming such snapshot descriptions into actual L1- and L2dependent reference level descriptors: a.

The identification of mother tongue transfer via a well-established learner corpus methodology (e.g. Gilquin 2008; Paquot 2010) and tracking the development of the amount of L1 influence across the proficiency

b.

A minute developmental qualitative analysis of error-inducing contexts across levels and mother tongue groups. This would enable to determine whether certain types of error-inducing contexts appear or disappear as proficiency increases (e.g. do cases of hypercorrection markedly increase with proficiency?).

levels;

268

GENERAL CONCLUSION The overarching aim of this study was to provide learner corpus-derived developmental insights into the construct of accuracy. This research question has situated our work at the crossroads between four main fields: (1) error analysis studies, (2) learner corpus studies, (3) second language acquisition developmental studies, and (4) language testing studies. The general conclusion takes stock of the main findings yielded in this volume and puts forward its contributions to each of the aforementioned fields. The section finishes off by proposing a number of worthy avenues for future research. Contributions to error analysis studies Error analysis, especially in its earlier, non-corpus based version, has rarely been acknowledged for the substantial contributions it has made to second language acquisition research (Ellis 1994: 70). Rather, second language acquisition materials often seem keen to pinpoint the areas where error analysis shows limitations (e.g. invented error examples, decontextualised presentation, poor sampling methods, etc.), with some scholars even going so far as to call it a “pseudo-procedure” (Bell 1974). The accuracy-oriented approach adopted here has made it necessary to relegitimise this research field without however brushing fundamental limitations under the carpet. This has been done via a research synthesis of non-corpus based error analysis work and a detailed critical analysis of each of the steps involved in the error analysis procedure: error detection,

error classification,

error explanation,

error counting and

error gravity. While acknowledging the limitations displayed by each of these phases, we were also keen to bring the positive contributions to the fore. These include, among others, the use of sophisticated error counting methods, the

development of error classification schemes, and discussions about possible error sources which have remained relevant to present-day research. Additionally, this volume provides a state-of-the-art research synthesis of the work that has been carried out in computer-aided error analysis to date. The analysis of c. 70 computer-aided error analysis studies has yielded a number of observations which are of general relevance for researchers wishing to embark on a corpus-based study of L2 errors. It has been stressed that learner corpora should perhaps not be seen as the cure to all woes when it comes to the practice of error analysis. While learner corpus data have undeniably led to a number of key improvements such as the use of more homogeneous datasets, the presentation of errors in their wider context and the widening of the scope of the construct of error itself, they have not necessarily done away with more tenacious issues such as the difficult distinction between actual errors vs.

269

Accuracy across Proficiency Levels

dispreferred forms, or the multiple interpretations that can be attributed to one and the same error. One key issue that stood out from the analysis of corpusbased error analysis studies and which was core throughout this book was that of proficiency level assignation. Proficiency-stratified learner corpora were shown to be few and far between and tended to largely rely on less than ideal proficiency assignation methods such as institutional status. The present study has strongly argued in favour of moving away from institutionallyderived proficiency assignation towards the assessment of individual learner production. Additionally, analysing the results yielded by previous error analysis work led to an important general word of caution in relation to the interpretation of error rankings. Error rankings are, in and of themselves, treacherous as certain error types may rank high because they concern parts of speech that occur very frequently in the first place (e.g. articles, nouns, verbs). Rankings should hence not de facto be interpreted as representing the intrinsic difficulty of different language elements. Finally, in breaking down the main existing computer-aided error analysis study profiles, it was discovered that the dominant general trend in the field to date has been to study errors crosssectionally at one point in time rather than developmentally across proficiency levels. Contributions to learner corpus studies This study has made use of learner corpus data to capture the development of the construct of accuracy and investigate the impact of the proficiency variable on resulting error profiles. It has contributed to learner corpus methodology in two major ways: (a) by combining error tagging and proficiency rating work and (b) proposing a novel error counting method. In terms of error tagging, this study was intent on adopting a macroapproach to the study of accuracy by capturing the development of a large number of error types. This has made it possible to analyse error profiles in many accuracy domains, some of which had rarely been submitted to a corpusbased error analysis. Particularly relevant domains pertain to grammar, lexis, lexico-grammar, form, and punctuation. A contribution of this volume is that it has shown how additionally carrying out a detailed proficiency assessment

procedure can considerably increase the value of the information provided by the manual error annotation work. Our work has stressed the great amount of care that should go into stratifying a learner corpus into proficiency levels. First, the assessment phase has relied on the rating of each individual learner production rather than on broad-brush procedures such as institutional status. Importantly, it was stressed that the raters should preferably have professional L2 assessment experience and, in the event where an actual training session is

270

General Conclusion

not feasible, should at least be provided with clear guidelines about the rating procedure itself (clear descriptor scales, accompanying instruction sheets, etc.). This volume has also illustrated how final scoring decisions could be reached by converting proficiency scores into numerical values which were subsequently averaged. An outcome of the combined error tagging/ proficiency assessment procedure is that it has enabled the developmental investigation of errors in terms of progress, stabilisation and regression across proficiency levels. We hope that the error profiles which have emerged from the analysis of the ICLE corpus sample used in this study will provide a useful basis for further L2 developmental research. As explained in this volume, the results put forward here are preliminary in the sense that the data are limited in type (written argumentative essays), in number (c.150,000 words in total), and in learner populations (different learners at different proficiency levels rather than the same learners across proficiency levels). Rather than be considered as a “completed research product” (Hawkins & Filipovi¢é 2012: 17), the ICLE-derived accuracy results yielded here would be worth testing in a truly longitudinal research format. An additional methodological outcome of our work for learner corpus research concerns the error counting phase. While early error analysis functioned on the basis of a series of sophisticated counting methods, the sheer amount of data that come with learner corpora means that computeraided error analysis studies have tended to shift away from more detailed counting methods, preferring agglomerative types of error counts instead. A contribution of the present study is that it has sought to find a happy medium between the very detailed methods suggested in traditional error analysis vs. the broad counting outlook adopted by its computer-aided counterpart. In proposing the potential occasion analysis method, our work has shown that combining an error-tagged and part-of-speech tagged version of the same corpus data can help determine “how many errors there are relative to correct instances of the relevant type” (Hawkins & Filipovié 2012: 28). In applying potential occasion analysis we proceeded to identify the denominator that was most suitable per error type (e.g. noun number errors out of the total number of nouns per text). This step has shown that different error types require a different type of counting universe, be it a part-of-speech denominator, the total number of sentences or the total number of words per text. On a related counting note, computer-aided error analysis studies have tended to remain at a comfortable distance from the more statistical aspect of learner corpus research. Rather, the field has generally displayed an overreliance on the agglomerative chi-square and log-likelihood measures which tend to find significant differences between groups where there are in

271

Accuracy across Proficiency Levels

fact none (known as a Type I error). Our work has made use of a series of more detailed tests such as the Pearson test, the one-way between-groups analysis of variance (ANOVA) test and the Ryan post-hoc test, all of which are based on frequency information from each individual text, thus taking variability between learner essays into consideration. Contributions to developmental second language acquisition studies This volume has made it clear from the outset that the developmental accuracy trends presented here were derived from pseudolongintudinal rather than truly longitudinal data. Ortega and Iberri-Shea (2005: 26) have said that “any claims about “learning” (or development, progress, improvement, change, gains, and so on) can be most meaningfully interpreted only within a full longitudinal perspective”. Far from questioning the validity of this assertion, the present study has had to consider practical constraints, among which the dearth of available longitudinal learner corpus data. By using pseudolongitudinal data, we have attempted to find a happy medium between a truly longitudinal approach and a purely cross-sectional research perspective. Our research is in keeping with the large-scale project carried out by Cambridge in the context of the English Profile Programme which also relies on pseudolongitudinal data to find criterial features for L2 English across the CEFR levels. In other words, although Larsen-Freeman (2009: 584) argues that we “must think longitudinally”, we hope to have shown in this volume that we can also “think pseudolongitudinally” and gain valid insights into the development of accuracy across levels of proficiency. Focus in this book has been on the development of accuracy at the intermediate and advanced levels, more specifically at Common European Framework levels B1, B2, Cl, and C2. The general developmental trends to

emerge from our analyses are that accuracy at these levels rarely shows signs of what Jones (2005: 18) calls “constant observable distance” between levels,

namely steady marked improvement. Instead, accuracy developmental trends have been shown to be mainly dominated by a mixture of both improvement and stabilisation patterns and stabilisation-only patterns. The majority of error types investigated have all been shown to plateau off at some point along the developmental scale. The specific area where accuracy showed the most marked signs of progress was between B1 and B2 while B2-C1-C2 were often found to display a tendency towards stabilisation. Although the results would need to be tested on truly longitudinal data, this finding seems to point towards the fact that, as claimed by Larsen-Freeman (2009: 584), we must think “nonlinearly”. Importantly, the point has also been made that certain error types can be seen as positive errors in the sense that they are

Zte

General Conclusion

representative of more general L2 improvement, e.g. lexical phrase errors may not show signs of progress in terms of accuracy from the upper-intermediate level onwards because of improvement in the range of phrases attempted. Certain error types may thus in fact be signposts of development happening in other areas of the L2. Based on a research synthesis of existing SLA studies with a developmental focus, we have been able to pinpoint a number of ways in which our work could be seen as a complementary addition to the studies that have already been carried out. SLA work to date has been shown to favour a micro-approach to L2 developmental analysis: it often takes individual language features as its object of study, thereby providing useful and detailed qualitative insights into the development of said features. Our study, for its part, has adopted a wider macro-approach, hence showing the quantitative and qualitative insights that can also be gained from investigating a large range of features. SLA developmental work has been found to show a strong preference for tracing the development of morpho-syntactic L2 features. While also considering L2 morpho-syntax, our work was keen to provide developmental information on under-researched areas such as lexis, spelling or even punctuation. Additionally, this volume has taken English as a Foreign Language learners rather than English as a Second Language learners as its object of study, thereby adding to the stock of rarer SLA developmental studies carried out on EFL learners. Contributions to language testing studies One of the original aims of this volume at its very beginnings was to propose a series of “new” linguistic competence descriptors for grammatical accuracy, vocabulary control, vocabulary range and orthographic control at each of the CEFR levels under investigation. With the benefit of hindsight, however, we came to the realisation that this was a rather over-ambitious pursuit. Not only were we curtailed by the nature of the data (the limited size of the error-tagged data sample precluded the elaboration of generalisable descriptor scales), but also by the object of investigation (descriptor scales would need to include information beyond just errors). Nevertheless, our work has hopefully provided a number of thought-provoking methodological insights into the future elaboration of empirically-derived descriptor scales for a given L2. Successfully distinguishing between language descriptors at different proficiency levels is dependent upon the prior identification of a number of discriminatory features. A first important outcome of the learner corpus developmental results for the field of testing concerns the discriminatory

273

Accuracy across Proficiency Levels

power of errors. It has been shown here that errors, or negative features as they are also called, may help discriminate better between certain proficiency levels than others. Specifically, the errors in our data have been found to be particularly helpful in teasing apart the intermediate (B1) from the upperintermediate (B2) levels, but their power on the B2-C1-C2 section of the proficiency continuum has been shown to be much more questionable. This finding has subsequently led us to question the “naturalness” of the six-level proficiency scale posited by the CEFR. While B1 has appeared to stand out as an independent “threshold level”, levels B2-C1-C2 have generally been found to lack marked signs of developmental change in accuracy. A key question is thus whether it makes sense to claim the existence of four separate levels at the intermediate and advanced

levels (B1, B2, Cl, C2) or whether a

different proficiency level breakdown would tally more closely with the developmental reality. Our corpus-based developmental analyses raise the hypothesis that accuracy development may not adhere to the B1-B2-C1-C2 trajectory suggested in the CEFR. Rather our results hint at the presence of (a) a general B-level which would encompass current level B1 and (b) a grouped C-level which would be representative of current levels B2, C1, C2. This hypothesis is offered in connection with the development of L2 accuracy exclusively and does not therefore imply that these levels do not exist in more absolute terms. It has indeed been suggested that turning our attention to more positive features such as levels of L2 complexity could perhaps reveal marked developmental differences from the upper-intermediate level onwards which would then warrant the existence of a B2, Cl, and C2 level. Looking at the

construct of accuracy exclusively, however, suggests that relying on a ‘default’ six-level proficiency scale may not correspond to developmental reality. In addition to questioning the naturalness of the CEFR proficiency scale, our work has put forward a number of inconsistencies in the wording currently found in the CEFR descriptors for linguistic competence (grammatical accuracy, vocabulary control, vocabulary range, orthographic control, coherence and cohesion). We believe it was important to point these out so as to take them on board for future work looking into the elaboration of more refined language descriptors. The scheme used to analyse the descriptors has revealed that, despite the CEFR’s official “can do” approach to language learning and testing, the wording of the descriptors hid a number of implicit and more explicit references to cannot do’s. This led to the suggestion that future work aiming to develop empirically-derived description scales may wish to make explicit both what L2 learners can and cannot do at distinct levels of proficiency. A second major inconsistency concerned the random references to mother tongue influence in the descriptors for grammar and orthography.

274

General Conclusion

It has been argued that randomly peppering the linguistic descriptors with unsubstantiated claims about L1 influence was rather unwise, especially given the L1/L2-neutral make-up of the Framework in its current version. Another weakness involved the intuition-based method behind the developmental trends expressed in the descriptors. By comparing the intuively-derived developmental trajectories with those obtained from our learner corpus analyses, we have shown that how teachers think in terms of L2 development and what the learner corpus says can sometimes differ quite substantially. Hopefully, these preliminary comparisons of intuitive and learner corpusderived L2 behavioural patterns have shown that the use of learner corpus data can help express developmental behaviour more concretely and reliably. In addition to using learner corpus data, we have proposed multi-layering the Framework to make it work better. While in its current L1/L2-neutral version the CEFR cannot reasonably be expected to be anything but “vague”, “imprecise”, and “undefined” (Alderson 2007), this volume has argued that two further layers of analysis could help improve matters. Layer 2 consists in the elaboration of L2-dependent/ L1-independent reference level descriptors, namely descriptors which encompass discriminatory features for a given L2 that are of relevance for all learners irrespective of their mother tongue background. It is encouraging to see that the elaboration of reference level descriptors at Layer 2 is well underway within the English Profile Programme for learners of L2 English. Layer 3 involves going one step further and developing L1and L2-dependent reference level descriptors, that is to say descriptors for learners of one specific L1 acquiring a given L2. The accuracy snapshots for French-speaking learners of English at levels B2/C1 are a concrete example of the shape that L1- and L2-dependent descriptors could take pending further work. The current snapshots are preliminary but nevertheless provide valuable information concerning remaining error-inducing contexts at the higher levels as well as learner corpus-derived example banks for French-speaking EFL learners. However, considerable amounts of work remain to be done in order to

transform such snapshots into actual reference level descriptors. The English Profile Programme mentioned above and throughout this volume has, in the meantime, significantly contributed to doing this. Avenues for future research Working towards L1- and L2-dependent reference level descriptors constitutes a promising field for future research. Not only will this provide considerable insights for language testing itself, but also for second language acquisition developmental studies. We argue that the elaboration of such language descriptors involves research into the following four main areas:

275

Accuracy across Proficiency Levels

(a) tracing the development of L1 influence across proficiency levels, (b) tracing the development of error-inducing contexts across proficiency levels, (c) going beyond the analysis of accuracy exclusively, and (d) trying out alternative statistical methods to the ANOVA test. The issue of L1 influence was first intended to be given much more weight in this book. The initial objective was to study the combined impact of both the proficiency level and the learners’ L1 background (French, German, Spanish) on resulting error profiles. However, once the CEFR rating procedure had been carried out, it became clear that the very uneven breakdown of texts per L1 background and proficiency level hindered the detailed analysis of the data from a combined L1 and proficiency point of view. Nevertheless, this combined perspective remains a promising avenue for future research as highlighted by a preliminary study carried out on our data (Bestgen et al. 2012) which has identified the presence of L1-effects on the error profiles of French and German learners at the same level of proficiency. Additionally, there is, to our knowledge, no existing corpus-based research whose aim is to capture the development of L1 influence across proficiency levels. Using corpus data to empirically determine whether L1 influence increases, decreases or stabilises across the proficiency continuum constitutes a truly worthwhile enterprise, especially in an L1- and L2-dependent language descriptor perspective. In addition to developmentally tracing the amount of L1 influence across the levels, another avenue worth pursuing involves capturing the development of specific error-inducing contexts across the proficiency stages. Our work so far has provided a number of partial, preliminary insights into remaining error-inducing contexts for French-speaking learners of English at levels B2/C1. It has for example been shown that noun-number errors at this level are still likely to occur after the phrase ‘one of” (e.g. one of the *function of religion) or in contexts which require the use of the distributive plural (e.g. English people with their *cup of tea). Future error analysis work may wish to carry out a more rigorous classification of error-inducing contexts for given error types at different levels of proficiency so as to trace marked changes

in the sorts of contexts which lead learners to err. This would help answer questions such as whether noun-number errors in the early stages of learning predominantly occur in what can be considered rather ‘straightforward’ contexts (e.g. after a clearly plural numeral as in three *chair) compared to the more complex error-inducing environments detected at the higher levels. Such concrete qualitative information would certainly take pride of place in L1-/L2-dependent descriptors. Crucially, we believe that the accuracy-oriented approach should be complemented by the study of development in L2 complexity. Considering

276

General Conclusion

errors in complete isolation can indeed be considered treacherous if development in other areas is disregarded: students may make very few errors because they tend to remain within their comfort zone, while other more adventurous learners may end up making a higher number of language errors (Murcia & MacDonald 2011), However, to say anything meaningful about the relationship between accuracy and complexity, the learner corpus data need to be submitted to complexity measures (see for example the Coh-Metrix tool for lexical complexity or the use of parsed learner corpora to trace grammatical complexity (Graesser et al, 2004)). This would provide relevant information about possible areas of parallel development (c.g. simultaneous improvement in levels of accuracy and complexity) and areas where a trade-off effect may be at play (e.g. lack of improvement in accuracy but improvement in levels of complexity). Another avenue for future research consists in trying out alternative statistical tests to the ANOVA procedure used here. One method that would be particularly useful in tracing L2 development is that of trend analysis which identifies patterns of behaviour (Field 2005). Trends are identified as either linear, cubic, or quadratic. Another statistical avenue worth exploring concerns the use of multiple regression analysis which would include one dependent variable (proficiency levels) and several predictors (error types). This method investigates whether errors can be relied upon to automatically identify levels of proficiency and, if so, pinpoints the specific error types which are most predictive of given proficiency levels. In light of the results presented in this volume, multiple regression analysis may be anticipated to display a high degree of accuracy in the prediction of proficiency levels B1 vs. B2. Hopefully, the use of this sophisticated statistical method may also help to better distinguish levels B2, C1 and C2 than has so far been possible. J embarked on my error analysis adventure a few years ago at the Centre for English Corpus Linguistics at the Université catholique de Louvain, Belgium. Although computer-aided error analysis was already the object of active research at the time, this volume has hopefully helped steer the field in innovative directions. New challenges were taken up but many more remain to be met. I hope that this book will bring new research impetus to many ofthe issues raised here so as to continue building bridges between error analysis, learner corpus research, developmental second language acquisition and language testing.

ia

os

ee

ee

ee

azis

pue

sunum Aessq

Ce WOT S[PAZ]

odin

sunoi

Suni | SNN [eNprlAIpuy

10339)2p

SNN/SN

poseq-101q NM Ppeseg-P1o suyunos sununos poseq-10uq

Poseq-P1oM Sununos pue jequojod sisAyeue A1oyes1190 sisAyeur

peseq

Suljunod P2seg=PsI0M -10119

sununos

peseq-p10M

sununos

UOISeI90 UOISed90 AO1NY

AWOuUOoXxe} pue

poseq-10q sununos

AWOUOXe}

[enprArpuy

sunum Aessq

sudy0} Sbr vez

YSIPOMS jo vsndiooSIOUID|

(1gJo Ayuolew ZO 1 TY WOL] rojdnjnyy

Aq Sunuim

UIeANO'T

Awouoxe} [enpiaipuy | poyioodsuy | Sunium Aessq

“si0u1a ysijsugq Suljjods jo

ss) jeoneUUe PJOM JOP10 “SIOLIO

(‘97a “si0S

L, ¢sudo} Aguaroryoid UdSSIMOU uUIS[TII yenuew “[PULIO})

= 18303) (69 eral

(6007) BasHe4

IIUIIIJIY,

YSIPOMS

(S19U1vIT

310

aidniny

$101.19

Accuracy across Proficiency Levels

Appendices

ujunod 10117 suunod

peseq-ouy surjunos

Peseq

paseq-Jouq

Suyunos

peseqoug

suyunos

suyunos

PIO M

| wouoxeR} Aulouoxe} AuLOUOXxe} Ausouoxe} [enprArpuy Awiouoxe} 10117 uleAno’T UeANO'T

ysey,

poseq)

| poroodsup

SNN/SN 10}9939p

odAy

jenprarpuy AuLOUOXe} uo

SN

SN

| Sunum

| Sunum

SN | sunt

| sunum

Aessq

Kessq

Kessq

Aessq

[gq AQUIIDIFOI“JB poourApe oyeIPOUID,U] (s)[2A2] WAAO Aguo1oiyo1d Aguaroiyoid Woly “rouulsog IPOWLIO}UL poourape 91dUAAO soidyinyl coidyinyy SIOUIEO] (Go Ino4 Saag] in pur € S[OAI] WO TY GO! azis JQUIvIT 000‘0S1 snd109 OOS*LE S69 SUdyO} Sud}40} SUDO} Sudyo}

UWIOUOXP}

Aoua1oryoid

cojdniny WAAO ¢

CO OF | poyroedsun,

LT sndi09

sndio9

ospuquie)

19U189]

cee)

ATI

DAT-O

Jowses’]

pue (sio119 pure aj pelpnjs “[EOIXo] “[ROIX9] diyjnuaysuig prom) oseiyd UI (SIOLIO [eanjonys jeoneuuress) ‘yeoneuuiels jeoneuuess “‘spunodwi09 SIOLIO uoNIsodaid juopuodap $101.19 ordinal ody 1dniny, ‘PBULIOJ) ayduyiny, ‘PRUUIOy) yeorxo]) -O91X9] (S101I9 [BOIX9] J9P10 in, pue aspuquied GIoApe “‘JOpsO Sjoao] WO TV (epio sanoelpe (‘aja “sion | | | “uereyy ZT) Sout] TD) aial[e10}(QUdA0|S “ueuLIoy (uRWLIOD ‘ystueds osoulya ysrueds ysisugq “yous,ystueds “yoozD Ul gy]

(69 e0ua19j0Y = 18703)

| (007)

[ee

|

USSSIMOY, (8007)

|

Mess

(9007)

oN

Jos (So0z 19quia

SHeEITTIM

‘Z00Z)

A

311

Accuracy across Proficiency Levels

| WADA

000°

et iro’ €7Z

£91

|S10

GGG oe

N

uonejauog uosieag|]

(poyrey-z) “BIS

N

UOl}R[I1I0D UosIesg

(pa|tes-Z) “SIS

LEI

cv -

ces

€CC

SIE

€CC 0cl

N

blo

(po[tey-Z) “BIS

l

000°

€7

UOLe[ALIOD UosIedg

oh: €

€CC

€C7

lle

€CC

!

blo ocr

170° er

ltE

€CZ 000°

€CZ lis‘

ceo-

€CZ 09°

..00€°

ECC 000°

O9E

€CT 000°

l

(Payiey-Z) “BIS

UOI}R]ALIOD uosiedg

N (poyrei-z) “Bg

UONPIILIOD uosIesg

rr"

000°

€CC EC NECT

=XAVADds

joudy

Wads

Sadi

vod

LAM

800° 000° z10° 000" ‘BIg (pa[te}-z) en ace |

uoNR|sLOD uosiesg | 24098 YAO

N

UOT}R[ILIOZD UOsIesg

000°

6SS~

ECC

21989"

600° suv

ECC LO

9a | xnvaod | Ladd

8Lt-

€CC

VEE

000°

Suoyepes110;5 191q¥}

N (poytes-Z) “BIS crO-

ECC ces

e

€C S10°

€CC 000°

for

€CC

000°

I

€27

| ANADG

2 0SE-

€@Z

AAD

£9C-

Score aco

312

CEFR between Tables Detailed Correlation

APPENDIX 2

and Error sea4

N

N

“SIS

“BIS

“SIS

UOTLIILIOD

UONRIILIO|D

uOoNe]aOD

(poytei-Z)

uosiedg

(pa[ley-Z)

uosIedg

uosiedg

(peytel-Z)

UOHLIOIOD

N uole]aHOD

Pac 6£9 8h 760 68S Z9r 80

(porte

Gene |

197

a



(payre}

uonejood

:

I