Fundamentals of Formulaic Language: An Introduction 9780567186416, 9780567278982, 9781474218771, 9780567332172

This is the first book to address formulaic language directly and provide a foundation of knowledge for graduates and re

305 152 2MB

English Pages [209] Year 2015

Table of contents :
Cover
Contents
Preface
1 Formulaic Language Research in a Historical Perspective—Across Decades and Continents
2 Identifying Formulaic Language—Frequency, Psychological Representation, and Judgment
3 Categories of Formulaic Language—Labels and Characteristics
4 Mental Processing of Formulaic Language—Holistic and Automatized
5 Formulaic Language and Acquisition—First and Second Language
6 Formulaic Language and Spoken Language—Fluency and Pragmatic Competence
7 Formulaic Language and Written Language—Academic Discourse in Focus
8 Lexical Bundles—Corpora, Frequency, and Functions
9 Formulaic Language and Language Teaching—Research and Practice
10 Current and Future Directions in Formulaic Language Research—Gaps and Pathways
References
Index

Recommend Papers

Formulaic Language: Pushing the Boundaries 0194422453, 9780194422451

Examines how formulaic language (lexical chunks) is used in a variety of real-life situations. Presents a framework for

97 18 Read more

Philosophy of Language: An Introduction 9781350284791, 9781441180513

Philosophy of Language is an accessible yet detailed introduction to the major issues and thinkers in the subject. Thema

160 22 2MB Read more

Oral Formulaic Language in the Biblical Psalms 9781442653382

In Oral Formulaic Language in the Biblical Psalms, Robert C. Culley discusses dynamics involved in oral composition of p

122 46 5MB Read more

Perspectives on Formulaic Language: Acquisition and Communication 1441150471, 9781441150479

This edited collection draws together diverse international work on formulaic language such as idioms, collocations, lex

448 75 7MB Read more

Fundamentals of Language 9783110889611, 9789027930743

163 108 6MB Read more

Linguistics, An Introduction to Language and Communication

642 75 23MB Read more

Linguistics, an introduction to language and communication

629 107 5MB Read more

An Introduction to English Language 9781137496881, 1137496886

Back for its fourth edition, this core textbook offers a clear and engaging introduction to the building blocks of the E

118 0 4MB Read more

An introduction to the !Xũ (!Kung) language

212 53 9MB Read more

An Introduction to the Boro Language 8183240852

103 24 53MB Read more

Fundamentals of Formulaic Language: An Introduction
9780567186416, 9780567278982, 9781474218771, 9780567332172

Author / Uploaded
David Wood

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Fundamentals of Formulaic Language

ALSO AVAILABLE FROM BLOOMSBURY Formulaic Language and Second Language Speech Fluency, David Wood Language in Education, Rita Elaine Silver and Soe Marlar Lwin Linguistics: An Introduction, Second Edition, William B. McGregor Perspectives on Formulaic Language, edited by David Wood Research Methods in Applied Linguistics, edited by Brian Paltridge and Aek Phakiti Why Do Linguistics?, Fiona English and Tim Marr

Fundamentals of Formulaic Language An Introduction

DAVID WOOD

Bloomsbury Academic An imprint of Bloomsbury Publishing Plc

LON DON • OX F O R D • N E W YO R K • N E W D E L H I • SY DN EY

Bloomsbury Academic An imprint of Bloomsbury Publishing Plc 50 Bedford Square London WC1B 3DP UK

1385 Broadway New York NY 10018 USA

www.bloomsbury.com BLOOMSBURY and the Diana logo are trademarks of Bloomsbury Publishing Plc First published 2015 © David Wood, 2015 David Wood has asserted his right under the Copyright, Designs and Patents Act, 1988, to be identified as the Author of this work. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. No responsibility for loss caused to any individual or organization acting on or refraining from action as a result of the material in this publication can be accepted by Bloomsbury or the author. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

ISBN: HB: 978-0-5671-8641-6 PB: 978-0-5672-7898-2 ePDF: 978-0-5673-3217-2 ePub: 978-0-5672-7777-0 Library of Congress Cataloging-in-Publication Data Wood, David (David Claude), 1957Fundamentals of formulaic language : an introduction / David Wood. pages cm Includes bibliographical references and index. ISBN 978-0-567-18641-6 (hb) – ISBN 978-0-567-33217-2 (epdf) – ISBN 978-0-567-27777-0 (epub) 1. Linguistic analysis (Linguistics) 2. Linguistic models. 3. Discourse analysis. 4. Language acquisition. 5. Applied linguistics. 6. Psycholinguistics. I. Title. P126.W66 2015 410– dc23 2015014502 Typeset by Integra Software Services Pvt. Ltd.

Contents Preface vi

1 Formulaic Language Research in a Historical Perspective— 2 3 4 5 6 7 8 9 10

Across Decades and Continents 1 Identifying Formulaic Language—Frequency, Psychological Representation, and Judgment 19 Categories of Formulaic Language— Labels and Characteristics 35 Mental Processing of Formulaic Language—Holistic and Automatized 53 Formulaic Language and Acquisition—First and Second Language 67 Formulaic Language and Spoken Language—Fluency and Pragmatic Competence 81 Formulaic Language and Written Language—Academic Discourse in Focus 101 Lexical Bundles—Corpora, Frequency, and Functions 121 Formulaic Language and Language Teaching— Research and Practice 139 Current and Future Directions in Formulaic Language Research—Gaps and Pathways 159

References Index 191

173

Preface M

y first encounters with formulaic language date back to the mid 1990s during my time as a teacher of English as a Second Language (ESL) and English for Academic Purposes (EAP) at a large university. I became particularly intrigued with the teaching of spoken language and the challenges presented for second language learners by the real-time, ephemeral nature of speech. In looking around for resources and background knowledge to help, I found myself encountering the term fluency very often in the literature. I began to look for the underlying psycholinguistic mechanisms and the research on the nature of fluency and found some reference to the role of formulaic language. Some of the papers I read alluded to the notion that formulaic language might play some role in facilitating fluency speech, or that formulaic language might be a fundamental aspect of spoken communication in several ways. This elusive phenomenon came to haunt my dreams for some years to come, as I soon thereafter embarked on doctoral studies with a goal of attempting to measure or examine the relationship between formulaic language and fluent speech. I am still fascinated by the study of formulaic language, and have since examined it from several other perspectives, including pedagogical and corpusbased. I have seen my students in graduate programs become interested in formulaic language too and seen them set out to study formulaic language from various perspectives—in academic writing, in the speech of autistic children, in textbooks, in the discourse of official meetings, and more. I have taught seminars on the topic and supervised a range of master of arts and doctoral projects. Throughout all of this, I have seen students struggle with the sheer volume and range of literature. The multidisciplinary nature of the field means they the need to quickly grasp concepts from areas as diverse as psycholinguistics, vocabulary research, and discourse analysis, to name but a few. To add to the burden, it became painfully clear early on that much of the written work in the area is not particularly reader-friendly, especially for those new to the field. The combination of complex concepts, diverse sources, and opaque prose has made the establishment of a foundation in this area an uphill climb indeed.

PREFACE

vii

This has led me to take on the task of creating an overview of the area for newcomers. The present volume is meant to be a resource for new researchers first and foremost, but may also be a reference source for established scholars. The content in this book is not mine, it is a distillation of the work of many others from across the decades. It is taken from a wide range of sources, from original research reports, review and summative stateof-the-art papers, edited collections, and so on. The content of this book is not meant to be a complete presentation of every bit of research conducted to date, it is meant instead to be a start, a place for readers to get a sense of what exists, and then to go further beyond this book as needed. Some parts of the book go more deeply into the literature than others, partly due to space constraints, partly due to my own perceptions of what needs to be foregrounded. I encourage those who use this book as a teaching resource to bear that in mind and to point out to students what more is needed in any area. The book ranges widely, stops to scrutinize, and at the same time dances across many areas of study. This is inevitable for an overview like this, crafted by a single author. Please feel free to pick, choose, adapt, adjust, dismiss, embrace, or elaborate on anything you find herein. I hope this book will be a support for teachers and students in this area. I fervently wish for you to be inspired to take some risks in research, to push some boundaries in the area, and to go ahead and create new knowledge. I must give thanks to those whose work was so useful in creating this book: To my students Randy Appel, Ridha Ben Rejeb, Lina Al Hassan, Alisa Zavialova, Olga Makinina, Lin Chen, and Joelle Doucet. Special thanks to Joshua Romancio for editing assistance. And others from whom I have been inspired. Ottawa February, 2015

1 Formulaic Language Research in a Historical Perspective— Across Decades and Continents

S

ome years ago I had two interesting experiences with students of English as a second language, which sparked in me an early interest in the formulaic nature of language. A clever student whose first language was Spanish came to me during a break in class and asked “Teacher, what means festival?” I was a bit puzzled by the question, since the recent lessons had had no content related to this, but I manfully attempted to explain the word as best as I could. It was the student’s turn to be puzzled, as he tried to fit my definitions with what he was trying to understand. After some struggle, he interrupted me to ask “Why do you say this word at the start?” It slowly dawned on me that what he had heard as festival was in fact first of all, which I did indeed tend to use at the start of lessons and to give instructions. It surprised me that he would interpret a three-word sequence as a single word, but I put it down to some confusion about phonology and a general lack of vocabulary knowledge. Another incident occurred with a very active and alert student from Cambodia, who arrived in my beginner class quite late in the course, with limited to no English proficiency whatsoever. She bravely plunged into the job of becoming a member of the group and learning what she could. Her English output after several lessons began with I no stan, whenever anything was addressed to her in English or if she had to participate in anything by speaking. Later, this was modified to I don no stan, and later still it became a closer replication of the sequence I don’t understand. I noticed this at the time as an amusing example of a student mustering one resource to cope with a

2

FUNDAMENTALS OF FORMULAIC LANGUAGE

really challenging situation. It also appeared odd to me that she had taken a three-word sequence and interpreted it as a single word. Later, in hindsight, this story and the festival incident came to represent to me the power of formulaic language and a glimpse of the process of language acquisition and the importance of formulaic language in it. To begin any discussion of formulaic language, it is important to establish some foundations and establish the terminology that is used to refer to it, and to look at a definition or definitions. Formulaic sequence is generally used to refer to one such item, formulaic language is the uncountable noun referring to these items as a collective, and phraseology is a term often used to refer to the study of formulaic language. As we will see later, phraseology also does double duty as a specific term for a particular type of analysis of formulaic language. These days, formulaic language is a language phenomenon that is quite well known among researchers and students in linguistics and applied linguistics. Articles with a focus on formulaic language are appearing in an expanding range of journals at an ever more frequent rate, papers on topics related to formulaic language are being presented at congresses around the world, and graduate theses on this theme are appearing everywhere. It is remarkable that all of this is occurring in the absence of a journal devoted to formulaic language, and with only one widely attended recurring conference, that of the Formulaic Language Research Network (FLaRN), which has been held at various locations in Europe every second year since 2004. The Yearbook of Phraseology, a creation of Europhras, a European organization dedicated to the study of phraseology, and published by Mouton de Gruyter, is the only periodical publication currently in existence that concerns itself with formulaic language research. The real source of information about formulaic language has been a range of books, both edited collections and monographs, about it over the past fifteen to twenty years: Sinclair’s (1991) Corpus, Concordance, Collocation was a landmark, Nattinger and DeCarrico’s (1992) Lexical Phrases and Language Teaching; Cowie’s (1998) Phraseology: Theory, Analysis, and Applications; Wray’s (2002) Formulaic Language and the Lexicon and Formulaic Language: Pushing the Boundaries (2008); Allerton, Nesselhauf, and Skandera’s (2004) Phraseological Units: Basic Concepts and Their Applications; Schmitt’s (2004) Formulaic Sequences: Acquisition, Processing and Use; Granger and Meunier’s (2008) Phraseology: An Interdisciplinary Perspective; and Wood’s (2010b) Perspectives on Formulaic Language. This list is by no means exhaustive, but gives a taste of the range and quantity of formulaic language-focused work that exists on library shelves around the world. So what is the attraction to formulaic language despite the relative lack of a coherent set of venues in which to present or find research and information about it? It has become apparent over the years that formulaic language is, despite its

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

3

marginalization in classic generative linguistic theory, a fundamental aspect of language and communication. Formulaic language is as essential to us as words or grammar. Virtually every aspect of communication and language is linked to it: pragmatics; discourse; fluency in speech and writing; first and second language acquisition; and cognitive processing of language. So what is formulaic language? What do the following word sequences have in common? Good morning Look up On the other hand Don’t let him take you for a ride By and large Haste makes waste At top speed Speed limit Camera speed Computer desk In the case of Up to date We can certainly identify these sequences as fairly common in English, but they do have a certain common element which is a bit elusive at first glance. Some of them appear to have a specific meaning or function unto themselves, as if a single word—for example, good morning, on the other hand, computer desk. Some of them seem to be preferred ways of expressing or identifying something, be it a concrete or an abstract thing—for example, haste makes waste, at top speed, in the case of. Some of them look a bit mysterious at close examination, and we wonder how they came to be used as they are—how did we ever come to agree on the use of strange items such as by and large or look up? The general consensus on a definition of formulaic language seems to be that the items will be: 1 Multi word 2 Have a single meaning or function 3 Be prefabricated or stored and retrieved mentally as if a single word

4

FUNDAMENTALS OF FORMULAIC LANGUAGE

The third criterion is still under quite a bit of scrutiny, particularly by psycholinguists, and Chapters 4 and 5 in this book will introduce you to that research. The research on formulaic language has generally tended to fall into two broad categories in terms of research methodology. The phraseological methods are top down methods that look to classify formulaic language using certain criteria such as their semantic or syntactic composition. Phraseologists will tend to examine lists of selected formulaic sequences in order to do this, or they will examine texts and isolate formulaic sequences using certain criteria such as syntactic or semantic characteristics. On the other hand, the distributional methods of research into formulaic language are more bottom up in nature. These types of research use corpus analysis and frequency cut offs to identify formulaic sequences, often classifying them according to discourse function. Not terribly long ago there was little research being conducted on formulaic language and the research that did exist was anything but mainstream or recognized. The actual turning point was probably around 1970, when some structural linguists actually began to pay some attention to formulaic language. This helped to mainstream the research to a certain extent, as the work on this aspect of language had, up to that point, been conducted by people from a diverse range of areas of interest, from literary studies and anthropologists and educational psychologists to neurologists and experimental psychologists, to language teaching methodologists and lexicographers. Linguists began to establish their own schools of inquiry during the 1970s, and the 1980s through to the present have seen a remarkable expansion of effort.

Early research Lacking the technology to perform extensive corpus research, and hampered by the scattered nature of linguistic knowledge, researchers before the 1970s paid scant attention to formulaic language. However, there were pockets of work being conducted in diverse fields outside of linguistics proper.

Collocation researchers Early work in the area of collocations was initiated by Firth (1951, 1957) in the 1950s, although the actual term itself had been around much longer. Firth’s basic definition of collocation was the co-occurrence of words in proximity, with several possible types of variation. One type is the habitual collocation, in which words occur together quite frequently. Firth uses the example of

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

5

silly ass, a popular colloquial pejorative label at that time, as an example of a habitual collocation. Another type of collocation discussed by Firth is the idiosyncratic collocation, a co-occurrence of words that relatively rarely happens and yet has a function. Firth points to some word combinations from literature as examples of this, such as sleek supple soul from a poem by Swinburne (see Nesselhauf, 2004 for more). Firth further complicates matters by sometimes referring to noncontiguous words as collocations, such as dark and night occurring in a sentence separated by other words. It is rather unclear from Firth’s work how distantly separated words can be before the collocational bond is broken. This overall approach to collocations was, of course, later developed by other researchers. These included Halliday, Mitchell and Greenbaum, Sinclair, and Kjellmer. Halliday extended and refined the definition to specify that a collocation is a function of the frequency of a word appearing in a certain lexical context as compared to its frequency in language as a whole.

Research traditions in formulaic language Pawley (2007) outlines eight research traditions that laid the groundwork for much of what came later.

Literary scholars working on epic sung poetry One of the first to turn attention to formulaic language was Parry (1928, 1930, 1932), in the 1920s and 1930s. He examined formulaic language in the works of Homer, and later turned attention to the South Slavic tradition of public performance of epic poems. The performers and composers of these very lengthy poems were not literate, and Parry and Lord noted that formulas in such poems were often serving two functions simultaneously— they allowed the performance to be more fluent, rhythmic, and smooth, while at the same time allowing for some creative variations. Many formulas in the epic poems showed a degree of lexical substitution, allowing them to represent a particular meaning with a different number of syllables and fitting with a variety of metrical patterns. For a comprehensive look at this body of work, see Lord (1960).

Anthropologists and folklorists Given the nature of the early examinations of formulaic language, focused on epic poetry and oral traditions in various cultures, it is not surprising to find that anthropologists were among those to pay attention to this language

6

FUNDAMENTALS OF FORMULAIC LANGUAGE

phenomenon. Their work covered a range of types of spoken language from everyday speech to magical incantations to child language play. Hymes (1962) was a trailblazer in the application of anthropological research methods to work in linguistics. His work on what he called “the ethnography of speaking” focused on performance routines and recurrent patterns in everyday speech, which stimulated research from linguistic anthropologists. Prior to this, however, others investigated particular spoken genres and found evidence of formulaic language playing a strong role. For example, Malinowski, in 1935, noted that the Trobriand islanders used fixed formulaic language in their magical incantations, designed to control invisible spirits. Similarly, Opie and Opie (1959) showed that the game chants and sayings and rhymes of six- to ten-year-old children were packed with formulaic language.

Philosophers and sociologists During the 1960s, the study of everyday communication grew to focus on the use of routine utterances to accomplish speech acts. Goffman (1971) and ethnomethodologists’ work led to the emergence of conversation analysis as a discipline within linguistics. Goffman pointed out the fact that many conversational moves are accomplished by means of conventional word usage, an early type of formulaic language research. In addition, Austin (1962) and Searle (1968) focused on speech acts and discourse functions and the ways in which these take the form of expressions or formulas.

Neurologists and neuropsychologists Work on brain structure dating back to Broca in the 1860s showed that the left hemisphere of the brain is key to spoken expression. Certain types of aphasia resulting from damage to Broca’s area of the left hemisphere reduced or eliminated the ability to use propositional speech, but left the sufferers with the ability to use familiar expressions.

Learning psychologists Goldman-Eisler in the late 1960s, with her book Psycholinguistics: Experiments in Spontaneous Speech (1968), was among the first to discover that fluent speech in particular is characterized by patterns of temporal variables such as pause phenomena and length of runs of speech. This was the start of a tradition of research, with a psycholinguistic flavor, into speech fluency. The researchers discovered that patterns of fluent speech hint at a possible role for formulaic language, as it has become apparent that fluent speakers have many

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

7

more automatized chunks of language to use while speaking spontaneously. This helps them to save mental effort so as to conceptualize and formulate the next stretch of discourse, simultaneous to maintaining a certain pace and rhythm of speech. It has been discovered that skillfully blending formulaic sequences with newly assembled strings of words is probably an important factor in the ability of proficient speakers to produce the longer runs between pauses that characterize fluency. For sure, the nature of fluent speech is distinctive. As Chafe (1980) notes, speech is produced in bursts, in which vocalization is broken up by pauses at junctures of meaning and syntax. This type of skill requires an ability to juggle plans that may compete for mental attention and cause a sort of “traffic jam” of speech conceptualization, formulation, and utterance. When such a “jam” occurs, one produces speech which is disfluent, marked by slow speed, pauses at mid-clause, sentence, or phrase, and brief, incomplete or simplified runs between pauses. Rehbein (1987, p. 104) proposes that “one may propose that fluency in a second language requires the capability of handling routinized complex speaking plans.” The plans to which he refers need to be stored in long-term memory so as to be easily retrieved and produced as speech. To complicate matters further, one must generate new words and constructions to encode novel elements simultaneous to producing the automatized sequences. See Chapter 6 in this book as well as Wood (2010a) for an extensive review of this research.

Grammarians Early grammarians hinted at the importance of formulaic language— Jesperson (1924) examined the phenomena of free and fixed expressions, and the structure of idioms was a focus of the works of Chafe (1968) and Fraser (1970). Work on phrasal dictionaries by Hornby, Gatenby, and Wakefield (1942) and Palmer (1938) influenced how phrasal units were handled in later works. Meanwhile, in Eastern Europe phraseology was taking off as a legitimate area of study in its own right. Researchers including Amosova (1963), Mel’cˇuk (1988), and Vinogradov (1947) compiled lists of idioms and collocations and classified them. Some examples include pure idioms, expressions with literal meanings totally divorced from their idiomatic meanings, for example, chew the fat and beat around the bush. Figurative idioms are expressions in which the figurative meaning is an obvious derivation from the literal meaning, for example, hold water or steal one’s heart. Restricted collocations include word combinations in which the interpretation of one word is dependent on its relationship with the other, for example, pay a visit or meet one’s needs.

8

FUNDAMENTALS OF FORMULAIC LANGUAGE

Since 1970s The 1970s represent a turning point in formulaic language research, with a number of linguists pursuing research in the area, and some major areas of research came to be defined. Lexicographers began to assemble information about multiword chunks; research on speech acts and pragmatics grew. A major event was a course taught by Charles and Lily Wong Fillmore in 1977 at the 1977 Linguistic Institute, and Coulmas (1981) and Krashen and Scarcella (1978) published a review article and an edited collection of papers. Pawley and Syder in 1983 published a landmark paper pointing out that formulaic language is likely key to second language fluency and nativelike selection— the tendency we have to use routine ways of expressing things, despite the supposed infinite potential of language. For example, we say “how are you?” rather than creative alternatives such as “what is the nature of your current well-being?” The reasons for this, according to Pawley and Syder, have to do with processing restrictions and the probability that we acquire language in chunks and we store and retrieve word strings often as wholes from longterm memory to fit the meanings and functions that arise in communication. In 1991, Sinclair posited the idiom and the open choice principle, a somewhat similar idea, that most texts are largely composed of multiword expressions that constitute single choices in the mental lexicon. Many new areas of focus arose over the years.

Oral formulaic genres A fascinating area of inquiry that has yielded some remarkable information is the study of oral genres of production in traditional societies and in specific areas of communication. Balkan epic poetry, the storytelling traditions of peoples of New Guinea (Rumsey, 2001), and the rapid-fire speech routines of auctioneers (Kuiper, 1996) are so formulaic that almost every utterance is a formula. Kuiper and collaborators isolated some distinct features of auctioneering language: strict discourse structure rules of topics and sequencing—for example, in stock auctions (Kuiper & Haggo, 1984) first is the description of the lot, second is the search for a first bid, third is a call for the bids, and fourth is the sale; a high concentration of formulas; special grammatical rules for formulas; prosodic and musical patterns; exceptional fluency, speed, and very few pauses within clauses. Similar research by Pawley (1991) into cricket match commentary identified it as a formulaic genre.

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

9

Identification Issues began to develop around the actual identification of formulaic language in texts and discourses. Some cases are clear such as true idioms, phrasal verbs, nominal compounds, and so on (see Chapters 2 and 3 for detailed discussions), but many gray areas persisted. For example, discontinuous expressions are hard to identify, as fillable slots and two-part expressions tend to blend into surrounding text—for example, not only … but also. Pawley (1986) elaborated a list of twenty-seven diagnostics, and Moon (1998), among others, also presents lists of diagnostic criteria. Wray (2002) laid out a set of criteria for determining if multiword combinations might be prefabricated. Structure or form of the sequence is one such criterion, and it is often the case that strings begin with conjunctions, articles, pronouns, prepositions, or discourse markers (p. 31). Compositionality or internal structure of strings is also important, as Wray observes that “the string is no longer obliged to be grammatically regular or semantically logical” (p. 33). Fixedness, or the tendency for prefabricated sequences to be of invariable form, is another such criterion, although Wray does allow that a large subset of formulaic sequences often have fillable slots (p. 34). Other criteria relate to phonological or prosodic aspects of the articulation of a sequence, including intonation contour and speed of articulation, and fluency criteria such as lack of internal pausing (p. 35). For spoken language in particular, an important point of Wray’s to bear in mind is that “it may simply be that identification cannot be based on a single criterion, but rather needs to draw on a suite of features” (p. 43). Somewhat later, Wray (2008) came to emphasize that the processing of formulaic sequences as wholes likely results from the ways acquisition processes operate with respect to input. She notes that much language input in first language acquisition is left unanalyzed unless necessary, a phenomenon she terms needs only analysis, or NOA (p. 17). If there is a strong form-meaning link with a particular string, for example, How do you do, as a standard greeting among previously unacquainted adults, with no variation, then the string will remain unanalyzed. Over the course of first language acquisition, acquirers may note some variation in such strings, such as lexical insertion (e.g., Have you seen my boots/shirt/watch?, or I’d like a Coke/cheeseburger/3-month plan), but analysis will likely stop at the recognition of the existence of a fillable slot and the possible word types that may fill the slot. For adult second language learners, this process may be much less frequent or slower, since the tendency of adults and language programs is to analyze second language input for patterns, not

10

FUNDAMENTALS OF FORMULAIC LANGUAGE

to mention the fact that second language learners receive greatly reduced input compared to children in a first language. Some researchers have linked formulaic sequences to the lexicogrammar (Tucker, 2005) and to systemic models of functional grammar (Butler, 2003). These researchers have noted that formulaic sequences have a place in models of language that prioritize the lexicogrammar and levels of structure related to speech act realizations. They acknowledge the role of formulaic sequences in integrating extraclausal or partially clausal expressions into functional grammars of discourse. One of the best checklists to aid in identifying formulaic language is that of Wray and Namba (2003) (see Chapter 2 for details).

Classification There are many categories of formulaic language, including collocations, idioms, phrasal verbs, lexical phrases, lexical bundles, and so on (see Chapter 3 for a detailed discussion). For formulaic sequences with pragmatic functions, Pawley (2007) outlines seven identifying criteria: 1 Segmental phonology 2 Music, that is, intonation, rhythm, and stress of production 3 Grammatical category 4 Grammatical structure 5 Idiomaticity constraints 6 Literal meaning pragmatic function 7 Accompanying body language

Nattinger and DeCarrico (1992) identified a subset of formulaic language with pragmatically specialized functions and meanings, which they labeled as lexical phrases. They classified the phrases into two large categories: strings of specific lexical items and generalized frames. The former are mostly unitary lexical strings and may or may not be canonical in the grammar, while the latter consist of category symbols and specific lexical items. In addition, four criteria further classify the phrases: length and grammatical status; canonical or noncanonical shape; variability or fixedness; whether it is a continuous, unbroken string of words or discontinuous, allowing lexical insertions (pp. 37, 38). Nattinger and DeCarrico also identify four large categories of lexical phrases that display aspects of the four criteria: polywords, which operate as single words, allowing no variability or lexical

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

11

insertions, and including two-word collocations (e.g., “for the most part,” “so far so good”); institutionalized expressions, which are sentencelength, invariable, and mostly continuous (e.g., “a watched pot never boils,” “nice meeting you,” “long time no see”); phrasal constraints, which allow variations of lexical and phrase categories, and are mostly continuous (e.g., “a ___ ago,” “the ___er the ___er”); sentence builders, which allow construction of full sentences, with fillable slots (e.g., “I think that X,” “not only X but Y”) (pp. 38–45). A more recent descriptive scheme for formulaic sequences is that of Wray and Perkins (2000), in which they focus on semantic and syntactic irregularities of the sequences. A vital aspect of formulaic sequences, according to Wray and Perkins, is their semantic irregularity. They are not composed semantically, but are holistic items, like idioms and metaphors. Another key element of formulaic sequences is their syntactic irregularity, which is manifest in two qualities: a restriction on manipulation, for example, one cannot pluralize beat around the bush or passivize face the music or say you slept a wink, or feeding you up; the fact that in formulaic language normal restrictions are flouted, such as the sequences that contain an intransitive verb + direct object, for example, go the whole hog or other gross violations of syntactic laws like by & large.

Prevalence Researchers have worked to identify what proportion of discourse in a given genre or register is in fact formulaic. For example, Altenberg (1998) in examining the London Lund corpus found that over 80 percent of words are in formulaic sequences. A well known and often cited number is from the study by Erman and Warren (2000), which found that 52 to 58 percent of texts in a corpus were comprised of formulaic sequences.

Speech production and comprehension A certain amount of research has focused on the links between fluent speech production and formulaic language. Wood (2006, 2009a, 2009b, 2010a) has examined the role of formulaic language in fluency speech with second language learners, finding that it appears that increased use of formulaic language facilitates improvements in speech fluency. Wood (2006) notes that learners appear to use formulaic sequences to facilitate speech fluency by relying on one sequence, repeating a particular sequence, stringing together multiple sequences, and using them as self talk or rhetorical structuring devices.

12

FUNDAMENTALS OF FORMULAIC LANGUAGE

One of the earliest observations about the nature of spoken discourse was that of Pawley and Syder (1983), who described the way clauses tend to be chained. They noted that everyday fluent conversational speech is composed largely of strings of more or less independent clauses, without much grammatical integration. For example, subordination is only present in limited amounts in spontaneous speech (Pawley & Syder, 1983, pp. 202–204). Pawley and Syder presented an analysis of two types of native-speaker production to illustrate how fluency relates to clause-chaining. One speaker, George Davies, produced speech in which fluent units were separate clauses: /we had a /fan tastic time – [slows] (1.1) /there/were/ all kinds of re/lations /there/ [accel] [slows ] /I dun/no where they/all come /from/ [accel] [slows] I didn’t know/‘alf o’ them – [accel] (0.9) and’ ah—the kids/sat on the floor – (0.2) (1.5) and ol’/ Uncle Bert/he/ah o’/course /he was the life and soul of the party [accel] [slows] /Uncle /Bert ‘ad a /black bottle – [accel] [slows] (1.5) an ah—‘e’d t/tell a/few stories (0.2) [accel] [slows] an ‘e’d/take a /sip out of the/black bottle [accel] [slows] n’ the/more sips he /took /outa / that bottle – [accel] (1.0) the worse the /stories got – (1.6) (Pawley & Syder, 1983, p. 203) Another speaker, Q., produced comparatively nonfluent speech, in a PhD dissertation oral defense: and it/seems to be – [accel] if a /word is/fairly—/high on the frequency /list/ –

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

13

[slow] [accel] I /haven’t /made /any count – [accel] but—/just—im/pression istically—um [slow] um—the /chances are – that you get a—com /pound – [slow] or—a /nother—phono /logically deviant—form – [slow] with ah/which is al/ready in other /words [accel] [slow] /which is /fairly frequent—ly the /same—/phono /logical [accel] [slows] shape – (Pawley & Syder, 1983, p. 201) It is apparent that Q is planning only a few words at a time, unlike Davies. The context and the content of the discourse is novel for him, and it is obvious that he struggles with formulating and conceptualizing and articulating, due to the considerable stress of the experience. To make matters even more arduous, Q tries to use a clause-integrating strategy, in which each new clause depends to some extent on the structure of the previous one—for example, his false start or reformulation of the final clause in the sample, beginning with “with,” repaired to begin with “which.” The genre, register, and relative lack of interactivity of the speech production leave Q little choice but to use this style of speech. On the other hand, Davies is speaking more spontaneously and comfortably, and his speech shows clause-chaining of independent clauses linked by conjunctions such as “and,” in most cases. According to Pawley and Syder, this style is most effective in narrative speech: With the chaining style, a speaker can maintain grammatical and semantic continuity because his clauses can be planned more or less independently, and each major semantic unit, being only a single clause, can be encoded and uttered without internal breaks … we may speak, then, of “a one clause at a time facility” as an essential constituent of communicative competence in English: the speaker must be able regularly to encode whole clauses in their full lexical detail, in a single encoding operation and so avoid the need for mid-clause hesitations. (Pawley & Syder, 1983, pp. 203, 204)

14

FUNDAMENTALS OF FORMULAIC LANGUAGE

There is a clue here about one prime function of formulaic language in spoken communication. The tendency to chain clauses in conversational and narrative speech in English implies that a speaker should be able to encode whole clauses and avoid hesitations in mid-clause. If we look at the temporal variables most often associated with fluency, we find that they include pauses at clause junctures and a certain length of speech runs between such pauses. The means by which a speaker is able to maintain this pattern of pausing has to do with the recall of most clauses as more or less intact, and automatically chained. In other words, much of everyday speech is formulaic. In fact, Pawley and Syder (1983, p. 205) suggest that memorized chunks form a high proportion of the speech of everyday conversation. The benefits of this are obvious: if speech is formulated and articulated word-forword, a speaker’s attention is freed to focus on rhythm, variety, combining memorized chunks, or producing creative connections of lexical strings and single words.

Acquisition One early researcher in the area of formulaic language in child first language acquisition is Lily Wong Fillmore (1976), who examined the language development of six-year-old children. A later work by Peters (1983) elaborated on children’s use of strategies to extract formulaic sequences from input and retain them while at the same time breaking them down to build grammar and lexical competence. Later, Wray and Perkins (2000) identified four stages for children’s use of formulaic language in first language acquisition: a purely holistic strategy whereby they extract multiword sequences from input without analysis; analytic stage where grammar and lexical knowledge are acquired; fusion of sequences and use of processing shortcuts; a balance that favors holistic processing except where circumstances require analytic processing. The evidence for a role of formulaic language in adult second language acquisition is less clear than that for children. Adults tend to take an analytic approach to language learning and only under certain circumstances will they acquire multiword sequences holistically. Yorio (1980) was one of the early investigators of adult language development and formulaic sequences. In an examination of studies of instructed adult learners’ writing, he found that, unlike children, adult learners do not appear to use formulaic language to any great extent and that when they do, they seem not use it to develop overall language knowledge. Instead, they appeared to use it more as a production strategy, to save effort and attention in spontaneous communication.

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

15

Schmidt (1983) conducted a well-known case study of the English language development of a Japanese adult in Hawaii and found that formulaic sequences were an essential aspect of language gain. The participant used a large and growing number and range of formulaic sequences as a communication strategy, at the same time seeming to be fossilized and grammatically inept. Schmidt found that the research subject resisted error correction and was able to develop his language ability and acculturate through using formulaic sequences. There was no evidence of the processes of segmentation and analysis that Peters (1983) found in child language acquisition. Ellis (1996) asserts that much of language acquisition is really acquisition of memorized sequences, and that short-term repetition and rehearsal permit the development of long-term language ability. Long-term storage of frequent language sequences permits a learner to more easily use them for meaning reference, and they can be accessed more automatically. This allows for more fluent language use, as attention is freed for dealing with conceptualizing and meaning. Similarly, Bolander (1989), studying learners of Swedish as a second language, found that formulaic sequences contributed to ease of learning and use. The participants in the study consistently used prefabricated language units that contained target language structures well in advance of demonstrating that they had actually acquired the structures themselves. It appears that adults in naturalistic L2 learning environments, like children, tend to acquire and use formulaic sequences. However, the established cognitive and learning styles of adults, their diverse acquisition contexts, knowledge of L1, and other factors make for more variety in the route of language acquisition generally, and with regard to use of formulaic sequences specifically. Some adults may be more analytic and seek to infer rules from chunked units or from pieces of input, while others, such as Schmidt’s (1983) subject, may rely heavily on acquired formulas and not attempt to break them down or analyze them. Furthermore, degree of literacy and type and degree of instruction may play a part.

Lexical bundles research With the rise of corpus analysis tools and other technological aids to researching formulaic language has come lexical bundle research (see Chapter 8 for a detailed discussion). In essence, lexical bundles are combinations of three or more words that are identified in a corpus of natural language by means of corpus analysis software programs. As well, lexical bundles occur across a range of texts

16

FUNDAMENTALS OF FORMULAIC LANGUAGE

in a corpus, or, in the case of academic language, a range of disciplines. Lexical bundles are quite frequently used in published academic writing such as journal articles, and particular types of the bundles are characteristic of particular disciplines (Cortes, Jones, & Stoller, 2002). It has become apparent that acquisition and use of lexical bundles do not come naturally, but may require focused instruction (Biber & Conrad, 1999). An excellent piece of research in the area is a monograph written by Biber (2006), which presents a comprehensive corpus-based analysis of university language, containing a thorough examination of lexical bundles in textbooks. Biber discovered that academic disciplines use lexical bundles differently, with natural and social sciences using them more than the humanities. Overall, the distribution of lexical bundles across functional categories in Biber’s study show that referential bundles—making direct reference to real or abstract entities or to textual content or their attributes—are the most common. Stance bundles—expressing attitudes or assessments of certainty—are the second most common type of function for lexical bundles in the textbooks, whereas discourse organizers—reflecting relationships between previous and subsequent discourse—were the least common. Within the category of referential functions, it appears that quantity and intangible framing subfunctions represent the largest categories.

In summary … From this short and dense overview of the research history of formulaic language, some patterns and themes emerge. One image remains, however. It still seems that we are working with something quite elusive about language. Like the characters in the tale of the blind men and the elephant, we can only feel for a certain aspect of the phenomenon at a time. Luckily, we can all pool our impressions from these encounters with particular aspects and create a fuller image through reading and researching over time. A few of the many themes and patterns that the research shows are: MM

Formulaic language is important in spoken and written language.

MM

Formulaic language is defined in certain ways.

MM

MM

MM

Formulaic language has been studied from a wide range of research and disciplinary traditions. Formulaic language study has only been synthesized and pulled together over the past two decades or so. There are still a wide range of questions about formulaic language.

FORMULAIC LANGUAGE RESEARCH IN A HISTORICAL PERSPECTIVE

17

For certain, all the questions have not been answered yet in any particular area. How do we know whether a formulaic sequence is stored and retrieved as a whole in spoken language? Do the basic assumptions about formulaic language in the processing of spoken language also apply to written language, to any extent? How valuable is it to elaborate lists of categories of formulaic language?

POINTS TO PONDER AND THINGS TO DO 1 Think back to your assumptions and images of formulaic language before you read this chapter. How have these changed now that you have read and thought about it? 2 From the descriptions in this chapter, can you draw a timeline and a mind map of the history of research on formulaic language? 3 Which of the various areas of investigation over time seem to have been the most powerful in helping us understand formulaic language? Why? 4 Can you imagine any areas of study that have not been covered in the research traditions described in this chapter? 5 Based on what we have read here, where do you feel the most important areas of investigation are likely to be in the coming years? 6 If you were to begin a plan of research in this area, what would you focus on? 7 Does the study of formulaic language challenge any common assumptions about the nature of language and how it is produced? 8 Choose a particular area of focus in the history formulaic language research. Read the relevant sources. Write a short paper of five to ten pages and share it with those who have focused on other areas. If you compile these papers, you will have a fuller historical view of what is presented in this chapter. 9 Based on what you have read here, imagine an area of investigation. Can you envision a particular research method or methods to employ in such an investigation? 10 Based on what you have read here, compose a short guide to formulaic language for a particular type of language professional or student. What are the implications of this area of research for students of second languages? For parents and early childhood educators? For writers and editors?

2 Identifying Formulaic Language—Frequency, Psychological Representation, and Judgment

H

aving an idea of what formulaic language is, at least in definitions elaborated by scholars, and understanding some major categories of formulaic language takes us to a certain point in dealing with it. However, the proverbial or formulaic elephant in the room is bound to make his presence felt sooner or later: how can one identify formulaic sequences in texts, spoken and/or written? To make this issue seem more like an issue, take a look back at the first two sentences in this chapter. Identify the formulaic sequences. It is not at all easy, is it? A few strings jump out as being idiomatic in some way, for example, elephant in the room, make its presence felt, sooner or later. But can we be comfortable with this identification? What clues or features of the word combinations helped us to make these decisions? What other formulaic strings are lurking under the surface, invisible to untrained eyes, or accessible only to digital corpus analysis tools? This particular concern is central to the work of virtually all researchers in this field. After all, how can you present a study of formulaic language from any source without a good indication of how you isolate word strings which are formulaic? In the end, these are going to be your units of analysis. Take a look at the following list of word strings, taken from Nick Ellis (2012, p. 27), and see if you can determine which are formulaic: 1 Put it in. 2 Put it in the fridge.

20

FUNDAMENTALS OF FORMULAIC LANGUAGE

3 Polly put the kettle on. 4 Put the butter on the table. 5 Put that in your pipe and smoke it. 6 Put another nickel in the Nickelodeon. 7 Gabe cleared the music stands from the stage. 8 Why don’t you kids ever clear the dishes from the table? 9 Boy, you gonna carry that weight, carry that weight a long time. 10 Dad’s spilled Digestive crumbs all over the kitchen floor again, typical!

Ultimately, you may simply resort to remarking that some are formulaic, some are not, and some are more formulaic than others. But how can you even make those decisions? You could look at how often they are used in a particular context, study the prosodic features of the string (see Chapter 6), maybe you would look in a corpus. It is encouraging to know that a variety of means of identifying formulaic sequences have been developed. However, the processes are in many cases more inexact than we might expect. Some means of empirical measurementbased identification are discussed below, followed by a more detailed look at criteria-based checklists, which rely on the decisions of judges as opposed to measurement instruments.

Frequency and statistical measures Formulaic sequences are generally recurrent. The lexical bundle approach, as discussed in Chapter 8, uses this as the primary criterion. The general idea is that a word string which is used often is likely formulaic. However, we would agree that it is necessary for a string to be more than just frequent, it needs to have a unitary meaning or function, and perhaps a particular way of being mentally stored, retrieved, or produced as well. Statistical identification of formulaic language in corpora is a foundation of the frequency-based approach to formulaic language. In this approach to identification of formulaic sequences, researchers set certain specifications before scanning and analyzing a corpus. Generally, minimum lengths of word combinations and minimum frequency cutoffs are determined, and then the corpus is scanned and analyzed for word combinations that fit within the parameters. Frequency cutoffs can range from 10 to 40 occurrences

IDENTIFYING FORMULAIC LANGUAGE

21

per million words (e.g., Biber, Johansson, Leech, Conrad, & Finegan, 1999; Simpson-Vlach & Ellis, 2010). This approach often yields word combinations which are not complete structural units (Cortes, 2004), and are generally labeled as lexical bundles (e.g., Biber et al., 1999), or multiword constructions (Liu, 2012; Wood & Appel, 2014). Some researchers, using this set of criteria as only part of a more complex identification protocol, simply use the term formulaic sequences (e.g., Simpson-Vlach & Ellis, 2010). This frequency-based method is most appropriate for large corpora of hundreds of thousands of words, if not millions, taken from specific registers of language and/or academic disciplines. It has many limitations for use with small data sets, as the standard minimum cutoffs for frequency established in the field may not be met in such cases. It would be difficult, for example, to use only frequency as a criterion for identifying formulaicity in a set of transcribed conversations on a range of topics. Some items which we might consider formulaic might arise only once or twice in such a data set. Another drawback of use of frequency-based analysis is that it does not give any information about the psycholinguistic validity of the formulas. This particular issue arose in a study by Schmitt, Grandage, and Adolphs (2004), who identified formulas from a corpus and presented them to subjects in spoken dictation tasks designed to overtax short-term memory capacity. After an analysis of the participants’ reconstructions of the dictations, it was concluded that the holistic storage of the sequences, which were formulaic according to frequency in the corpus, varied among participants (Schmitt, Grandage, & Adolphs, 2004). A further limitation of using frequency alone as a criterion for formulaicity is that additional steps are also required to eliminate meaningless combinations of words for functional analyses of formulaic language. Sequences which are salient or readily recognizable as chunks, such as on the other hand or how do you do, may or may not be frequent in any particular corpus or genre, but they do have coherence in that they represent elements which usually stick together in this order and which always have a particular meaning or function. This tendency for words to stick together can be measured statistically using measures of association such as mutual information (MI), which determines how likely the items are to appear together compared to chance. MI has no particular statistical significance cutoff and is most useful for purposes of comparison. A higher MI score would indicate a higher likelihood of co-occurrence, and taken together with frequency measures, can provide objective evidence of formulaicity. Other measures of the relative “stickiness” of word strings are also used, for example, in corpus linguistics Gries (2008, 2012) is using the Fisher-Yates exact probability test to help determine the degree of association between a word and a construction.

22

FUNDAMENTALS OF FORMULAIC LANGUAGE

Studies are also triangulating data from various sources such as corpus measures of association together with eye tracing and response latency data—procedures often referred to as psycholinguistic measures. Often, researchers with small or quite specific corpora will refer to a large general corpus such as the British National Corpus (BNC) or the Corpus of Contemporary American English (COCA) for information about particular word strings. For example, Wood and Namba (2013) identified formulaic sequences of potential value for Japanese university students to perform oral presentations. The sequences were all generated using native speaker/ proficient speaker intuition, and were then confirmed as formulaic with reference to the spoken language subcorpus of the COCA —at a frequency cutoff of at least ten occurrences per million words and with a MI score of at least 3.0 in the corpus (for an overview of MI see Schmitt, 2010). This ensured that the sequences were frequent in spoken discourse and that they were strings of items highly likely to stick together—two powerful markers of formulaicity. Other researchers have used the hits generated by online search engines such as Google to aid in determining what is formulaic. Shei (2008) illustrated how no popularly available corpus seems large enough to provide adequate instances of formulaic sequences for close investigation. Shei proposes that researchers and teachers use the Internet as a sort of vast corpus, employing a search engine like Google to help identify and retrieve multiword units for linguistic research and language teaching and learning. Simply Googling a particular word string and examining the resulting hits can yield valuable information about its frequency, form, variability, and functions.

Psycholinguistic measures As seen in Chapter 4, a number of studies of formulaic language have been carried out using measures of processing speed. Conklin and Schmitt (2012) summarize a list of studies that have incorporated a variety of measurements, including reaction times (e.g., Conklin & Schmitt, 2012), eye movement (e.g., Underwood, Schmitt, & Galpin, 2004), and electrophysiological (ERP) measures (e.g., Tremblay & Baayen, 2010). These studies use eye tracking or response latencies involving reading. While psycholinguistic measures are useful for determining which sequences have been stored holistically by individual speakers, they provide us with a partial view of the use of a sequence—for example, they do not usually help us to know how common a given sequence may be in actual use in the community, and the formulaic sequences identified in these ways may include

IDENTIFYING FORMULAIC LANGUAGE

23

rare, unusual, or one-off sequences that the speaker has tended to use for a variety of idiosyncratic reasons.

Phonological characteristics Another measure used to identify formulaic sequences in spoken language can be phonological coherence, discussed at some length in Chapter 6. Formulaic sequences tend to be uttered with particular prosodic features such as alignment with pauses and intonation units, resistance to internal dysfluency, no internal hesitations, fast speech rhythm, and stress placement restrictions (see Lin, 2010, 2012, for a discussion). Some cautions are important here: as with psycholinguistic methods, phonological coherence provides a limited or partial sense of formulaicity. For one thing, phonological coherence is limited to analysis of spoken language only. As well, it only relates to formulas used by a particular speaker, and analysis is limited by the quality of the audio data recorded.

Criteria checklists and native speaker intuition If the measures of frequency, psycholinguistic processing, or acoustic analysis taken individually fail to provide satisfactory results in most cases, what is a researcher to do? One thing the researcher can do is use criteria checklists that combine characteristics typically associated with formulaic language. These work especially well with spoken language samples or corpora. Wray (2002) reviews approaches to the issue of what constitutes a formulaic sequence and how to detect formulaic sequences in corpora. She notes that use of corpus analysis computer software is one possible method of identification, but presents some serious concerns: It seems, on the surface, entirely reasonable to use computer searches to identify common strings of words, and to establish a certain frequency threshold as the criterion for calling a string “formulaic” … (however) problems regarding the procedures of frequency counts can be identified. Firstly, corpora are probably unable to capture the true distribution of certain kinds of formulaic sequences … The second serious problem is that the tools used in corpus analysis are no more able to help decide where the boundaries between formulaic sequences fall than native speaker judges are (pp. 25, 27, 28).

24

FUNDAMENTALS OF FORMULAIC LANGUAGE

It seems that use of computer corpus analysis software has certain limitations. For one thing, the specific nature of the type of speech elicited in some types of research and the relatively small word counts which make up some corpora mean that frequency alone cannot be a satisfactory criterion for identifying formulaic sequences. Some formulaic sequences may be used only once or used idiosyncratically in such a situation. Wray’s second concern is even more worthy of attention. Many formulaic sequences tend to blend into the linguistic context in transcripts, and many are frames or have larger fillable slots, which present real challenges for corpus analysis software. As well, if the participants in a study are second language learners, many formulaic sequences may be nonstandard or idiosyncratic. In the end, it appears that the best compromise is to employ with what Wray terms “the application of common sense (p. 28)” in determining what is formulaic in corpora. This is especially true for spoken corpora.

Native speaker judgment We can examine second language performance to see how it conforms to native speaker use of formulaic sequences. O’Donnell, Romer, and Ellis (2012), for example, look at this. Native speaker judgment is another possible means of identifying formulaic sequences in a corpus. However, Wray (2002, p. 23) identifies five weaknesses in this method: 1 It has to be restricted to smaller data sets. 2 Inconsistent judgment may occur due to fatigue or alterations in

judgment thresholds over time. 3 There may be variation between judges. 4 There may not be a single answer as to what to search for. 5 Application of intuition in such a way may occur at the expense of

knowledge we do not have at the surface level of awareness. We have only to return to our two identification tasks at the beginning of this chapter to see also how challenging it can be to start attempting to isolate formulaic sequences from a text or a corpus without any guidelines. This is where the idea of use of a checklist of specific criteria to guide judgments comes into play. The procedure would then involve having judges study the criteria which inform a checklist, and then go through a corpus to apply the criteria to determine what is formulaic and what is not.

IDENTIFYING FORMULAIC LANGUAGE

25

While some checklists have been developed for specific populations, others are more general. Let’s take a look at four checklists which are well developed and have been used in various studies: an early checklist elaborated by Coulmas (1979); a checklist used to identify formulaicity in child language acquisition (Peters, 1983); a checklist used to identify formulaicity in second language acquisition of speech fluency (Wood, 2006, 2009b, 2010); a checklist applicable to a range of child and adult native or nonnative speakers (Wray & Namba, 2003).

Early list of criteria: Coulmas (1979) Coulmas (1979, p. 32) outlines conditions which need to be met if a word sequence is to be considered formulaic. Two conditions, that the unit must be at least two morphemes long and cohere phonologically, are identified as necessary for formulaicity. Utterances which are formulaic, then, are polymorphemic and produced without internal hesitation or pausing. Coulmas also specifies that a formula may be more grammatically advanced than surrounding language, exhibiting a level of syntactic and phonetic complexity beyond the norm for the language produced by the learner. Other criteria laid out by Coulmas for formulaic sequences are that they are typically shared within a community, situationally dependent, and repeatedly used in the same form: 1 at least two morphemes long (i.e., two words) 2 coheres phonologically 3 individual elements are not used concurrently in the same form

separately or in other environments 4 grammatically advanced compared to other language 5 community-wide formula 6 idiosyncratic chunk 7 repeatedly used in the same form 8 situationally dependent 9 may be used inappropriately

Formulaicity in child first language speech: Peters (1983) Similarly, Peters (1983), in an effort to elaborate criteria for identifying formulas in child first language, focuses on:

26

FUNDAMENTALS OF FORMULAIC LANGUAGE

1 phonological coherence 2 greater length and complexity than other output 3 nonproductive use of rules underlying a sequence 4 situational dependence 5 frequency and invariance in form

Gradience of formulaicity: Wray and Namba (2003) A sophisticated checklist which can be used for a range of applications is that of Wray and Namba (2003). Originally designed for use in assigning formulaicity to utterances of bilingual children, the checklist is remarkably comprehensive. It consists of eleven criteria, each of which would be applied to the researcher’s perception of formulaicity of a word string using a Likert Scale of 1 to 5. This cleverly deals with the issue of gradience or ranges of formulaicity: 1 By my judgment, there is something grammatically unusual about this

word string. 2 By my judgment, part or all of the word string lacks semantic

transparency. 3 By my judgment, this word string is associated with a specific

situation and/or register. 4 By my judgment, the word string as a whole performs a function in

communication or discourse other than, or in addition to, conveying the meaning of the words themselves. 5 By my judgment, this precise formulation is the one most commonly

used by this speaker/writer when conveying this idea. 6 By my judgment, the speaker/writer has accompanied this word string

with an action, use of punctuation, or phonological pattern that gives it special status as a unit, and/or is repeating something s/he has just heard or read. 7 By my judgment, the speaker/writer, or someone else, has marked

this word string grammatically or lexically in a way that gives it special status as a unit. 8 By my judgment, based on direct evidence or my intuition, there is a

greater than-chance-level probability that the speaker/writer will have encountered this precise formulation before, from other people.

IDENTIFYING FORMULAIC LANGUAGE

27

9 By my judgment, although this word string is novel, it is a clear

derivation, deliberate or otherwise, of something that can be demonstrated to be formulaic in its own right. 10 By my judgment, this word string is formulaic, but it has been

unintentionally applied inappropriately. 11 By my judgment, this word string contains linguistic material that is

too sophisticated, or not sophisticated enough, to match the speaker’s general grammatical and lexical competence. These are the eleven diagnostic criteria for identification of formulaic sequences (Wray & Namba, 2003, pp. 29–32).

Native speaker judgment: Wood (2010a) Wood (2010a) describes a study in which the speech of second language learners of English was analyzed with a focus on the role of formulaic language in facilitating fluency (see details in Chapter 6). The participants, from three separate language backgrounds, retold the narratives from silent film prompts six times in six months, and the resulting corpus of second language speech was analyzed. Identifying formulaic sequences in the data was a central concern in this study, and Wood takes pains to explain the checklist and the procedures used. In the study, native speaker judgment was used to determine what constitutes a formulaic sequence, and each of Wray’s (2002) concerns about native speaker judgment were addressed in the procedures used: 1 It has to be restricted to smaller data sets—The small corpus accords

with Wray’s first concern about native speaker judgment. 2 Inconsistent judgment may occur due to fatigue or alterations in

judgment thresholds over time—The concern with inconsistent judgment was addressed by having judges individually listen to as well as read the transcripts. 3 There may be variation between judges—Variation among judges

was addressed by having a discussion and benchmark identification session before actual individual judging began. The samples used for the benchmark session were not included in later judgment processes, but were set aside as complete after the benchmark session ended. In the benchmark session, two random transcripts were analyzed individually and judges presented the formulaic sequences they had marked.

28

FUNDAMENTALS OF FORMULAIC LANGUAGE

4 There may not be a single answer as to what to search for—The idea

that there might not be a single answer as to what to search for was at least partly addressed by having the judges read relevant literature about formulaic sequences and to study and apply a set of five criteria drawn from that literature. 5 Application of intuition in such a way may occur at the expense of

knowledge we do not have at the surface level of awareness—As for knowledge beyond the surface level of awareness of judges, all judges read the most salient literature on criteria for identifying formulaic sequences. In the benchmark sessions, the criteria taken from the background literature were used as justification for selecting particular items as formulaic sequences in the transcripts, and features of the recorded speech such as speed and volume changes were also used as guides. Given the small and very specific corpus obtained, it was logical to avoid complete reliance on frequency counts as would be required when using computer corpus analysis. As noted earlier, some formulaic items might be uttered only once or be highly idiosyncratic. As well, a researcher would need to use a great deal of judgment in determining what is or is not actually a formula after a list were compiled by means of corpus analysis software, since word combinations are not necessarily formulaic just because they occur together often. Recall that formulaic sequences need to have a more or less unitary meaning or function, and/or be produced or comprehended more or less as a whole. And finally, it is vital to remain aware that Wood’s participants were second language English speakers with a less than solid grasp of the nuances of English phraseology. The major reason for using native speaker judgment in Wood’s study was the fact that it was a corpus of spoken language and the act of listening to speech and noting intonation and pause patterns that cannot be done by machine. In other words, human judgment was required if all the factors relevant to formulaicity in speech were to be determined.

Judgment criteria Five criteria were applied in deciding whether a sequence was a formula, drawn from previous research on formulaic sequences. No particular criterion or combination of criteria were deemed as essential for a word combination to be marked as formulaic, these were only guides: 1 Phonological coherence and reduction. In speech production

formulaic sequences may be uttered with phonological coherence

IDENTIFYING FORMULAIC LANGUAGE

29

(Coulmas, 1979; Wray, 2002), with no internal pausing and a continuous intonation contour. Phonological reduction may be present as well, such as phonological fusion, reduction of syllables, deletion of schwa, all common features of the most high-frequency phrases in English, but much less in low frequency or more constructed utterances, according to Bybee (2002). Phonological reduction can be taken as evidence that “much of the production of fluent speech proceeds by selecting prefabricated sequences of words” (Bybee, 2002, p. 217). 2 The taxonomy used by Nattinger and DeCarrico (1992) (see a

description in Chapter 3). This includes syntactic strings such as “NP + Aux + VP” (…), collocations such as curry favor, and lexical phrases such as how do you do?, all of which have pragmatic functions (…) (p. 36). This taxonomy is not necessarily applicable in every case; it was used as a guide to possible formulaicity. For example, if a sequence matched other criteria and fit into a category in this taxonomy, it might be marked as formulaic. 3 Greater length/complexity than other output. Examples would

include using I would like … or I don’t understand, while never using would or negatives using do in other contexts. Judges were able to see and hear the entire output of a particular participant to help in applying this criterion. 4 Semantic irregularity, as in idioms and metaphors. Wray

and Perkins (2000, p. 5) note that formulaic sequences are often composed holistically, like idioms and metaphors, and not semantically. Examples of this were apparent in the background literature for the judges, and many formulas readily match this criterion. 5 Syntactic irregularity. Formulaic sequences tend to be syntactically

irregular. This criterion was readily applied to some sequences, but it was important to check syntactically irregular sequences against other criteria on this list. (Wood, 2010a, pp. 111, 112). Features of the recorded speech such as speed and volume changes were also used as guides. A sequence was marked as formulaic if two or all three of the judges agreed. Idiosyncratic or nonnative-like sequences were accepted, the idea being that various criteria were employed by the judges in making determinations. Some types of productions, which still met all or most of

30

FUNDAMENTALS OF FORMULAIC LANGUAGE

the criteria, were examples of several phenomena marginally relevant to the study. For example, a sequence might have been stored and retrieved by a participant as a whole, but in an inaccurate or misperceived form, for example, what’s happened instead of what happened, or thanks god instead of thank god. The retell situation was heavily stressful on communicative and cognitive resources of the participants, as they were required to recall events seen in the film while creating a running narrative, causing articulatory slips or gaps and inaccuracies of some components of sequences. Because of these realities of the spontaneous speech situation, it was decided that a sequence could match the criteria and still be idiosyncratic, misperceived and stored with errors, or misarticulated due to stress.

Judgment procedure The expert judges were two graduate students in applied linguistics, and the researcher himself. All had read Coulmas (1979), Nattinger and DeCarrico (1992), Peters (1983), and Wray and Perkins (2000) prior to the judging process. A benchmarking, preliminary discussion session was held in which the judgment criteria and the procedure as a whole were clarified, and a two transcripts were jointly examined and coded by all three judges, in an effort to standardize the overall approach to identification of formulas. Due to the fact that the speech samples were very specific narrative retells, the formulas identified covered a wide range, from idioms (love your neighbor, that’s it, instead of) to two-word verbs (throw away, come back, let out, give up, got mad, fall down), to repeated prepositional and participial phrases (living in the same house, taking a bath, started fighting, out of the house, at the moment, in the middle). The judges individually coded the rest of the transcripts, following the time sequences of the speech samples, beginning with sample number one for a given participant and continuing to sample two and on through sample six for the same participant. After this, marked items were accepted as formulaic if two or all three of the judges were in agreement. In some cases, issues such as location of the boundaries between formulas and the surrounding language, or judges’ determination that some items were possibly but not definitely formulaic, were decided by the researcher.

The four checklists: Coulmas, Peters, Wray, Namba, and Wood All four checklists are designed so that none of the criteria are necessary nor must they all be met for the purpose of identification. Wray and Namba

IDENTIFYING FORMULAIC LANGUAGE

31

(2003) propose that each applicable criterion on their list should be rated on a 5-point Likert scale from strongly agree to don’t know, to strongly disagree, where strongly disagree indicates “the absence of a trait that sometimes indicates it [formulaicity]” (p. 26). All four checklists represent a departure from the methods described above in that they place considerable importance on native speaker intuition. Wray and Namba’s (2003) checklist is the most ambitious, having a total of eleven criteria that address thirteen points. Peters (1983), on the other hand, lists six criteria that address eight points while Wood’s (2010a) checklist is based on five criteria. Taken together, the checklists show remarkable agreement on the range of characteristics that may be indicative of formulaicity, all of them make reference to phonological characteristics and complexity. As seen elsewhere in this book, phonological markers of formulaicity can include phonological coherence, reduction, or distinctive phonological patterns, including phonological fusion, reduction of syllables, or deletion of schwa (Wood, 2010a). Complexity refers to the fact that a given sequence may be noticeably more advanced or less advanced than the individual’s typical nonformulaic language use in terms of syntactic and morphological features. Wood’s (2010a) is the only checklist to specifically reference form, linked to Nattinger and DeCarrico’s (1992) taxonomy of lexical phrases. Frequency is also considered by Wray and Namba (2003) and Peters (1983) to be a mark of formulaicity, although frequency in this context refers to frequent use by the speaker, not an arbitrary threshold for identification set in corpus research. As already discussed, Wood (2010a) does not consider frequency in and of itself to be a criterion for identification. It is also interesting to note that both Peters (1983) and Wray and Namba (2003) allow for the two social extremes—idiosyncratic uses and communitywide uses—of formulas. Wood (2010a) does not consider idiosyncratic uses a mark of formulaicity, though they are accepted given the developing competence of nonnative speakers participating in the study. Wray and Namba’s (2003) checklist features criteria that can be applied to either correct or inappropriate forms. Wray and Namba’s (2003) checklist takes into account local repetitions, including reading, derivations, and functional uses as possible indicators of formulaicity. Clearly, all the checklists rely on native speaker intuition to classify word combinations as formulaic. For a number of types of research in this area, judgment checklists can help to overcome the limitations of frequencybased psycholinguistic, or phonologically focused identification methods, and provide a sort of aggregate measure of formulaicity. This is first and foremost a useful means of identifying formulas in spoken language corpora, but, as

32

FUNDAMENTALS OF FORMULAIC LANGUAGE

will be seen in Chapters 7 and 8, judgment can be useful in working with written corpora too. The work of, for example, Simpson-Vlach and Ellis (2010) into academic formulas uses judgment protocols having to do with salience and “teachability” to help narrow down lists of frequency-derived word combinations.

In summary … From this general overview of approaches and means and methods of identifying formulaic sequences in corpora and texts, some interesting possibilities come into sight. It is possible to determine formulaicity by using frequency statistics, either from one’s own corpus or from taking sequences from one’s own text or small corpus and checking their frequency or MI in very large corpora such as the BNC or the COCA. It is even feasible to use Internet search engines to guide decisions about formulaicity. Psycholinguistic or acoustical features of a sequence and its processing can also yield useful guidance about possible formulaicity. In working with language data, expert or native speaker judgment about formulaicity may be employed, measures well suited to smaller or quite specific data sets. In these cases, a checklist of characteristics of the strings and their uses can be a useful guide for judges. A few of the many themes and patterns which the research shows are: MM

MM MM

MM

MM

Formulaic language is challenging to identify from texts, transcripts, and corpora. Formulaic language can be identified by various means. Formulaic language may best be identified by use of a combination of measures. Formulaic language can be identified by expert or native speaker judges using checklists as guides. Regardless of the measures used to determine formulaicity, absolute certainty is elusive.

Even if you use corpus frequency and MI statistics and acoustical features and judges and checklists, you are likely to remain guarded about your decisions about formulaicity. As the body of research grows, however, it is more and more likely that new and more reliable or confidence-inspiring means of determining formulaicity will emerge.

IDENTIFYING FORMULAIC LANGUAGE

POINTS TO PONDER AND THINGS TO DO 1 Before reading this chapter, how would you have approached the task of determining what is formulaic in a given text? 2 Look at the list of sentences taken from Ellis (2012). Are there particular items in the list which might be better matched to any of the measures of formulaicity than others? 3 Take a word string which you think may be formulaic from a particular text. Check its frequency in the BNC or COCA. Does the result help confirm your intuitions? 4 For the word string you used in #3 above, check its MI in the BNC or the COCA. Does this confirm your intuitions or does it change the picture? What are the implications of the frequency and MI results you have found? 5 Take a word string you think may be formulaic from particular text. Take another which you think may not be formulaic. Enter each string into a Google search box. What do the resulting lists of hits tell you about the strings in terms of their frequency and function or meaning? Are your original intuitions confirmed or not? 6 Record some spoken language from the media or from real-life communication. Can you apply some knowledge of the acoustic characteristics of spoken formulaic sequences and find some possibly formulaic sequences? What guides your decisions in this exercise? 7 Survey the checklists of criteria described in this chapter. Choose one of them and see if you can employ it to help you isolate some formulaic sequences from a text or transcript. What are some of the complications you experience in doing this? 8 Using a set of similar texts or transcripts in a group of three, attempt the checklist and judging procedure described above as used in Wood’s 2010a research. How consistent are your judgments? What are some ways to make the judging process more consistent? 9 Can you think of a way to determine formulaicity not discussed in this chapter? 10 What are the implications of the identification procedures for researchers? For language teachers? For language testers and assessors?

33

3 Categories of Formulaic Language—Labels and Characteristics

O

ver the history of formulaic language research the units of analysis have been labeled in a wide range of ways. This is largely because researchers were not all examining the exact same phenomenon, and were frequently working in quite separate areas of linguistics and, as we saw in Chapter 1, even in areas outside of linguistics in fields as diverse as social anthropology and neurology. It took time for anyone to survey the existing research and actually attempt to draw a picture of the phenomenon under examination and sketch out the state of knowledge about it. In fact, it was not until Wray (1999) took a step back and examined the growing body of research that the umbrella term formulaic language/formulaic sequence came into widespread use. Since then, that term has more or less gained and held traction in the literature. We have seen special issues of journals devoted to formulaic language, for example, in 2012 a special volume of Annual Review of Applied Linguistics. We have seen particular academic symposia devoted to the field, for example, the Symposium at University of Wisconsin Milwaukee in 2007. Wray herself was instrumental in developing the Formulaic Language Research Network (FLaRN), which has a social networking site on the Web with hundreds of members, and has spawned a series of well-attended seminars over the years at which researchers share their work. It was Wray and Perkins (2000, p. 3) who noted that formulaic language at that point had been labeled by as many as forty terms (Table 3.1):

36

FUNDAMENTALS OF FORMULAIC LANGUAGE

Table 3.1 Terms used to refer to formulaic language Amalgams Automatic Chunks Cliches Co-ordinate constructions Collocations Composites Conventionalized forms FEIs Fixed expressions Formulaic language Formulaic speech Formulas/formulae Fossilized forms Frozen phrases Gambits Gestalt Holistic Holophrases Idiomatic Idioms Irregular Lexical(ized) phrases Lexicalized sentence stems Multiword units Noncompositional (Continued )

CATEGORIES OF FORMULAIC LANGUAGE

37

Table 3.1 Terms used to refer to formulaic language Noncomputational Nonproductive Petrification Praxons Preassembled speech Prefabricated routines and patterns Ready-made expressions Ready-made utterances Rote Routine formulae Schemata Semi-preconstructed phrases that constitute single choices Sentence-builders Stable and familiar expressions with specialized subsenses Synthetic Unanalyzed chunks of speech

As we can see from the list, the range is remarkable. In the years since the publication of Wray and Perkins, new terms have been added, including corpus technology-derived labels such as n-grams and concgrams. However, a survey of the main categories in recent literature shows that there is considerable common ground among researchers as regards exactly what they are studying, but, at the same time, categories exist for valid reasons. The main areas of focus which have emerged over the years are collocations, idioms, lexical phrases, lexical bundles, metaphors, proverbs, phrasal verbs, n-grams, concgrams, and compounds. If we examine each of these in turn, we will end up with a strong sense of what is actually meant by formulaic language.

Collocations The term collocation is a bit of a puzzler for many, because it appears to simultaneously refer to a specific type of word combination and to all

38

FUNDAMENTALS OF FORMULAIC LANGUAGE

multiword phenomena. There are many possible definitions of collocation, but, in linguistics, they mostly boil down to the notion of a syntagmatic relationship among words which co-occur. The syntagmatic relationship may be defined quite generally, or it may be restricted to relationships which conform to certain syntactic and/or semantic criteria. Over the years, two perspectives on collocations have emerged from the literature—frequency-based and phraseological (see Granger & Paquot, 2008) for an overview of these). The frequency-based approach, with roots in the work of Firth (1951, 1957), is concerned mostly with the statistical likelihood of words appearing together, while the phraseological approach, with roots in Soviet phraseology, tends much more toward restrictive descriptions of multiword units, with a narrower view of what to label as a collocation. In addition to all this, after Firth’s work in the 1950s, work on other types of word combinations began to expand, with researchers using the term collocation in more creative ways.

The influence of Firth, the pioneer The frequency-based approach to dealing with collocations was basically initiated by Firth in the 1950s, although the actual term itself had been around much longer. Firth developed the concept of collocation as a functional description of language in line with his overall theories of meaning (1951, 1957). As touched on earlier in Chapter 1, Firth’s definition of collocation was essentially the co-occurrence of words in proximity to one another. There are several types of variation: the habitual collocation, in which words occur together quite frequently, exemplified by Firth’s use of the pejorative label, silly ass, as well as the idiosyncratic collocation—a co-occurrence of words that occur relatively rarely, but retain a useful function. Firth’s pointed examples from literature, such as sleek supple soul from a poem by Swinburne (see Nesselhauf, 2004 for more), help to further complicate the otherwise straightforward definition given. Reference to noncontiguous words as collocations, such as dark and night occurring in a sentence separated by other words, may for the inexperienced lay the final blow. Indeed, even for those better versed, it is rather unclear from Firth’s work how distantly separated words can be before the collocational bond is broken. This overall approach to collocations was later developed by Halliday, Mitchell and Greenbaum, Sinclair, and Kjellmer. Halliday extended and refined the definition to specify that a collocation is a function of the frequency of a word appearing in a certain lexical context as compared to its frequency in language as a whole. Mitchell and Greenbaum, working separately but based on the Firthian tradition, refined the study of collocation by including syntactic and semantic aspects in the descriptions. Sinclair (1991) worked to refine things by focusing specifically on the issue of what span of words

CATEGORIES OF FORMULAIC LANGUAGE

39

to consider a collocation. Jones and Sinclair (1974) found that the span of words which is optimal for a collocation is four words to the right or left of a node, or core word. Kjellmer, Stubbs, and Altenberg took the computerized methods espoused by Sinclair some steps further. Kjellmer worked on the Dictionary of English Collocations (1984) defining a collocation as a continuous and recurring sequence of two or more words which are grammatically well formed. These efforts were the genesis of the computer-based frequencydriven study of collocations.

Phraseological approaches to collocation research Phraseologists have typically viewed collocations as combinations of words whose relations are fixed or variable to varying degrees, and the meaning of which is somewhat transparent (Nesselhauf, 2005). Cowie (1994) sees word combinations as occurring along a scale from composites, combinations below the sentence level with lexical or syntactic functions, and formulae, often sentence-length and having pragmatic functions. Composites can be fully opaque and/or invariable, as in pure idioms. Figurative idioms can have both a literal and figurative meaning, and restricted collocations are those in which in which at least one element is literal and the other figurative. Cowie gives no restrictions on the number or words or the span of words in a collocation. The phraseological approach has roots in the 1940s–1960s as Russian pioneers in the area such as Vinogradov (1947) and Amosova (1963) classified phraseological units according to their semantic and pragmatic functions. In his definition of phraseological units, Vinogradov (1977) identifies such important characteristics as noncompositionality and nonsubstitutability, classifying the units into four broad types according to the degree of their opacity, structural fixedness, literal/figural meaning, and contextual boundaries. A similar classification was elaborated by Amosova (1963), using the term phrasemes instead of phraseological combinations and outlining specific parameters within a word combination. The Russian phraseologists were conscious of relationships between the components of collocations, and identified that one of the words in a multiword unit might have a leading position. Igor Mel’cˇuk (1998) founded the Meaning-Text Theory, using semantics and pragmatic functions of formulaic sequences as the basis of classification. Mel’cˇuk (1998) observed that collocations are not free and noncompositional, and that the specific relations between the words in a collocation cause it to be perceived as a single unit of meaning. Mel’cˇuk pointed out that in a multiword combination one of the components is leading, while another depends on it. Both components combine and participate in the creation of meaning as

40

FUNDAMENTALS OF FORMULAIC LANGUAGE

a whole unit. Mel’cˇuk (1998) attempted to classify collocations on the basis of the relations between the components. He differentiated four types: (1) collocations with “light” delexicalized verbs such as do a favor; (2) collocations in which the meaning of the dependent word is clarified only through its relation with the main word such as well-chilled beer; (3) collocations in which one of the elements has a synonym, yet this synonym is impossible in a given word combination such as strong (but not powerful) coffee; and (4) collocations in which a dependent word embraces the meaning of the main word such as artesian well or aquiline nose (p. 31).

Lexicography and collocations Lexicographers and lexicographists, whose primary interest was in creating dictionaries, had been interested in collocations even before the advent of phraseology. An early reference to the concept collocation was in the works of Palmer (1933) and Hornby, Gatenby, and Wakefield (1942). These scholars examined collocations in the context of phraseology and lexicology, and Palmer (1933) attempted to classify collocations into verb, noun, preposition, and adverb morphologo-syntactical types. Collocations were not the primary research focus of phraseologists and lexicographists. In fact, in early stages, the Russian classical school of phraseology concentrated on formulaic language in general. It wasn’t until the latter part of the twentieth century that Mel’cˇuk (1998) and Cowie (1998) focused on collocations as a specific phenomenon.

Idioms Definitions of idiom Unfortunately, the definition of idiom is in some ways just as fraught as that of collocation. Some researchers use the term in an extremely broad sense, encompassing proverbs, slang expressions, and even individual words of certain types. Many others, however, use the term in a much narrower sense, to refer only to word strings which are, in the words of Moon (1998, p. 4), “fixed and semantically opaque or metaphorical,” for example, kick the bucket or spill the beans. It may be that for such a complex language phenomenon, no specific single definition will do it justice. The most encompassing definition of idiom is that which includes even single words. The definition elaborated by Hockett (1958) is a classic in this category, labeling any language item whose meaning is not visible from

CATEGORIES OF FORMULAIC LANGUAGE

41

its structure as an idiom, even if it is a single morpheme, for example, -ed. To Hockett, the -ed morpheme is idiomatic because its meaning cannot be interpreted from its structure, whereas attaching a lexical word to the morpheme makes it not idiomatic, since the meaning of, for example, worked can be deduced from the structure of the two morphemes it contains. In the end, however, this definition of idiom is far too broad to be of practical use to most researchers. However, some researchers have extended the definition of idiom to include single words which are polymorphemes or compounds, such as lighthouse or television (see Katz & Postal, 1963; Makkai, 1972). More focused and limiting criteria have been employed by some scholars to define and identify idioms. For example, Weinreich (1969) maintained that only multiword phenomena which have both literal and figurative meanings can be termed idioms, which rules out such intuitively idiomatic noncompositional and purely figurative expressions as by and large or as of. Weinreich also excludes phenomena he calls stable collocations from the category of idioms, such as two wrongs don’t make a right, because they lack a figurative interpretation. Weinreich’s definition of idioms may also be somewhat too narrow to work easily for most scholars. Transformational-generative grammar featured in the origins of the work of Katz and Postal as well as Weinreich, but it was Fraser (1970) who strictly defined idioms as word strings with transformational power. Looking at the range of idioms, Fraser elaborated a six-level hierarchy to encompass the variety of manipulation and transformation a given idiom may allow (adapted from Liu, 2008, pp. 7, 8): 6. Unrestricted—no real idioms allow this much transformation 5. Reconstruction—only nominalization of a verb, for example, she lay down the law to her laying down of the law 4. Extraction—passivization, for example, the buck has been passed too often and particle and noun inversion, for example, look up the information/look the information up 3. Permutation—inversion of direct and indirect object, for example, cannot teach an old dog new tricks and particle and noun inversion when the noun is part of the idiom, for example, put on some weight/ put some weight on 2. Insertion—insertion of a nonidiomatic item into the idiom, for example, she read the class the riot act 1. Completely frozen—no transformation or manipulation is possible

42

FUNDAMENTALS OF FORMULAIC LANGUAGE

Another strict definition of idiom is centered around semantic noncompositionality and nonproductiveness of form. Wood (1981) adopts this sort of definition, holding that the meaning of an idiom must not be merely the sum of the meaning of its parts, and that the structure of an idiom must not allow any transformations. Word combinations may then be placed on a sort of continuum of idiomaticity, crossing a range from idioms at one end to expressions, formulas, and free forms at the other. Later scholars dealt with idioms in similarly restricted ways. Moon (1998, p. 5) defines idioms as “semi-transparent and opaque metaphorical expressions such as spill the beans and burn one’s candle at both ends.” She separates idioms from what she terms fixed expressions, which are word combinations such as routine expressions, sayings, similes, and so on (Moon, 1998, p. 2). Grant and Bauer (2004) go a step further than Moon in the direction of exclusivity in defining idioms, adding the qualification that an idiom is not only noncompositional, that is to say, nonliteral, but also nonfigurative, in that its meaning cannot be interpreted from the constituent parts. To Grant and Bauer, a sequence such as kill two birds with one stone is not an idiom because it can be interpreted as nonliteral, and then reinterpreted by means of studying its pragmatic intent. It is unlikely or rare to actually kill two birds by casting one stone. When we see this word string, we recognize that fact and then look at the context and likely arrive at a good sense of what it actually means. On the other hand, to Grant and Bauer, by and large is a classic true idiom, because it is not only nonliteral, but it also gives no clue as to what its figurative meaning might be.

Categorizations of idioms As for categorization of types of idioms, various scholars have elaborated taxonomies. Makkai (1972) identifies six subcategories (adapted from Liu, 2008, pp. 17, 18): 1 Phrasal verbs—verb and one or two particles, for example, come

across 2 Tournure—a verb and at least two words (often noun phrases), for

example, take the bull by the horns 3 Irreversible binomials—two nouns or adjectives in a fixed sequence,

for example, safe and sound 4 Phrasal compounds—compound nouns and adjectives, for example,

high-handed

CATEGORIES OF FORMULAIC LANGUAGE

43

5 Incorporating verbs—compound verbs, for example, brainwash 6 Pseudo-idioms—compound words or phrases in which one item has

no meaning by itself, for example, chit-chat Moon (1998), meanwhile, classified idioms into three broad categories (adapted from Liu, 2008, pp. 19, 20): 1 Anomalous collocations—uniquely formed collocations, which may: a violate grammatical rules, for example, day in and day out b contain items specific only to the collocation and with no meaning

outside of it, for example, to and fro c be somehow defective, for example, foot the bill, in which the

word foot carries a meaning unique to this collocation d be phraseological, or allow variation in structure, for example, with

regard to or in regard to 2 Formulae—grammatical in structure and compositional in meaning,

yet pragmatically specialized in function a Sayings, for example, an eye for an eye b Proverbs, for example, every cloud has a silver lining c Similes, for example, as right as rain 3 Metaphors—expressions which link the concrete and the imaginary

or abstract, with three degrees of transparency d Transparent—for example, stepping stone e Semi-transparent—for example, throw in the towel f Opaque—for example, pull one’s leg

This overview of the history and range of definitions of idioms is somewhat complex. But we can summarize the definition of idiom as centered around five defining criteria (see Skandera, 2004): 1 At least two words in length—this is common to all categories of

formulaic language. 2 Semantic opacity (adding up meanings does not yield the whole)—

spic and span and to and fro are examples of this phenomenon, although the lexical items involved are in and of themselves opaque, we do not see spic, span, or fro used in other contexts (see Allerton, 1984 for more on this). Other examples of semantic opacity have roots in history, such as kick the bucket (die), which derives from a phenomenon known in the procedures involved in the slaughter

44

FUNDAMENTALS OF FORMULAIC LANGUAGE

of pigs, and look for a needle in a haystack, which is actually more a figurative use of a sequence which also has a possible literal interpretation. 3 Noncompositionality—similar to semantic opacity, but more the

idea that an idiom is unanalyzable in terms of meaning or function. If we look again at the previous examples, we can see that this criterion is a bit flexible, in that many idioms are actually figurative interpretations of a word sequence which can also be taken literally. 4 Mutual expectancy—also referred to as lexicality, this means

that the items which comprise an idiom tend to occur together in a more or less fixed way, frequently making the idiom appear more like a single lexical item than a collection of individual words. 5 Lexicogrammatical invariability/frozenness/fixedness—similar to the

idea of lexicality, this implies that the words in an idiom are fixed and cannot be substituted by synonyms. Some idioms are fixed to the point of not allowing any syntactic or morphological variation, such as hook line and sinker or by the way, or beat around the bush; we cannot pluralize any of the items in these word sequences nor, for example, passivize the latter one to read the bush is beaten around. However, some idioms allow a limited amount of such variation, such as red herring or teach an old dog new tricks; it is possible to say red herrings, plural, and to reverse the order of an idiom such as teach new tricks to an old dog.

Lexical phrases Lexical phrases are a particular subset of formulaic language first publicized by Nattinger and DeCarrico (1992), based largely on previous work by Becker (1975). They outline two large categories of the phrases, strings of specific lexical items and generalized frames. The former are generally unitary lexical strings and may or may not be canonical in the grammar, while the latter consist of category symbols and specific lexical items. Four criteria help in classifying the phrases: length and grammatical status; canonical or noncanonical shape; variability or fixedness; whether it is a continuous, unbroken string of words, or discontinuous, allowing lexical insertions (pp. 37, 38). They also identify four large categories of lexical phrases which display aspects of the four criteria: polywords, which operate as single

CATEGORIES OF FORMULAIC LANGUAGE

45

words, allowing no variability or lexical insertions, and including two-word collocations (e.g., “for the most part,” “so far so good”); institutionalized expressions, which are sentence-length, invariable, and mostly continuous (e.g., “a watched pot never boils,” “nice meeting you,” “long time no see”); phrasal constraints, which allow variations of lexical and phrase categories, and are mostly continuous (e.g., “a ___ ago,” “the ___er the ___er”); sentence builders, which allow construction of full sentences, with fillable slots (e.g., “I think that X,” “not only X but Y”) (pp. 38–45). Nattinger and DeCarrico’s comprehensive taxonomy covers a large proportion of the types of utterances which are produced in a language.

Lexical bundles Lexical bundles (Biber & Conrad, 1999; Biber, Johansson, Leech, Conrad, & Finnegan, 1999) are a category of formulaic language characterized by the means by which they are identified and their purely functional nature—they are not meaning units per se, but rather, units of function which serve to characterize particular types of discourse. The work on lexical bundles has been overwhelmingly conducted on academic language, especially academic written text. Lexical bundles are combinations of three or more words which are identified in a corpus of natural language by means of corpus analysis software programs. An additional characteristic of lexical bundles is that they occur across a range of texts or, in the case of academic language, a range of disciplines. Biber and Conrad (1999) noted that these word combinations “are so common, it might be assumed that lexical bundles are simple expressions, and that they will be acquired easily” (p. 188). However, the acquisition and use of lexical bundles does not appear to occur naturally. Lexical bundles have been shown to be used at high frequency in published academic writing, and particular types of the bundles are characteristic of particular disciplines (Cortes, Jones, & Stoller, 2002). Academic disciplines have different ways of seeing the world, connected with different communicative conventions (Hyland & Hamp-Lyons, 2001). Biber (2006) presented a comprehensive corpus-based analysis of university language, including an examination of lexical bundles in textbooks. He found that academic disciplines differed in their use of lexical bundles, with natural and social sciences relying on them more than the humanities. Overall, the distribution of lexical bundles across functional categories in Biber’s study show that referential bundles—making direct reference to real or abstract entities or to textual content or their attributes—are the most common.

46

FUNDAMENTALS OF FORMULAIC LANGUAGE

Stance bundles—expressing attitudes or assessments of certainty—are the second most common type of function for lexical bundles in the textbooks, whereas discourse organizers—reflecting relationships between previous and subsequent discourse—were the least common. Within the category of referential functions, it appears that quantity and intangible framing subfunctions represent the largest categories. Other researchers have deviated somewhat from the somewhat narrowly defined methodologies stipulated by lexical bundle researchers. Simpson-Vlach and Ellis (2010) note that many of the items identified by lexical bundle research are characterized by context-specific lexical components and are of limited utility for the purposes of teaching L2 learners to be competent with academic discourse. They conducted a modified type of such research wherein they added elements of native speaker judgment to the process of identifying lexical bundles, producing lists of items they call instead formulaic sequences. Liu (2011) took this a step further and, analyzing items extracted from the British National Corpus (BNC), produced lists of what he terms multiword constructions, which stand as units of meaning or function. This terminology was also employed by Wood and Appel (2014) in an analysis of multiword phenomena in first year university textbooks.

Metaphors Metaphor is essentially a semantic principle centered around an unconventional act of reference, a word that is used to describe an entity which is essentially outside of its denotational range, and there is tension between a literal and a metaphoric interpretation. The structure of a metaphor is like this: a vehicle is the term used in an interpreted sense which cannot be understood literally because of the unusual context of use. The topic is the referent of the vehicle. The grounds are the analogies or features shared between vehicle and topic. Take, for example, life is a highway. Highway is the vehicle, the word being used in an interpreted, not literal sense. Life is the topic, and the grounds are the analogy between the passing of time and the covering of distance. The metaphor can, of course, use a marker such as is like or kind of such as in life is like a box of chocolates. The vehicle can be single words or phrases or full clauses or sentences. The strength of a metaphor depends on the degree of semantic tension between vehicle and topic, linguistic markers such as like, kind of, and the implicitness or prominence of the vehicle. The metaphor calls on us to compare the two components, the vehicle and the topic.

CATEGORIES OF FORMULAIC LANGUAGE

47

Proverbs Proverbs are hard to define but key are the opacity of the relationship between literal and figurative meanings, and sentence-like length. Pragmatic characteristics of proverbs include advice and warning (better late than never, don’t put the cart before the horse), instruction and explaining (an apple a day keeps the doctor away, the ball is in your court), and communicating common experience and observations (you can’t get blood from a stone, just as the sun rises in the east). They are not the words of the speaker, but quotations from a canon of proverbs shared by members of a community. They feature the linguistic characteristics of brevity and directness, simple and/or parallel syntax, metaphorical quality, and sometimes archaic structures.

Compounds Compounds are special cases in formulaic language study, being more a branch of word formation. A compound is, in fact, the creation of a word with a unique meaning by combining two existing words, and in English many compounds in fact are written as two separate words (see ten Hacken, 2004). Compounds show asymmetry, with the second of the two words usually the head or core of the combination—for example, desk computer describes a type of computer and computer desk describes a type of desk (see Williams, 1981). The head word is subject to rules of pronoun reference and has some freedom of syntactic form. The head represents a type and the nonhead serves to classify the head. There are three forms of compound words: MM

MM

MM

closed form, in which the words are written as one, such as secondhand, childlike, or notebook; hyphenated form, in which the lexical items are separated by hyphens such as mother-in-law, or mass-produced; open form, in which the two words are written separately such as post office, or real estate.

Compounds written as single words indicate a stronger lexicalization. Words are combined into compound structures in various ways and they can change over time. Two words may be joined by a hyphen and then be

48

FUNDAMENTALS OF FORMULAIC LANGUAGE

blended into a single word. The rules for writing compounds are not universal or specific in English, and it is common for even experienced and highly educated writers to need to consult dictionaries or online resources to determine whether a given item is two words, a hyphenated compound, or a single word. Words modified by adjectives, for example, an old school, are different from a compound word, for example, a high school in the degree to which the nonhead word changes the essential character of the head, or the degree to which the modifier and the noun are inseparable. In the example of high school, the compound represents a single entity, a particular type of school which is always identified as such, whereas old school is simply a school being described as old. The adjective slot in the combination can be filled by any number of items. Modifying compounds are often hyphenated, for example, an old-furniture salesman sells old furniture, but an old furniture salesman is an old man. When compound modifiers precede a noun, they are often hyphenated: part-time worker, high-speed chase. Adverbs, words ending in -ly, are not hyphenated when compounded with other modifiers: highly rated university, a partially refundable purchase. In pluralizing, the most significant word, the head, takes the plural form. Examples include also-rans, fathers-in-law, and go-betweens.

Phrasal verbs Phrasal verbs are a particularly English type of formulaic language phenomenon. They are verbs combined with a preposition or particle, or both, with often nonliteral meanings, or both literal and figurative interpretations, like idioms. Three structural categories exist: Verb + preposition (prepositional phrasal verbs) Help me look after Jake’s dog for the weekend. Other children often picked on Sebastian. What if you run into your ex-wife at the party? Verb + particle (particle phrasal verbs) You should bring that up at the next meeting. Try not to give in when you see the dessert table. Come over and let’s hang out for the afternoon.

CATEGORIES OF FORMULAIC LANGUAGE

49

Verb + particle + preposition (particle-prepositional phrasal verbs) I am not putting up with any more outbursts from her. Jane is looking forward to a long sunny vacation. The kids loaded up on chocolates before we got there. Three criteria exist for determining whether an item is in act a phrasal verb (adapted from Liu, 2008, p. 22): 1 A phrasal verb does not permit insertion of an adverb between its

components, for example, we cannot say, The kids loaded slowly up on chocolates before we got there. 2 A phrasal verb particle cannot be forefronted in a sentence, for

example, we cannot say, Up with I am not putting any more outbursts. 3 A phrasal verb never exists as only literal in meaning, but must have

some degree of figurative meaning, as seen in the examples above. Some researchers in the area of idioms have actually included phrasal verbs as a subcategory. As is the case with idioms, the meaning of these items cannot be understood in a literal fashion, or interpreted from the component words. For example, pick on has little to do with actually picking, not to mention on. Hanging out has nothing to do with actually hanging. Phrasal verbs are common in everyday, informal speech, and their synonyms, which are often borrowings from Latin, Greek, or French, are reserved for more formal discourse registers or for more specific high-level usage. We tend to say get together rather than congregate, put off rather than postpone.

Concgrams A concgram is, like all formulaic language, a combination of two or more words. However, a concgram is a noncontinuous sequence, in which the constituent words are separated by others. The idea dates back to the 1980s when the Cobuild team at the University of Birmingham tried to find a way to search corpora by machine for noncontiguous sequences of associated words. The ability to discover noncontinuous word combinations in corpora increases the likelihood that researchers will discover not only a more

50

FUNDAMENTALS OF FORMULAIC LANGUAGE

extensive description of patterns of collocation and their meanings, but also, and more importantly, new patterns of language use.

In summary … From this general overview of categories of formulaic language, it can be surprising how many and varied the types are. The phenomenon we are dealing with is by no means unitary, and the classifications and the taxonomies are somewhat leaky or slippery—the distinctions between, for example, a collocation and an idiom are blurry, and it also appears that particular researchers have somewhat arbitrarily composed their own sets of descriptions and classifications and definitions of the various types. Other types of formulaic sequences seem lost in the shuffle, uncategorized but intuitively formulaic—look at items like and then or sooner or later. Where do they fit? The advent of corpus analysis technology and techniques has done much to help us identify new types of formulaic sequences, but what makes the exact determination of a lexical bundle different from a sequence identified using frequency and other statistical measures such as mutual information (see Chapter 2)? Is the distinction even worthy of debate? A few of the many themes and patterns which the research shows are: MM MM

MM

MM MM

Formulaic sequences can be classified in various ways. The nature of the classifications and the criteria used to determine them has changed over time. The classifications which exist are by no means exhaustive, and some types of word strings are difficult to classify. Some categories overlap with others. There is no firm consensus that all the categories are similarly processed semantically or psycholinguistically.

One interesting conclusion to be drawn from this survey of types and categories of formulaic language is that the classifications are in some ways arbitrary. We might even wonder why the categories are even valuable to us as researchers or teachers. Does it matter whether a sequence is a phrasal verb or a collocation? Or are many of the classifications really just carryovers from an early era of armchair phraseology, with little particular relevance to those who do applied research or language teaching?

CATEGORIES OF FORMULAIC LANGUAGE

POINTS TO PONDER AND THINGS TO DO 1 Before reading this chapter, what categories of formulaic sequences would you have identified? 2 Look at the list of word strings at the beginning of Chapter 1 in this book. Can you classify them on the basis of the information presented in this chapter? 3 Take several word strings which you think may be formulaic from a particular text. Can you categorize them according to the information in this chapter? Does this exercise help enhance your understanding of the word strings in some way? 4 For each of categories of formulaic language make a list of its characteristics. Find several examples of each one from the literature and/or from your own intuitions. 5 For each of the categories you have dealt with in activity #4, see if you can find examples from a text. Do these examples differ from those you drew from intuition? 6 What is a workable definition of formulaic language which takes all the categories and their characteristics into account? 7 How would your methods of conducting research into collocations differ from those you might use to conduct research into phrasal verbs? Lexical phrases? Lexical bundles? 8 How would your methods of teaching collocations differ from those you might use to teach phrasal verbs? Lexical phrases? Lexical bundles? 9 Which of the categories of formulaic language make intuitive sense, and which appear to require specific research techniques to uncover? 10 What are the implications of the nature of the categories for language testers and assessors? To writers and editors?

51

4 Mental Processing of Formulaic Language—Holistic and Automatized

I

n any discussion of formulaic language and its definition, one is bound to encounter the assertion made most famously by Wray (2002) that a formulaic sequence is “a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar” (2002, p. 9). This claim is attractive to many because it allows us to make additional claims for formulaic language—most notably that it facilitates fluent communication by allowing us to bypass laborious creative construction, and to produce and comprehend chunks of words with particular meanings or functions, helping with fluent and accepted use of language. Rather than assemble utterances or sentences in a word-by-word, step-by-step manner, we are able to use holistically stored formulaic sequences instead. Does this sound like an attractive model of language use? Of course it does, and it handily explains away some tricky things about communication. It simply makes sense, right? Attractive claims are one thing, but it is essential that they not be taken on faith. What is really meant by “stored and retrieved as wholes,” what does that imply for formulaic language research, and, most importantly, what is the evidence that it is so? To answer this it is necessary to take a look at some basic concepts of mental processing and models of language production and see where formulaic language may fit. As well, we need to survey the evidence of holistic storage and retrieval of formulaic sequences.

54

FUNDAMENTALS OF FORMULAIC LANGUAGE

Mental processing: A primer Language production, and, to a great extent, comprehension, is largely a function of mental processes and skills. Some key concepts, such as declarative and procedural knowledge, automatization and proceduralization, controlled and automatic processing, are essential to an understanding of these mental processes.

Declarative and procedural knowledge One foundational distinction which is essential to an understanding of language production is that between declarative knowledge and procedural knowledge. Declarative knowledge is information which is consciously known, while procedural knowledge is more a sense of how to accomplish things, and is linked closely to skilled behavior. The distinction can be reduced to a comparison between knowledge of what, as opposed to knowledge of how. For example, it is one thing to know a particular grammatical rule or transformation. It is quite another to be able to accurately and fluently produce an utterance containing that rule or transformation. Knowing the rule is basically declarative knowledge. We can explain things we know declaratively. We can analyze things we know declaratively. We can produce accurate language containing a declaratively know rule or transformation if we have time and focus to deal with it in an explicit way such as answering test questions or filling in blanks on grammar worksheets or even practicing it in pair work which is contrived to force us to produce the form. However, procedural knowledge is quite different. Use of procedural knowledge gives us the ability to simply feel what is accurate in a particular context, and to produce spontaneous speech with accuracy, or correct forms, and fluency, or speed and flow. Procedural knowledge is related to skill, or ability to do or perform, rather than encyclopedic or analytic knowledge of things. If you know how to perform skilled behavior, you can relate to this distinction. Remember learning to swim or ride a bicycle, or drive a car? Not even a world of abstract instruction about strokes or pedals, or brakes and steering could have made you actually capable of these skills. Only some subconscious senses of what to do and how to do it can really enable you to perform in a skilled manner. That is procedural knowledge. But can declarative knowledge become proceduralized? Can explicit understandings of things be transformed into relatively effortless skill in using or implementing it? Yes, and connected to all of this is the concept of automatization, or proceduralization, by which declarative knowledge may be changed into procedural knowledge. Think of how you might have “learned”

MENTAL PROCESSING OF FORMULAIC LANGUAGE

55

to swim, ride a bicycle, or drive a car. The key to your ultimate success was probably practice, right? In fact, automatization or proceduralization occurs generally through a process of repetition and repeated use and recall. Through automatization, content originally stored in the conscious mind can become available for efficient use in real time. Knowledge which is proceduralized can be said to be available for use more or less subconsciously, and one may perform other tasks simultaneously. Let’s take one of our examples and break it into steps—learning to drive a car. At first all the necessary steps must be explained and shown. The novice driver struggles to control the steering, the brakes, the accelerator, and, if a standard shift vehicle, the delicate clutch, shift, and accelerator control while shifting. Any distractions will tend to make the novice driver lose control—noise, the need to carry on a conversation, or any physical effort in addition to the driving itself, such as controlling windshield wipers, attending to other traffic, and so on. With repetition, however, the performance becomes much smoother and less effortful. Eventually one gets to the point where the driving itself is more or less automatic, and you can chat, sing to music, drink beverages, smoke cigarettes, and so on, simultaneous to driving. This is an example of how automatization of a complex set of knowledges or actions can, over time with practice, become skilled and performable concurrently with other tasks. So it is with language. Think about someone progressing to be more and more fluent over time. He or she first struggles to produce even content words, let alone anything approaching grammar. With exposure and practice he or she can start to laboriously create roughly grammatical utterances. But this takes quite a bit of effort. His or her cognitive and affective resources are almost entirely taken up with formulating utterances—retrieving words from the mental lexicon, applying rules of syntax and morphology to them, lining them up, articulating them—all of this takes up most of his or her “head space.” Distractions, stresses, or interruptions may cause him or her to lose the train of thought or communication and need to start all over. With time, aspects of this process become automatized and the student is able to produce utterances with less excruciating effort, depending on context and so on. He or she can attend to coming up with ideas to communicate, to planning the next things to say, and so on, while actually producing language. Like the driver of a car, he or she is now able to “multitask” and be more skilled and flexible with the passing of time and plenty of practice.

Spontaneous speech Producing spontaneous language is a shockingly complex thing, if you pause to really look at it. Watch a person speaking in conversation or in a skilled way in any context. Mind and the muscles of articulation are operating in synch in

56

FUNDAMENTALS OF FORMULAIC LANGUAGE

a truly remarkable way. Ideas roll around the brain, simultaneous to weighing contextual factors such as a sense of who is listening and what shared knowledge and opinion exist. Clauses and phrases and words encapsulating the ideas and meanings and nuances of the speaker’s mind clump together and roll out into the air. And, remarkably, a listener can attend to and react to the utterances in real time, showing comprehension, and even be driving a car or cooking a meal, or watching a television program at the same time. This seems miraculous and may be at least partly explained by the automatization or proceduralization processes described earlier. It seems that one’s memory of language bits and pieces and rules, and so on is the underpinning of all this. We recall what we have learned and implement it using another type of memory. But if we take a look at the nature of human memory, we find a surprising limitation which at the same time helps us to understand a bit more the role of formulaic language in communication.

Long- and short-term memory The concepts of short- and long-term memory are extremely relevant to how language may be processed mentally. Long-term memory is essentially a repository of all kinds of language knowledge. But digging around this store of memory is really difficult and time consuming. For this knowledge to be used, therefore, it must first be assembled in short-term memory. From longterm memory, words need to be selected to express concepts, and then morphological, syntactic, and phonological rules need to be applied to them. And here is where the nature of human cognition presents a huge barrier to our ability to assemble language in this way. Unfortunately, or fortunately, human short-term memory is limited to approximately seven or eight items at a time (Anderson, 1983), making construction of utterances from scratch as described earlier unlikely. There is a role here for another component of memory termed working memory. The originator of the models of working memory is Baddeley (1988), who came up with the concept of working memory and an accompanying phonological loop, conceived of as a site in which items encountered in input or retrieved from long-term memory are rehearsed for production. So between long-term memory and the actual point of utterance, working memory represents a launch pad for language. Here the form-meaning relationship can be developed and retained. An example of how this may work is the mental repetition of a newly encountered sevendigit telephone number so as to be able to recall it later. What happens when someone tells you a seven-digit telephone number and you have no immediate place to write it down or otherwise record it? What happens if you need to cross the room to your phone or a pen and paper to record the

MENTAL PROCESSING OF FORMULAIC LANGUAGE

57

number? You need to resort to chanting or repeating the sequence of digits as you go. If there is a distraction or interruption in the meantime, you may forget the number. So you keep it in a loop in your mind in order to retain it. Your short-term memory has been maxed out. As for language production, Hulstijn (2001) and Robinson (1995) point out that such rehearsal is important in lexical acquisition in particular. This may help to explain how formulaic sequences may be acquired. They are then retrieved directly from longterm memory as chunks, bypassing the restrictions of short-term memory. Think of it this way: assembling the number of words we need to produce any reasonable sentence in English would overtax the short-term memory rapidly. But assembling seven to nine formulaic sequences allows for many words and clauses and phrases to be lined up and produced with much less effort. Assembling seven to eight formulaic sequences in short-term memory certainly allows for a greater volume of language and thought to be produced or comprehended than seven or eight individual words!

Second language acquisition theory and spontaneous speech Second language acquisition theory provides clues as to how the ability to produce spontaneous speech in these ways may develop over time. It may be that frequency of exposure to input and language experience is key to acquisition and automatic processing. From a connectionist perspective (see Chapter 5), language production and comprehension are determined by vast amounts of statistical information about how language behaves, how words fit together, how they collocate, and so on. This experience with input allows us to implicitly understand the likelihood of certain language items occurring together. Ellis (2002) distinguishes between explicit and implicit memory, noting that explicit memory is “a conscious process of remembering a prior episodic experience or fact such as questions like what did you have for breakfast?” (p. 145), whereas implicit memory is a result of repeated encounters with a particular stimulus, and does not require any conscious recollection or knowledge of particular events. In connectionism, each encounter with or repetition of a stimulus strengthens memory connections between the stimulus and the category to which it belongs, as well as the characteristics of the stimulus which link it to the category. One basically subconsciously recalls past encounters and assesses their similarity to the new one, which is then classified accordingly. These associations and categorizations are likely made based on multiple sources. Research evidence with children segmenting words from continuous speech shows that, using just phonotactic information, they can achieve a

58

FUNDAMENTALS OF FORMULAIC LANGUAGE

success rate of 47 percent. Using utterance boundary and relative stress increases the success rate to 70 percent (Ellis, 2002, p. 140). There is also evidence of a cohort effect in lexical retrieval, in which exposure to or retrieval of an initial phoneme of a word activates all words in the mental lexicon which share that initial phoneme. As more information is retrieved, we reduce the range of possibilities and the most high-frequency words are activated the most in memory. For formulaic sequences, then, it would seem that they are probably automatized through repeated exposure in input, constrained of course by the pragmatic requirements of the communication contexts one encounters most often. They are probably stored and retrieved over time by a number of cue sources, including initial phoneme classification and so on.

Storage and retrieval of formulaic sequences as wholes What exactly is meant by the notion of formulaic sequences being produced and recalled as wholes? Obviously this has not been proven beyond a doubt, and there are likely a number of possible answers to the question. It may be, as Weinert (1995) pointed out some years ago, that they are retrieved based on the order of their parts, by phonological cues, or by recall of first and last words first, which then trigger recall of the entire sequence. Alternatively, the sequences may be cognitively stored as clusters or bundles, and retrieved based on salient pragmatic aspects, functions, meanings, and so on. As well, there may be a continuum or spectrum of processing. Depending on a particular context or due to cognitive stress from a communication situation, a sequence might be retrieved sometimes as a whole automatically, at other times only partially as a whole, and sometimes in a controlled step-by-step way. It is also likely the case that some types of sequences are retrieved more holistically than others, for example, a frame or discontinuous string such as not only X but also Y, or a frame with a fillable slot such as a X ago may be retrieved less holistically than, for example, a syntactically and semantically opaque string such as by and large, or an idiomatic item such as beat around the bush or a proverb such as a stitch in time saves nine. It is one thing to consider how formulaic sequences (or language in general, for that matter) may be retrieved and processed, but how can strings of words become holistic units like this? There are several means by which this might occur, but it is important to bear in mind that the answers are fairly speculative. Perhaps it is a process of recognizing a meaning or function in a string as a whole and storing it accordingly. Perhaps it is because of the utility

MENTAL PROCESSING OF FORMULAIC LANGUAGE

59

of a particular string—as an example, as reported earlier in this book, in my past as an ESL teacher, I encountered a woman in my class named Marie who had just arrived from Cambodia and spoke absolutely no English. She was observant and silent for some time, and her first English utterance was something which sounded like “I no stan.” Later, this adapted to sound more like “I do no stan.” Clearly, she picked up the most useful utterance she could, as a complete beginner, the utterance which could help to deflect attention and make clear she was unable to communicate—“I don’t understand.” In this case, a learner took, stored, and retrieved a lexical string based on its pragmatic utility and function first and foremost. Perhaps, not unlike children acquiring their first language, learners initially may segment formulaic sequences from input and alter, fuse or combine them. Perhaps consciousness, noticing, and awareness of sequences in input leads to initial registration of the sequence as a lexical item and it is then automatized through repeated exposure and/or use. There are a number of potential research projects in this area, some virgin territory for new researchers looking to uncover some of the processes involved in acquisition of and perception of formulaic language.

Wray’s heteromorphic mental lexicon Wray (2002, 2008), a theorist, commentator, and critic in this area, has elaborated a model of the mental lexicon which applied to formulaic language based on syntheses of research conducted by a number of scholars. The model basically consists of three rather commonsense ideas (adapted from Wray, 2008, pp. 15–21): The mental lexicon is heteromorphic—The mental lexicon consists of a variety of linguistic material which ranges from single morphemes to lengthy multiword units. Some, but not all, of the multiword units are formulaic. A formulaic sequence is in essence the equivalent of a single morpheme in terms of the space it takes in the lexicon and the processing effort required to produce or comprehend it. The term Wray (2008) uses here for formulaic sequences is morpheme equivalent unit. The content of the lexicon is determined by needs-only analysis—In dealing with input as children in first language acquisition, or as adults in second language acquisition, we will only break down or segment linguistic material to the extent that it is necessary. In other words, if a string of words is readily assigned to a meaning or function as a whole, we will not break it into parts for analysis

60

FUNDAMENTALS OF FORMULAIC LANGUAGE

or productive use because the string functions perfectly well as a unit. For example, the string How do you do? requires no analysis, since it is readily assigned to the role of what is said when meeting someone, and is fixed and invariate in form. Other multiword strings may be broken into useful separate pieces, yet also remain stored as a whole. Morpheme equivalent units allow the speaker to connect with the hearer—Paying attention to the hearer often requires a speaker to be more formulaic, so that the hearer may more efficiently grasp meaning and affect. The more formulaic the utterances, the less processing required.

Evidence of holistic processing Despite the amount of theorizing and reasoning which can be brought to bear on the notion that formulaic sequences are stored and retrieved as wholes, researchers have struggled to find ways to uncover empirical support. Some of the concepts outlined earlier in this chapter are helpful in getting an overall sense of how formulaic sequences may be integrated into cognitive theories of language storage and processing. But the burning question still remains from the very first paragraph here: what is the evidence that formulaic sequences are “stored as wholes?” It is obvious that we store individual words in the mental lexicon, but it is less clear whether we store multiword items in a similar way. Researchers have been chipping away at this topic for some years, and we have reached a point where there is growing support in the research for a claim that formulaic language is processed faster and differently from nonformulaic language.

Research on idiom processing One category of formulaic language which has received attention from researchers who are curious about mental processing is idioms. Recall that idioms are generally defined as having unitary meanings. But the research linking idioms to mental processing as wholes has tended to use the fact that there are basically two ways of interpreting many idioms, figurative and literal. So the researchers in this area have zeroed in on the activation of the figurative or literal meaning of idioms, and have given us some interesting ideas. One such example is the work of Swinney and Cutler (1979), who propose that when an ambiguous (literal and figurative interpretations both possible) word string is encountered by a proficient speaker, both the analysis of the literal meaning and the retrieval of the figurative interpretation are initiated, but the

MENTAL PROCESSING OF FORMULAIC LANGUAGE

61

figurative one will be activated first because it is faster. A figurative meaning is generally holistic, in that it is a singular interpretation of the word string, whereas a literal meaning might more likely involve some analysis of the string as a syntactic/morphological unit, which would be significantly slower than merely interpreting it as a whole. One obvious angle of attack for researchers in trying to determine whether idioms are processed holistically is to compare how we deal with them to how we deal with novel or literal word strings. This type of idiom-focused research has investigated the activation of idioms in comparison to novel phrases with nonfigurative interpretations. One way to determine whether idioms are processed differently from nonidiomatic strings is, of course, by measuring the speed at which people react to them. Perhaps unsurprisingly, a range of studies have found that native speakers process idioms faster than they process novel strings of words (e.g., Gibbs & Gonzalez, 1985; Swinney & Cutler, 1979; Van Lancker, Canter, & Terbeek, 1981). However, similar research using nonnative speakers as subjects is ongoing and in some ways quite complex. For example, Van Lancker-Sidtis (2003) found that native speakers were able to use prosodic (pronunciation and accent/ stress/word blending) cues to determine whether particular idioms were used in a figurative or literal way, whereas nonnative speakers did not. Cieslicka (2006) found that, in contrast to native speakers, as discussed earlier, nonnative speakers reacted more quickly to literal interpretations of idioms than to figurative ones. Conklin and Schmitt (2008) and Underwood, Schmitt, and Galpin (2004) found that native speakers and proficient nonnative speakers shared an ability to process idioms in texts faster than novel language. Siyanova-Chanturia, Conklin, and Schmitt (2011) conducted a study using idioms in reading passages and eye-movement tracking, and found that native speakers processed the idioms more quickly than novel language, while nonnative speakers did not—and in fact often processed idioms more slowly than other language. These studies are described in more detail in elsewhere in this book. So we have, from the studies described briefly here, a sense of idioms being processed differently from nonidiomatic strings. But, while this research involving idioms presents some tantalizing evidence of processing phenomena, unfortunately it is hard to generalize from it to any great extent. For one thing, the fact that idioms often have two interpretations, the literal and figurative, can slow processing as choices have to be made while processing them. As well, idioms are not all equally transparent or decompositional, which makes it hard to present sweeping conclusions even about idiom processing, let alone processing of all other types of formulaic language. And finally, it is a fact that idioms represent a subset of formulaic language and are actually not very frequent in language, meaning that, nonnative speakers are unlikely

62

FUNDAMENTALS OF FORMULAIC LANGUAGE

to have encountered many idioms very often (Conklin & Schmitt, 2012). In light of all of these restrictions on concluding from the idiom research, we still need to approach the study of mental processing with a mix of caution and enthusiasm. While the idiom research provides us with a few tantalizing bits of knowledge, the study of mental processing of other types of formulaic sequences may be a much richer and more rewarding area to investigate.

Research on formulaic language other than idioms Fortunately, a look at some research focusing on nonidiomatic formulaic language shows us that there is some evidence of processing advantages. Classic work by Kuiper (1996, 2004) noted that the speech production of certain performers such as auctioneers and sports commentators, who need to speak at much faster rates than normal, is replete with formulaic language. This gives us an impression that perhaps it is the formulaic language which is easing the processing load of producing utterances at such rapid and unrelenting speed, a clue to a processing advantage. A range of quite delicately and carefully designed studies have attempted to tease out the nature of processing of various formulaic sequence types. The key variable in most of these has been the relative frequency of sequences. One such study by Sosa and MacFarlane (2002) investigated the word of used in high-frequency and low-frequency two-word combinations. Using auditory word monitoring, they found that native speakers reacted more slowly to the word when it was in a high-frequency combination, indicating that they were used to processing the frequent two-word sequences as wholes and were slowed down by the need to deal with the sequence word by word. Similarly, Bod (2000, 2001) had native speakers read frequent and less frequent three-word sentences and found that they reacted more quickly to the higher frequency items. Arnon and Snider (2010) conducted a similar study using compositional (comprehensible from individual words) fourword phrases at different frequency levels and found that higher frequency items were processed faster than lower frequency. Tremblay, Derwing, Libben, and Westbury (2011) found that sentences containing lexical bundles (see Chapter 8) were processed faster than those without lexical bundles, and Tremblay and Baayen (2010) found that electrophysiological measures show that native speaker processing of frequent four-word sequences was faster than for less frequent sequences, and that the frequent items appeared to be stored as both parts and wholes. Eye-tracking work by Siyanova-Chanturia et al. (2011) found that both native and nonnative speakers read more frequent formulaic sequences faster than less frequent ones. Taken together, these studies appear to be unanimous in indicating a faster processing speed for

MENTAL PROCESSING OF FORMULAIC LANGUAGE

63

frequent sequences compared with less frequent ones. It is intriguing to see that in some cases this effect was present in the processing by nonnative speakers as well as native speakers.

Research on brain-damaged individuals The study of processing by brain-damaged individuals has also helped to understand how formulaic sequences are processed. Van Lancker and Kempler (1987) studied how left- and right-brain-damaged people processed formulaic and novel phrases. Note that the right brain is generally understood to be the site of holistic processing of information, and that the left brain is the analytic and discrete-point processor. Using picture-matching auditory comprehension tests, they discovered that left-brain-impaired participants performed better on recognizing formulaic phrases, whereas right-brain-impaired participants deal more readily with novel phrases. This indicates that the right brain, which specializes in holistic processing, has much to do with formulaic language processing, giving some support to the notion that formulaic sequences may be processed as wholes. In a later study van Lancker-Sidtis and Postman (2006) found that left-hemisphere-damaged individuals produced significantly more formulaic language than participants with right hemisphere damage, extending the notion of right brain holistic processing into production as well as recognition. These types of studies offer much in the way of understanding of how formulaic sequences may be processed. The comparison of the two hemispheres in damaged individuals gives us a sense of where processing may be happening in undamaged brains. Taken together with the frequency studies and the idiom studies discussed above, this brings the nature of processing of formulaic language into clearer and clearer focus.

Other types of research Other types of research into processing of formulaic language have used dictation tasks as a means of data collection. A study examining whether formulaic sequences extracted from corpora are also psycholinguistically processed as formulaic is that of Schmitt, Grandage, and Adolphs (2004). The researchers extracted a range of types of formulaic language from corpora and embedded them in a dictation task for both native and nonnative speakers. The utterances in the dictations contained more words than shortterm memory could manage, generally twenty to twenty-five words, forcing participants to reconstruct the language. Many of the formulaic sequences were in fact reconstructed intact, reflecting some evidence of holistic processing, but some were not, perhaps indicating that some individuals may

64

FUNDAMENTALS OF FORMULAIC LANGUAGE

process formulaic sequences as wholes under some circumstances, and as strings of individual words under other circumstances. This type of research has rich implications for our understanding of these phenomena, however, and certainly bears replication and refinement in future work.

What can be concluded? An overview of some basic concepts in cognition and language, as well as a review of some relevant research, leads us to an interesting place. The evidence, as discussed earlier, certainly appears to indicate that adult native speakers and proficient nonnative speakers have, in some ways, mental representations of formulaic sequences as wholes. Thinking back to the discussions of automatization and procedural knowledge, it makes some sense that frequency seems to play an important role in the holistic storage and processing of formulaic sequences. Automatization will occur since repeated exposure to a word string with a particular meaning or function over time likely leads to it becoming entrenched in memory and stored as a whole. Researchers who take a generative grammar approach to language such as Pinker (1999) would likely argue that any effect of frequency would apply only to single words, since, in their view, the lexicon and the grammar are separate. However, when dealing with formulaic language, it is much more logical to integrate more data-based and encompassing theories of language. For example, usagebased (e.g., Goldberg, 2006; Tomasello, 2003) and exemplar-based models of language (e.g., Bod, 2006) see language learning as the acquisition of constructions which vary in length and complexity. Constructions are seen as units of form and meaning which may be as small as a single morpheme or word, or as long as an entire sentence. These models would include formulaic language as a type of construction just as a single word would be, and so a formulaic sequence could be subject to the effects of frequency in acquisition, storage, and retrieval. In this view, faster processing of any frequent item, be it a word or a sequence, is logical. Formulaic sequences, then, do indeed appear to be represented mentally as if single words, and are processed faster than novel language. The exact nature of the processes underlying the faster processing is still extremely uncertain. It may be that words which co-occur often are more strongly connected mentally and semantically. For example, when we encounter a word or set of words such as fish and our brain activates chips. Interestingly, this particular combination occurs in standard American English as well as British English, despite the fact that outside of this combination the term chip is not used for this type of fried potato in American English, which labels it a French fry or a fry instead. In accordance with the idea that language

MENTAL PROCESSING OF FORMULAIC LANGUAGE

65

acquisition is fundamentally a matter of exposure to input and subconscious compilation of statistics of frequency of co-occurrence, it may be that the probability of these words occurring together is greater than other possible combinations, and so chips wins the race for what is to be activated next. If this is the case, then the sequence is not being dealt with as a chunk, but as a rational and probable combination of items. Regardless of the details of whether sequences are dealt with as rational combinations or as chunks in and of themselves, we have seen some interesting evidence of some type of unitary mental representations of formulaic sequences. Idioms tend to be dealt with faster by means of their figurative or holistic meanings. Higher frequency word sequences are processed more quickly and efficiently by both native speakers and high proficiency nonnative speakers. Taken together with the brain lateralization evidence, all of this indicates that formulaic sequences are processed as wholes, or, if we take the approach that words activate each other according to subconsciously held frequency data, as sets of probability formulas.

In summary … From this general overview of research and theory on mental processing of formulaic language, it is really interesting to note that despite the quantity of research which has explored the notion of “retrieved and stored as wholes,” there are still unanswered questions. It may be stated that formulaic language is likely to some extent dealt with holistically. But is this always the case? Can a given sequence sometimes be dealt with holistically and sometimes constructed in a more synthetic, conscious manner? If so, what are the factors, cognitive and contextual, which might influence whether the sequence is dealt with holistically or not? Is it also perhaps the case that sequences may fit on a spectrum of holistic processing, with, for example, collocations and idioms being on the holistic end of the spectrum, and lexical bundles or lexical phrases being dealt with in more constructed way? Are there answers in second language acquisition theory that we have not yet encountered? A few of the many themes and patterns which the research shows are: MM

MM

MM

Formulaic sequences are probably mentally processed more or less holistically. Frequency and automatization likely play a role in the holistic processing. A large amount of the evidence for holistic processing comes from research on idioms.

66

FUNDAMENTALS OF FORMULAIC LANGUAGE

MM

MM

A great deal of the research in this area is highly experimental and does not deal much with real life language use. Studies of brain lateralization of language processing indicate that formulaic language is dealt with holistically in the right hemisphere.

One interesting conclusion to be drawn from this survey of research in the area of mental processing is that higher frequency sequences appear to be processed holistically. This makes us wonder about the nature of input and the power of exposure to language, especially naturally occurring language. It may be that the research in this area does indeed reinforce the ideas of the associative, usage-based schools of second language acquisition theory. But do the evidence and the theory have real implications for our work as researchers and as language teachers, assessors, editors?

POINTS TO PONDER AND THINGS TO DO 1 Before reading this chapter, were you willing to take the notion of holistic processing as a given? 2 Summarize the evidence for holistic processing based on the idiom research. What is it about idioms that lends them to this type of work? 3 Summarize the evidence for holistic processing based on brain lateralization research. How relevant is this to our work as researchers and practitioners? 4 Summarize the evidence for holistic processing involving nonidiom sequences. Does this research bolster the claim of holistic processing? If so, how? 5 Is holistic processing the central element of a definition of formulaic language? What other features are important? 6 Is holistic processing important for other areas of research into formulaic language, for example, corpus studies, acquisition and teaching research, and so on? 7 As a language teacher, how does the notion of holistic processing affect how you might present and introduce formulaic language to learners? 8 As a language teacher, how does the notion of holistic processing affect how you might provide feedback to learners? 9 Does the idea of holistic processing affect the spoken language differently from written language? If so, how? 10 Create or obtain a small corpus of spoken language, either first or second language. Can you detect evidence of holistic processing in the way in which the speech is produced?

5 Formulaic Language and Acquisition—First and Second Language

I

t is interesting to note that, despite the wealth of research which exists on formulaic language from a range of perspectives, there is relatively little empirical work on its role in language acquisition, or how it is itself acquired. The reasons for this are obscure, but it is likely the case that the fixation of linguistics and applied linguistics on acquisition of morphosyntax is at least partly to blame. Also, while the research on vocabulary acquisition grows ever richer, there is only a certain amount of overlap between that body of work and that which concentrates on formulaic language. Vocabulary research tends to focus on single words and their meanings, while much of formulaic language constitutes, as we have seen, more than mere meaning units, often with functions in discourse and so on. This is not to say that the work which has been done on formulaic language and language acquisition is of low quality or marginal worth. On the contrary, the relatively small body of work into this area has yielded some tantalizing and useful results in the area of child first language acquisition. For adult second language acquisition, however, the research results, while still tantalizing, are much thinner. This is an area which is crying out for quality research. The coming years will no doubt bring us plenty to consider and learn.

First language acquisition The study of formulaic language in child language acquisition has given us some serious knowledge with which to understand child language. For one thing, there is a certain amount of evidence of formulaic sequences being used as learning and communication strategies by children in first and

68

FUNDAMENTALS OF FORMULAIC LANGUAGE

second language acquisition. It appears that initial first and second language acquisition in children includes attending to formulaic sequences in language input, adopting them for use, and later segmenting and analyzing them. The analysis may take place later partly as a result of neurological development and a resultant increase in analytic cognitive skills.

Early research The first serious study of formulaic language in child language acquisition dates back to the 1970s. The first such studies were basically case studies of individual children and their progress through the acquisition of language. Wong-Fillmore (1976) was one of the very first to study the second language acquisition of a child and find that one prominent process involved formulaic chunk acquisition. Her data further revealed that this was followed by a process of segmentation or syntactic and semantic analysis and breakdown of the acquired chunks. This in turn furthered development of overall linguistic competence. Another early researcher in the area, Hakuta (1974), conducted a sixty-week study of the second language acquisition of a Japanese child and found evidence of initial acquisition of prefabricated chunks later analyzed and used to facilitate overall language development. Much later, in a similar vein, Hickey (1993), in a longitudinal examination of the acquisition of Irish Gaelic of a child, also discovered a role for formulas in acquisition. Again, she found that they were later broken down and analyzed, providing grist for the linguistic competence mill. A turning point in the first language acquisition research came in the early 1980s with Anne Peters’ seminal piece of work on child language acquisition. Peters (1983) documented how the process of formulaic chunk acquisition and later segmentation, as established by Wong-Fillmore and Hakuta, might actually work. Peters claims that there is evidence for eight assertions about the process: 1 First acquisition units by children often consist of more than one

morpheme. 2 There is no difference between these units and minimal ones in terms

of storage. 3 All of the polymorphemic units can be segmented (broken down). 4 Smaller units from segmentation are stored in the lexicon. 5 Both the original unit and the segmented ones can be stored in the

lexicon. 6 Segmentation produces structural information, starting with simplest

frames with slots, then generalized into patterns.

FORMULAIC LANGUAGE AND ACQUISITION

69

7 The lexicon grows through units perceived in conversation and their

segmentation, as well as fusion (storage of combinations). 8 Fusion continues into adulthood.

According to Peters, the child very early and quickly develops strategies for extracting meaningful chunks from the flow of conversation. This may be based on any of a range of cues, for example: 1 the utility of the chunk for his/her own needs 2 the result he/she observes occurring when the chunk is used among

adults 3 the frequency with which he/she is exposed to the chunk 4 some attractive aspect of the phonetics or prosody of the chunk

He/she is able to remember the chunks, compare them phonologically with others, and remember them as new lexical units. They are initially stored as wholes in the lexicon as individual words or as multiword units. Later in his/her cognitive development, he/she is able to analyze the stored chunks and then recognize and remember structural patterns and information about distribution classes revealed by the analysis. He/she is then ready to develop an ability to utilize lexical and syntactic information already acquired to analyze new chunks in the linguistic environment. Wong-Fillmore, Hakuta, and Peters set the stage for further analysis of these child language acquisition processes. These dynamics of acquisition of formulaic sequences and their use as a basis for creative construction were investigated much later by Myles, Hooper, and Mitchell (1998) and Myles, Mitchell, and Hooper (1999) in child learners of French as a second language in a classroom context. As expected, Myles and her research associates found that the young learners in their studies did in fact acquire and use formulaic sequences as wholes, but they also used segmentation of the formulas to enhance their increasingly complex communication needs over the two years of the research project. In other words, as the learners’ worlds expanded due to increased experience and cognitive development, and they became more aware of the world around them and the need to participate in communication, they were forced to move away from simply using multiword chunks to communicate. Initially, the learners were able to use unanalyzed wholes to communicate simply, but they began to break the formulas apart and use components in different ways as their routine classroom communication needs developed beyond simple communication of personal information into a need to discuss third person activities and characteristics. When the third

70

FUNDAMENTALS OF FORMULAIC LANGUAGE

person communication needs grew, the segmentation process began and then accelerated (Myles, Hooper, and Mitchell, 1998, p. 359).

A role for pragmatic competence What might underlie the need to expand language abilities with increased exposure to others? Some researchers have been able to determine that processes related to pragmatic competence are at work when children acquire formulaic sequences. Bahns, Burmeister, and Vogel (1986) investigated the second language acquisition of a group of children and found evidence of a formula segmentation process at work. They found two particular pragmatic factors at work in the use of formulas by the children, namely, participation in situational frames requiring their use, and frequency of occurrence of the formulas. The authors note that it was common for researchers to discover exceptionally sophisticated language in stretches of child learner speech in research: In their attempts to write grammars for different stages of development, mainly in structural areas like negation or interrogation, child language researchers were very often confronted with utterances of a rather complex nature. The structure of these utterances was somehow “outside” the rules written to account for the bulk of data representing syntactic development for the stage in question. (pp. 696, 697) In their study, Bahns et al. found a large range of formulas used by the children, accounting for the complex utterances noted by earlier researchers. The categories found included: 1 Expressive formulas—indicators of a sudden state of mind, for

example, shut up, stupid idiot, thank you 2 Directive formulas—intended to change the hearer’s behavior, for

example, let’s go, knock it off, wait a minute 3 Game or play formulas—tied to specific play activities, for example,

who’s up, you’re out 4 Polyfunctional formulas—exceed a single semantic-pragmatic value,

for example, what is it? I don’t know 5 Question formulas—elicit information, for example, how come? What

time is it? 6 Phatic formulas—to establish, prolong, or discontinue interaction, for

example, good bye, see you later, You wanna see X?

FORMULAIC LANGUAGE AND ACQUISITION

71

The researchers also found signs of a pattern of development of use of the formulas, starting with use of the simpler expressive and game formulas. This was followed by a broadening of the range of formulas and scope of use as pragmatic awareness and ability grew, and, eventually, full nativelike selection and use of formulas with more precise knowledge of when an expression is pragmatically appropriate.

A double role for formulaic language An important and interesting point to note is the double role of formulaic sequences as an element of child language acquisition. First, they are acquired and retained in and of themselves, linked to pragmatic competence and expanded as this aspect of communicative ability and awareness develops. Second, overlapping in time, they are segmented and analyzed, broken down, and combined as cognitive skills of analysis and synthesis grow. Both the original formulas and the pieces and rules which come from analysis are retained. It has been observed that children basically acquire their first language by attending to and imitating the speech of others. Traditionally, linguists operated on the assumption that the acquisition process in children passed through four general stages: 1 acquisition of words 2 classifying words into categories 3 inferring rules for combining words 4 producing and understanding by combining or analyzing word

sequences on the basis of rules However, recent theory and research tends to focus more on the guidance and scaffolding provided by attending to and communicating with adults in a child’s environment, rather than how children may break down and reconstruct language in a linear fashion. Children in fact tend to pick up and use pieces of language which are useful for them in satisfying their needs, or which provide opportunities for language play. The form–meaning combinations which they pick out of input are stored separately yet overlap in memory, and provide a source of growing or emergent grammar through subconscious cognitive processes on various levels, forming schematic patterns which eventually become available for analysis. Yet, at the same time, the original chunks perceived and taken from input remain for use as wholes, which helps explain the formulaic nature of language use.

72

FUNDAMENTALS OF FORMULAIC LANGUAGE

Studies which examine children’s communications with caregivers provide plenty of data about the nature of the language children hear and what they pick up. The quantity of data has increased over the past five decades or so, and evidence has grown that the language children experience is, for one thing, quite repetitive. Cameron-Faulkner, Lieven, and Tomasello (2003) note that over 50 percent of all utterances mothers direct at two-year-old children begin with fifty-two item-based phrases. Most of these contained two words or two morphemes. Therefore, we can conclude that children are exposed to language which is highly formulaic in nature, and which is restricted in variety, and very repetitive. As well, given the nature of this input, and the restricted linguistic needs of a child, it stands to reason that he/she would take and retain the word sequences more or less as wholes and retain them in memory. Moving beyond the nature of the input, if we look at children’s language production, we can see that indeed they tend to reproduce multiword constructions from the input. A typical type of research method used to examine children’s language output is the traceback method, in which researchers look at the multiword sequences produced by a child after a period (perhaps weeks) of recording, and comparing it to the productions from the previous time periods. In a classic such study, Lieven, Salomo, and Tomasello (2009) studied thirty hours of the speech of each of four 2-year-old children. It was found that 20 to 40 percent of the utterances in the final two hours were sequences the children had used in the previous twenty-eight hours. Furthermore, 40 to 50 percent of these were identical to previous sequences—except for single fillable content word slots, generally references to a person, place, or thing. This constitutes strong evidence that children’s speech, at early stages at least, is highly formulaic and limited in range. How, then, do children create mental representations of formulaic language? Intriguing evidence has been uncovered of the power of frequency in this process. Looking at the nature of the input we also see evidence of children’s creation of representations of frequent sequences. A key study investigating this is that of Bannard and Matthews (2008), who had children repeat back frequent word sequences and less frequent sequences which were different in the final word only. They found that they were faster and more accurate in uttering the initial, shared part of the sequence when repeating the higher frequency one. This speed and accuracy of uttering the common, shared, part of a high frequency sequence points to a strong role for frequency in retention and storage of formulaic sequences by children. It is as if they store the entire high-frequency sequence for ready accessibility, while a similar sequence with lower frequency will be stored more or less as a whole too, but in a less readily accessed way.

FORMULAIC LANGUAGE AND ACQUISITION

73

Earlier, researchers noted that the morphosyntax of language in child language develops from segmentation and breaking down of initially stored multiword chunks. The growth of grammar from this process of creation of representations of strings has actually been tested. In an important such study, Bannard, Lieven, and Tomasello (2009) analyzed the speech of two children when they were at age two and again at age three, discovering that reliance on formulaic language to communicate lessens over time. The researchers inferred grammars from the data and found that lexically specific grammars covered 84 and 75 percent of the utterances at age two, and 70 and 81 percent at age three. At age two, 60 percent one child’s utterances consisted of a single simple grammatical operation, and at age three roughly 50 percent consisted of only two operations. For the other child, at age two, 60 percent of utterances consisted of two operations, and at age three, 60 percent of utterances were accounted for by five operations. It appears, then, that at age two the children are communicating in a very formulaic fashion, and by age three their language starts to appear more productive. In general, evidence points to a process in which children extract formulaic sequences from input and use it to develop productive language. This seems to be accomplished by perceiving and playing with the phonological and semantic shared features of the formulaic sequences. What is the nature and dynamic of the balance between formulaic and productive language in child language acquisition? In classic linguistic research fashion, looking at the errors children produce while speaking can help shed light on competition between embedded formulas and productive language. This helps to show how formulaic language remains active even as productive grammar emerges. For example, children make errors in which they use me in the subject position, which can be taken as evidence of their extracting formulaic pieces out of more complex sentences. Kirjavainen, Theakston, and Lieven (2009) found that this type of error was linked to a grammatical structure common in caregiver speech, the use of me directly previous to a nonfinite verb, for example, let me do it. The seventeen children in the study even tended to use the me form incorrectly in utterances in which their caregivers had used these particular types of verbs. As well, children tend to make errors of using nonfinite forms when a finite one is correct, for example, go to work instead of goes to work. Some researchers (e.g., Freudenthal, Pine, & Gobet, 2010) suggest that this type of error may vary according to how often nonfinite verb forms occur at the end of caregivers’ utterances. Utterance-final word sequences in caregiver speech are much more likely to be adopted by children for use in their own speech. Question inversion is another area of errors which has significance for determining whether children retain formulas even while generating productive grammar. In making these types of errors, children will tend to simply add the

74

FUNDAMENTALS OF FORMULAIC LANGUAGE

interrogative word to the front of the declarative form of the sentence, as in what he is doing or what does he doesn’t want. Rowland and Pine (2000) examined these types of errors in the utterances of and the input received by a child between the ages of two and five, and found more errors if the combination was relatively rare in the input, and fewer if the combination was frequent in the input. Ambridge, Rowland, Theakston, and Tomasello (2006) noted this type of error in the utterances of three- and four-year-olds and found that the most likely cause of errors was in fact the nature of the specific word combinations. Taken together, these studies provide some evidence of the possibility that children take and retain sequences as wholes from caregiver input, and that they still have a strong influence on speech even as productive grammar begins to emerge. The evidence of a sort of competition between the retained formulas and their use as data to structure emergent grammar lends credence to the claims of Peters (1983) and Wong Fillmore (1976) from many years ago.

Second language acquisition A body of evidence has also been collected over the years of a role for formulaic sequences in the process of adult language acquisition, but the development processes uncovered by researchers in this area are not exactly like those found in the child language acquisition studies. The evidence is also more limited for adult language acquisition. It was the 1980s before serious work in this area was undertaken. Yorio (1980) was an early investigator of adult language development and formulaic sequences. Examining several longitudinal studies based on instructed adult learners’ written work, he found that unlike children, adult learners do not make extensive use of prefabricated formulaic language, and when they do, they do not appear to use it to further their language development. Instead, they appeared to use it more as a production strategy, to economize effort and attention in spontaneous communication. A keynote study by Schmidt (1983) consisted of an in-depth case study of the English language development of a Japanese adult in Hawaii, uncovering a definite role for formulaic sequences. In fact, the learner under study used a great and ever-increasing number and range of formulaic sequences as a communication strategy, while appearing fossilized and grammatically inept in other aspects of language. Schmidt found that, while the research subject was highly motivated and rapidly acculturating, he remained resistant to error correction and yet managed to develop linguistically and adapt socioculturally almost exclusively through use of formulaic sequences. It is important to note that, in this case at least, there was little or no evidence of the processes of segmentation and analysis which so characterize the child acquisition studies.

FORMULAIC LANGUAGE AND ACQUISITION

75

Other early researchers found that, as appears to be the case with child language learners, adult learners tend to use formulaic sequences as communication and learning strategies. For example, as noted earlier, Bolander (1989), in a study of acquisition of Swedish by adults, found that formulaic sequences contributed to a greater facility and economy in learning and use. The adults in this longitudinal study consistently used prefabricated language units which contained target language structures well in advance of demonstrating that they had actually acquired the structures themselves. Like the child subjects of Hickey (1993) and Peters (1983), they produced formulaic sequences which contained language which outstripped their normal abilities. As well, Bolander noted that the learners appeared to sometimes use standard or reliable “canonical” formulas to help in acquiring specific rules of Swedish syntax. Much later, in the 1990s, Ellis (1996), in an overview of sequencing in language acquisition, finds a role for formulas in adult language acquisition. He asserts that much of language acquisition is really acquisition of memorized sequences, and that short-term repetition and rehearsal permit the development of long-term sequence information for language. In turn, this information allows chunking of working memory contents to these established patterns. Long-term storage of frequent language sequences allows them to more easily serve as labels for meaning reference, and they can be accessed more automatically. The result is more fluent language use, freeing attentional resources for dealing with conceptualizing and meaning. Ellis asserts that multiword units in long-term storage serve as a database for grammar acquisition. It appears that adults in naturalistic second language learning environments, like children, tend to acquire and use formulaic sequences. However, the established cognitive and learning styles of adults, their diverse acquisition contexts, knowledge of the first language, and other factors make for more variety in the route of language acquisition generally, and with regard to use of formulaic sequences specifically. Some adults may be more analytic and seek to infer rules from chunked units or from pieces of input, while others, such as Schmidt’s (1983) subject, may rely heavily on acquired formulas and not attempt to break them down or analyze them. Furthermore, degree of literacy and type and degree of instruction may play a part.

Second language acquisition theoretical models Perhaps the most promising area of study of adult language acquisition and formulaic language has been in examining the links between use of formulaic language and specific second language acquisition theoretical models.

76

FUNDAMENTALS OF FORMULAIC LANGUAGE

Emergentist or associative models of second language acquisition lend themselves to connections to formulaic language research. Ellis (1996, 2002, 2012) has been a strong proponent of emergentist models of language acquisition and the importance of formulaic language. In 2002 he pointed out that second language learner sensitivity to sequence information, or the statistical probabilities of linguistic elements, was likely evidence of implicit knowledge of formulaic language. A model of the developmental sequence of acquisition therefore is formulaic sequence—limited scope slot and frame pattern—productive grammar. Note that, in emergentist or associative models of language acquisition, this principle should apply to second language as well as first language acquisition. The power of memory in language acquisition and its link to the role of formulaic language in acquisition is undisputed. Phonological shortterm memory (PSTM) appears to be crucial to language acquisition, with evidence showing that learners with better ability to sequence linguistic items in PSTM are more successful in acquiring vocabulary and grammar (Ellis, 1996). Other researchers have worked to show the power of PSTM for various aspects of language use. O’Brien, Segalowitz, Collentine, and Freed (2006) showed that fluency gains in second language were linked to strong PSTM in second language learners, and Kormos and Safar (2008) found that PSTM correlated to a certain extent with second language writing ability and with speech fluency and vocabulary in intensive English as a foreign language study. Wen (2011) found that PSTM correlated with lexical diversity and syntactic complexity in second language speech. Martin and Ellis (2012), using an artificial language, discovered that vocabulary and grammar development were strongly influenced by PSTM. It appears, then, that PSTM affects the learning of word forms and the retention of sequences of forms.

Attention to formulaic sequences in comprehension and production A certain amount of recent research has focused on attention to formulaic sequences in comprehension and production. Some of this research has been conducted with first language participants, but the implications for second language acquisition are clear. The following several paragraphs present some of the relevant research in these areas. It seems that both lexical processing and phonetic processing are influenced by knowledge of formulaic language. As for phonetic processing, a classic example of this type of research is that of Hilpert (2008), who used the make-causative construction—in this construction make occurs with cry seventy-three times and with the verb try just eleven times. However, try is

FORMULAIC LANGUAGE AND ACQUISITION

77

some ten times more frequent as a word in general discourse, making the make X cry construction appear quite formulaic. In the study, first language participants were required to identify whether they heard cry or try after the carrier phrase they made me, and the signal ranged from try to cry on an eight-step continuum. The ambiguous sounds were more often perceived as cry, showing that the formulaic nature of the make-causative construction was quite powerful. Reading time also appears to be influenced by formulaic knowledge. Bod (2001) showed that higher frequency three-word sentences such as I like it were reacted to faster by native speakers than low frequency ones. Ellis, Frey, and Jalkanen (2008) showed that native speakers are quick to read and process frequent collocations with verb agreement and booster-maximizeradjectives. Arnon and Snider (2010) showed that more frequent phrases were processed faster than less frequent ones even when they were matched as to the frequencies of individual words. Tremblay, Derwing, Libben, and Westbury (2011) used three self-paced reading tasks with lexical bundles (see Chapter 8) and matched control sentence fragments to show that the lexical bundles were read faster. Studies involving retention of material in short-term memory and accurate subsequent reproduction show an influence of knowledge of formulaic language. Bannard and Matthews (2008) found that children were more likely to reproduce familiar sequences correctly than less frequent or familiar ones, and to reproduce them faster. Studies of priming, in which sequences recently encountered in communication are reproduced later, show a priming effect for hearing, speaking, reading, or writing sequences (see e.g., McDonough and Trofimovich, 2008 and a growing body of subsequent work). Most of the aforementioned studies involved native speaker or child participants. With second language participants, some remarkable work has also been done. Conklin and Schmitt (2008) found that formulaic phrases were read faster than matched nonformulaic phrases by both native and second language participants. Ellis and Simpson-Vlach (2009) and Ellis, SimpsonVlach, and Maynard (2008) found that second language learners processed formulaic language more effectively if it was of high frequency, as opposed to native speakers, who processed faster those sequences which also exhibit a high rate of mutual information (MI), which measures the statistical likelihood of words collocating. Extensive exposure to formulaic language appears to aid fluency of speech. For example, Taguchi (2007) studied the development of speech abilities in students drilled in word chunks and found that they used more correct chunks after instruction and that they were more aware of discourse features. Wood (2006, 2009a, 2009b) and Wood and Namba (2013) have shown that exposure to and practice with formulaic sequences has positive

78

FUNDAMENTALS OF FORMULAIC LANGUAGE

effects on speech fluency and effective communication. See Chapters 6 and 9 of this book for details on these studies.

Developmental sequence of acquisition To determine the extent to which formulaic language affects the overall acquisition process would entail finding a process somehow similar to that we have seen with children, in which embedding of formulaic sequences somehow “seeds” or “bleeds into” overall language acquisition. This is complicated. For example, some formulaic sequences are not necessarily acquired as wholes by adults, but are easily learnable by virtue of their high frequency and extreme functionality—Hasselgren (1994) remarks that even advanced second language learners will use high frequency words rather than risk the time and cognitive energy to search for and utter alternatives. These safe words and phrases—“islands of reliability” (Dechert, 1980) or “teddy bears” (Ellis, 2012; Hasselgren, 1994)—are most probably the sources of the “seeding” which may happen between formulaic language and overall second language acquisition in adults. Other less frequent and more semantically opaque and less functionally obvious sequences are much harder to acquire, which explains why second language learners tend to underuse formulaic language. In any case, evidence does exist of learners using formulaic sequences with much more complex syntax than their creatively constructed language. This is clearly a communication strategy of great value to learners. Myles (2004) and Myles et al. (1999) examined the oral language of second language learners of French and found that their use of formulaic sequences was full of complex syntax which did not show up in creatively constructed language— until later, and those learners who acquired formulaic sequences readily at first appeared to be the ones who acquired complex grammar faster later on, likely as a result of analyzing the sequences: “these chunks seem to provide these learners with a databank of complex structures beyond their current grammar, which they keep working on until they can make their current generative grammar compatible with them” (Myles, 2004, p. 153). Eskildsen and Cadierno (2007) noted that a Mexican learner of English only used donegation correctly in English L2 in the utterance I don’t know, but gradually expanded this to use with other pronouns and verbs as he abstracted from the exemplar. Sugaya and Shirai (2009) in a ten-month analysis of acquisition of Japanese tense-aspect morphology by a Russian learner discovered that she tended to use particular verbs only with particular aspect markers. A follow up study with groups of low- and high-proficiency learners found that lower proficiency learners tended to link particular verbs to aspect markers but that this

FORMULAIC LANGUAGE AND ACQUISITION

79

tendency shifted as proficiency developed, which can be taken to indicate that learners tend to begin with very context and item-specific pattern control which evolves over time to allow more actual control over the syntactic rules themselves.

In summary … From this short overview of the research into first and second language acquisition of formulaic language, some patterns and themes emerge. One important element is the notion of segmentation of formulaic sequences from the input, and subsequent breakdown of the stored sequences and use of their constituent elements for development of the language system, grammar, and so on. The research into adult second language acquisition of formulaic language is heavily concentrated in naturalistic contexts of acquisition, and leaves little for classroom or formal language teaching practitioners to work with. A great deal of the work with adults has involved native speakers or second language speakers of high proficiency. Is it possible that second language learners do acquire some formulaic language units as wholes at first and then break them down as time passes? Or do they tend to recognize the sequences as strings of discrete and separately recognizable units, and only when instructed to, perceive them as wholes? Is there a blend of both of these types of processing and acquisition? More research is needed to determine whether this is so and if so, how it works—what makes some sequences salient as wholes and others not? A few of the many themes and patterns which the research shows are: MM MM

MM

MM

MM

Formulaic sequences are acquired as wholes by children. Formulaic sequences are likely retained as wholes by children and also later broken down and their constituent parts used as material for subsequent acquisition of morphosyntax and so on. Formulaic language appears to be dealt with holistically by adult native speakers and highly proficient second language speakers. Formulaic language may be used as a strategy for second language acquisition by adult learners. There are still a wide range of questions about how adult language learners perceive and acquire formulaic language.

For certain, all the questions have not been answered yet in any particular area. How can we determine whether and how an adult learner might perceive

80

FUNDAMENTALS OF FORMULAIC LANGUAGE

and process formulaic language? Do the basic assumptions about formulaic language acquisition in children apply wholesale to adult acquisition of a second language? What types of research methods are needed in order to answer these questions? Clearly, this is fertile ground for future researchers!

POINTS TO PONDER AND THINGS TO DO 1 In your own second language learning experience, have you ever perceived and recalled multiword items and then later realized that they were actually composed of single elements? Did you use this as a source of language information later? Give an example or two. 2 If you have had experience as a caregiver of a small child, see if you can recall some examples of a child using a formulaic sequence as a whole. Give an example or two. Perhaps the child’s pronunciation and prosody were nonstandard? Why would this be? 3 Think of a research plan to study an aspect of child first language acquisition of formulaic language. If possible, collect a corpus of speech samples and conduct a small study. 4 Think of a research plan to study adult second language acquisition of formulaic language. If possible, collect a corpus of spoken or written samples, or create a small corpus, or conduct an intervention study. 5 From the descriptions of the research on first language acquisition in this chapter, can we draw any conclusion of value for early childhood educators, kindergarten teachers, or caregivers in facilitating child language acquisition? 6 As a parent, what would you look for in early child speech to help with language acquisition? Can you imagine any areas of study which have not been covered in the research traditions described in this chapter? 7 From the descriptions of the research on second language acquisition in this chapter, can we draw any conclusion of value for second language teachers? 8 Based on what we have read here about adult second language acquisition, where do you feel the most important areas of investigation are likely to be in the coming years? 9 If you were to begin a plan of research in the area of adult second language acquisition, what would you focus on? 10 Does the study of the acquisition of formulaic language have any serious implications for the work of language testers and assessors and editors?

6 Formulaic Language and Spoken Language—Fluency and Pragmatic Competence

A

s we have already seen, a great deal of the research around formulaic language has to do either directly or indirectly with speech. It has been noted that formulaic sequences may make up as much as 58.6 (Erman & Warren, 2000) to 80 percent (Altenberg, 1998) of spoken language. Indeed, the historical roots of the study of formulaic language are in examination of speech—recall deKuyper, Pawley and Syder, and Sinclair, discussed elsewhere in this book. The discussions of cognitive processing focus on spoken language—recall the idea of, for example, “stored and retrieved as wholes.” A good portion of the categories described earlier in this book are also quite speech-focused—idioms, collocations, phrasal verbs. As far as direct links between formulaic language research and spoken language are concerned, however, three main areas are worthy of some in-depth consideration: speech fluency, phonological characteristics, and speech pragmatics. Let’s take each in turn and see what the research may have to tell us. It has become fairly well established that formulaic language is fundamental to spoken language—and that second language speakers can certainly benefit from using it. In a pivotal study, Boers et al. (2006) illustrate that formulaic sequences may help second language speakers in three important ways: they may help speakers appear more nativelike, as they provide ready-made chunks of language which are appropriate to specific contexts; they provide an opportunity for “error-free” speech

82

FUNDAMENTALS OF FORMULAIC LANGUAGE

and may allow speakers to produce language that outstrips their actual competence; they facilitate fluent speech.

Lists of formulaic sequences in spoken language A number of researchers have worked to construct lists of formulaic sequences used in spoken language. These are largely centered around corpus analysis of academic registers. Coxhead (1998) was among the first to construct a list of words used in academic language, the influential Academic Word List (AWL). She later focused on formulaic language to a certain extent (2008). Biber, in a 2006 book reporting on corpus study of academic language from various perspectives identified lists of lexical bundles (multiword sequences identified by means of frequency and range) in university spoken language across a range of registers: lectures and other classroom discourse, service encounters, and so on. See Chapter 8 of this book for details about lexical bundle research. While the research into lexical bundles has contributed immensely to descriptions of academic discourse in general, researchers such as Simpson-Vlach and Ellis (2010) and Liu (2012) point out that the lexical bundle research has often provided us with lists of multiword units which are semantically and structurally incomplete, such as to do with the, or I think it was. The obvious problem with these sequences is that they are “neither terribly functional nor pedagogically compelling” (Simpson-Vlach & Ellis, 2010, p 493). Largely in response to the perceived weaknesses of some of the lists of lexical bundles found in previous research, Simpson-Vlach and Ellis (2010) created the Academic Formulas List (AFL). The AFL was developed by comparing a corpus of 2.1 million words of academic speech and writing to a nonacademic corpus. As such, the AFL represents an effort to identify formulas which are both genuinely academic and classroom-worthy. The corpora used to develop the AFL were scanned at a frequency cutoff of ten per million, using a range criterion of three out of four academic disciplines for the written corpus. To avoid the often limited psychological saliency and pedagogical inutility of the types of units listed in the lexical bundle research, the AFL was compiled using mutual information (MI) scores as a measure of collocation strength, combined with frequency data and a rating by instructors and testers, to produce a composite score which determined the final lists. Table 6.1 presents the top most frequent formulaic sequences in spoken language uncovered by Simpson-Vlach and Ellis (2010):

FORMULAIC LANGUAGE AND SPOKEN LANGUAGE

83

Table 6.1 Top most frequent sequences in spoken language— Simpson-Vlach and Ellis (2010) Blah blah blah This is the You know what I mean You can see Trying to figure out A little bit about Does that make sense You know what The university of Michigan For those of you who Do you want me to Thank you very much Look at the We’re gonna talk about Talk a little bit If you look at And this is If you look at the No no no no At the end of We were talking about In Ann Arbor It turns out that You need to See what I’m saying Take a look at You have a Might be able to At the end

84

FUNDAMENTALS OF FORMULAIC LANGUAGE

Shin and Nation (2008) worked to identify the most common or highfrequency collocations in English using specific criteria. Using the spoken subcorpus of the BNC as a basis, they used the most common single content words (nouns, verbs, adjectives, and adverbs) as a starting point, and used a frequency cutoff of thirty per ten million running words. The list is presented in Table 6.2:

Table 6.2 Most frequent sequences in spoken language—Shin and Nation (2008) You know I think (that) A bit (always/never) used to + infinitive As well A lot of # pounds Thank you # years In fact Very much # pound Talking about (something) (about) # percent (of something/ in something/ on something/ for something) I suppose (that) At the moment A little bit Looking at (something) This morning (not) any more Come on Number (#) Come in (somewhere/something) (Continued )

FORMULAIC LANGUAGE AND SPOKEN LANGUAGE

85

Table 6.2 Most frequent sequences in spoken language—Shin and Nation (2008) Come back Have a look In terms of (something) Last year So much (#) years ago Determiner –the/this/a county council

Martinez and Schmitt (2012) elaborated a phrasal expressions list consisting of 505 multiword items extracted from the British National Corpus (BNC) by means of frequency and distributional criteria. The resulting list was then narrowed down by applying a range of judgment criteria including whether the sequence was semantically transparent and whether it had a one-word equivalent. The researchers included the entire BNC in their search and therefore the list contains sequences relevant to both spoken and written registers, and is focused on receptive skills rather than both receptive and productive. The most frequent items are presented in Table 6.3:

Table 6.3 Most frequent sequences in spoken language—Martinez and Schmitt (2012) I mean A lot Rather than So that A little A bit (of) As well as In fact (be) likely to Go on Is to A number of (Continued )

86

FUNDAMENTALS OF FORMULAIC LANGUAGE

Table 6.3 Most frequent sequences in spoken language—Martinez and Schmitt (2012) At all As if Used to (past) Was to Not only Those who Deal with Lead to (cause) Sort of The following In order to Have got (+NP) Have got to Set up As to As well Based on Carry out

Speech fluency If you look around for descriptions of spoken language abilities, such as second language syllabuses or assessment criteria and so on, you are likely to see the word fluency represented somewhere. It is often used as a synonym for “nativelike” ability in a second language, or for having “good command” of language. In terms of speech in particular, ask people what it means and you may hear words like “smoothness” or “flow” of speech. In the end, though, the research literature on fluency has generally attended to temporal variables of speech. These are measurable, quantifiable aspects of speech: speed; pauses and hesitations; length of runs.

FORMULAIC LANGUAGE AND SPOKEN LANGUAGE

87

Speed of speech is usually measured as syllables uttered per second or per minute. Research shows that second language learners tend to produce more rapid speech over time as the acquisition process unfolds, and that speed of speech correlates with judges’ perceptions of fluency. Examples are Towell’s 1987 case study of a learner of French over four years which found that speech rate increased by 65 percent, Riggenbach’s 1991 study of Chinese students learning English which found that syllables uttered per minute linked to judges’ ratings of fluency; in other words, faster speakers were rated as more fluent. Another well-known large study of fluency, by Freed in 1995, found that speech rate was the only thing that actually increased when American students of French spent time abroad in France. A more complex aspect of fluency is pause phenomena. By this we mean the amount and frequency of hesitations and pauses (the two terms pause and hesitation are interchangeable as used here), as well as the location. Location means, basically, syntactic location. In general, pause times are measured by the proportion of the total speaking time spent in pauses or silence. This is called the pause/time ratio, or PTR in the literature. Less time spent pausing is, not surprisingly, an indicator of higher fluency. Researchers like Möhle (1984) and Riggenbach (1991) found that shorter, fewer, and less frequent pausing was linked to higher fluency. Thinking about fluency as a function of increased speed of speaking, or of less pausing tells us something useful, but something is missing at the same time. The missing element is an understanding of how fluent speech occurs, and what role there is for formulaic language. For this, it is productive to look at pause location. The research shows that pauses which occur at phrase and clause boundaries are more linked to fluency than pauses occurring at other syntactic locations. Dechert (1980) found that, after a study abroad experience, a student telling a story was more able to pause at breaks between story segments which established the setting, locations, and so on. Lennon (1984) found that L1 speakers paused at clause boundaries, whereas L2 speakers also paused within clauses. Deschamps (1980), Riggenbach (1991), and Freed (1995) found similar phenomena in their studies. What this means is that the production of clauses and phrases more or less as wholes is a sort of hallmark of fluency speech production. The most important variable of speech associated with fluency is the length of runs of speech produced between pauses. An early investigation, which focused on temporal variables in L2 speech, is Raupach’s (1980) study of participants telling a story from picture prompts in their L1s and L2s, in which the L2 speech displayed shorter runs between pauses. Later, Möhle (1984) discovered the same dynamic at play when the participants in her study

88

FUNDAMENTALS OF FORMULAIC LANGUAGE

produced shorter runs between pauses in L2 than in L1 speech. Another key study was that of Towell (1987), in which a British learner of French over a four-year period increased the mean length of runs by 95 percent over the first three years. Lennon (1990b) found that the mean length of runs between pauses in the L2 speech of his participants increased by 20–26 percent over a 23-week period. As well, in a large-scale examination of the development of speech fluency by American students of French, Freed (1995) identified a strong tendency toward longer runs over time. Here is where we can find a possible role for formulaic language in fluency. It may be that a repertoire of formulaic sequences can help speakers to produce phrases and clauses more or less as wholes, without internal pausing, which would account for the less frequent pausing and the longer runs of speech between pauses which appear to be key indicators of fluency. A key study, which investigates the role of formulaic language in L2 speech fluency, is that of Wood (2010). Wood examined the fluency development of a group of eleven L2 learners in an intensive study abroad English program over a six-month period. The participants were two female Japanese L1 speakers, two male Japanese L1 speakers, two female Spanish L1 speakers, two male Spanish L1 speakers, one female Mandarin L1 speaker, and two male Mandarin L1 speakers. Drawing on the research into temporal variables of speech and the possible role of formulaic language, Wood hypothesized that, with continuous experience with their L2 over the six months, the participants’ spontaneous speech in English would show a faster speech rate, a lower ratio of pause time to speech time, longer runs of speech between pauses, and more frequent occurrence of formulaic language within the longer runs of speech. Participants met once a month over the six months to watch a silent animated film prompt and then spontaneously retell the narrative of the film. Three films were used, 8—10 minutes long, with eight narrative moves each and a similar number of characters and level of plot complexity. The films were seen in a staggered sequence so as to ensure that participants would not become overly familiar with any one story—one film was seen in months one and four, another in months two and five, another in months three and six. The resulting data were analyzed first for evidence of fluency gain over the six months. Speed of speech was calculated as speech rate (SR), syllables uttered per minute, and by articulation rate (AR), syllable uttered per minute with pauses removed. Pause time was calculated as phonation/ time ratio (PTR), time spent actually articulating, divided by the total speech time. Mean length of runs (MLR) was calculated by dividing the total number of syllables uttered by the total number of runs for each sample. Finally, a

FORMULAIC LANGUAGE AND SPOKEN LANGUAGE

89

formula/run ration (FRR) was calculated by dividing the number of runs by the number of formulaic sequences in each sample. As noted in Chapter 2, determining what constitutes a formulaic sequence in speech can be challenging. In Wood’s study, native speaker judgment was used. Three informed judges who had read key pieces of research literature on formulaic language used five judgment criteria to make decisions about what constituted a formulaic sequence: MM

The taxonomy used by Nattinger and DeCarrico (1992). It was a guide for selection, as sequences from the transcripts were identified with an eye to categories:

1 Syntactic strings are strings of category symbols, such as

“NP + Aux + VP” (…). 2 Collocations are strings of specific lexical items, such as rancid butter and curry favor, that co-occur with a mutual expectancy greater than chance (…). 3 Lexical phrases are collocations, such as how do you do? and for example, that have been assigned pragmatic functions (…) (p. 36). The authors go on to refine these categories and further refine their shared characteristics. MM

MM

MM

MM

Phonological coherence. Coulmas (1979) and Peters (1983) state that if a sequence is to be considered formulaic, it must be at least two morphemes long and cohere phonologically, that is, be produced without internal hesitation or pausing. This was one of the most important aspects of the formula identification process in the present study. For more information about phonological coherence see later in this chapter. Greater length/complexity than other output. Also pointed out by Coulmas (1979) and Peters (1983), chunks which are uttered in a longer run and/or show greater semantic or syntactic complexity than the rest of a speaker’s output are likely formulas. Examples would include using I would like … to express a desire for something, or I don’t understand to show a lack of comprehension, while never using would or negatives using do in other contexts. Semantic irregularity. According to Wray and Perkins (2000), formulas are often not composed semantically, but are holistic items like idioms and metaphors. Syntactic irregularity. Formulas often do not follow rules of syntax (Wray & Perkins, 2000). This can restrict the manipulation

90

FUNDAMENTALS OF FORMULAIC LANGUAGE

of elements in a formula (one cannot pluralize beat around the bush or passivize face the music), or require the flouting of normal syntactic restrictions as in the intransitive verb + direct object construction of go the whole hog or the gross violation of syntactic laws in by and large. No one or combination of these criteria were deemed necessary for a sequence to be determined formulaic, they were to serve as general guidelines only. The judges had a discussion and benchmark session in which they worked together on two transcripts to prepare them to then work alone on the rest (Wood, 2010, pp. 111, 112). As might be expected, the participants showed significant gains in fluency on the SR, AR, and MLR measures over the six months. They also showed strong gains in FRR, or formula/run ratio. This indicated that the use of formulaic language actually may have facilitated the reduction of pauses and increased length of runs. Wood (2006, 2010) also noted that the study participants used formulaic sequences in certain ways and for certain functions in order to facilitate their fluency. He compared sequences of the speech samples in which a participant retold a certain narrative move after the first viewing of a film, and again three months later. Five categories of uses and functions emerged from this analysis: MM

repetition of a formula; stringing together of multiple formulas; reliance on one formula; use of self-talk and filler formulas; use of formulas as rhetorical devices.

Repetition of a formula helped the speakers to lengthen runs and avoid pausing in later retells. An example of this would be the following, from a Japanese female participant’s speech: MM

And he came back the cat came back to the his house and ah

This results in a run of thirteen syllables, only one of which is a filler nonlexical item ah. MM

I forget I forget the order but maybe the f he went to the forest

Here she appears to think aloud, buying time to recall the next event in the narrative and uses a very simple subject + verb formula to repeat her lack of clear recall. It helps her to produce a 19-syllable run. Stringing together multiple formulaic sequences also helped speakers to avoid pausing and to extend runs, as evidenced by, for example, For instance, in later retells of the film Strings (speech samples 3 and 6, these examples

FORMULAIC LANGUAGE AND SPOKEN LANGUAGE

91

are from sample 6) several participants described the old man in the story making music by himself in his room, a combination of three short two-word formulas making music, by himself, and in his room. This produces a very fluent ten-syllable run. Reliance on a single formulaic sequence also helped speakers to avoid pausing. To introduce the next action in the story, for example, it was common to use and then, or and next. Use of self-talk and fillers was a relatively sophisticated strategy used by the speakers. This includes use of self-referential collocations as I know, or I think, or I guess. Also included in this category are long strings used for self-talk or circumlocution such as I don’t know, or I don’t know the thing’s name. These allowed them to produce longer runs. Similarly, use of formulaic sequences as rhetorical devices was a relatively sophisticated means of avoiding pauses and extending runs. Wood notes that, in later retells, the study participants tended to use beginning formulas such as at the beginning, narrative move markers such as when the story is go ahead, and endings such as that is the end of the story. All of these add greatly to the length of runs as well as to the effectiveness of the storytelling. Interestingly, this study still stands alone as the only effort to determine the role of formulaic language in speech fluency. It shows both quantitatively and qualitatively the importance of formulaic language in second language speech fluency development.

Phonological characteristics As noted in Chapter 2, formulaic sequences appear to display particular phonological characteristics in speech. Lin (2010, 2012) cataloged these characteristics. Phonological coherence is the term most often used to summarize the nature of the prosody of formulaic sequences in speech. It was Peters (1977, 1983) who first noted that children can be observed to produce sequences which surpass their grammatical competence and which exhibit no internal hesitations and a smooth intonation contour, making them stand out from the rest of the speech flow. The idea is, then, that these characteristics of formulaic sequences can give researchers a clue as to what has been acquired as a “chunk.” Child language researchers such as Hickey (1993) note that phonological coherence is basic to spoken formulaic sequences in children’s speech. As for adult speech, it is not quite so clear, although numerous researchers have assumed that a similar dynamic applies as to child language (e.g., Moon, 1997; Wray, 2004).

92

FUNDAMENTALS OF FORMULAIC LANGUAGE

It has been shown that high-frequency phrases undergo phonological reduction more quickly than other phrases (e.g., Bybee, 2002, 2006). This means reduced schwa, sometimes t/d deletion (Bybee, 2000). As for stress placement, Ashby (2006) looked at idioms and determined that there are actually three classifications of idioms according to their accent patterns in comparison with literal uses of the same word sequences: in one case idioms have the same pattern as a literal version, for example, to have a CHIP on one’s shoulder, to have a BEE on one’s shoulder. In a second case, the accent pattern is different from a literal version, for example, POUR down (idiomatic, as in heavily raining) versus pour DOWN (literal). In a third case, the idiom is very restricted, for example, I could eat a HORSE (falling tone) versus I could eat a horse (falling-rising tone—never used). Ashby notes that the second type of case signals or invites a listener to note an interpretation of the utterance which is nonliteral. Theories of holistic storage of formulaic sequences, as discussed in Chapter 4, are often linked to phonological coherence. Lin (2010, p. 179), however, notes that this connection is often an assumption and used in a circular fashion, with researchers pointing out that phonological coherence is an indicator of holistic storage, while others state that sequences are phonologically coherent because they are holistically stored. If we look instead from a frequency-based perspective, we may conclude that the phonological coherence of a frequently used/frequently encountered sequence may simply be a result of the fact that it is uttered and heard often. The neurological and motor underpinnings of the processing and production of the sequence may simply have become faster with frequent use. Research has indicated that the nature of pausing before the uttering of formulaic sequences in spontaneous speech differs from that before nonformulaic language. Erman (2006, 2007) examined the London-Lund Corpus and the Bergen Corpus of London Teenager Language, and noted that retrieval of formulaic language appeared to entail shorter pre-sequence pauses. This may be taken as something of an indication that the formulaic sequences are processed more quickly. Other research has pointed to an alignment between formulaic sequences and intonation units. Lin and Adolphs (2009) found that the sequence “I don’t know why” taken from the Nottingham Corpus of Learner English was a single intonation unit in over 50 percent of cases. In further research, Lin (2010) investigated formulaic sequences in a university lecture and found that 82 percent aligned with intonation boundaries on one side of the sequence, and 40 percent aligned on both sides. Clearly, there is plenty of evidence that formulaic sequences are marked by certain phonological characteristics in speech. The evidence points to phonological coherence being a function of holistic mental storage and retrieval of formulaic sequences, possibly in addition to frequency effects.

FORMULAIC LANGUAGE AND SPOKEN LANGUAGE

93

Pragmatics One extremely important aspect of spoken language is pragmatic competence. The idea of communicative competence (e.g., Bachman, 1990) led the language teaching field into new ways of approaching its purpose. In addition to grammar, communicative competence includes knowledge of discourse, genre and text, social aspects of language, and a focus on the learner and what he/she does. Current models of communicative competence see communication as composed of knowledge or competence in four key areas: organizational competence, pragmatic competence, sociolingual competence, and strategic competence. Pragmatic competence is key to successful ability to communicate in social interactions, and is the basis of what we might call “small C” culture. We can define pragmatic competence as the knowledge and skill necessary for successful and appropriate use of language in communication, and subdivide it into several broad categories: MM

MM

Pragmalinguistics: actual language ability to perform language functions such as requesting help, inviting, refusing invitations, making requests, giving commands, and so on. Sociopragmatics: ability to assess the context in which the function occurs—what is the appropriate means of achieving what I need, given the nature of the situation and the people involved?

Pragmatic competence involves both of the above, in real-time communication. For example, in a communication situation we might need to determine what grammar and vocabulary is needed to refuse an invitation, and, at the same time, assess whether such a refusal is acceptable, and what to say, under the specific circumstances. Clearly, knowledge of conventional formulaic language units to use in specific contexts is a key part of sociopragmatic competence. According to Bardovi-Harlig (2012, p. 207), research on formulaic language and pragmatics involves three areas of focus: 1 the form as a recurring sequence 2 its use in specific situations 3 the social contract or bonds, which include members of a speech

community Formulaic language has been referred to by a range of labels in pragmatics, including conventional expression, pragmatic routine, situation-based utterance (SBU). The general agreement in pragmatics research is that

94

FUNDAMENTALS OF FORMULAIC LANGUAGE

“formulas in pragmatics are conventional expressions representing ways of saying things agreed upon by a speech community” (Bardovi-Harlig, 2012, p. 209). In pragmatics research frequency-based research is relatively rare. Formulaic sequences are identified in various ways depending largely on the goals of the research. It is common to either start with the formulaic sequence and work from there or to start with the situation and context, specifically the illocutionary force or the speech acts, and determine the sequences used.

Specific studies of formulaic language in pragmatics Some studies have identified a formulaic sequence in a specific situation and identified its use in a quantitative way. Manes and Wolfson (1981) looked at almost 700 examples of compliments and determined that 53.6 percent took the form NP (is/looks) (really) ADJ, while two others, I (really) (like/love) NP and PRO is (really) ADJ NP made up another 30 percent. Culpeper (2010) examined impoliteness in a range of contexts, identifying the impoliteness sequences by means of examining the reactions of interlocutors and addressees. The resulting list was compared with occurrences in the Oxford English Corpus, in which at least 50 percent of the occurrences had to be impolite. In this way, a list of impoliteness sequences was compiled. Interestingly, corpus-based studies are still rather uncommon in pragmatics research into the use of formulaic sequences. One of the relatively rare examples is a study by Wong (2010), examining the Hong Kong Interactional Corpus of English for sequences expressing thanks. How formulaic sequences are used in pragmatics is a prime focus of research. Terkourafi (2002), analyzing fifteen hours of spontaneous conversation in Greek, found that formulaic sequences tend to bear the responsibility for conveying politeness in spoken discourse. Reiter, Rainey, and Fulcher (2005) explored speaker expectations and the use of sequences, finding that making requests by means of formulaic sequences was linked to greater certainty of compliance with the request. In other words, use of formulaic language to make requests gets better results, despite some cross-cultural differences. Another important question for pragmatics research is how common the use of formulaic language is in pragmatics. We might expect that pragmatic competence is very tightly connected to use of formulaic sequences, given their importance in realizing language functions, their frequency and ubiquity in spontaneous speech, and their role in establishing speech

FORMULAIC LANGUAGE AND SPOKEN LANGUAGE

95

communities. Surprisingly, the research shows that nonformulaic options are often available, and that the degree of use of formulaic sequences in a given context may vary by culture. For example, Traverso (2006) noted that service encounters in France and in Syria differ in the amount of formulaic language used in certain aspects of the exchange, such as solicitation— request—request uptake, as well as in verbal acceptance of items. This may lead to difficulty in L2 teaching and learning of pragmatic competence as one’s L1 community norms for use of formulaic language in a given situation may vary from the norms of the L2, target cultural community. As well, certain contexts may require more formulaic language use to achieve pragmatic goals than others within a given community. Bardovi-Harlig et al. (2010), for example, found that, in a university community, requests required relatively little use of formulaic language, but expressing thanks was much more formulaic in nature. It is interesting to think about learner or user attitudes toward use of formulaic language in pragmatics. A fascinating study by Sifanou and Tzanne (2010) found that 92 percent of Greek respondents to a survey felt that routine polite business uses of expressions such as “how can I help you?” Or “Thank you for calling us” were not sincere and were simply means of marketing and maintaining or increasing business. This level of skepticism may be equally or less common in other cultures. Attitudes toward particular formulaic sequences from particular regions or cultures can have implications for language educators and proponents of English as an international language. Korean ESL learners, when asked to choose among Australian, British, or American formulaic sequences and then asked attitude questions, expressed a desire to learn American or British language (Davis, 2007). What is the role of formulaic language and pragmatics in second language acquisition? The notion of pragmatic competence as key to communicative competence (see e.g., Bachman, 1990) is an indicator of the potential importance of acquisition of formulaic language in pragmatics. A range of research studies have examined L2 acquisition of pragmatics. Some early research looked at how learners recognize and interpret meanings of formulaic sequences. Kecskes (2000) found that learners recognized literal rather than figurative or idiomatic meanings first in sequences such as “a piece of cake,” or “get out of here.” Researchers have examined the effects of exposure to a target language over time on use of pragmatic formulaic sequences. Barron (2003) in a study of development of pragmatic formulaic sequences by L1 English speakers learning German after a year of study abroad found that they increased their use of accurate and appropriate formulaic sequences. Similarly, Roever

96

FUNDAMENTALS OF FORMULAIC LANGUAGE

(2005) found that even a three-month term in study abroad helped learners to select appropriate formulaic sequences in multiple-choice tasks. BardoviHarlig and Bastos (2011) discovered that recognizing and producing authentic formulaic sequences was helped by intensity of interaction in an L2, but that length of stay in the L2 milieu had little influence alone. Other researchers have looked at the phenomenon of transfer of formulaic sequences from first to second language. For example, Sharifian (2008) found that Persian L1 learners of English in Iran, when asked to respond to compliments in English, were likely to use translated formulas from their first language. In a study by Barron (2003) such transfer appeared to decrease with increased acquisition and that learners sometimes weighed the options of using target language-sounding sequences with a need to be themselves and use L1 conventions.

Formulaic language and the teaching of pragmatic competence In teaching pragmatics generally, what is the role of formulaic sequences? A lot of the research in this area has focused on thanking formulas (BardoviHarlig, 2012). Schauer and Adolphs (2006) compared two pedagogical means of dealing with thanking formulas: identifying thanking formulaic sequences in the CANCODE corpus, and a discourse completion task in which one, for example, fills in the utterances of one of the interlocutors in a dialogue. They found that native speakers in the discourse completion task contexts produced sequences not even found in the corpus data, a testament to the power of real-life communication. De Pablos-Ortega (2011) looked at the thanking expressions present in sixty-four Spanish language textbooks, and found that the sequences were present in good numbers, but more in lower level materials than higher levels. The researchers also presented the textbook thanking scenarios to native speakers and found, like Schauer and Adolphs (2006), that the native speaker participants produced a greater range of sequences than the textbooks did. Research (see Taguchi, 2011) shows that explicit explanation can be of benefit. Setting up consciousness-raising situations is helpful, especially when students are required to compare their performance with target-like performance to promote “discovery” of pragmatic conventions. On the other hand, expecting students to “pick up” how pragmatics work by simply providing examples leads to weak “uptake”; to a great extent this is due to the fact that learners tend to pay attention to meaning rather than form in input unless their attention is directed specifically.

FORMULAIC LANGUAGE AND SPOKEN LANGUAGE

97

Structured input tasks seem to work well. These include: MM

Listen to a dialogue containing target forms – Complete discourse—for example, fill in the utterances of one of the interlocutors in a dialogue – Role play – Problem-solving—what to say or do in situation X

After some explicit instruction, it is best to move from declarative (conscious, “taught” knowledge) to procedural (subconscious, skill focused) knowledge. Research shows that more practice leads to faster and more accurate use of pragmatic conventions in both reception and production. Some specific activities might include the following: MM

Receptive skills – Listen and rate appropriateness of how someone deals with a communication situation related to a specific function – Listen and select appropriate forms from a list

MM

Productive skills – Role play – Structured conversation—students are provided with a sense of the nature of the situation and the participants, plus a set of functions to execute, then they generate appropriate language to suit the context – Discourse completion—students fill in missing parts of a dialogue or other type of communication situation – Cloze—students insert appropriate words into a text with blanks

In summary … I am sure we all agree that this extremely compressed overview of research into formulaic language and spoken language reveals some important information about the power of formulaic language. Formulaic language appears to be a fundamental aspect of the dynamics and the linguistic content of spoken communication, comprising a significant proportion of the words we speak, and playing vital roles in the production (and, by implication, the comprehension) of fluent speech, the ways we achieve communicative goals in communication, and more. Yet, it also feels as if we are still only scratching

98

FUNDAMENTALS OF FORMULAIC LANGUAGE

the surface of what formulaic language does in spoken language. Deeper investigations are needed into the areas outlined in this chapter. Richer data are needed to help us understand more fully what importance formulaic language has in spoken communication. A few of the many themes and patterns which the research shows are: MM MM

MM

MM

MM

Formulaic language is important in spoken language. Formulaic language comprises a large proportion of spoken language in a range of registers. Formulaic language may be a key element of second language speech fluency. Formulaic language is uttered with particular phonological characteristics. Formulaic language is integral to the ways we use language to achieve particular communication goals—it is inextricably linked to pragmatic competence.

For certain, all the questions have not been answered yet in any particular area. How much of spoken language overall is formulaic? Have we really gotten to a point where we can confidently claim that a given word sequence is formulaic (see Chapter 2)? Do we fully agree on what constitutes a formulaic sequence? Given the limited amount of research on formulaic language and fluency, how confident can we be in claiming that formulaic language is important, and how do we know whether a formulaic sequence is stored and retrieved as a whole in spoken language?

POINTS TO PONDER AND THINGS TO DO 1 Reviewing the information covered in this chapter, summarize the importance of formulaic language in spoken language. Are there any aspects of spoken language which appear to be missing from this overview? 2 How is the value of formulaic language in spoken language different from and similar to that in written language (see Chapter 7)? 3 In the first part of this chapter we see lists of formulaic sequences taken from corpus research. What similarities and differences do you see in these lists? 4 In second language fluency research we see that certain temporal variables of speech are highlighted as indicators of fluency.

FORMULAIC LANGUAGE AND SPOKEN LANGUAGE

5

6

7

8

9 10

Does it appear that any qualities of fluency might be missing from these variables? The preoccupation in Wood’s (2010a) research seems to be fluent language production. Can we safely assume that there is a similar effect of formulaic language on second language fluent listening comprehension? A great deal of the work summarized here is centered around second language learning. Which pieces of research also seem relevant to first language speech? Review Wood’s 2010 study. Using his methods and findings as a start, outline a study of your own which deepens or focuses our understanding of formulaic language and second language fluency. Try conducting an ethnographically focused study of formulaic language in pragmatics. Record or take field notes on spoken communication in a particular context. Try to identify key formulaic sequences used in this type of situation. Elaborate a lesson designed to teach the content you uncovered in the activity described in #8 above. Read one of the pragmatics-focused research studies described in this chapter. Would the results be the same with a different group of participants? Try duplicating it with a different group of participants. Are your findings similar or different from those in the original study?

99

7 Formulaic Language and Written Language—Academic Discourse in Focus

A

s will be discussed in Chapter 8 concerning lexical bundle research, the study of formulaic language in written language tends to concentrate on academic discourse. There are some obvious reasons for this, not least among which is the fact that the language skill of writing is a primary concern in academia. Outside of the academy, specific professional fields have their own requirements and demands with regard to writing, but the world at large tends to actually write rather little outside of particular workplaces, beyond filling forms and perhaps emailing and texting. As seen in Chapter 5, some rather experimental and psycholinguistically oriented research has been conducted into formulaic language and its effect on reading abilities. For example, reading speed seems to be affected by knowledge of high-frequency formulaic language. Research with native speakers by Bod (2001) showed that higher frequency three-word sentences such as I like it were processed more quickly than lower frequency ones, and Ellis et al. (2008) found that frequent collocations with verb agreement and booster-maximizer-adjectives were more quickly read and processed. Similarly, Arnon and Snider (2010) discovered that more frequent phrases were processed faster than less frequent ones, and Tremblay et al. (2011) found that sentences with lexical bundles were read faster in self-paced reading tasks than those without lexical bundles.

102

FUNDAMENTALS OF FORMULAIC LANGUAGE

The nature of writing Writing is a language skill which carries a great deal of communicative power. The study of writing has a long history and is a research discipline all of its own. Writing theorists such as Kress (1994), Llach (2011), and Paré (2009) point out that writing is a remarkably potent means of manipulation of language to share information. A leading figure in the field of writing studies, Raimes (2002) notes that writing is distinguished by its complex nature, as writers conceive of ideas, present them, and examine the production in terms of products and the writing process itself: Writing, unlike speaking, provides us with a way not only to generate ideas before presenting them, but also to scrutinize the ideas and language we produce; this re-vision, this seeing again, lets us receive feedback from ourselves and from others and, learning as we go, make changes and corrections. (p. 309) The act of writing has become even more embedded in modern societies and cultures as advances in technology have enabled people to share ideas more rapidly and comfortably by means of texts and numerical data. In the academic world, writing is central to communication. It plays a strong role in evaluation and assessment, as in schools, colleges, and universities most evaluation is based on students’ abilities to craft pieces of writing which present ideas, including essays, summaries, and so on (Connor, 2003; Leki, 2006; Llach, 2011; Nation, 1990; Paltridge, 2004; Zhu, 2006). In addition to the power of writing as a communication medium and a means of academic growth and assessment, writing is also generally seen as an engine of both first and second language acquisition. Llach (2011), for example, notes that writing is a key means of furthering lexical acquisition. For second language writers, the act of writing provides a chance to put into practice and employ the syntax and vocabulary of the target language, and, as such, it can be considered a remarkable means of strengthening and broadening learners’ entire linguistic repertoire (Llach, 2011). Llach (2011) also notes that skill in writing is seen as a tool for assessing language proficiency, as writing tasks are present in virtually all major university placement and proficiency tests; the assessments which result from these tests tend to focus on accuracy of grammar and word use. It is almost universally accepted that writing is a challenging and complex activity, especially in academic environments. This is especially true for second language learners who struggle to express complex ideas and may fear being seen as having undeveloped language skills due to their challenges with writing (e.g., Bacha, 2002; Cook & Bassetti, 2005; Silva, 1993). The base of this

FORMULAIC LANGUAGE AND WRITTEN LANGUAGE

103

type of fear on the part of learners may be linked to the fact that, in academic writing in particular, writing is characterized by a high level of formal structure and strict expectations of the types of words and structures needed, a set of expectations which tends to be specific to particular registers, disciplines, and genres. Unlike more creative or purely communicative writing such as fiction or letter and text messages, academic prose is characterized by its restricted and formal nature. Nowhere is the particular nature of academic writing more apparent than in its phraseology and use of formulaic sequences. It was Lewis (2000) who put it best, remarking that, “in academic writing, where the focus is almost exclusively on accurate communication of information, among colleagues with a shared background in a particular topic, standard words, phrases, collocations and other chunks are an essential prerequisite for effective communication” (p. 189). This can be taken to mean that academic writing skills go well beyond the need to handle lexicon and syntax, and require a high level of ability to incorporate formulaic sequences which are so fundamental to the creation of academic discourse. Formulaic sequences are, in essence, a major part of the foundation of successful academic writing skills because they comprise the basic elements of academic discourse and are specific to particular disciplines, registers, and genres. It is not uncommon to find discussions among teachers and assessors, especially in second language education contexts, in which the topic may be the “naturalness” of a piece of writing. A given sentence may be grammatically correct, yet appear to be unnatural or odd because in some hard-to-define way it fails to read comfortably as an academic piece of prose. The particular area of sentence frames and heads is presented as a case in point by Lewis (1997): The frustration of reading a student’s essay and thinking “I know what you mean, but that’s not the way to say (= write) it,” is most frequently caused by the student’s failure to use this type of lexical item. Some are comparatively short and easy (sequences such as secondly, … and finally); some are sentence heads serving similar pragmatic purposes (We come now to a number of important reservations). (p. 259) This type of deviation from the norm can have serious effects on the evaluation of a learner’s writing in an academic environment. It has been posited that EAP programs perhaps should be designed to provide learners with awareness and control of large numbers of formulaic sequences (see Jones & Haywood, 2004). This makes excellent sense, since acceptable production of academic text by learners is strongly linked to accurate use of words and sequences characteristic of academic discourse. Indeed,

104

FUNDAMENTALS OF FORMULAIC LANGUAGE

Cowie (1992) notes that “it is impossible to perform at a level acceptable to native users, in writing or in speech, without controlling an appropriate range of multiword units.” Writing is seen as, to a great extent, a skill requiring manipulation of vocabulary in all its forms. For example, Coxhead and Byrd (2007) point out that successful academic writing is linked to specialized knowledge of academic genres (p. 133), which in turn implies knowledge of particular recurrent vocabulary. This includes, of course, formulaic sequences, which are vital as they are apt to be stored and retrieved more or less holistically. This particular characteristic of formulaic sequences can make for a comfortable reading experience for consumers of academic text, allowing for efficient expression of often extremely abstract content, highly complex relationships among ideas, delicate nuances of assertions and stances, all to be understood in the blink of an eye. The act of reading, with its necessary speed and flow, is dependent upon the recognition of familiar or expected word sequences, which help the reader to, as is the case with listeners in spoken discourse, grasp meaning without resorting to the arduous and ultimately unproductive process of processing word by word. In the end, academic writing combines the control of grammar and vocabulary, clear knowledge of academic genres, and vocabulary essential to a specific field (Coxhead & Byrd, 2007). Clearly, proficient second language writing requires not only more than just control of grammar and vocabulary, but also an ability to meet the expectations and predictions of readers by using the formulaic sequences typical of the specific discourse. Hyland (2008) points out that weak awareness of and skill with formulaic language is a major problem for second language writers in academic contexts. Skilled use of formulaic sequences in academic writing, as Hyland (2008) suggests, can help in the crafting of logical and coherent academic texts, and the absence or misuse of the sequences can mark writing as low proficiency or nonnative. Written texts in academic disciplines are characterized by specific sets of formulaic sequences. Ellis et al. (2008) remark that research on English for academic purposes has shown us that each discipline has a high frequency of multiword sequences. Logically, learning to write effectively in a particular discipline requires use and control of these frequent word combinations. Ellis et al. (2008) further note that a writer may, for example, have advanced knowledge of grammar and vocabulary, and produce grammatically correct sentences which may, at the same time, appear unnatural and foreign. The challenge in crafting prose in appropriate academic style can be attributed at least in part to a lack of awareness of formulaic language relevant to specific disciplines on the part of second language writers. Ellis et al. (2008) state that along with control of grammar rules and vocabulary, second language

FORMULAIC LANGUAGE AND WRITTEN LANGUAGE

105

writers should be able to handle useful and frequent formulaic sequences to be able to achieve a more nativelike result. Similarly, Li and Schmitt (2009) note that prefabricated chunks are abundantly used in academic discourse, and they serve a range of functions; it can be noted that they are so important to academic text that insufficient use of them can mark a writer as a novice or as inadequate. In light of this, it appears that learning to write effectively requires successful integration of formulaic sequences into prose, and lack of skill in this area may result in inappropriate, nonnative writing. This writing may be grammatically accurate, yet still judged as awkward (Li & Schmitt, 2009). It is clear that formulaic sequences are foundational to academic writing. Second language learners who are writing need to be able to successfully use formulaic sequences which are common to particular registers, disciplines, and genres, which entails mastering the often complex and varied structures and functions of formulaic sequences. This is key to constructing coherent and developed pieces of writing. This means that second language learners should be skilled in the use of formulaic language, since errors in their use can be as deleterious to their academic success as simply avoiding them—in other words, not using formulaic language and error-prone or inappropriate use of it can lead to being labeled as an outsider (Handl, 2008). In addition to the need for frequent and appropriate use of formulaic sequences, second language learners also need to be in control of a broad range of sequences. It has become apparent that overuse of a limited range of prefabricated sequences, or lack of use of certain sequences may also result in weak writing. Granger (1998) notes that “while the foreignsoundingness of learners’ productions has generally been related to the lack of prefabs, it can also be due to the excessive use of them” (p. 155). On the same theme, Hyland (2006) points out that second language learners often rely on a limited number of formulaic sequences, likely in order to avoid grammatical errors, resulting in repetitive writing (p. 60). Paqout (2008) observes that the “non-nativeness” or “unconventionality” of second language writing is often due to a tendency “to overuse a limited number of frequent English collocations and prefabs but to underuse a whole set of native-like phraseological units” (p. 102).

A historical perspective As concerns academic discourse, writing proficiency has been connected to the use of formulaic sequences since at least the 1980s (Bamber, 1983; McCulley, 1985). Specific words and word combinations also appear more frequently in academic contexts than in other registers (Coxhead, 2000;

106

FUNDAMENTALS OF FORMULAIC LANGUAGE

Simpson-Vlach & Ellis, 2010). A preoccupation in the research in this area is the realization that, while adapting to and learning to master academic discourse can be challenging for both native speakers and second language learners, the challenge is vastly greater for second language learners, who are, in most cases, still struggling with various aspects of language knowledge and use as well. By means of corpus analysis, studies of second language writing have highlighted a number of the challenges and pitfalls many second language English learners encounter. It is a fact that corpora have long been used to study proficient and/or native speaker language users in a variety of registers and genres (Biber & Barbieri, 2007; Hyland, 1998, 2008; Poos & Simpson, 2002), use of computerized corpora of learner English have been in use for a shorter time and only came on the research scene in the early 1990s (Granger, 1998). In the several decades since learner corpora began to come into use, they have been used to study the discourse of second language English users and to distinguish between first language and second language English writing in a range of ways (Altenberg & Tapper, 1998; Granger & Rayson, 1998; Salazar & Verdaguer, 2009; Virtanen, 1998; Yeung, 2009). A key study is that of Yeung (2009), who analyzed the use of besides in expert English and learner English corpora to see how its use differed in their writing. The study showed that second learners of English tended to overuse besides, using it overwhelmingly to convey the meaning of in addition, whereas the expert English users did not tend to do so. A related study is that of Virtanen (1998), who examined, using corpus analysis, use of direct questions in argumentative essays by first and second language English writers. Virtanen discovered that second language English writers tend to rely heavily on direct questions in their writing, as compared to native speaker writers. Similarly, Granger and Rayson (1998) found that second language English writers in an academic setting used much more informal conversational language in their writing and underused features more commonly associated with academic writing.

A look at learner corpora Some researchers have studied differences in formulaic language use at different proficiency levels in academic writing. For example, Levy (2003) found that proficient post-secondary writers used discourse organizing lexical bundles (e.g., on the other hand, at the same time) more often than did less proficient writers. Similarly, Connor (1990) and Ferris (1994) discovered that

FORMULAIC LANGUAGE AND WRITTEN LANGUAGE

107

more proficient second language writers generally employ more prepositional phrases, passives, and nominal forms. As discussed in several other chapters in this book, a key study by Boers et al. (2006) used an experimental design to test the effectiveness of targeted instruction of formulaic language on second language speech proficiency. The study indicated that targeted instruction of formulaic language was linked to greater perceived proficiency by outside judges. These types of studies have helped us better understand areas where second language English users experience difficulties, and, by the same token, areas where instruction can be more appropriately focused to help learners increase their writing proficiency. Although the growth of corpus studies investigating differences between second and first language English users has provided glimpses of how these two groups differ, relatively few corpus studies have focused on differences among L2 English users of differing proficiency levels. Appel and Wood (in press) used a corpus of graded test-taker writing sourced from archives of the Canadian Academic English Language Assessment (CAEL) to compare the writing of high and low level L2 English writers, with the goal of identifying how the use of recurrent word combinations differed. The CAEL requires test takers to read several texts on a particular academic topic, listen to a brief lecture on the same topic, and then write a short essay responding to a prompt based on the test topic. Using qualitative and quantitative measures of analysis, including application of the functional classifications of Biber, Johansson, Leech, Conrad and Finegan (1999) for lexical bundle research (see chapter 8), the authors discovered that lower level writers tended to use more stance and discourse organizing types of combinations. Also, they found that these writers used copying strategies in their writing, taking chunks of language verbatim from source reading texts in crafting their essays. Higher level writers, on the other hand, were much less dependent on the language in the reading texts, and used many more referential word combinations in their writing. In a study related in theme to that of Appel and Wood (in press), Staples, Egbert, Biber, and McClair (2013) analyzed formulaic sequences and test taker performance in the writing section of the Test of English as a Foreign Language (TOEFL). Staples et al. (2013) examined the use of lexical bundles in the test across three proficiency levels, using Biber et al.’s (1999) functional classification. They found that lower level test takers tended to use more bundles in general, but that many of their bundles were taken from the writing prompts. In contrast, higher level test takers used fewer bundles from the prompts. Writers at all levels, however, used discourse and stance bundles in similar ways, and used few referential bundles at all. An important piece of work on the use of formulaic language in academic writing is that of Ädel and Erman (2012). The authors investigated the use of lexical bundles in a large corpus of English writing of advanced Swedish first

108

FUNDAMENTALS OF FORMULAIC LANGUAGE

language learners of English, and compared it to the writing of first language English students at a British university. Results showed that the native speakers used a much greater range of bundles in their writing, including bundles used in hedging or softening propositions. An oft-cited piece of work on formulaic language in second language writing is a study on lexical bundles in first and second language writing by Chen and Baker (2010). Using standard lexical bundle research methods (see Chapter 8), the researchers compared bundles in a corpus of published academic texts and in two corpora of student writing. Unsurprisingly, the published writing showed a wider range of lexical bundle use than did the student writing, and some bundles which were used at high frequency in the published writing were used comparatively infrequently in student writing. Meanwhile, the second language student writers overused a number of bundles which were rarely used in the published writing. There exists some limited evidence of the value of instructed formulaic language in improving second language academic writing. One effort to uncover the effectiveness of explicit teaching of formulaic sequences and their use by second language writers in an EAP context is a study by Al Hassan and Wood (2015), in which EAP learners were engaged in focused instruction of formulaic language. The participants were explicitly taught a range of useful formulaic sequences for describing line graphs, and a timing writing sample of this particular genre was elicited in a pre-test, a post-test, and a delayed post-test several weeks later. Three blind judges were asked to rate the overall effectiveness of the writing samples. The results overall showed a significant increase in the participants’ use of appropriate formulaic sequences from the instruction in both the post-test and the delayed posttest, as well as significantly higher ratings by the judges. These results can be taken to indicate that formulaic language appropriate for academic writing can indeed be taught successfully, and that increased use of sequences can boost the overall quality of such writing.

Some key studies and lists of formulas Besides the lexical bundle work, a number of corpus-focused studies have been conducted which produce lists of formulaic sequences, using different labels for the sequences due to the nature of the extraction methods used, or the specific nature of the sequences themselves. These studies all focus on academic written language to some extent or other, so it is valuable to take a look at what they have done to isolate the sequences, and what the resulting lists look like. The following is a brief description of each study, followed by a list of the thirty most frequent sequences in descending order.

FORMULAIC LANGUAGE AND WRITTEN LANGUAGE

109

A multi-corpus study by Liu (2012) concentrates exclusively on academic written English. Liu queried the COCA and the BNC, identifying what he terms multiword constructions (MWC). Similar to Simpson-Vlach and Ellis (2010), Liu notes that the lexical bundle research has resulted in lists of items of incomplete structure and limited functional utility. Liu (2012) suggests that a shift in focus toward a more usage-based construction learning (e.g., Ellis, 2002) set of criteria is a way to solve this problem. In other words, after extracting a set of multiword units from a corpus using frequency and range criteria, a researcher should examine the items and select the core construction from those which appear semantically and structurally incomplete. He opted to search for a broad range of formulaic sequences instead, including idioms and phrasal verbs in addition to lexical bundles. The result is a long list of MWCs, 228 items long. The most frequent of these are listed in Table 7.1.

Table 7.1 Most frequent MWC from Liu (2012, p. 33) 1

Such as X

2

For example

3

As well as X

4

X suggest that

5

According to X

6

(Be) based on X

7

There be X

8

There be no X

9

A/the large/small number of

10

Out of X

11

One of X

12

X show that

13

Be/to be able to X

14

Focus on (X)

15

(As) (a) part of X

16

X argue that

17

In addition (to)

18

(modal verb) Lead to X (Continued )

110

FUNDAMENTALS OF FORMULAIC LANGUAGE

Table 7.1 Most frequent MWC from Liu (2012, p. 33) 19

The fact that X

20

(be) Associated with X

21

In order to X

22

(be) Used to VP/in/as

23

(to) Deal with (determiner + noun)

24

Tend to VP

25

NP say that

26

The use of (determiner + noun)

27

In fact

28

Refer/(be) referred to (as) (determiner + noun)

29

NP indicate that

30

In + the name of a country/state/region (e.g., The U.S.)

Simpson-Vlach and Ellis (2010) compiled a list of formulaic sequences they call the academic formulas list. The list was created in part in response to the same types of issues identified by Liu (2012) concerning lexical bundles, namely the structural incompleteness of bundles, and their lack of salience or utility for teaching. Simpson-Vlach and Ellis (2010) compiled a large corpus of both spoken and written academic language taken from a variety of sources, and compiled their list by means of a combination of various measures. The academic writing corpus consisted of Hyland’s (2004) research article corpus of 1.2 million words, along with selected BNC files from across academic disciplines. Using a frequency cutoff of 10 per million, the researchers extracted all three-, four-, and five-word sequences from the corpus. They then applied a log-likelihood statistical measure, comparing sequences found in their academic corpus with a non-academic corpus, thereby ensuring that the items they identified were indeed academic in nature. They used a specific range criterion, making sure that items identified as formulaic in the written corpus occurred at least ten times per million words in three out of four academic disciplines. They then applied a MI statistical test to the items, MI being designed to determine the degree to which the words in a phrase occur together more frequently than would be expected by chance. Simpson-Vlach and Ellis used this measure to further eliminate some items from their initial lists. They then took a random subset of the total list and used a composite measure to determine the saliency or practicality for teaching of each sequence:

111

FORMULAIC LANGUAGE AND WRITTEN LANGUAGE

We then asked twenty experienced EAP instructors and language testers at the English Language Institute of the University of Michigan to rate these formulas, given in a random order of presentation, for one of three judgments using a scale of 1 (disagree) to 5 (agree): 1 whether or not they thought the phrase constituted “a formulaic

expression, or fixed phrase, or chunk.” There were six raters with an inter-rater α = 0.77. 2 whether or not they thought the phrase has “a cohesive meaning

or function, as a phrase.” There were eight raters with an inter-rater α = 0.67. 3 whether or not they thought the phrase was “worth teaching, as

a bona fide phrase or expression.” There were six raters with an inter-rater α = 0.83 (Simpson-Vlach and Ellis, 2010, p. 496). The top academic formulas in written language, according to Simpson-Vlach and Ellis, are presented in Table 7.2, in rank order.

Table 7.2 Most frequent written academic formulas from SimpsonVlach and Ellis (2010) 1

On the other hand

2

Due to the fact that

3

On the other hand the

4

It should be noted

5

It is not possible to

6

A wide range of

7

There are a number of

8

In such a way that

9

Take into account the

10

As can be seen

11

It is clear that

12

Take into account

13

Can be used to

14

In this paper we (Continued )

112

FUNDAMENTALS OF FORMULAIC LANGUAGE

Table 7.2 Most frequent written academic formulas from SimpsonVlach and Ellis (2010) 15

Are likely to

16

In the next section

17

A large number of

18

The United Kingdom

19

On the basis of the

20

That there is no

21

Over a period of

22

As a result of the

23

Can be seen in

24

A wide range

25

There are a number

26

It is interesting to

27

It is impossible to

28

It is obvious that

29

It is possible to

30

It is not possible

In another corpus-driven study of formulaic language in academic written language, Wood and Appel (2014) took a different approach to the issue of academic relevance and teachability of formulaic sequences. Adopting from Liu (2012) the term multiword construction (MWC), they extracted formulaic language from a corpus of first-year university engineering and business textbooks, which they identified as being likely the essential reading sources and introductions to academic written language encountered by EAP students. The authors note that engineering and business are the two most popular majors for EAP students, and that textbooks from these areas are likely more representative of student reality than the corpora often used in such research, which has usually been specific academic disciplines and including journal articles. The resulting in corpus contained texts from a range of disciplines, including mathematics, economics, and so on. Each textbook was scanned into digital form, creating a corpus of 1.6 million words containing text from a total of ten textbooks divided equally between required/recommended readings from first-year classes in Business and Engineering programs. Instructional language (i.e., end of chapter problem sets, instructional exercises/activities/

FORMULAIC LANGUAGE AND WRITTEN LANGUAGE

113

etc.) was removed from the corpus, and MWCs were identified at a frequency cutoff of twenty-five per million words, and a minimum range of two textbooks, that is, one textbook from each discipline. After compiling a list of four-word MWCs, the list was analyzed to determine which of the four-word sequences could potentially be perceived as a three-word MWC with variable slots at its boundaries (i.e., as a result [the/of]). Two methods of analysis were used to modify the original list of four-word structures to reflect this possibility: The first was to divide each four-word structure into its two constituent three-word sequences (words 1–3 and words 2–4). Frequency for each of these three-word structures was noted, as an example, the sequence as long as the divides into two threeword sequences, as long as and long as the. Using the frequency data, it was apparent that as long as occurred much more frequently than long as the. This was taken as evidence that the four-word sequence in question might in actuality be a three-word MWC which happens to occur with a 4th word (the) at the end. The second means of identifying which four-word MWCs could actually be three-word MWCs involved examining overlap with other four-word sequences. When two four-word sequences contained identical three-word sequences, the three-word string common to both was used as the “root” structure. For instance, at the end of and the end of the were combined to create the condensed listing (at) the end of (the). The authors ended up with two lists: one a list of three-word MWC, and the other a list of four-word MWC. The top three-word MWC are presented in Table 7.3. Note the use of () brackets to indicate the three- and four-word versions of each construction:

Table 7.3 Most frequent three-word MWC from Wood and Appel (2014) 1

(Shown/as/illustrated) in figure #

2

(Is/to)the number of

3

(Discussed) in section #

4

(As) in example #

5

(Discussed) in chapter # (we)

6

The cost of (the)

7

In this case (the)

8

In terms of (the)

9

The amount of (the) (Continued )

114

FUNDAMENTALS OF FORMULAIC LANGUAGE

Table 7.3 Most frequent three-word MWC from Wood and Appel (2014) 10

(In) the United States (and)

11

(That) there is a

12

(In) the direction of (the)

13

(Is) the sum of (the)

14

The fact that (the)

15

(Is/as) shown in figure

16

As a result (of/the)

17

The graph of (f/the)

18

With respect to (the)

19

# Percent of (the)

20

Is given by (the)

21

In other words (the)

22

The rate of (change)

23

As well as (the)

24

(At) the end of (the)

25

(#)We see that (the)

26

The price of (the)

27

A and B (are)

28

(Is) the same as (the)

29

(As/is) a function of

30

The center of (mass/the)

A seminal study by Byrd and Coxhead (2010) used the corpus originally created for development of the Academic Word List (Coxhead, 2000), consisting of 3.6 million running words taken from a range of text types including journal articles, manuals, course notes, book chapters, and so on, from forty-eight academic disciplines within the four areas of arts, commerce, law, and science. Using a lexical bundle focus with a twenty per million frequency cutoff and measures to ensure that the bundles were well distributed across disciplines, they compiled a list of thirty-five lexical bundles. Comparing this list to the lists compiled by Biber et al. (2004) and Hyland

115

FORMULAIC LANGUAGE AND WRITTEN LANGUAGE

(2008), they found a common core of twenty-one bundles. The bundles were classified into three categories of functions: presentation of content, creation of within-text connections, and expression of the writer’s attitudes. The researchers identify six areas of difficulty in using bundles in EAP teaching: 1 Making sure that the findings and the lists in published research

reports are clear and that one is aware of the issues around the identification and teaching of bundles. 2 Determining the length of bundles to teach when shorter bundles

appear inside longer ones—Byrd and Coxhead (2010) found that 21–64 percent of the three word bundles were folded into four-word bundles. This point is dealt with in the work of Wood and Appel (2014) as discussed above in some detail. 3 Ensuring awareness of the context within which published bundles

appear—lists alone give only part of the usage picture. 4 Encouraging learners to see the validity of a focus on bundles, despite

the fact that some academic vocabulary seems more challenging than bundles. 5 Balancing the idea that bundles need to be a focus in classroom

teaching with the notion that such formulaic sequences are dealt with as unanalyzed chunks. 6 Having learners read sufficient volumes of text to encounter the

bundles frequently. The study represents a milestone in research on lexical bundles, presenting a rigorously compiled list of bundles, and addressing issues of their actual implementation into the EAP curriculum. The most high frequency bundles from Byrd and Coxhead’s (2010) work are presented in Table 7.4.

Tables 7.4 Most frequent lexical bundles from Byrd and Coxhead (2010) 1

On the basis of

2

On the other hand

3

As a result of

4

The end of the

5

At the end of (Continued )

116

FUNDAMENTALS OF FORMULAIC LANGUAGE

Tables 7.4 Most frequent lexical bundles from Byrd and Coxhead (2010, pp. 37–39) 6

At the same time

7

The nature of the

8

In the form of

9

In terms of the

10

In the absence of

11

At the time of

12

As well as the

13

It is clear that

14

In the United States

15

That there is a

16

The way in which

17

Is likely to be

18

It is possible to

19

It is important to

20

As part of the

21

In the same way

22

That there is no

23

It is difficult to

24

The case of the

25

It is necessary to

26

A result of the

27

A wide range of

28

The relationship between the

29

The rest of the

30

The development of the

There is a remarkable amount of variation among the lists elaborated by Liu (2012), Simpson-Vlach and Ellis (2010), Wood and Appel (2014), and Byrd and Coxhead (2010). However, there are some items which appear on more

FORMULAIC LANGUAGE AND WRITTEN LANGUAGE

117

than one list, for example, on the other hand. The four lists were created using criteria and corpora which differ considerably, of course, which explains at least a good part of the differences among lists. A lesson to be learned from all of this would seem to be that there is really no master list of formulaic sequences which are characteristic of written language. This point is even more obvious when we consider that all of the researchers focused their efforts on academic registers of writing rather than on written language in its entirety. So, the main point to take away here is that, while written language would seem to be highly formulaic, there is no compelling body of research which catalogues the formulaic sequences most prevalent in written language. Even academic written language appears to still be virgin territory for formulaic language researchers in many ways. Another significant point to remember here is that formulaic language seems to be important in academic writing, and, by implication, writing in general, yet we know very little about its actual role in writing. We can say that formulaic language is useful in making writing appear to fit certain norms in a given register. But, unlike the research into formulaic language and spoken language (see Chapter 6), we do not know much at all about the ways that knowledge of formulaic language can help with ease or fluency of writing, for example. Even more to the point, we know next to nothing about any role that formulaic language may have in reading. Does our knowledge about formulaic in language writing imply anything for reading? Does the use of formulaic language aid in reading comprehension, speed, or fluency, for example? These are areas in which new research is definitely warranted!

In summary … This brief overview of research into formulaic language and written language unveils some important information about the utility of formulaic language. Formulaic language can be said to be a marker of proficient writing in academic contexts in particular, and it comprises a significant proportion of the words in such writing. It can be said to play important roles in the production (and, by implication, the comprehension) of competent writing, and is a key element in the communication of ideas in written channels, and more. The research reviewed here, and the lists of formulaic sequences generated by the research covered in more detail leaves us feeling as if we are still only scratching the surface of what formulaic language does in written language. It would be a great step forward for researchers to focus on nonacademic registers of written language. It would be a great idea if we could try to delve into investigating the psycholinguistic processes engaged by use of formulaic language in writing—does use of formulaic language allow for faster or more

118

FUNDAMENTALS OF FORMULAIC LANGUAGE

efficient writing, for example? And finally, it is time for research to look at the roles of formulaic language in reading—does formulaic language allow for more fluent, faster, effective reading? A few of the many themes and patterns which the research shows are: MM MM

MM

MM

MM

Formulaic language is important in written language. Formulaic language comprises a large proportion of written language. Formulaic language may be a key element of competence in academic writing. Lists of frequent formulaic sequences in academic writing have been compiled by researchers from various perspectives and using various types of corpora. Formulaic language may be integral to competent reading abilities, although much more research activity is needed in this area.

For sure, there are a great number of questions which have not been answered yet in any particular area. How much of written language overall is formulaic? Have we really gotten to a point where we can confidently claim that a given word sequence is formulaic (see Chapter 2)? Is the research on lexical bundles (see Chapter 8) distinct from the research discussed in this chapter? What is the use of the distinction between a lexical bundle and a formulaic sequence in written language? Given the limited amount of research on formulaic language and reading, how confident can we be in claiming that formulaic language is important in written language abilities as a whole? Does the question of whether a formulaic sequence is stored and retrieved as a whole have relevance for written language processing?

POINTS TO PONDER AND THINGS TO DO 1 Reviewing the information covered in this chapter, summarize the importance of formulaic language in academic written language. Are there any aspects of written language which appear to be missing from this overview? 2 How is the value of formulaic language in written language different from and similar to that in spoken language (see Chapter 6)? 3 In the last part of this chapter, we see lists of formulaic sequences taken from corpus research. What similarities and differences do you see in these lists?

FORMULAIC LANGUAGE AND WRITTEN LANGUAGE

4 In the lists presented in the latter part of this chapter, do you see any items which seem surprising or unexpected? If so, why? 5 The preoccupation in much of the research presented here seems to be the need for use of certain types of formulaic sequences to fit with the expectations of particular genres. Can we safely assume that there is a similar effect of formulaic language on reading in these genres? 6 Which pieces of research discussed here seem relevant to first language writing, and which to second language writing? 7 Review in depth one of the four studies (Byrd & Coxhead, 2010; Liu, 2012; Simpson-Vlach and Ellis, 2010; Wood and Appel, 2014). Can you outline a study of your own which deepens or focuses our understanding of formulaic language and academic writing? 8 Try conducting a corpus-based study of formulaic language in a nonacademic genre. Compile a list of the most frequent formulaic sequences you find. How does this list differ from those presented in this chapter? 9 Elaborate a second language writing lesson designed to teach the use of some of the formulaic sequences presented in the lists in this chapter, or from your own findings in question #8 above. 10 What is the relevance of the content of this chapter to those working in language assessment? In curriculum design? In editing and publishing?

119

8 Lexical Bundles—Corpora, Frequency, and Functions

O

ne of the most productive trends in formulaic language research in the past several decades has been in the area of lexical bundles and related corpus-based research into multiword sequences. The majority of this research has centered around particular genres or academic registers and, as a result, we know a great deal more now about academic discourse in particular—lexical bundles appear to be key to the creation of disciplinespecific discourse, as they carry particular types of functions which help to signal referential aspects of content, convey the stance of the writer/speaker, and stitch together ideas and so on in characteristic ways.

What makes a lexical bundle Unlike other categories of formulaic language, it is not generally claimed that lexical bundles and other corpus-derived sequences are mentally stored and retrieved as wholes. Instead, it is claimed that the bundles are the most frequent multiword sequences in a corpus of texts, and that they are essentially extended collocations with particular functions in discourse. There are three main components of lexical bundle research: 1 They are identified by means of corpus analysis tools such as

WordSmith Tools (Scott, 2007), using the criterion of frequency first and foremost. In other words, researchers assemble a corpus of texts from a particular register and scan it with analysis software create a list of the most frequent multiword sequences by setting a frequency cutoff. This procedure will yield a long list of sequences, but

122

FUNDAMENTALS OF FORMULAIC LANGUAGE

a further criterion is employed to ensure that the resulting list is more representative of the corpus register as a whole. 2 Researchers set a distribution or range criterion in addition to

frequency; for a sequence to be deemed a lexical bundle, it must appear in, for example, at least five of the texts in the corpus, or in a certain percentage of the texts therein. This helps to eliminate the possibility that certain sequences might be used more by a particular author, or when a particular topic is under consideration. 3 A third characteristic of the lexical bundle research is the practice

of assigning functions to the sequences identified by employing frequency and range criteria. Researchers take the lists generated by the analysis software and classify all the sequences according to functions in a range of categories. The result is a list of lexical bundles which represents the most frequently used word sequences in the register, with attached functions. These lists tell us a great deal about the bundles which are distinct to a register, and how they link to the expression of particular discourse functions. In sum, then, lexical bundles have typically been defined as combinations of three or more words which are identified in a corpus of natural language by means of corpus analysis software programs, identified using a specific frequency cutoff, and present in a particular range of texts within the corpus. Biber and Conrad (1999) noted that these word combinations “are so common, it might be assumed that lexical bundles are simple expressions, and that they will be acquired easily” (p. 188). The origins of lexical bundle research date back to Altenberg (1993, 1998), who was one of the first to use frequency as a prime criterion in identifying particular word combinations, and the first to employ functional analysis to categorize them. The actual frequency cutoffs used in lexical bundle research vary rather widely, from a threshold of ten occurrences per million words (e.g., Biber et al., 1999; Biber, 2006), to forty occurrences (e.g., Biber, Conrad, & Cortes, 2004). The smaller corpora tend to use lower cutoffs, but there is little consistency in the literature about how to devise a particular frequency cutoff threshold for a corpus of any given size—the determination of frequency cutoffs appears to be generally based on the levels used by earlier researchers rather than strictly grounded in any empirical, statistical set of standards. The range and length criteria are vital aspects of lexical bundle research. The general consensus in the literature is that three to five texts (e.g., Biber & Barbieri, 2007) is the minimum number in which a sequence must

LEXICAL BUNDLES—CORPORA, FREQUENCY, AND FUNCTIONS

123

appear before it can be labeled a lexical bundle. Other researchers (e.g., Hyland, 2008) have employed a percentage criterion, requiring a sequence to appear in at least 10 percent of the texts in a corpus in order for it to be considered a lexical bundle. The procedures to follow after a corpus analysis tool has yielded a list of four-word sequences which meet frequency and range criteria include elimination of certain types of word sequences. A prime example of a type of word sequence which is excluded from the lists is that which contains a noun or verb phrase which is lexically too rooted in the surrounding text or too specific to the topic. Also, a certain amount of judgment is required if one is to determine whether to accept or reject certain sequences. For example, a classic such as on the other hand is quite salient as a unit unto itself, whereas others such as is one of the is a bit more problematic. Many such sequences will appear in lists if only frequency and range criteria are applied, which has led to a certain amount of controversy about how to prune lists to useful items only. Some researchers have tried to apply a combination of criteria in order to, for example, produce a list of items useful for teaching English for academic purposes (EAP). Simpson-Vlach and Ellis (2010) created the Academic Formulas List (AFL). Using a corpus of 2.1 million words of academic speech and writing, and comparing it to a nonacademic corpus, the AFL attempts to delineate which formulas are truly academic. The corpora were scanned for formulaic language at a frequency cut-off of ten per million and a range of three out of four academic disciplines was set for the written corpus. In an effort to address the issue of low psychological salience and pedagogical inutility of the lists of units uncovered in the lexical bundle research, the authors used mutual information (MI) scores as a measure of collocation strength, combined with frequency data and a rating by instructors and testers, to produce a composite score which determined the final lists. MI is a statistical measure of the coherence of a phrase, or of the relative “stickiness” of the words which make up a phrase. It is intended for use with two-word collocations, though, and may not provide particularly valid information about strings of three or more words, and also may ignore the order of sequence of words (Hyland, 2012). Some multiword units may not be contiguous, may have optional fillable slots, or may have extra lexical items outside of the four-word limit used in lexical bundle research. The standard lexical bundle research tools have generally been incapable of identifying such items, but some recent software developments have allowed for their identification. These items are labeled concgrams and include noncontiguous collocations in which a slot may be filled by a range of words, or in which the order of items may be flexible. Research into academic discourse using these types of units of

124

FUNDAMENTALS OF FORMULAIC LANGUAGE

analysis stands to tell us a great deal that is beyond the scope of lexical bundle research. Other research methods employed to identify multiword units beyond the four-word items common to lexical bundle research include the modified procedures developed by Wood and Appel (2014), discussed in detail below.

Acquisition of lexical bundles Despite their importance in discourse, it seems surprising that acquiring and using lexical bundles and other such multiword sequences seems to be challenging and by no means a natural process. In fact, there have been examinations of whether it is possible to acquire lexical bundles commonly used in academic writing in the natural course of learning. This research has largely focused on native speaker university students, not second language EAP learners. For example, in a particularly noteworthy study, Cortes (2004) conducted a corpus-based study of the use of formulaic sequences in published and student writing in history and biology. Results clearly showed that student writers rarely used the formulaic sequences which were frequent in published writing, and that when students did use the most frequent sequences, it was for different pragmatic purposes. Subsequent work by Cortes (2007) went on to investigate whether teaching lexical bundles to student writers would help them to use bundles more effectively in their writing. In this case, results showed that students hardly benefited at all from brief direct lessons in the use of bundles. Similarly, Levy (2008), in a comparison of student and professional writers’ use of lexical bundles, discovered that professional writers and the most proficient students used bundles more to structure discourse than did less proficient students. If these regular university students rarely used the target lexical bundles in their writing, and struggled to learn to use them, we can surely expect that EAP learners should have even less facility with them. This assumption is borne out in a noted study by Chen and Baker (2010), who compared L1 and L2 academic student writing and found that L2 learners used fewer lexical bundles in their writing and that they had particularly limited ability to use bundles for certain discourse functions such as hedging.

Structural characteristics of lexical bundles Lexical bundles have been classified as to their structural features, most notably by Biber et al. (2004, p. 381).

LEXICAL BUNDLES—CORPORA, FREQUENCY, AND FUNCTIONS

125

Lexical bundles that incorporate verb phrase fragments a (Connector+) 1st/2nd person pronoun + VP fragment

Example bundles: you don’t have to, I’m not going to, well I don’t know

b (Connector+) 3rd person pronoun + VP fragment

Example bundles: it’s going to be, that’s one of the, and this is a

c Discourse marker = VP fragment

Example bundles: I mean you know, you know it was, I mean I don’t

d Verb phrase (with nonpassive verbs)

Example bundles: is going to be, is one of the, have a lot of, take a look at

e Verb phrase with passive verb

f

Example bundles: is based on the, can be used to, shown in figure n Yes–no question fragment Example bundles: are you going to, do you want to, does that make sense

g WH-question fragments

Example bundles: what do you think, how many of you, what does that mean

Lexical bundles that incorporate dependent clause fragments a 1st/2nd person pronoun + dependent clause fragments

Example bundles: I want you to, I don’t know if, I don’t know why, you might want to

b WH-clause fragments

Example bundles: what I want to, what’s going to happen; when we get to

c If-clause fragments

Example bundles: if you want to, if you have a, if we look at

126

FUNDAMENTALS OF FORMULAIC LANGUAGE

d (verb/adjective+) to-clause fragment

Example bundles: to be able to, to come up with, want to do is

e That-clause fragments

Example bundles: that there is a, that I want to, that this is a

Lexical bundles that incorporate noun phrase and prepositional phrase fragments a (connector+) noun phrase with of-phrase fragment

Example bundles: one of the things, the end of the, a little bit of

b Noun phrase with other post-modifier fragment

Example bundles: a little bit about, those of you who, the way in which

c Other noun phrase expressions

Example bundles: a little bit more, or something like that, and stuff like that

d Prepositional phrase expressions

Example bundles: of the things that, at the end of the, at the same time

e Comparative expressions

Example bundles: as far as the, greater than or equal, as well as the

Functional categories of lexical bundles The functional classification of lexical bundles has been established quite firmly in the literature. Biber et al. (2004, pp. 389–396) identify categories of functions of lexical bundles, categories which have been used by a number of other researchers as well, resulting in a sort of standard type of functional classification:

Stance bundles Stance bundles are a means of providing a frame for interpretation of the subsequent proposition. They may fall into one of two categories, epistemic or attitude/modality. Epistemic stance bundles convey a sense of the knowledge

LEXICAL BUNDLES—CORPORA, FREQUENCY, AND FUNCTIONS

127

status of the information which immediately follows, whereas attitudinal/ modality stance bundles convey the speaker or writer’s attitudes toward the content of what immediately follows. Stance bundles may also be categorized as personal or impersonal depending on whether or not the judgments and attitudes can be attributed directly to the speaker/writer. Most epistemic bundles are personal. These may express uncertainty, as in I don’t know what, I don’t think. They may convey a sense of imprecision, as in and stuff like that or what do you think. Impersonal epistemic bundles, on the other hand, tend to express degrees of certainty, as in are more likely to be or in the fact that. Attitudinal/modality stance bundles are most often personal and may be divided into several subcategories. Desire bundles frame wishes or desires, be they those of the speaker/writer or those of an interlocutor or reader. Examples include I don’t want to, what I want to do, and I would like to. Obligation and directive bundles usually contain a second-person pronoun and express obligations and directions—such as you need to know, you have to do, I want you to, and you might want to, take a look at. Intention/prediction bundles are usually personal and indicate the speaker/writer’s intention to do something. Examples include what we’re going to and the impersonal bundle is going to, indicating that some external dynamic is headed in a certain direction. Ability bundles are a small subset which express the ability to do or accomplish something, as in to come up with or to be able to.

Discourse organizing bundles These bundles help to organize the flow of ideas in discourse. They fall into two categories topic introduction/focus and topic elaboration/clarification. The former provide signals that a new topic is being introduced, while the latter are used to provide detail and clarification of points. Topic introduction/focus bundles may overlap with stance bundles to some extent but serve different functions. For example, I want to talk about, going to talk about, or if you look at. Elaboration/clarification bundles may include it has to do with, on the other hand, or as well as the.

Referential bundles Referential bundles represent a large number and range of lexical bundles. They usually identify an entity or point out some specific quality of an entity. Four main subcategories exist, identification/focus, imprecision indicators, specification of attributes, and time/place/text-diexis reference. The four subcategories all serve different purposes. Identification/focus bundles generally identify the subsequent proposition as important and

128

FUNDAMENTALS OF FORMULAIC LANGUAGE

noteworthy, and are often used in classroom teaching. Examples include that’s one of the, and this is a, and one of the things. Imprecision indicators, as the name implies, usually signal that a reference is not exact, or that there are other similar references which could be provided. Examples include or something like that and and things like that. Specification of attributes bundles identify specific qualities of the noun which follows. Some deal with quantity or amount, as in have a lot of, and a little bit about; some describe size or form of the noun, as in the size of the or in the form of; some identify abstract qualities such as on the basis of, the nature of the, and in terms of. Time/place/text-diexis bundles refer to particular times, places, or location within the surrounding text. Examples include in the United States (or other geopolitical entity), the end of the year (or other time period), the end of the X (could be a place or the end of the present chapter/volume/etcetera). A similarly very workable and user friendly taxonomy was elaborated by Hyland (2007), which reflects the three major metafunctions of language, ideational, textual, and interpersonal. The ideational functions are termed research-oriented by Hyland: Research-oriented—help structure experience and activity of real world: MM

Location—for example, at the same time, at the beginning of

MM

Procedure—for example, the use of the, the purpose of the

MM

Quantification—for example, a wide range of, one of the most

MM

Description—for example, the structure of the, the size of the

MM

Topic—for example, in the United States, the currency board system

The textual functions are labeled by Hyland as text-oriented: Text-oriented—deal with meaning of text and its organization MM

Transition signals on the other hand, in addition to the

MM

Resultative signals as a result of, it was found that

MM

Structuring signals in the present study, in the next section

MM

Framing signals in the case of, on the basis of

The interpersonal functions are labeled participant-oriented: Participant-oriented—focused on the writer or the reader MM

Stance features may be due to, it is possible that

MM

Engagement features as can be seen

LEXICAL BUNDLES—CORPORA, FREQUENCY, AND FUNCTIONS

129

Non-lexical bundle corpus-based research Corpus analysis of academic text has yielded some valuable information from perspectives other than that of lexical bundles. In a response to some concerns about the sequences identified by lexical bundle research, work on formulaic sequences in academic prose was conducted by Simpson-Vlach and Ellis (2010), who perceived that the lexical bundle research often results in lists of units which appear semantically and structurally incomplete, such as to do with the, or I think it was. A prime concern with items such as these is that they appear “neither terribly functional nor pedagogically compelling” (Simpson-Vlach & Ellis, 2010, p. 493). The research of Simpson-Vlach and Ellis was designed to modify the means of identifying formulaic language in academic language by adding extra procedures to the standard research protocols of frequency, range, and functional classification. A key change in Simpson-Vlach and Ellis’s (2010) research is the addition of Mutual Information tests (a statistical measure of the likelihood of words collocating) and EAP expert judgment in order to refine the lists identified by means of frequency and range. Similar to the lexical bundle research, Simpson-Vlach and Ellis’s (2010) resulting lists classify formulaic sequences into three major groups, namely referential expressions, stance expressions, and discourse organizing expressions, under which other subcategories are listed. The most common of the three groups, referential expressions, are divided into five subcategories (Simpson-Vlach & Ellis, 2010). The first is labeled specification of attributes and is itself divided into three types of functions in academic discourse, that is, intangible framing attributes, tangible framing attributes, and quantity specifications (Simpson-Vlach & Ellis, 2010). SimpsonVlach and Ellis (2010) explain that the vast majority of intangible framing attributes take the structure of “a/the N of” (e.g., the notion of … ) and are often seen following a preposition (e.g., on the basis of … ). These attributes may frame concrete entities (e.g., based on the total volume …) or abstract concepts (e.g., even with the notion of eminent … ). More importantly, they may create or frame an attribute of a following phrase, formulate a whole clause, or connect a verb with a subsequent clause (Simpson-Vlach & Ellis, 2010). Tangible framing attributes serve to identify physical or measurable attributes of a following noun or noun phrase as, for example, “the level of shade … ” (Simpson-Vlach & Ellis, 2010). A final type of specification of attributes is quantity specification, including expressions that specify the quantity of the following noun phrase as in “there are three … ;” they can also be anaphoric and refer to the preceding noun “the combination of these two … ” (SimpsonVlach & Ellis, 2010).

130

FUNDAMENTALS OF FORMULAIC LANGUAGE

Identification and focus is the second subcategory, characterized by explicatory phrases such as “as an example” and clause stems like “this would be … ” (Simpson-Vlach & Ellis, 2010). Simpson-Vlach and Ellis (2010) point out that identification and focus attributes are extremely important in academic discourse, given that academic contexts frequently require listing of examples and identification. The third subcategory, contrast and comparison, includes comparative and contrastive bundles such as “as opposed to … ” (Simpson-Vlach & Ellis, 2010). Deictics and locatives are the fourth subcategory, involving expressions such as proper nouns (the United Kingdom) or referring to “physical locations in the environment (e.g., the real world) or to temporal or spatial reference points in the discourse (e.g., a and b, at this point)” (p. 505). The fifth and rather restricted subcategory, vagueness markers, includes basically four phrases (and so on, and so forth, and so on and so, and blah blah blah), three of which are typical of spoken discourse (Simpson-Vlach & Ellis, 2010). The second category of formulaic sequences identified by Simpson-Vlach and Ellis (2010) is stance expressions, divided into six subcategories, namely, hedges, epistemic stance, obligation and directive, ability and possibility, evaluation, and intention/volition (Simpson-Vlach & Ellis, 2010). To start with, hedges comprise all the formulaic sequences that serve hedging functions in discourse such as “there may be …” (Simpson-Vlach & Ellis, 2010). Epistemic stance, a second subcategory, includes formulaic sequences that help present beliefs, claims, thoughts, etc., for example, “… assume that it …” (Simpson-Vlach & Ellis, 2010). Obligation and directive formulas, the third subcategory, help in giving directions as in “it should be noted … .” Ability and possibility formulas are the fourth subcategory, mainly used in speaking and helping form actions or propositions such as “you can see … ” (SimpsonVlach & Ellis, 2010). Evaluation formulas are the fifth subcategory which includes sequences such as “it is obvious that … ” or “important role in … ” (Simpson-Vlach & Ellis, 2010). The final subcategory, commonest in spoken discourse, is intention/volition, including sequences that help convey the speaker’s intentions as in “so let me just … ” (Simpson-Vlach & Ellis, 2010). The final category of formulaic sequences is discourse organizing expressions which is subdivided into four subcategories (Simpson-Vlach & Ellis, 2010). The first subcategory, metadiscourse and textual reference, is genre-specific and includes formulaic sequences which introduce topics, for example, in the next section … (Simpson-Vlach & Ellis, 2010). Topic introduction and focus formulas create the second subcategory and help frame entire subsequent phrases as in “take a look at …” (Simpson-Vlach & Ellis, 2010). Topic elaboration, the fourth subcategory, is divided into noncausal topic elaboration, where a phrase like “it turns out that …” helps elaborately on a topic without any explicit causal relations whatsoever, and cause and

LEXICAL BUNDLES—CORPORA, FREQUENCY, AND FUNCTIONS

131

effect formulaic sequences, whose main function is to introduce results, for example, as a result … (Simpson-Vlach & Ellis, 2010). Finally, discourse markers can connect either sentence constituents or clauses together as in “ … as well as ” or “in other words … ”; they may express agreement, disagreement, gratitude, or surprise such as “no no no” (Simpson-Vlach & Ellis, 2010).

Specific lexical bundle studies and findings Lexical bundles are very frequently used in published academic writing and are also discipline-bound (Cortes, Jones, & Stoller, 2002), which shows that each discipline has different purposes or ways of seeing the world, associated with distinct communicative conventions (Hyland & HampsLyon, 2001). Hyland (2008), in a study of variation in lexical bundles across four disciplines, found that less than half of the fifty most frequent bundles in his corpus occurred in all four disciplines, and that there were clear differences in the structure, form, and functions of bundles across the four disciplines.

Academic textbook language and teaching EAP Biber (2006) conducted a wide-ranging corpus-based analysis of university language, including an examination of lexical bundles in textbooks. He found that academic disciplines differed in their use of lexical bundles, with natural and social sciences relying on them more than the humanities. Overall, the distribution of lexical bundles across functional categories in Biber’s study show that referential bundles—making direct reference to real or abstract entities or to textual content or their attributes—are the most common. Stance bundles—expressing attitudes or assessments of certainty—are the second most common type of function for lexical bundles in textbooks, whereas discourse organizers—reflecting relationships between previous and subsequent discourse—were the least common. Within the category of referential functions, it appears that quantity and intangible framing subfunctions represent the largest categories. An important study of lexical bundles in language teaching materials compared to discipline-specific texts is that of Chen (2010), in which bundles in an electrical engineering textbook corpus were compared to those used in ESP materials geared toward electrical engineering. Using Biber’s (2006) taxonomy of categories of functions of bundles, Chen found that the ESP materials neglected to present many of the functional categories found in the engineering corpus. This implies that the ESP materials actually

132

FUNDAMENTALS OF FORMULAIC LANGUAGE

misrepresented the language used in the discipline and failed to adequately prepare students for the discourse of the field. The actual nature of the discourse presented to EAP learners in EAP textbooks was investigated by Wood (2010) in a small-scale corpus study of EAP textbooks. Since range criteria were not applied in this study, the units of analysis were termed lexical clusters rather than lexical bundles. Analysis of a 539,210-word corpus indicated that the highest frequency lexical clusters appeared in the instructional materials in the textbooks and not in the reading texts. The majority of the clusters had referential functions related to location and tangible framing and showed a rather different pattern of function types from those which Biber (2006) found in university textbooks. In addition, it was found that the textbook activities paid very little attention to lexical clusters or formulaic language of any type. It was found that the instructional subcorpus contained the largest number of lexical clusters and the highest frequency lexical clusters, compared to the textual subcorpus. The clusters present in the instructional subcorpus were of a different functional type from those in the texts. Many were referential, mostly dealing with description and location, but a significant proportion were stance-oriented or dealing with engaging the reader. The text subcorpus clusters were mostly referential, dealing with location and tangible framing.

Length of sequences to study—three, four, or five words? As for the length of formulaic sequences under investigation in corpusbased research, a general consensus has emerged that four words is the most productive unit to study. Three-word sequences are quite common in discourse and tend to be difficult to categorize as bundles, especially in terms of functions. Meanwhile, five- or six-word sequences are quite rare and often contain shorter sequences within them. Four-word sequences has become the standard for such research—they are much more frequent than five-word sequences and show a broad range of functions and other characteristics. Some researchers such as Wood and Appel (2014) have attempted to address issues of three- vs. four-word units by truncating four-word sequences down into three words if they appeared to have extraneous words. Wood and Appel focused on MWCs, a particular type of formulaic sequence similar to lexical bundles in terms of the frequency and range criteria, but identified using a judgment criterion of whether or not the item is of pedagogical value in and of itself. Wood and Appel (2014) began their analysis of first-year university textbooks in engineering and business by looking for four-word sequences.

LEXICAL BUNDLES—CORPORA, FREQUENCY, AND FUNCTIONS

133

Noticing that there appeared to be significant overlap among identified sequences in their corpus (e.g., at the end of and end of the day), and hoping to find a way to identify formulaic sequences of differing lengths, they took extra steps to modify their initial list of sequences and arrive at a better list of the formulaic language present in the corpus. Taking the lead of Simpson-Vlach and Ellis (2010) and Liu (2012), Wood and Appel altered some procedures to construct a list of more pedagogically relevant sequences—a list which could be of actual use for teachers and learners of EAP. With this in mind, the original list of four-word structures was scrutinized in an effort to see which of the four-word sequences could potentially be better understood as shorter MWC with variable slots at the periphery (i.e., as a result [the/of]). Two separate methods of analysis were used to refine the original list of four-word structures. The first method employed was to divide each four-word structure into its two constituent three-word sequences (i.e., words 1–3 vs. words 2–4). After this, the frequency for each of these three-word structures was recorded. For example, the four-word sequence as long as the divides into two threeword MWC, “as long as” and “long as the.” “As long as” occurs with much greater frequency than “long as the” (95 occurrences vs. 47 occurrences). These frequency differences were interpreted to mean that the four-word sequence was likely a shorter three-word MWC that often occurred with a 4th word (the) at the end. By employing these means of analysis, Wood and Appel could identify the three-word MWCs contained within the previously identified four-word structures and reduce the number of MWCs ending in a/ the. In any situation in a four-word sequence where a three-word structure was at least twice as frequent as the other, the higher frequency three-word sequence was identified as the “root” or base structure, and the fourth word was simply treated as a commonly occurring word appearing at the end of these constructions. The second procedure used to identify three-word MWCs “hiding” within four-word sequences involved checking for overlap with other fourword structures in the original list. Wood and Appel (2014) noted that previous corpus driven research focusing on four-word units (e.g., Biber & Barbieri, 2007; Chen, 2010; Chen & Baker, 2010; Cortes, 2004) often produced lists of sequences with significant overlap (e.g., at the end of and the end of the). Wood and Appel (2014) decided that, in their corpus, if four-word sequences contained identical three-word structures, the threeword string common to both sequences would be classified as the “root” structure. For example, at the end of and the end of the were combined to create the condensed listing (at) the end of (the). This makes it possible to see both the base structure of a MWC, and the most likely words to occur at its periphery.

134

FUNDAMENTALS OF FORMULAIC LANGUAGE

Benefits of use of lexical bundles Similar to the benefits of formulaic language for L2 speakers as laid out by Boers et al. (2006), as discussed elsewhere in this volume, researchers have pointed out the benefits of using lexical bundles. Coxhead and Byrd (2007) note that they have three main beneficial effects in academic writing: 1 They offer ready-made sets of words to use as a partial foundation for

crafting academic prose. 2 They facilitate and represent fluent language use and signal that a

writer is a “member” of a discourse community. 3 They represent register-specific ways of expressing particular meanings.

The use of bundles also helps guide readers through text, by signaling linkage of ideas, the writer’s stance, or the attitudes implicit in prose. The special value of lexical bundles in academic language is clearly identified in research. Biber et al. (1999), Hyland (2008), and Simpson-Vlach and Ellis (2010) all show that high-frequency lexical bundles in academic language are low-frequency items in other nonacademic genres. Biber et al. (1999) also reveal that the actual structure of academic lexical bundles is, to the tune of 70 percent of all bundles, preposition + noun phrase, or noun phrase + of, or anticipatory it fragments. In contrast, these rarely exist in conversation. Lexical bundles are extremely frequent in academic prose. Biber et al. (1999) indicated that four-word bundles appear over 5,000 times per million words in academic writing. Table 8.1 shows the occurrence of Hyland’s (2008) four-word bundles in a 3.5-million-word corpus of academic language.

Table 8.1 Occurrence of bundles by discipline from Hyland (2012, p. 162) Discipline

Different bundles

Total cases

% of ttl words in bundles

Electrical engineering

213

4562

3.5

Business studies

144

3728

2.2

Applied linguistics

141

4631

1.9

Biology

131

2909

1.7

LEXICAL BUNDLES—CORPORA, FREQUENCY, AND FUNCTIONS

135

Studies of lexical bundles in spoken language are relatively uncommon. The works of Biber (2006), Biber and Barbieri (2007), Biber et al. (2004), and Cortes and Csomay (2007) are some of those which have looked at spoken language. The contexts from which the spoken corpora were extracted include instructional, that is, classroom talk and study groups, and noninstructional, including office hours, student advising, and service encounters. Results have tended to show that classroom spoken language uses a broader range of bundles than does academic writing, and that noninstructional spoken language uses an even more diverse array of bundles. A large proportion of this body of bundles consists of stance bundles, particularly used at the start and middle of lectures (Cortes & Csomay, 2007), when professors are dealing with classroom management topics or attempting to elicit student engagement. Simpson (2004) noted that a large number of discourse organizing bundles, to focus and organize information, are used in academic speech, based on analysis of the Michigan Corpus of Academic Spoken English. Different academic genres display different lexical bundles. Biber (2006) illustrates this beautifully by pointing out that lectures tend to be a sort of blend of both oral and written language, and so they use double the amount of bundles that conversation has, and four times as many as textbooks. Also, the bundles in lectures tend heavily toward discourse and stance functions. In written genres this type of variation can be seen too. Chen and Baker (2010) note a large difference between student academic writing and the types of prose found in published academic work. The student writing contained many more discourse organizing bundles, whereas the published work showed a high proportion of referential bundles. Cortes (2004) also found differences in the types of bundles used by students and published writers. Disciplinary variation is another key determinant of the nature of lexical bundles. Hyland (2008) conducted a study comparing the use of bundles in electrical engineering, business studies, applied linguistics, and biology. He noted that the broadest range of bundles was found in electrical engineering texts, and that the bundles used there were often not found in the writing in other disciplines. Also, he noted that the function and the forms of these bundles were also distinct, in that many contained an of-fragment or began with a prepositional phrase. Also, the scientific texts contained a relatively high proportion of referential bundles, whereas the social science texts contained more discourse and stance bundles.

In summary … From this short overview of the research into lexical bundles and academic discourse, some patterns and themes emerge. One important element is

136

FUNDAMENTALS OF FORMULAIC LANGUAGE

the notion of a lexical bundle being strictly defined within three parameters: frequency, range, and function. Researchers who study multiword sequences taken from corpora and who use additions or modifications to these parameters are, technically speaking, not investigating lexical bundles but some other phenomenon. This accounts for the use of other terminology by, for example, Simpson-Vlach and Ellis (2010) who call their units of analysis “formulaic language,” Wood and Appel (2014), who employ the term “multiword constructions,” and Liu (2012), also using the term “multiword constructions.” Regardless of the terminology, the fact is that the advent of corpus analysis tools and the study of lexical bundles have allowed us to uncover previously hidden elements of language. The nature of lexical bundles and their structural characteristics make them less perceptually salient than some other types of formulaic sequences. Also, and most importantly, lexical bundles open a gateway into the engine of academic discourse, providing us with an observable and tangible element of language which is key to the construction of discourse. A few of the many themes and patterns which the research shows are: MM MM

MM

MM

MM

Lexical bundles are an important type of formulaic sequence. Lexical bundles are discoverable by means of corpus analysis software from a corpus using frequency and range criteria. Lexical bundles are not meaning units, but rather, they serve specific functions in discourse. In academic discourse lexical bundles and their use differ from discipline to discipline there are still a wide range of questions about how adult language learners perceive and acquire formulaic language. Lexical bundles may be key to helping tertiary education students to achieve proficiency in academic writing.

For sure, there are a number of issues yet to be addressed fully in the research. For example, how can we be sure that knowledge and awareness of lexical bundles will help students to improve their writing abilities? Similarly, how can we be sure that lexical bundles really improve academic writing? How can we teach the bundles effectively? What about lexical bundles in spoken language and their contribution to spoken discourse? How do lexical bundles work in nonacademic discourse? Clearly, this is fertile ground for future researchers!

LEXICAL BUNDLES—CORPORA, FREQUENCY, AND FUNCTIONS

POINTS TO PONDER AND THINGS TO DO 1 In your own academic writing instruction, have you been made aware of discourse devices? Have any of them been multiword units? Can any of them be characterized as lexical bundles? 2 Take an academic text and see if you can identify lexical bundles by means of your own intuition. Compare your decisions with a partner or two. Are your decisions similar? What are the drawbacks of this type of identification technique, compared to use of corpus analysis software? 3 Consider the primary importance of frequency in lexical bundle research. Can you think of any drawbacks to reliance on frequency to identify formulaic sequences in text? What do we miss by depending heavily on frequency? 4 The research is quite firm on using three or more words as a length cutoff for lexical bundles. Why would two-word units not be included? How about longer units of five or more words? 5 Does the research on lexical bundles have any implications for teaching reading and writing in kindergarten and elementary schools? 6 Think of a specific group of university students in a particular discipline. How would you approach selecting appropriate lexical bundles to teach them? Bear in mind that the bundles are not meaning units, but functional units. 7 For the group of students in #6 above, think of some ways to teach them the appropriate lexical bundles. 8 Are there specific ways second language learners/EAP students need to acquire lexical bundles compared to regular university students? Why or why not? 9 Outline a research project on lexical bundles in a register different from academic discourse. How do you imagine the resulting lists of bundles might differ from the ones from the academic discourse research? 10 Does the study of lexical bundles have implications for the work of language assessors? Editors?

137

9 Formulaic Language and Language Teaching—Research and Practice

D

espite the wide range of research that has been conducted on the nature of formulaic language and its use and acquisition, surprisingly little effort has been put into investigating how to teach it, particularly to second language learners. This is all the more perplexing when you consider that evidence shows that only the most advanced of learners approach anything resembling a nativelike facility with formulaic language (e.g., Forsberg, 2010; Laufer & Waldman, 2011). Development in this area of language is generally quite slow, with learners persisting in, for example, showing limited ability to intuit usage norms, and utilizing first language-based sequences which end up inappropriate. A review of the literature on pedagogical interventions with a focus on formulaic language follows here, and the chapter ends with an overview of what seems to have worked. With this, we can begin to imagine a way in which formulaic sequences may actually be taught, or their acquisition fostered, so as to help avoid the problems of slow acquisition and faulty intuitions and usage even at advanced levels. What might be the actual benefits of acquisition of a range of formulaic sequences with good mastery or depth of knowledge or degree of proceduralization? If we think of formulaic language as a type of vocabulary, we can readily get to an answer. Formulaic language works in much the same way as vocabulary does: many sequences have similar meanings to content words, for example, collocations and complex verbs (Boers & Lindstromberg, 2012, p. 84). Others such as exclamations and idioms carry discourse functions. Pragmatic formulas have social interaction functions, and many serve to organize discourse in the same way as function words. The size of one’s

140

FUNDAMENTALS OF FORMULAIC LANGUAGE

vocabulary is linked to language proficiency (Staehr, 2009), and knowledge of multiword sequences does too. Keshavarz and Samili (2007), Hsu and Chiu (2008), Boers, Eyckmans, Kappel, Stengers, and Demecheleer (2006), and Dai and Ding (2010) report that increased use of formulaic sequences in speech or in writing is linked to higher assessments on speech and writing tasks. There are some excellent corpus-based resources available for help with vocabulary learning, a prime example being Tom Cobb’s Lextutor, a compilation of links to major corpora and activity ideas for teachers and learners. Learners struggle with formulaic sequences, however, often failing to interpret their meanings accurately. Boers, Eychmans, and Stengers (2007) found that learners tended to misinterpret figurative idioms even when given ample contextual clues. Martinez and Murphy (2011) found that learners attribute false meanings to formulaic sequences often due to a focus on the individual word meanings which comprise the sequence—they give the example if it’s about time, in which learners mistakenly tend to see about as a topic marker. It is equally unfortunate to note that learners also fail to process formulaic language in the same way as native speakers do, thus losing the valuable processing advantage which they provide. As seen in Chapter 4 of this volume, sequences deeply entrenched in memory serve to provide ready-made stretches of language and serve as a valuable processing shortcut. They allow us to bypass the limitations of short-term memory and to free up attentional energy to conceptualizing and formulating other aspects of the discourse (see Wood, 2010a, for an overview). Learners may recognize sequences as formulaic, but they may not process them as such. Evidence from eye-tracking and reaction-time studies (Columbus, 2010; Conklin & Schmitt, 2008; Jiang & Nekrasova, 2007) shows that learners process formulaic language faster than nonformulaic, just as native speakers do, but that their processing is, all the same, slower. Interestingly, evidence exists (Ellis et al., 2008) that native speaker facility in processing formulaic sequences is positively correlated with their MI scores—MI being a statistical measure of the strength of association between collocating words. Part of the reason for this may be, as Ellis et al. (2008) note, native speakers have had vast opportunity to encounter even lowfrequency sequences, while learners obviously have not. Also, the figurative meanings of many formulaic sequences may tend to jam the system for learners—native speakers will link the entire word string as a chunk to its figurative meaning, whereas learners will often have to juggle literal and figurative options for a given sequence. So far, it is quite clear that mastery of formulaic language has many benefits and many challenges to L2 learners. So what are some ways in which language educators can facilitate this mastery? What are the possible ways in which one might begin to actually teach or facilitate the acquisition

FORMULAIC LANGUAGE AND LANGUAGE TEACHING

141

of formulaic language? An overview of some intervention studies in which researchers implemented teaching strategies is a place to start.

Pedagogical intervention studies Some types of teaching techniques for formulaic language involve chunking of text. In these activities, learners are required to highlight sequences in texts which they think are formulaic, and their decisions are compared to nativelike decisions or verified by means of online sources. Boers et al. (2006) conducted a classic study in which this procedure was key. Compared to a control group, the learners who had experienced chunking activities appeared to use more formulaic language in later retells of new reading texts, although closer examination showed that this was in part due to their having repeated some word strings from the texts verbatim. Another study by Stengers, Boers, Housen, and Eyckmans (2010) repeated the basic procedures and used a final input text in the learners’ first language to try to avoid the verbatim reproductions found in the earlier study. Unfortunately, this time no significant difference was found from the control group in use of formulaic language in retells. Why would this be the case? Perhaps, raising awareness of sequences several times is simply not enough to foster deep processing of them and embedding in memory. Subsequent encounters with a given sequence may occur with too much time elapsing, and no memory of the previous encounter is left. Eyckmans, Boers, and Stengers (2007) tested whether text chunking facilitated recognition of formulaic sequences in new texts, and found a positive result for chunking. However, when compared to native speakers, these learners had marked as formulaic many sequences which clearly were not so. The moral of the story appears to be that raising awareness of the formulaic nature of language may encourage learners to attend to it, but the raised awareness does not necessarily translate into anything of any value in real terms. What difference does it make whether a learner is aware of formulaic language, yet cannot accurately perceive or use it? The jury is still out on the actual practical pedagogic value of text chunking as dealt with in the studies mentioned here. It is important that, as is the case with individual vocabulary items, learners be given ample opportunities to use a formulaic sequence in a range of contexts in order to acquire it and have it available for retrieval at a later time. A related type of awareness raising activity to chunking is typographic enhancement and glossing, in which formulaic sequences are highlighted in texts using, for example, bold and enhanced typeface, and/or are explained in margins or between lines. Peters (2012) found some positive effect of

142

FUNDAMENTALS OF FORMULAIC LANGUAGE

this procedure, when learners were better able to recall enhanced and glossed sequences from reading. Flooding the input is another technique with potential for facilitating acquisition of formulaic sequences—it involves creating or abridging text so that a particular formulaic sequence appears a number of times within a stretch of discourse. Webb, Newton, and Chang (2013) tested this in a study in which they abridged a graded reader so that target collocations appeared one, five, ten, or fifteen times, having four groups of intermediate ESL learners read and listen to one version. Post test results show that more exposure to a collocation yields better receptive and productive knowledge of it, although even the fifteen-encounter group only succeeded in the productive post test roughly 50 percent of the time. Here we have evidence of the power of frequent exposure, although this type of study needs to be replicated to extend the number of exposures beyond one reading and listening experience with a given abridged text. Another type of vocabulary, and, by extension, formulaic language, activity is the use of dictionary resources or corpus information. Laufer (2011) had students supply the missing word in collocations in sentences and then consult monolingual and bilingual dictionary resources. A post test revealed only 40 percent success rate despite the use of dictionary information, and a one-week delayed post test only a 25 percent success rate. Wu, Witten, and Franken (2010) created a corpus of formulaic sequences and had learners use it to locate collocates of words, with a 67 percent success rate. Chan and Liou (2005) did something similar with verb–noun collocations and found a gain of nine collocations, but a two-month post test showed the gain reduced to almost nothing. So, what is the problem with these intervention studies which have such weak results? Part of the problem may be that little effort was put into getting learners to actually try to remember the formulaic sequences they were given in the interventions. In other words, there was little engagement with the sequences and little need for deeper processing. Let’s see if studies which require more cognitive engagement with the sequences yield better results. Laufer and Girsai (2008) had three groups of ESL learners read a text containing ten new single words, and ten new collocations, with the meanings explained. One group was asked to discuss the text and the moral issues involved in its theme, a second group was given multiple choice and completion activities focusing on the new vocabulary and formulaic sequences, and the third group were required to conduct translation exercises involving the new vocabulary. A one-week post test of the vocabulary and formulaic sequences showed that the translation activities were the most helpful, followed by other vocabulary activities. Boers and Lindstromberg (2005) and Lindstromberg and Boers (2008a, 2008b) tested the

FORMULAIC LANGUAGE AND LANGUAGE TEACHING

143

efficacy of drawing learner attention to the alliteration, rhyme, and assonance characteristics of formulaic sequences. Learners were required to deal with dictation and writing of the sequences in these studies, and all results showed a significant influence of the activities for particular formulaic sequences. Another mnemonic aspect of formulaic sequence is their imageability, specifically figurative idioms. Steinel, Hulstijn, and Steinel (2007) found that idioms which link to a mental image are better retained than those which do not. The dual coding hypothesis (Sadoski, 2005) holds that concrete vocabulary is easier to retain than abstract. This idea is linked to cognitive semantics, which is used in teaching figurative idioms and abstract vocabulary by grouping them into metaphoric themes. These types of techniques tend to use categorization and drawing, pictures, and mimed actions. Beréndi, Csábi, and Kövecses (2008) found that grouping idioms under metaphoric themes helps retention. Helping learners to cope with the literal meanings of idioms can help them to manage the figurative meanings as well (Boers, Demecheleer, & Eyckmans, 2004). This type of work has also extended from work on idioms to work on phrasal verbs, based on the observation that these are metaphorical.

Practical applications and materials It is interesting to note that, while there has been a growth in knowledge and awareness of formulaic language in general among researchers and academics, there has been such limited effort to utilize this knowledge in language teaching methodology. Some pedagogical approaches have been developed which are based on formulaic sequences, in areas including overall language proficiency, English for specific purposes (ESP), and English for academic purposes (EAP). For example, Nattinger and DeCarrico (1992) outline some general principles for the teaching of lexical phrases; Willis (1990) integrates awareness of formulaic sequences into a set of general teaching materials based in large part on the COBUILD corpus; and Lewis (1997) also elaborates a set of principles and a syllabus design for using knowledge of formulaic sequences in language teaching. It is noteworthy that there exists a quite limited group of materials for English as a second language (ESL), which focus to any great extent on teaching formulaic language. The options available include a corpusinspired series by McCarthy and O’Dell (2002, 2004, 2006) of volumes which deal separately with idioms, collocations, and phrasal verbs. While these types of resources may help teachers to begin to address the importance of formulaic language, the actual pedagogical methods represented in them are actually quite old-fashioned present–practice–

144

FUNDAMENTALS OF FORMULAIC LANGUAGE

produce sequences, and they contain little of the most recent methods such as task-based language teaching (TBLT). A small collection of useful dictionaries of formulaic sequences are available, including several from Hill and Lewis (1997), Benson, Benson, and Ilson (1997), and Spears, Birner, and Kleinelder (1994). However, it is important to bear in mind that dictionaries are not really teaching materials unless they are used in conjunction with other resources. Lewis’s lexical approach (1997, 2008) represents one of the most comprehensive efforts to date of developing a pedagogy based on formulaic language (Lewis, 1997, 2008). The lexical approach proposes that syllabus and curriculum designs be based on lexis instead grammar, building on words, collocations, fixed and semi-fixed expressions, and so on, with less attention paid to sentence grammar and uncollocated nouns (Lewis, 2008, p. 15). The actual pedagogical techniques inherent in the approach emphasize repetition, noticing, and consciousness-raising. In Lewis’s practice-oriented book Implementing the Lexical Approach (2008), readers can find classroom reports from experienced teachers who have used elements of the lexical approach with their students. Also, the approach inspired Boers and Lindstromberg (2009) to summarize a range of pedagogical techniques which may optimize the acquisition of formulaic sequences—including tips on selecting what to teach and means of semantic and structural elaboration. They also stress that sequences to be automatized, practiced, and used to the point where they are available for use in spontaneous communication with minimal conscious effort. It seems that most ESL textbooks provide little to draw on for teaching formulaic sequences. While formulaic sequences are sometimes presented in commercially available teaching materials, the selection may not be useful for learner needs. Koprowski (2005), in a survey of contemporary course books, found that, while many textbook authors pay attention to formulaic sequences in their syllabuses, they may rely on intuition and not be grounded in language data such as that available from corpus research. Teachers who wish to facilitate the acquisition of formulaic sequences and who rely on commercial materials to provide the best input and tasks for the job should probably be cautious in their choices of resources. Specific types of textbooks such as those used in teaching EAP and ESP have been examined to see how they deal with formulaic language. A good example is Chen’s 2010 examination of ESP materials, focusing on lexical bundles, which revealed that only around one-third of the bundles commonly used in regular engineering textbooks were present in ESP materials, and that they covered a much narrower range of discourse functions in the ESP materials. Looking at lexical clusters in EAP textbooks, Wood (2010b) discovered that the highest frequency of clusters was in instructional

FORMULAIC LANGUAGE AND LANGUAGE TEACHING

145

language and not the texts used for input, and that the textbook material generally ignored formulaic language. In an examination of multiword constructions in first-year university textbooks, Wood and Appel (2014) found that the formulaic sequences common in them were minimally present in a set of five commercially available EAP textbooks, and that the materials failed to pay much attention to formulaic language at all.

Principles of pedagogy of formulaic language As discussed in Chapter 4 of this book, automatization of formulaic sequences is a vital part of their acquisition. For automatization to occur, it is useful for learners to produce extended stretches of discourse (see Swain, 1985, 1995, for studies on the importance of output). Production of this sort promotes automatization by requiring juggling of planning, processing, and encoding, all necessary in real time as the production tasks are performed. Presentations, dramatizations, role plays, and extended written text are samples of productions which can be used in connection with virtually any thematic context. Some important elements of the process are preparation, practice, and feedback:

Preparation Preparation time and activity is vital to ensure that learners integrate formulaic language which is useful for successful completion of a task. Ellis (2005) notes that pre-task planning may be either rehearsal of production or strategic planning—in which learners take time to prepare how to express the content they need to encode. Another valuable type of planning and preparation is within-task planning, which entails manipulation of the time needed to complete a given task. These types of planning help learners to notice the formulaic sequences useful for a task, which in turn supports the transition from controlled processing to more automatic processing. Accuracy and complexity of speech is aided by within-task planning, according to Ellis and Yuan (2005).

Practice Practice is necessary for automatization. Practice with peers can help, as students check comprehensibility of the output, organization of ideas, and try to pick up speed as the number of practices increases. A full process of practice should allow the learner to produce extended speech which goes beyond his or her usual level of proficiency. A classic type of activity to help with this is Nation’s (1989) 4/3/2 technique. In this technique, individual

146

FUNDAMENTALS OF FORMULAIC LANGUAGE

students prepare a talk and deliver it to a peer. The first talk period is allotted four minutes, after which the speaker moves to a second peer and delivers the talk again within a three-minute time limit. Speakers change partners a third time and deliver the talk within two minutes. Nation observed that students became more fluent as this process unfolded: In all except one case study there was an increase in the rate of speaking from the first to the third delivery … the number of false starts, hesitations … and repeated words … decreased in each case study from the first to the third delivery of each talk. (Nation, 1989, p. 379) Focused practice like this can push speech abilities beyond normal levels. Further research by de Jong and Perfetti (2011) showed that this very procedure can indeed boost spoken fluency, and attribute it to changes in underlying psycholinguistic mechanisms. Repetition can be built into tasks in three main ways: task-based, involving the repetition of a particular task in its entirety, such as a presentation or a role play; meaning-focused, whereby the particular meaning inherent in a task is repeated in different ways, such as converting written language to speech; and form-focused, with a focus on improving particular points of language such a formulaic sequences. Wood and Namba (2013) found that having learners memorize and repeat specific formulaic sequences useful for a subsequent in-class presentation resulted in retention for the purpose of the presentation. Also, it appears that this procedure contributed to a growth in awareness among learners of the utility of formulaic language in communication. The beneficial effects of memory are highlighted in research with a specific focus on teaching methodology. Gatbonton and Segalowitz (1988, 2005) elaborate sets of principles for encouraging memory and automatization with implications for dealing with formulaic sequences. These are dealt with in more depth below. Some phonemic aspects of formulaic sequences, including alliteration and assonance (Lindstromberg & Boers, 2008a, 2008b) have been shown to facilitate the learning or memorization of formulas, and an emerging theme in research centers around the positive effects of harnessing memory as a means of pushing acquisition. Some researchers, notably Alison Wray and collaborators, have examined how memorizing formulaic language can improve communication. In one study (Wray, 2004, 2008), a beginner learner of Welsh memorized phrases and sentences necessary in order to provide a cooking demonstration broadcast on television, all within a one-week period. The learner conducted a competent and quite fluent demonstration and nine months later still recalled a significant amount of the language. In another study (Wray, 2008; Wray & Fitzpatrick, 2008), advanced learners memorized nativelike formulaic ways of

FORMULAIC LANGUAGE AND LANGUAGE TEACHING

147

expressing useful ideas in their everyday encounters with native speakers. After a period of practice and rehearsal, the learners recorded themselves in these encounters and reported that the memorized language improved their confidence, satisfaction, and feeling of being nativelike communicators. Not only does memorization of specific sequences appear to support proficiency, but memorization of long texts and full dialogues has also been shown to be beneficial, at least from the perspective of learners. Ding (2007) found positive effects for memorization of large amounts of English text by Chinese university students, who reported that the practice had made them better communicators in English by enhancing their fluency, focusing attention on collocations and formulas, and enabling the transfer of these to real-life communication. In Japanese as a second language, Walker and Utsumi (2006) found that learners valued memorizing dialogues as a learning technique, and as a boost to fluency which is readily transferred to real-life communication. It appears also that memorization can improve second language writing abilities. Dai and Ding (2010) discovered that Chinese L2 learners of English used more formulaic sequences in their L2 writing after memorizing text, and that their writing ability was rated better than that of learners who did not memorize texts.

Feedback Focus on form (FonF) approaches are an important element of current language teaching methods. FonF basically calls for teachers to intervene in the acquisition process by directing learner attention to language form and structure at the same time that they are engaged in activity with a primary focus on meaning. FonF pedagogy relies strongly on corrective feedback, with teachers providing online feedback in the form of repetition, recasts, or other strategies while a learner is in the process of expression (Nassaji 1999; Ellis, Basturkmen, & Loewen 2001, 2002). While the focus of such research is overwhelmingly on grammar, it seems that there can this type of feedback could work with formulaic language as well. We might imagine a teacher, for example, recasting a learner’s utterance with a more appropriate formulaic sequence, or providing alternatives.

Formulaic sequences as vocabulary While formulaic sequences are not simply lexical meaning units, they can be taught using a range of vocabulary teaching activities. Some strategies for dealing with formulas as vocabulary, adapted from Liu (2008), are presented below, along with specific associated activity types.

148

FUNDAMENTALS OF FORMULAIC LANGUAGE

Macro strategies MM

Watch for the use of formulaic sequences in daily life encounters.

Learners can be encouraged to listen to formulaic sequences being used in encounters in the community. If they are unclear, they may attempt to combine a phonetic image of the sequence with a guess at of its possible function. See below for more applications of this ethnographic approach to growing a repertoire of formulaic sequences. MM

Make a list or notebook of formulaic sequences heard.

Learners can keep a list of the sequences they hear outside of the classroom. The list could be organized or reorganized as to function and so on. MM

Listen to the media and note how recurring formulaic sequences are used.

Learners can listen for sequences in the media and use phonetic and context cues to determine meanings and functions. MM

Interpret functions and meaning of sequences from the context or from analyzing their component parts.

Learners use phonetic, context, or component analysis to determine the meanings or functions of sequences encountered outside of class. MM

Explore the cultural metaphors underlying sequences.

Learners can investigate the cultural concepts and metaphors underlying some formulaic sequences. According to Lakoff and Johnson (1980), in English, time appears to be similar to money, for example, don’t waste a minute, argument is related to war, for example you can never win an argument with her, and ideas are dealt with as if tangible objects, for example can you give me an idea?). MM

Use mnemonics to aid in the storage of sequences.

Use of mnemonic supports such as alliteration and assonance (Lindstromberg and Boers, 2008a, 2008b) or mental imaging may help in the mental processing, storage, and production of formulaic sequences as well.

Specific activity types MM

Listen to spoken texts and dialogues and mark sequences on a transcript.

FORMULAIC LANGUAGE AND LANGUAGE TEACHING

149

Detailed descriptions of how this might work and how to integrate it into a sequence of activity are presented below. MM

Compare L1 and L2 ways of expressing a particular function or meaning.

Making comparisons to L1 means of expression can potentially aid in mental processing. MM

Search corpora for concordances of sequences.

Corpora available online are often easily searched for specific formulaic sequences, and concordance lines can illustrate many ways in which a sequence can be used or integrated with surrounding text. MM

Replace single words with sequences.

This is a classic vocabulary activity, as are the three following. MM

Fill in blanks in transcripts with sequences.

MM

Complete a transcript with a sequence.

MM

Use sequences in a narrative or other monologue.

See below for more detail on how this and the following activity may be integrated with a sequence of activity. MM

Use sequences in a role play or dramatization/simulation.

MM

Retell a spoken text.

MM

Use mnemonics. {{

Catalogue by phonological features.

{{

Make a semantic web.

{{

Catalogue by key lexical item.

{{

Link to a mental image.

MM

Describe a picture or picture sequence using sequences.

MM

Explain sequences. {{

Define them.

{{

Elaborate on their meanings/functions.

{{

Paraphrase a text with or without sequences.

150

FUNDAMENTALS OF FORMULAIC LANGUAGE

Specific activities for formulaic language and speech 1 Shadowing

Shadowing and tracking are typically used in the teaching of second language pronunciation (Ricard, 1986), but have potential for formulaic language as well. The process requires dealing with a recorded native speaker and a transcript. The learners read the transcript aloud along with the voice on the recording, then record their best effort in imitating the model. The instructor provides feedback and learners then use this as a guide to try again. Before dealing with the transcript, formulaic sequences in the model should be highlighted. 2 Mingle jigsaw

Described by Wood (1998) as an information-sharing technique, this involves repeated information delivery by learners to peers, while listening to the varied information the peers have to convey. Wood instructs students to mingle and share specific information: Divide (six pieces of information) equally among class members. Each group should carefully prepare to explain their assigned (information) as simply and clearly as possible. All class members should rise from their seats and mingle as at a party. No notes or readings should be carried around the room. Each student should explain his or her (assigned information) to other students, and listen to classmates explaining theirs. After each brief session, students may return to their seats when ready and jot down a few notes in the appropriate boxes in the table below. Return to the party to continue sharing your information. Continue mixing and returning to your seat until everyone has completed all the boxes in the table. (Wood, 1998, p. 59) The mingle jigsaw technique does not necessarily need to be readingbased, but any content, including opinions or personal experiences could be “mingled.” 3 4/3/2

As noted above, Nation’s (1989) 4/3/2 technique requires students to prepare a talk and perform it for peers, with incrementally reduced time deadlines. A similar procedure is described by Schloff and Yudkin (1991), in which speakers

FORMULAIC LANGUAGE AND LANGUAGE TEACHING

151

practice reading aloud a 180-word passage with a sixty-second time limit, without sacrificing clarity. Note that de Jong and Perfetti (2011) discovered that this seemingly simple process has effects on psycholinguistic processing of language. 4 Chain dictations

Dictation is an activity which readily lends itself to variations (see Davis and Rinvolucri, 1989, for a resource). In chain dictation, small groups of learners collaborate to write down a dictated text. The students in the groups are assigned numbers, and the recorded or read-aloud dictation text is delivered in the corridor outside the classroom, or in a quiet corner of the room. First, the students numbered one go to hear the text and return to their group to share what they recall. The group transcribes this, and then students numbered two go to listen, then return and the group repairs or completes their written text. This process continues until all students have had a chance to listen. The returning students need to retain the text in short-term memory even though it is too lengthy for them to retain in a word-by-word fashion. In order to minimize processing overload, one must attend to the chunks or formulas contained in the text. Back at the main group, the dictation text will grow and come into focus as chunks or formulas are fixed and clarified. 5 Student dictations

In student dictations, participants are given half a dictation text and assigned to dictate it to a partner who has the other half. The learners in this task must notice and retain formulaic sequences in working memory in order to complete the task. 6 Chat circles

Chat circles involve dividing a class into two large groups which stand in two concentric circles, the inner circle facing out and the outer circle facing in. Each face-to-face pair talks spontaneously for a minute or two on a topic assigned by the instructor. Participants then step one partner to the right or left and talk spontaneously about a newly assigned topic for the same amount of time as in the previous round. The circle ends when every outer circle member has spoken with every inner circle member. The topics should move from the immediate and personal or familiar, to the more abstract and opinion-oriented as the activity progresses.

Fluency workshop—formulaic sequences What follows is a description of a fluency program consisting of three blocks of six hours each, with each block following the sequence outlined below.

152

FUNDAMENTALS OF FORMULAIC LANGUAGE

Formulaic language is key to activities. A full description is to be found in Wood (2006, 2010a).

Input stage Learners listen to a recorded native speaker spontaneously talking about a topic. They discuss the content of the speech and the speaker’s attitudes and feelings. They listen again while following along on a transcript and then mark hesitations. The instructor points out formulaic sequences which occur between the hesitations and remarks on their linguistic and discourse functions.

Automatization stage Learners shadow the recording in a language laboratory. First, the entire group practices and then they shadow at least eight times alone. They pay attention to the formulaic sequences and hesitation patterns, and repeat the harder parts as many times as needed. The learners participate in two activities designed to further automatization of the formulaic sequences. First, students listen to a dictogloss of sentences containing key formulaic sequences taken from the input text. Dictogloss (Wajnryb, 1990) is a procedure originally developed for grammar awareness. A brief (usually five sentences) text is read aloud at normal speed, with students jotting down whatever they can catch, usually key words. They work in teams to reconstruct the entire text, and then compare their reconstruction with the original text, with their attention directed to stretches of discourse in which they missed the mark. Dictogloss texts rich in formulaic sequences raise awareness of the sequences and their functions in speech. A mingle jigsaw (Wood, 1998) is then used to further fluency through automatization, followed by a chat circle to consolidate the experience gained. In the chat circle, a topic from the original taped model is used for each talk period. The partners then comment on their production and reflect on the speed, hesitations, and difficulties.

Practice and production stage Learners then prepare their own brief talk, connected to the topic of the original model. In preparation they are guided through Nation’s (1989) 4/3/2 procedure, after which they record their talk without using notes. The recordings are collected and the learners review their performances and make note of aspects in which they feel they have had development from the first to third attempts.

FORMULAIC LANGUAGE AND LANGUAGE TEACHING

153

Free talk stage Learners then form groups and generate topics related to the original native speaker model. These topics are distributed randomly, and small groups take turns listening to individuals speaking spontaneously about the topics, commenting on the production, and reflecting on the successes and difficulties of their own efforts.

Evidence A case study based on this fluency workshop is presented in Wood 2009a. The participant, Sachie (pseudonym), produced narratives before and after a sixweek set of fluency workshops, and the speech was analyzed with respect to fluency and use of formulaic sequences. The first sample was produced on the first day of the fluency workshops, before the start of the cycles of activities, and the second sample was produced roughly six weeks later—a week following the end of the workshops. Two fluency measures were calculated: speech rate (SR) was calculated as the number of syllables uttered per minute; mean length of runs (MLR) was calculated as total number of syllables uttered divided by the number of runs. Formulaic sequences in the transcripts were identified using the procedures developed for Wood’s 2010a research on formulaic language and fluency reported in detail in Chapter 6 of this book. Sachie’s speech was more fluent in the second sample, with MLR increasing by 26.3 percent in the second speech sample, and SR improving by 13.8 percent. The samples show important effects of the workshop experience. The first sample contained eighteen formulaic sequences, and the second sample, six weeks later, contained fifty formulaic sequences. Also, 11.8 percent of the formulaic sequences in the first speech sample were from the workshop, and 36 percent of those in the second speech sample were from the workshop activities. The formulaic sequences used in the second sample were marked by increased length and complexity and nativelike semantic aspects. The mean length of the formulaic sequences in sample two is 4.46, while that of the first sample is 3.17—an increase of 40.7 percent. It seems that the participant recalled the formulaic sequences from the models, integrated them into her repertoire, and effectively used them in her own narrative. This resulted in increased fluency, particularly in longer length of runs between pauses.

Discourse analysis activity Riggenbach (1999) presents a paradigm for designing ethnographic discourse analysis activities in the classroom:

154

FUNDAMENTALS OF FORMULAIC LANGUAGE

Step 1: Predict Learners make predictions about the target structure. Step 2: Plan Learners set up a research plan that will produce samples of the target structure. Step 3: Collect data Learners observe and/or record the target structure in its discourse environment. Step 4: Analyze Learners analyze the data and explain results/make conclusions. Step 5: Generate Learners discuss the target structure or produce the target structure in its appropriate context. Step 6: Review Learners summarize their findings or reanalyze the data that they produced, asking whether the data conform to their conclusions in Step 4 (Riggenbach, 1999, pp. 45, 46). Wood (2009b) describes a course designed for engineering cooperative education students in which training in formulaic sequences was undertaken using an ethnographic and discourse analysis approach largely based on that of Riggenbach (1999). The learners in the study were able to improve their spoken language abilities using formulaic sequences. The participants were placed into a specialized spoken English course by means of an assessment instrument which consisted of three simulation tasks related to the types of speech situations they were likely to encounter in their job interviews and placements. The assessment required the candidates to listen to the instructions and input through a headset and record their responses and output through a microphone. First, they were to take a maximum of two minutes to describe their interests in their field of study. They talked about their motivation to enter the field, what contributions they could make, and the benefits they expected from their cooperative work placement. The second task simulated a meeting in which the candidate listened to a manager introduce a change in the work environment, with three main points: what the planned change was; reasons for the change; and contributions expected from employees. The candidate then relayed the information to a colleague. In the third task candidates read a memo and answered oral questions from a customer. The entire simulation was scored using a rubric which covered content, organized presentation of material, overall comprehensibility, grammatical and phonetic accuracy, and fluency. In the course, learners basically followed the fluency workshop as described earlier in this chapter. The native speaker models they dealt with included talking about work, career choice, and making changes to work plans. With

FORMULAIC LANGUAGE AND LANGUAGE TEACHING

155

each model, the candidates went through the automatization and free talk stages as explained above. In the ethnographic section of the course, learners were paired with teacher education students from a university education faculty. They were required to conduct an interview with the education students on topics related to those of the simulations in the initial cooperative education placement: 1 Why they chose teaching as a career. 2 What contributions they feel they can make to the teaching

profession. 3 What benefits they expect to gain from a teaching career. 4 When to schedule another appointment to meet. 5 The history of their studies until now. 6 An example of changes in their studies or work and what they had to

do to cope. 7 An example of a problem with meeting a deadline and how they dealt

with it. They recorded and transcribed the speech of the education students, and then pooled their transcripts, recordings, and findings. They used the transcripts to identify sample phrases, words, and expressions used for five purposes: 1 Explaining a personal choice 2 Listing personal characteristics and/or qualifications 3 Telling about life experiences 4 Introducing the idea of contributions 5 Discussing benefits

These phrases, words, and expressions were then pooled, and the participants returned to the original recorded samples of themselves describing their interests, contributions, and benefits. They used three sources to help modify and improve their speech: MM MM

MM

The instructor’s feedback. The experience with the teacher trainees, including the recorded samples and transcripts and our in-class discussions about it. Their sample recording with transcript.

156

FUNDAMENTALS OF FORMULAIC LANGUAGE

After this, they re-recorded themselves talking about a change in a workplace, explaining what the change is, why it is happening, and what the contributions are expected from employees. In sum, these participants were given a range of focused support to improve their spoken performance on the three themes related to the test of speaking: discussing why they chose a particular career and what they expected to contribute to it and benefits they expected from it; workplace changes and their causes and the expected contributions of employees; challenges in meeting deadlines, reasons for that, and plans to rectify the situation. The support included instructor feedback, the ethnographic data they collected from the university teacher trainees, and in-class experiences and peer feedback. The course ended with a repeat of simulation tasks used to place the learners in the course. Student peers and three ESP teachers all completed an evaluation template for each participant, and results showed that they generally scored higher on speed, hesitation, intonation and rhythm, and vocabulary, phrases and grammar in the final presentation. At the beginning, overall scores on a scale of 1 to 5 for six of the learners were 3, with the other ten scoring 4. In the final presentation, all scores were in the range of 4, with six of them advancing from 4 to 4.5a. Their oral proficiency improved, and they were able to use more formulaic language taken from the speech models of native speakers.

In summary … From this short overview of the research and practice into teaching formulaic language, it appears that there is a great deal of potential for successful means to facilitate acquisition of these essential elements of language. The intervention studies reported here appear to have some weak results, but interventions which encourage remembering, including chunking, flooding, and cognitive engagement appear to have been more successful. As far as integrating formulaic sequences into language pedagogy is concerned, there are ways to use them with sensitivity to the power of input, interaction, and even form focused instruction. Specific activities which can foster automatization of formulaic sequences are plentiful, along with course plans and syllabus designs with a similar focus. Research studies which investigate the benefits of focused instruction of formulaic sequences on, for example, spoken fluency show an effect of instruction. In sum, there are a number of potentially fruitful ways for teachers to integrate formulaic language in their plans or to devise activities and lessons with a specific

FORMULAIC LANGUAGE AND LANGUAGE TEACHING

157

focus on them. A few of the many themes and patterns which the research and practice in this area show are as follows: MM

MM

MM

MM

MM

Integrating formulaic-language-specific activities in second language classroom lessons can facilitate their acquisition. Input flooding, chunking, and cognitive engagement are useful ways of dealing with this element of language. It is possible to integrate formulaic language into input, interaction, and form-focused instructional plans. There are a number of specific activity types which can help facilitate acquisition of formulaic sequences. Evidence shows that encouraging automatization of formulaic sequences can have positive effects on spoken fluency.

For certain, there are a number of issues yet to be addressed fully in research and practice. For example, how can we be sure that acquisition of specific formulaic sequences is adequate in causing general language gain? Are there particular strategies which learners can be taught which will help them to be more readily capable of perceiving and integrating formulaic sequences in their own language repertoire without explicit teaching? Similarly, how can we be sure that our instructional techniques have lasting effects? Clearly, the teaching of formulaic language is an area still ripe for investigation by researchers and teachers alike.

POINTS TO PONDER AND THINGS TO DO 1 In your own second language acquisition process, were you ever explicitly made aware of formulaic language in any way? 2 In your own second language acquisition process, did you ever notice formulaic sequences without being instructed? Did that noticing represent a turning point in your acquisition process in any way? 3 How does teaching formulaic language differ from teaching of vocabulary in general? Why? 4 Think of a specific group of second language learners. Think of a specific speech communication context relevant to their needs and goals. Draft a lesson plan to help teach relevant sequences to them. 5 Think of a specific group of second language learners. Think of a specific reading and writing communication context relevant to their

158

FUNDAMENTALS OF FORMULAIC LANGUAGE

6 7 8

9 10

needs and goals. Draft a lesson plan to help teach relevant sequences to them. Are there benefits to teaching formulaic language to native speakers of a language? If so, what are they, and why? Are there benefits to teaching formulaic language to children developing literacy skills? If so, what are they, and why? Draft a plan for a research project focused on teaching specific formulaic sequences. What data elicitation and analysis methods might you use? Do the different types of formulaic sequences as outlined in Chapter 3 require different teaching strategies? Outline a teacher training workshop on formulaic sequences and how to teach them.

10 Current and Future Directions in Formulaic Language Research— Gaps and Pathways

F

rom the information covered in this volume, it is quite obvious that the study of formulaic language has developed to spread across a range of areas of interest. Psycholinguists, language educators, corpus researchers, pragmaticians, and others will appreciate what the study of formulaic language has contributed to fields and subfields in language studies. We appear to be getting a stronger sense of what formulaic language is, how it is processed, how it is acquired, how it contributes to spoken and written language and to particular registers of language, how it may be taught, and so on. Over the decades since the 1970s we have indeed come a long way. But is the state of knowledge satisfactory, can we turn the page, and has the book been written on formulaic language, so to speak? Let’s take a bit of time and space to consider what we do know about formulaic language, how well we know it, and what remains to be examined. It is useful to do so, because this is where we can start to envision future research projects and potential new contributions to knowledge. As is often the case, when we review the information in an area, new ideas start to tumble all over one another as we stand back and reflect on what gaps still exist. This is how researchers tend to operate … read, think, reflect, and wait for the inspiration to design new projects and set out on new avenues of exploration.

History From a look at the research history of formulaic language, it still seems that we are working with a quite elusive aspect of language. We seem to be stuck

160

FUNDAMENTALS OF FORMULAIC LANGUAGE

only capable of dealing with one aspect of the phenomenon at a time. The big story of formulaic language is that it is important in spoken and written language and can be defined in certain ways, some more specific than others. Formulaic language has been studied from a wide range of research and disciplinary traditions, and the research has only been synthesized and pulled together over the past two decades or so. A wide range of questions about formulaic language remain. For sure, all the questions have not been answered yet in any particular area. How do we know whether a formulaic sequence is stored and retrieved as a whole in spoken language? Do the basic assumptions about formulaic language in the processing of spoken language also apply to written language to any extent? How valuable is it to elaborate lists of categories of formulaic language?

Identifying The identification of formulaic language in spoken or written texts remains a fraught enterprise, although some interesting possibilities do exist. We can identify formulaic sequences by using frequency, using text either from a specially created corpus of language from a specific register, or from taking sequences from such a corpus and checking their frequency (or other measures such as mutual information) in huge and comprehensive corpora such as the BNC or the COCA. Some benefits may even accrue from using internet search engines such as Google to help determine formulaicity. The ways in which formulaic sequences are uttered gives us clues as to what multiword combinations might be formulaic. A really multi-faceted and potentially fruitful means of determining formulaicity may involve expert or native speaker judgment, a means probably especially well suited to smaller or very specific data sets. This method of determining formulaicity likely needs to involve use of checklists of various characteristics of the strings and their uses. The overall picture which emerges from the literature in this area shows us that formulaic language is challenging to identify from texts, transcripts, and corpora, but it can in fact be identified by various means. It is best to try to use a combination of measures, and identification by expert or native-speaker judges using checklists as guides is a potentially useful technique. In the end, however, in many cases absolute certainty in identification is likely difficult to achieve. Even using combinations of corpus frequency and MI statistics and acoustical features and judges and checklists, one is probably going to hedge one’s claims about formulaicity. We can hope that new and more reliable or valid means of identifying formulaic language will come along in time.

CURRENT AND FUTURE DIRECTIONS

161

Categorizations We may feel surprised at the range and scope of the types of formulaic language. The actual construct of formulaic language is definitely not monolithic, and the categories themselves, when examined, show quite a bit of overlap, imprecision, and susceptibility to interpretation. The distinctions between, for example, a collocation and an idiom are hard to pin down, and a number of researchers have tended to assign particular characteristics to the various types. Some formulaic items seem to fall into gray zones or into cracks, for example, sequences like and then or sooner or later are really a challenge to categorize. While we can be grateful for the growth of corpus analysis technology and techniques, which have certainly helped us to uncover new types of formulaic sequences, a sort of orthodoxy has set in which can be puzzling to an observer of the research activity—you might wonder how significant it is to determine a lexical bundle by means of frequency only, as compared to a sequence identified using frequency in combination with other statistical measures such as Mutual Information (see Chapter 2). In any event, a number of themes and patterns emerge from a review of the literature. Clearly, formulaic sequences can be classified in various ways, and the nature of the classifications and the criteria used to determine them has changed over time. As well, the classifications which exist are by no means firmly fixed, and some types of word strings are difficult to classify. Some categories overlap with others, and there is no firm consensus that all the categories are similarly processed semantically or psycholinguistically. It is natural to question whether the categories are even of much use to us as researchers or teachers. Does it matter to an educator whether a sequence is, for example, a phrasal verb or a collocation? Are many of the classifications really just artifacts of an early era of phraseology, more an abstract intellectual exercise than an effort to deal with concerns particular to applied research or language teaching?

Mental processing The chapter in this volume which deals with mental processing begins with a sort of challenge—is the notion that formulaic sequences are mentally processed and retrieved as wholes something which has been proven? It is really interesting that the large body of research in this area has still left us with unanswered questions. It has certainly become clear that formulaic sequences are probably to some extent holistically processed. But this is by

162

FUNDAMENTALS OF FORMULAIC LANGUAGE

no means always the case. It may be that a given sequence sometimes be dealt with holistically and sometimes constructed using controlled, conscious processing. What might be the cognitive and contextual factors influencing whether a sequence is dealt with holistically or not? Perhaps sequences are more likely situated on a spectrum of holistic processing, with, for example, collocations and idioms being on the holistic end of the spectrum, and lexical bundles or lexical phrases being dealt with in more constructed way. There may well be answers in second language acquisition theory to some of these concerns, waiting to be uncovered. The research shows some themes and patterns, not least of which is that formulaic sequences are probably mentally processed more or less holistically. This is probably largely a result of frequency and automatization. However, it is important to bear in mind that a large amount of the evidence for such holistic processing comes from research on idioms, which have specific characteristics not common to all formulaic sequences. Also, a great deal of the research in this area is highly experimental and does not deal much with real-life language use. Neurological research has been helpful, as studies of brain lateralization of language processing indicate that formulaic language is dealt with holistically in the right hemisphere. One of the most provocative results of the research in the area of mental processing is that higher frequency sequences appear to be processed holistically. This makes us wonder about the power of exposure to language through input, especially naturally occurring language. This may in fact support the tenets of the associative, usage-based schools of second language acquisition theory. But it is the job of experts such as applied researchers, language teachers, and assessors to determine whether the research evidence and the theories have any real implications for real-life language acquisition and use.

Acquisition Some interesting patterns and themes emerge from a look at the research on formulaic language in first and second language acquisition. We have seen that, at least in first language acquisition, formulaic sequences are segmented from the input, and subsequently broken down, and their constituent elements used for development of the language system, grammar, and so on. The research into adult second language acquisition of formulaic language focuses strongly on acquisition in naturalistic contexts, providing very little of practical value for classroom or formal language teaching practitioners. A further limitation of the research on adults is that much of it has involved either native

CURRENT AND FUTURE DIRECTIONS

163

speakers or high-proficiency second language speakers. Do second language learners acquire some formulaic language units as wholes at first and then break them down over time? Do they generally perceive the sequences as strings of separate items and only deal with them as wholes after long periods of focused instruction? Do both of these types of processing and acquisition work together or under different circumstances? Certainly, more research is needed to determine whether such a blended dynamic exists, and if so, how it operates and why. Some things have been established by the acquisition research so far. It has become apparent that formulaic sequences are acquired as wholes by children, and that they are likely retained as wholes by children and also later broken down and their constituent parts used as material for subsequent acquisition of morphosyntax and so on. As for adults, formulaic language appears to be dealt with holistically by adult native speakers and highly proficient second language speakers, and formulaic language may be used as a strategy for second language acquisition by adult learners. However, there are still a wide range of questions about how adult language learners perceive and acquire formulaic language. One question in need of investigation is how we can determine whether and how an adult learner might perceive and process formulaic language. Is it safe to assume that assumptions about formulaic language acquisition in children are applicable to adult second language acquisition?

Spoken language One area in which we have discovered some fascinating information about formulaic language is in its role in spoken language production—although little research has been conducted into its role in listening. Formulaic language has come to be seen as a foundation of the dynamics and the building blocks of speech, making up a large part of the words in speech, and playing a crucial part in the production (and, by implication, the comprehension) of fluent speech, the ways we achieve communicative goals in communication, and more. As is the case in other areas of study, however, it seems we are still only at the beginning of understanding what formulaic language does in spoken language. Deeper research and richer data are needed to help us understand more fully what importance formulaic language has in spoken communication. So far, we can identify several key areas of understanding. It is obvious that formulaic language is important in spoken language, comprising a large proportion of spoken language in a range of registers. Formulaic language may be a key element of second language speech fluency. Formulaic language is uttered with particular phonological characteristics. Formulaic language is key to the

164

FUNDAMENTALS OF FORMULAIC LANGUAGE

ways we use language to achieve particular communication goals—it is tightly linked to pragmatic competence. A number of important questions remain. How much of spoken language overall is formulaic? Are we really able to safely determine that a given word sequence is formulaic (see Chapter 2)? Do we know whether a formulaic sequence is stored and retrieved as a whole in spoken language?

Written language The overview of research into formulaic language and written language in this book shows us some important information about the value of formulaic language in academic writing in particular. Formulaic language is clearly an indicator of proficiency in writing in academic contexts, and it makes up a great proportion of the vocabulary in academic writing. It has important functions in the production (and, by implication, the comprehension) of competent writing, and it is a vital element in the communication of ideas in written discourse. The research reviewed in this book, and the lists of formulaic sequences presented from corpus-based studies still represent just the beginning of an understanding of what value formulaic language has to written language. We need to advance on several fronts. It is time for a body of research on formulaic language in nonacademic registers of written language. It is important to examine the psycholinguistic processes involved in producing and processing formulaic language in written language. For example, does the use of formulaic language facilitate faster or more efficient writing? There are also great opportunities for research into the roles of formulaic language in reading, examining, for example, whether formulaic language allows for more fluent, faster, effective reading. In general, the research into formulaic language in writing shows that formulaic language is important in written language and that it comprises a large proportion of written language. The strongest area of focus has been academic written discourse, and formulaic language seems to be a key element of competence in academic writing. Lists of frequent formulaic sequences in academic writing have been compiled by researchers from various perspectives using various types of corpora. Formulaic language may be integral to competent reading abilities, although it is apparent that much more research activity is needed in this area. Perhaps, the biggest lesson to take away from a survey of research into formulaic language and writing is that there are a great number of questions which have not been answered yet in any particular area. For example, how much of written language overall is formulaic? Is the research on lexical bundles (see Chapter 8) any more informative than the research discussed in

CURRENT AND FUTURE DIRECTIONS

165

this chapter? Is it of practical utility to distinguish between a lexical bundle and a formulaic sequence in written language? The research on formulaic language and reading skills is thin, so is it safe to assume that formulaic language is important in dealing with written language abilities overall? And finally, does the idea that formulaic language is stored and retrieved as a whole apply to written language processing, and if so, what practical value might it have for language teaching and assessment?

Lexical bundles A survey of the research on lexical bundles and academic discourse reveals some intriguing patterns and themes. A key component of the research is the tight definition of a lexical bundle as being defined within three parameters: frequency, range, and function. Researchers who conduct corpus-based research into multiword sequences using modified applications of these parameters, or who add other measures to their studies, are seen to not be investigating lexical bundles, but some other aspect of language. This type of research is surveyed in Chapter 7 of this book and kept separate from the lexical bundle research. The nonlexical bundle studies use different terms to refer to the multiword phenomena under investigation: Simpson-Vlach and Ellis (2010) call their units of analysis “formulaic language”; Wood and Appel (2014) employ the term “multiword constructions”; Liu (2012) uses the term “multiword constructions”. Despite variation in terms used and methods employed, it is obvious that the development of corpus analysis tools and the study of lexical bundles have uncovered once unobserved elements of language. Lexical bundles were basically invisible until corpus analysis brought them to light. The structural and functional characteristics of lexical bundles render them perceptually less salient than some other elements of vocabulary and types of formulaic sequences. The remarkable and paradigmshifting effects of the discovery of lexical bundles have uncovered the internal workings of academic discourse, providing us with an observable and tangible element of language which is woven deeply into the fabric of discourse. A number of themes and patterns emerge from a look at the research, which shows first and foremost that lexical bundles are an important type of formulaic sequence, extractable by means of corpus analysis software from a corpus using frequency and range criteria. Lexical bundles are not units of meaning but serve specific functions in discourse. In academic discourse, lexical bundles and their functions differ from discipline to discipline, and lexical bundles may be key to helping tertiary education students to achieve proficiency in academic writing.

166

FUNDAMENTALS OF FORMULAIC LANGUAGE

Of course, there are a number of issues yet to be addressed fully in the lexical bundle research. The proverbial elephant in the room is how we can be sure that knowledge and awareness of lexical bundles will help students to improve their writing abilities. It is one thing to have a repertoire of lexical bundles or to learn to use them, but it remains to be studied whether this actually does translate into improved writing ability. Similarly, we have not delved very deeply into investigating how we can teach the bundles. We have not spent much effort in studying lexical bundles in spoken language, and their contribution to spoken discourse. We have hitherto confined our research in this area to academic language almost exclusively, and have not looked at how lexical bundles work in nonacademic discourse.

Language teaching One area in which the research and practice in formulaic sequences has borne fruit is in teaching formulaic language. There is great potential for development of ways to facilitate acquisition of this important aspect of language. The intervention studies reported in this volume may have some frustrating results, but it is clear that interventions which encourage remembering, including chunking, flooding, and cognitive engagement appear to show promise. The work on integrating formulaic sequences into language pedagogy indicates that there are plenty of ways to integrate them into input, interaction, and even form focused instruction. There are many classroom activities which can encourage automatization of formulaic sequences, and these can be used in tandem with course plans and syllabus designs which include a focus on formulaic language. For example, investigations into the benefits of focused instruction of formulaic sequences on speech fluency show an effect of instruction. All in all, it appears there are a number of ways for teachers to integrate formulaic language into their plans or to devise activities and lessons with a specific focus on them. Many themes and patterns emerge from a consideration of the research and practice in this area, not least of which is that integrating formulaiclanguage-specific activities in second language classroom lessons can facilitate its acquisition. Methods such as input flooding, chunking, and cognitive engagement are useful in encouraging acquisition of formulaic language, and it is possible to integrate formulaic language into input, interaction, and form-focused instructional plans. There are a number of specific activity types which can help facilitate acquisition of formulaic sequences, and evidence shows that encouraging automatization of formulaic sequences can have positive effects on spoken fluency.

CURRENT AND FUTURE DIRECTIONS

167

As is the case with every aspect of formulaic language, there are a number of issues yet to be addressed fully in research and practice. We are not yet sure that acquisition of specific formulaic sequences contributes to overall language gain. We have not tried to identify specific strategies which learners can be taught which will help them to be more readily capable of perceiving and integrating formulaic sequences in their own language repertoire without explicit teaching. We have not adequately tested whether our instructional techniques have lasting effects.

The big issues Probably, the biggest issue in the study of formulaic language is the lack of a unifying theory to explain its nature and roles. As is probably fully clear from a perusal of this volume, we have been looking at the phenomenon or phenomena from a range of different perspectives over a fairly long period of time. The information we have accumulated is substantial and in most cases helpful in understanding many things about communication and language in general. But it still can appear scattershot or ungrounded, in some ways.

A theory or model of formulaic language As noted repeatedly in this book, the study of formulaic language has taken place from many different perspectives. While this has yielded a rich trove of knowledge, it at the same time lacks a common foundation in a theory of language and a revised perspective of language based on the role of formulaic language in acquisition, mental processing, language in use, and so on. Nevertheless, there are some areas of theory which have intriguing possibilities for us, including Mel’cuk’s meaning-text theory, usage-based models of language, construction grammar, and lexical semantics and priming, to name a few.

Mel’cuk’s Meaning-Text Theory We do have Mel’cuk’s (1998) Meaning-Text Theory, but it seems too focused on collocation to be broadly applicable. It does, however, take us a step toward an understanding of the mechanisms underlying the unitary meanings of many multiword phenomena. As described in Chapter 2 here, Mel’cuk pointed out that the relationships between the words in a collocation are what create its sense of unitary meaning. The relationships can be represented by a formula: “A collocation AB of language L is a semantic phraseme of L

168

FUNDAMENTALS OF FORMULAIC LANGUAGE

such that its signified ‘X’ is constructed out of the signified of one of its two constituent lexemes—say, of A—and a signified ‘C’ [‘X’ = ‘AC’] such that the lexeme B expresses ‘C’ only contingent on A” (Mel’cˇuk, 1998, p. 30). This formula is daunting, but it is meant to chart the nature and structure of multiword combinations, in which one of the words is leading, while the other is in a dependent role. In the end, however, both elements merge to create meaning as a single unit. Mel’cˇuk (1998) attempted to classify collocations on the basis of the relations between the components.

Usage-based models of language The emerging and growing area of usage-based theories of second language acquisition may have implications for overall understandings of how formulaic language works. Usage-based models deal with usage events. In other words, usageevents represent real-world examples of language use. It is these usage events that are used as the primary method of analyzing language processing, understanding, and acquisition. Also, usage-based models see language ability as a result of the total sum of linguistic experiences of a language user (usage-events), and the retention and subsequent categorization of all such memories held within the speaker’s mind. These stored memories of utterances represent both input and output, language the user is exposed to, and utterances the user produces. These memories are incorporated into one’s linguistic repertoire for future use. Usage-based models blur or even erase distinctions between syntax and semantics in favor of a focus on meaning. In usage-based models, grammar is not seen as a collection of rules that can be used to create clauses and phrases, but instead it is viewed as a collection of “symbolic units” that combine both meaning and form (Geeraerts, 2006). The patterns and regularities within the store of symbolic units within the mind of a language user create the base of “grammar” in usage-based models. Unlike generative grammar models of language which tend to deal with syntax and semantics separately, usagebased models tend to deal with them along a continuum. The distinctive basis of usage-based models is that meaningful representations are central, not syntax and lexicon as separate systems or individual items. This is where usage-based models help to create a theory of formulaic language—formmeaning units are the foundations of language. Another advantage of usagebased models of language is that they take into account the importance of frequency effects in language production and understanding. This is an important development in that it can be used to help explain some interesting linguistic features that are not effectively dealt with by Chomskyan/generative grammar approaches to language study.

CURRENT AND FUTURE DIRECTIONS

169

The term usage-based can be seen as an umbrella term that incorporates many different models. Kemmer and Barlow (2000) point out that all usagebased models hold nine basic views in common: 1 An intimate relationship between linguistic structures and instances of

language use. 2 A recognition of the importance of frequency. 3 A view that comprehension and production are integral to the

linguistic system. 4 A focus on the role of learning and experience in language acquisition. 5 Recognition of representations as emergent rather than fixed. 6 Attention to the importance of usage data in theory construction and

description. 7 Attention to the intimate relationship between usage, synchronic

variation, and diachronic change. 8 Awareness of the interconnectedness of linguistic system with other

nonlinguistic cognitive systems. 9 Appreciation of the crucial role of context in the operation of the

linguistics system. Above all is the common belief that grammar is not represented as a series of rules, but instead it is emergent from the linguistic repertoire of all usageevents and utterances the user has experienced both as input and output. Each of these events/utterances is categorized and stored within the mind of the language user where linguistic schemas are created. These “schemas” can be thought of as mental representations of the associations among linguistic units of various sizes that give the user the ability to make generalizations about acceptability of constructions within a language.

Construction grammar Linked to usage-based models of language is the notion that language is composed of form-meaning units which can be termed constructions. A main driver of this trend has been the work of Goldberg (1995, 2006) and her insights into cognitive construction grammar. In her work, grammatical constructions are seen as the foundational units of language. Constructions are links between form and meaning or function. All levels of grammar are involved with constructions, be they words, morphemes, idioms, or lexical

170

FUNDAMENTALS OF FORMULAIC LANGUAGE

patterns of various types. The actual form of a construction can be linked with any of a range of linguistic information—syntactic, morphological, or phonological, for example. Goldberg (2006) gives some categories of constructions and examples. Table 10.1 shows the categories.

Table 10.1 Examples of constructions from Goldberg (2006, p. 5) Morpheme

e.g., Pre-, -ing

Word

e.g., Avocado, anaconda, and

Complex word

e.g., Daredevil, shoo-in

Complex word (partially filled)

e.g., (N-s) (for regular plurals)

Idiom (filled)

e.g., Going great guns, give the Devil his due

Idiom (partially filled)

e.g., Jog (someone’s) memory, send (someone) to the cleaners

Covariational conditional

the Xer the Yer (e.g., The more you think about it, the less you understand)

Ditransitive (double object)

subj V Obj1 Obj2 (e.g., He gave her a fish taco; he baked her a muffin)

Passive

subj aux VPpp (PPby) (e.g., The armadillo was hit by a car)

Discussions of construction grammar go deep and require readers to draw on theoretical linguistic background information. However, the concept of a construction is clearly of value to those of us who seek to find a theoretical groundwork for the study of formulaic language. It appears that usage-based models of language acquisition and cognitive construction grammar may be of use in elaborating a theoretical basis for formulaic language study. Usage-based models handily integrate notions of holistic storage with frequency of input. They give us a psycholinguistic model of language development which fits well with the idea of multiword units being stored as wholes. Construction grammar, meanwhile, gives us a linguistic model of the structure of language which easily allows for formulaic language to take a primary place.

Lexical priming and lexical semantics Another area of potential growth of theoretical knowledge in this area is the lexical priming theory (Hoey, 2005). Hoey proposes that we reconceive of

CURRENT AND FUTURE DIRECTIONS

171

language first and foremost as a system of interactions among words rather than as a series of grammatical structures. Drawing on corpus analysis, Hoey’s observation is that words in essence prime each other in a complex but systematic web of collocation and association, and that a web of such associations is developed in the mind through exposure to real-world language in use. This theory is not only an intriguing and seemingly sensible alternative to more orthodox views of language, but it also has rich implications for how the phenomenon of formulaic language is actually a potential springboard to a deeper understanding of how language is constructed and acquired. Research which incorporates and tests Hoey’s theory is certainly to be welcomed. Similarly, the field of lexical semantics, as refined by Stubbs (2002), has great potential as a theoretical underpinning of the study of formulaic language. According to Stubbs, as based on corpus analysis, evaluative meanings are conveyed by word combinations and phrases, which are widely shared in particular discourse communities. Together with Hoey’s (2005) discoveries about lexical priming, lexical semantics would appear to be a logical place to look for theories or models of language which incorporate what we know about formulaic sequences.

Teaching models In spite of the bulk of the research and practical work which has dealt with teaching formulaic language (see Chapter 9), we are still not really at the stage where we can say we have a methodology of teaching formulaic language per se. Nor do we have much evidence of formulaic language affecting how we teach. It is frustrating to see that a major aspect of language, one which affects meaning and function, and which has an influence on every component of communicative competence, has so little impact on language teaching itself. Lewis tried to integrate the knowledge of formulaic language into the lexical approach (1993, 1997), arguing for a focus on the word as a primary unit of knowledge, and trying to foster a way of learning lexical chunks as wholes. Since then, others have attempted to integrate key aspects of the lexical approach into certain types of language teaching, for example, McCarthy and O’Dell (2002, 2004, 2006) in teaching particular types of formulaic sequences, Wood (2010) with a focus on speech fluency, and Boers and Lindstromberg (2009) in an update of Lewis’ methodology. However, we have yet to see a truly new way of teaching language which starts with formulaic sequences. This is all the more frustrating when we consider that adult learners have been shown to struggle with the formulaic aspect of language at all levels (Chapters 5 and 9, this volume). In the area of technology-assisted language learning, there seem to be places where the study of formulaic language fits perfectly.

172

FUNDAMENTALS OF FORMULAIC LANGUAGE

This is an area of importance and a place for practitioners and researchers to work together. More intervention studies and more effort to link knowledge of formulaic language with state of the art methods such as task-based teaching and focus-on-form are needed. Also, it is important to attempt to integrate this knowledge with post-method theory and practice such as Dogme (Meddings & Thornbury, 2009), Kumaravadivelu’s (2002) post-method ideas, and ecological approaches (Kramsch, 2008; van Lier, 2004) to name a few.

Research focuses We certainly seem obsessed with academic language in our research on formulaic language. Indeed, virtually the entire body of work on lexical bundles and formulaic language in writing is focused on academic writing. It is time to move the focus out of the academy and look at other areas of importance. For example, studying the use of formulaic language in service encounters, doctor–patient discourse, debates, counseling, and psychotherapeutic situations, to name a few, can provide insights into how these types of communication are structured, the nature of the discourse therein, and how to foster facility with this language to native speakers and second language learners. Ben Rejeb (2014) conducted a study of formulaic language used in official business meetings of university student government, using a corpus of meeting minutes spanning many years. More of this type of research is useful. Similarly, perhaps it is time to take the focus away from productive skills and toward receptive skills. We have studied the use of formulaic language in speech and in writing, but comparatively little in reading and listening. More research in these areas can help us not only to discover the ways in which language users handle formulaic sequences, but also to uncover more psycholinguistic processes. These types of research can help augment our ways of teaching language as well. A final area ripe for research is to examine the ways formulaic language works with discourse analysis and writing studies. In empirical discourse analysis such as conversation analysis, there are some obvious roles for formulaic language to play, but we have not yet examined them. In critical discourse analysis and the systemic-functional linguistic ways of deconstructing text and determining ideologies, there seem to be many ways that research can incorporate knowledge of formulaic language. Similarly, in writing studies, the ways in which people learn to write and the ways that discourses evolve seem to be natural places where the interface with formulaic language study can exist. Collaborative and multifaceted research projects can help to push formulaic language into the forefront and help us all to create an unlimited amount of new knowledge in a wide range of fields.

References Adel, A. & Erman, B. (2012). Recurrent words combinations in academic writing by native and non-native speakers of English: A lexical bundles approach. English for Specific Purposes, 31, 81–92. Al Hassan, L. & Wood, D. (2015). The effectiveness of focused instruction of formulaic sequences in augmenting L2 learners’ academic writing skills: A quantitative research study. Journal of English for Academic Purposes. Allerton, D. J. (1984). Three (or four) levels of coocurrence restriction. Lingua, 63, 17–40. Allerton, D. J., Nesselhauf, N., & Skandera, P. (Eds.). (2004). Phraseological units: Basic concepts and their application. Basel: Schwabe. Altenberg, B. (1993). Recurrent word combinations in spoken English. In: J. D. Arcy (Ed.), Proceedings of the Fifth Nordic Association for English Studies Conference (pp. 17–27). Reykjavik: University of Iceland. Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent word combinations. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and application (pp. 101–122). Oxford: Clarendon Press. Altenberg, B. & Tapper, M. (1998). The use of adverbial connectors in advanced Swedish learners’ written English. In: S. Granger (Ed.), Learner English on computer (pp. 3–18). New York: Longman. Ambridge, B., Rowland, C. F., Theakston, A. L., & Tomasello, M. (2006). Comparing different accounts of inversion errors in children’s non-subject whquestions: “What experimental data can tell us?” Journal of Child Language, 33, 519–557. Amosova, N. N. (1963). Osnovui angliiskoy frazeologii [The foundations of English phraseology]. Leningrad: University Press. Anderson, J. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Appel, R. & Wood, D. (in press). Recurrent word combinations in EAP test-taker writing: Differences between high and low proficiency levels. Language Assessment Quarterly. Arnon, I. & Snider, N. (2010). More than words: Frequency effects for multi-word phrases. Journal of Memory and Language, 62, 67–82. Ashby, M. (2006). Prosody and idioms in English. Journal of Pragmatics, 38(10), 1580–1597. Austin, J. L. (1962). How to do things with words. Oxford: Clarendon Pres. Bacha, N. N. (2002). Developing learners’ academic writing skills in higher education: A study for educational reform. Language and Education, 16(3), 161–177.

174

REFERENCES

Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press. Baddeley, A. D. (1988). Working memory. Oxford: Oxford University Press. Bahns, J., Burmeister, H., & Vogel, T. (1986). The pragmatics of formulas in L2 learner speech: Use and development. Journal of Pragmatics, 10, 693–723. Bamber, B. (1983). What makes a text coherent? College Composition and Communication, 34, 417–429. Bannard, C. & Lieven, E. (2012). Formulaic language in L1 acquisition. Annual Review of Applied Linguistics, 32, 3–16. Bannard, C. & Matthews, D. (2008). Stored word sequences in language learning: The effect of familiarity on children’s repetition of four-word combinations. Psychological Science, 19, 241–248. Bannard, C., Lieven, E., & Tomasello, M. (2009). Modeling children’s early grammatical knowledge. Proceedings of the National Academy of Sciences, 106, 17284–17289. Bardovi-Harlig, K. (2012). Formulas, routines, and conventional expressions in pragmatics research. Annual Review of Applied Linguistics, 32, 206–227. Bardovi-Harlig, K. & Bastos, M.-T. (2011). Proﬁciency, length of stay, and intensity of interaction and the acquisition of conventional expressions in L2 pragmatics. Intercultural Pragmatics, 8, 347–384. Bardovi-Harlig, K., Bastos, M.-T., Burghardt, B., Chappetto, E., Nickels, E., & Rose, M. (2010). The use of conventional expressions and utterance length in L2 pragmatics. In G. Kasper, H. T. Nguyen, D. R. Yoshimi & J. K. Yoshioka (Eds.), Pragmatics and language learning: Vol. 12 (pp. 163–186). Honolulu, HI: University of Hawaii, National Foreign Language Resource Center. Barron, A. (2003). Acquisition in interlanguage pragmatics: Learning how to do things with words in a study abroad context. Amsterdam: John Benjamins. Becker, J. D. (1975). The phrasal lexicon. TINLAP ‘75 Proceedings of the 1975 workshop on theoretical issues in natural language processing (pp. 60–63). Stroudsberg, PA: Association for Computational Linguistics. Ben Rejeb, R. (2014). Lexical bundles in meeting minutes: The case of a graduate students association. Unpublished Master of Arts Research Essay. Ottawa, School of Linguistics and Language Studies, Carleton University. Benson, M., Benson, E., & Ilson, R. (1997). The BBI dictionary of English word combinations. Amsterdam: John Benjamins. Beréndi, M., Csábi, S., & Kövecses, Z. (2008). Using conceptual metaphors and metonymies in vocabulary teaching. In F. Boers & S. Lindstromberg (Eds.), Cognitive linguistic approaches to teaching vocabulary and phraseology (pp. 65–99). Berlin, Germany: Mouton de Gruyter. Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Philadelphia, PA: John Benjamins. Biber, D. & Barbieri, F. (2007). Lexical bundles in university spoken and written registers. English for Specific Purposes, 26, 263–286. Biber, D. & Conrad, S. (1999). Lexical bundles in conversation and academic prose. In H. Hasselgard & S. Oksefjell (Eds.), Out of corpora: Studies in honour of Stig Johansson (pp. 181–190). Amsterdam: Rodopi. Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…lexical bundles in university teaching and textbooks. Applied Linguistics, 25, 371–405. Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written English. Harlow, UK: Pearson.

REFERENCES

175

Bod, R. (2000). The storage vs. computation of three-word sentences. Paper presented at AMLaP2000, University of Leiden, Leiden, the Netherlands. Bod, R. (2001). Sentence memory: Storage vs. computation of frequent sentences. Paper presented at CUNY 2001, University of Pennsylvania, Philadelphia, PA. Bod, R. (2006). Exemplar-based syntax: How to get productivity from exemplars. Linguistic Review, 23, 291–320. Boers, F. & Lindstromberg, S. (2005). Finding ways to make phrase learning feasible: The mnemonic effect of alliteration. System, 33, 225–238. Boers, F. & Lindstromberg, S. (2009). Optimizing a lexical approach to instructed second language acquisition. Basingstoke, UK: Palgrave Macmillan. Boers, F. & Lindstromberg, S. (2012). Experimental and intervention studies on formulaic sequences in a second language. Annual Review of Applied Linguistics, 32, 83–110. Boers, F., Demecheleer, M., & Eyckmans, J. (2004). Etymological elaboration as a strategy for learning ﬁgurative idioms. In P. Bogaards & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition and testing (pp. 53–78). Amsterdam, the Netherlands: John Benjamins. Boers, F., Eyckmans, J., & Stengers, H. (2007). Presenting ﬁgurative idioms with a touch of etymology: More than mere mnemonics? Language Teaching Research, 11, 43–62. Boers, F., Eyckmans, J., Kappel, J., Stengers, H., & Demecheleer, H. (2006). Formulaic sequences and perceived oral proﬁciency: Putting a lexical approach to the test. Language Teaching Research, 10, 245–261. Bolander, M. (1989). Prefabs, patterns and rules in interaction? Formulaic speech in adult learners’ L2 Swedish. In K. Hyltenstam & L. K. Obler (Eds.), Bilingualism across the lifespan: Aspects of acquisition, maturity, and loss (pp. 73–86). Cambridge: Cambridge University Press. Broca, P. (1863). Localisations des fonctions cérébrales. Siège de la faculté du langage articulé. Bulletin de la Société d”Anthropologie, 4, 200–208. Butler, C. S. (2003). Multi-word sequences and their relevance for recent models of functional grammar. Functions of Language, 10(2), 179–208. Bybee, J. (2000). The phonology of the lexicon. In M. Barlow and S. Kemmer (Eds.), Usage-based models of language (pp. 65–85). Stanford, CA: CSLI Publications. Bybee, J. (2002). Phonological evidence for exemplar storage of multiword sequences. Studies in Second Language Acquisition, 24, 215–221. Bybee, J. (2006). From usage to grammar: The mind’s response to repetition. Language, 82(4), 711–733. Byrd, P. & Coxhead, A. (2010). On the other hand: Lexical bundles in academic writing and in the teaching of EAP. University of Sydney Papers in TESOL, 5, 31–64. Cameron-Faulkner, T., Lieven, E., & Tomasello, M. (2003). A construction-based analysis of child-directed speech. Cognitive Science, 27, 843–873. Chafe, W. (1968). Idiomaticity as an anomaly in the Chomskyan paradigm. Foundations of Language, 4, 109–127. Chafe, W. L. (1980). Some reasons for hesitating. In H. W. Dechert & M. Raupach (Eds.), Temporal variables in speech (pp. 169–180). The Hague: Mouton. Chan, T.-P. & Liou, H.-C. (2005). Effects of web-based concordancing instruction on EFL students’ learning of verb-noun collocations. Computer Assisted Language Learning, 18, 231–250.

176

REFERENCES

Chen, L. (2010). An investigation of lexical bundles in ESP textbooks and electrical engineering introductory textbooks. In D. Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp. 107–128). London/ New York: Continuum. Chen, Y. & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning & Technology, 14(2), 30–49. Cieslicka, A. (2006). Literal salience in on-line processing of idiomatic expressions by second language learners. Second Language Research, 22, 115–144. Clevedon: Multilingual Matters. Columbus, G. (2010). Processing MWUs: Are MWU subtypes psycholinguistically real? In D. Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp. 194–210). New York/London: Continuum. Conklin, K. & Schmitt, N. (2008). Formulaic sequences: Are they processed more quickly than nonformulaic language by native and non-nativespeakers? Applied Linguistics, 29, 72–89. Conklin, K. & Schmitt, N. (2012). The processing of formulaic language. Annual Review of Applied Linguistics, 32, 45–61. Connor, U. (1990). Linguistic/rhetorical measures for international students persuasive writing. Research in the Teaching of English, 24, 67–87. Connor, U. (2003). Changing currents in contrastive rhetoric: Implications for teaching and research. In B. Kroll (Ed.), Exploring the dynamics of second language writing (pp. 218–240). Cambridge: Cambridge University Press. Cook, V. & Bassetti, B. (2005). An introduction to researching second language writing systems. Second Language Writing Systems, 1–67. Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Speciﬁc Purposes, 23, 397–423. Cortes, V. (2007) . Teaching lexical bundles in the disciplines: An example from a writing intensive history class. Linguistics and Education, 17 (4) pp. 391–406. Cortes, V., Jones, J., & Stoller, F. (2002, April). Lexical bundles in ESP reading and writing. Paper presented at TESOL Conference, Salt Lake City, Utah. Coulmas, F. (1979). On the sociolinguistic relevance of routine formulae. Journal of Pragmatics, 3(3/4), 239–266. Coulmas, F. (Ed.). (1981). Conversational routines. The Hague: Mouton. Cowie, A. P. (1992). Multiword lexical units and communicative language teaching. In P. J. L. Arnaud & H. Béjoint (Eds.), Vocabulary and applied linguistics (pp. 1–12). Basingstoke: Macmillan. Cowie, A. P. (1994). Phraseology. In R. E. Asher (Ed.), The encyclopedia of language and linguistics (pp. 3168–3171). Oxford: Pergamon. Cowie, A. P. (Ed.). (1998). Phraseology: Theory, analysis and application. Oxford: Clarendon Press. Coxhead, A. (1998). An academic word list. English language institute occasional publication No. 18. New Zealand: Victoria University of Wellington. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213–238. Coxhead, A. (2008). Phraseology and English for academic purposes: Challenges and opportunities. In F. Meunier & S. Granger (Eds.), Phraseology in language learning and teaching (pp. 149–161). Amsterdam: John Benjamins. Coxhead, A. & Byrd, P. (2007). Preparing writing teachers to teach the vocabulary and grammar of academic prose. Journal of Second Language Writing, 16(3), 129–147.

REFERENCES

177

Culpeper, J. (2010). Conventional impoliteness formula. Journal of Pragmatics, 42, 3232–3245. Dai, Z. & Ding, Y. (2010). Effectiveness of text memorization in EFL learning of Chinese students. In D. Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp. 71–87). New York/London: Continuum. Davis, J. (2007). Resistance to L2 pragmatics in the Australian ESL context. Language Learning, 57, 611–649. Davis, P. & Rinvolucri, M. (1989). Dictation: New methods, new possibilities. Cambridge: Cambridge University Press. De Jong, N. & Perfetti, C. A. (2011). Fluency training in the ESL classroom: An experimental study of fluency development and proceduralization. Language Learning, 61(2), 533–568. De Pablos-Ortega, C. (2011). The pragmatics of thanking reﬂected in the textbooks for teaching Spanish as a foreign language. Journal of Pragmatics, 43, 2411–2433. Dechert, H. W. (1980). Pauses and intonation as indicators of verbal planning in second-language speech productions: Two examples from a case study. In H. W. Dechert & M. Raupach (Eds.), Temporal variables in speech (pp. 271–285). The Hague: Mouton. Deschamps, A. (1980). The syntactical distribution of pauses in English spoken as a second language by French students. In H. W. Dechert & M. Raupach (Eds.), Temporal variables in speech (pp. 255–262). The Hague: Mouton. Ding, Y. (2007). Text memorization and imitation: The practices of successful Chinese learners of English. System, 35, 271–280. Durrant, P., & Mathews-Aydınlı, J. (2011). A function-first approach to identifying formulaic language in academic writing. English for Specific Purposes, 30(1), 58–72. Ellis, N. C. (1996). Sequencing in SLA: Phonological memory, chunking, and points of order. Studies in Second Language Acquisition, 18, 91–126. Ellis, N. C. (2002). Frequency effects in language processing. Studies in Second Language Acquisition, 24, 143–188. Ellis, N. C. (2012). Formulaic language and second language acquisition: Zipf and the phrasal teddy bear. Annual Review of Applied Linguistics, 32, 17–44. Ellis, N. C. & Simpson-Vlach, R. (2009). Formulaic language in native speakers: Triangulating psycholinguistics, corpus linguistics, and education. Corpus Linguistics and Linguistic Theory, 5, 61–78. Ellis, N. C., Frey, E., & Jalkanen, I. (2008). The psycholinguistic reality of collocation and semantic prosody (1): Lexical access. In U. Romer & R. Schulze (Eds.), Exploring the lexis-grammar interface. Amsterdam, the Netherlands: John Benjamins. Ellis, N. C., Simpson-Vlach, R., & Maynard, C. (2008). Formulaic language in native and second-language speakers: Psycholinguistics, corpus Linguistics, and TESOL. TESOL Quarterly, 42, 375–396. Ellis, R. (2005). Planning and task-based performance: Theory and research. In R. Ellis (Ed.), Planning and task performance in a second language (pp. 3–36). Amsterdam: John Benjamins. Ellis, R. & Yuan, F. (2005). The effects of careful within-task planning on oral and written task performance. In R. Ellis (Ed.), Planning and task performance in a second language (pp. 167–192). Amsterdam: John Benjamins. Ellis, R., Basturkmen, H., & Loewen, S. (2001). Learner uptake in communicative ESL lessons. Language Learning, 51, 281–318.

178

REFERENCES

Erman, B. (2006). Non-pausing as evidence of the idiom principle. Paper presented at the first Nordic Conference on Syntactic Freezes. University of Joensuu, Finland. May 19–20, 2006. Erman, B. (2007). Cognitive processes as evidence of the idiom principle. International Journal of Corpus Linguistics, 12(1), 25–53. Erman, B. & Warren, B. (2000). The idiom principle and the open choice principle. Text, 20(1), 29–62. Eskildsen, S. W. & Cadierno, T. (2007). Are recurring multi-word expressions really syntactic freezes? Second language acquisition from the perspective of usage-based linguistics. In M. Nenonen & S. Niemi (Eds.), Collocations and idioms 1: Papers from the First Nordic Conference on Syntactic Freezes. Joensuu, Finland: Joensuu University Press. Eyckmans, J., Boers, F., & Stengers, H. (2007). Identifying chunks: Who can see the wood for the trees? Language Forum, 33, 85–100. Ferris, D. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TESOL Quarterly, 28, 414–420. Firth, J. R. (1951). Modes of meaning. In J. R. Firth (Ed.), Essays and studies (pp. 118–149). London: Oxford University Press. Firth, J. R. (Ed.), (1957). Papers in linguistics 1934–1951. Oxford: Oxford University Press. Forsberg, F. (2010). Using conventional sequences in L2 French. International Review of Applied Linguistics, 48, 25–51. Fraser, B. (1970). Idioms within a transformational grammar. Foundations of Language, 6, 22–42. Freed, B. F. (1995). What makes us think that students who study abroad become fluent? In B. F. Freed (Ed.), Second language acquisition in a study abroad context (pp. 123–148). Philadelphia, PA: John Benjamins. Freudenthal, D., Pine, J. M., & Gobet, F. (2010). Explaining quantitative variation in the rate of Optional Inﬁnitive errors across languages: A comparison of MOSAIC and the Variational Learning Model. Journal of Child Language, 37, 643–669. Gatbonton, E. & Segalowitz, N. (1988). Creative automatization: Principles for promoting fluency within a communicative framework. TESOL Quarterly, 22(3), 473–492. Gatbonton, E. & Segalowitz, N. (2005). Rethinking communicative language teaching: A focus on access to fluency. The Canadian Modern Language Review, 61(3), 325–353. Gibbs, R., & Gonzales, G. (1985). Syntactic frozenness in processing and remembering idioms. Cognition, 20, 243–259. Goffman, E. (1971). Relations in public. London: Allen Lane/The Penguin Press. Goldberg, A. (1995). Constructions: A construction grammar approach to argument structure. Chicago, IL: University of Chicago Press. Goldberg, A. (2006). Constructions at work: The nature of generalization in language. Oxford: Oxford University Press. Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in spontaneous speech. New York, NY: Academic Press. Granger, S. & Rayson, P. (1998). Automatic profiling of learner texts. In S. Granger (Ed.), Learner English on computer (pp. 119–131). New York, NY: Longman. Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and applications (pp. 145–160). Oxford: Clarendon Press.

REFERENCES

179

Granger, S. & Meunier, F. (Eds.). (2008). Phraseology: An interdisciplinary perspective. Philadelphia, PA: John Benjamins. Granger, S. & Paquot, M. (2008). Disentangling the phraseological web. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp. 27–50). Philadelphia, PA: John Benjamins. Grant, I. E. & Bauer, L. (2004). Criteria for redefining idioms: Are we barking up the wrong tree? Applied Linguistics, 25, 38–61. Greenbaum, S. (1974). Some verb-intensifier collocations in American and British English. American Speech, 49, 79–89. Gries, S. T. (2008). Corpus-based methods in analyses of SLA data. In P. Robinson and N. C. Ellis (Eds.), Handbook of cognitive linguistics and second language acquisition (pp. 406–431). New York, NY: Routledge, Taylor & Francis. Gries, S. T. (2012). Frequencies, probabilities, association measures in usage-/ exemplar-based linguistics: some necessary clarifications. Studies in Language, 36(3), 477–510. Hakuta, K. (1974). Prefabricated patterns and the emergence of structure in second language acquisition. Language Learning, 24(2), 287–297. Handl, S. (2008). Essential collocations for learners of English: The role of collocational direction and weight. In F. Meunier & S. Granger (Eds.), Phraseology in foreign language learning and teaching (pp. 43–66). Amsterdam: John Benjamins. Hasselgren, A. (1994). Lexical teddy bears and advanced learners: A study into the ways Norwegian students cope with English vocabulary. International Journal of Applied Linguistics, 4, 237–260. Hickey, T. (1993). Identifying formulas in first language acquisition. Journal of Child Language, 20, 27–41. Hill, J. & Lewis, M. (1997). LTP dictionary of selected collocations. EMEA British English. Hilpert, M. (2008). New evidence against the modularity of grammar: Constructions, collocations, and speech perception. Cognitive Linguistics, 19, 491–511. Hockett, C. F. (1958). A course in modern linguistics. New York: MacMillan. Hoey, M. (2005). Lexical priming: A new theory of word and language. London/ New York: Routledge. Hornby, A. S., Gatenby, E. V., & Wakefield, H. (1942). Idiomatic and syntactic English dictionary. Oxford: Oxford University Press. Hsu, J.-Y. & Chiu, C.-Y. (2008). Lexical collocations and their relation to speaking proﬁciency of college EFL learners in Taiwan. Asian EFL Journal, 10, 181–204. Hulstijn, J. H. (2001). Intentional and incidental second language vocabulary learning: A reappraisal of elaboration, rehearsal and automaticity. In P. Robinson (Ed.), Cognition and second language instruction (pp. 258–286). Cambridge: Cambridge University Press. Hyland, K. (1998). Hedging in scientific research articles. Amsterdam: John Benjamins. Hyland, K. (2003). Second language writing. Cambridge: Cambridge University Press. Hyland, K. (2004). Disciplinary discourses: Social interactions in academic writing. University of Michigan Press. Hyland, K. (2006). English for academic purposes: An advanced resource book. Abingdon.

180

REFERENCES

Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction. Journal of Second Language Writing, 16(3), 148–164. Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4–21. Hyland, K. (2012). Bundles in academic discourse. Annual Review of Applied Linguistics, 32, 150–169. Hyland, K. & Hamp-Lyons, L. (2001). EAP: Issues and directions. Journal of English for Academic Purposes, 1, 1–12. Hymes, D. (1962). The ethnography of speaking. In T. Gladwin and W. C. Sturtevant (Eds.), Anthropology and human behaviour (pp. 13–53). Washington, DC: Anthropological Society of Washington. Jesperson, O. (1924). The philosophy of language. London: Allen and Unwin. Jiang, N. & Nekrasova, T. M. (2007). The processing of formulaic sequences by second language speakers. The Modern Language Journal, 91, 433–445. Jones, M., & Haywood, S. (2004). Facilitating the acquisition of formulaic sequences. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use (pp. 269–300). Amsterdam: John Benjamins. Jones, S. & Sinclair, J. (1974). English lexical collocations. Cahiers de Lexicologie, 24, 15–61. Katz, J. J. & Postal, P. (1963). The semantic interpretation of idioms and sentences containing them. MIT Research Laboratory of Electronics Quarterly Progress Report, 70, 275–282. Kecskes, I. (2000). Conceptual ﬂuency and the use of situation-bound utterances. Links & Letters, 7, 145–161. Kemmer, S. & Barlow, M. (2000). Introduction: A usage-based conception of language. In M. Barlow & S. Kemmer (Eds.), Usage-based models of language. Chicago, IL: University of Chicago Press. Keshavarz, M. H. & Salimi, H. (2007). Collocational competence and cloze test performance: A study of Iranian EFL learners. International Journal of Applied Linguistics, 17, 81–92. Kirjavainen, M., Theakston, A., & Lieven, E. (2009). Can input explain children’s me-for-I errors? Journal of Child Language, 36, 1091–1114. Kjellmer, G. (1984). A dictionary of English collocations: Based on the Brown Corpus. Oxford: Clarendon Press. Koprowski, M. (2005). Investigating the usefulness of lexical phrases in contemporary coursebooks. ELT Journal, 4, 322–332. Kormos, J. & Safar, A. (2008). Phonological short-term memory, working memory and foreign language performance in intensive language learning. Bilingualism: Language and Cognition, 11, 261–271. Kramsch, C. (2008). Ecological perspectives on foreign language education. Language Teaching, 41(3), 389–408. Krashen, S. & Scarcella, R. (1978). On routines and patterns in language acquisition and performance. Language Learning, 28(2), 283–300. Kress, G. (1994). Learning to write. London: Routledge. Kuiper, K. (1996). Smooth talkers: The linguistic performance of auctioneers and sportscasters. Mahwah, NJ: Lawrence Erlbaum. Kuiper, K. (2004). Formulaic performance in conventionalised varieties of speech. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use (pp. 37–54). Amsterdam: John Benjamins.

REFERENCES

181

Kuiper, K. & Haggo, D. (1985). The nature of ice hockey commentaries. In R. Barry and J. Acheson (Eds.), Regionalism and national identity: Multidisciplinary essays on Canada, Australia and New Zealand (pp. 167–175). Christchurch Association for Canadian Studies in Australia and New Zealand. Kumaravadivelu, B. (2002). Beyond methods: Macrostrategies for language teaching. New Haven, CT: Yale University Press. Kunin, A. V. (1955). English-Russian phraseological dictionary (2nd ed., 1967; 3rd ed., 1984). Moscow: Russkii Yazik. Lakoff, G. & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of Chicago Press. Laufer, B. (2011). The contribution of dictionary use to the production and retention of collocations in a second language. International Journal of Lexicography, 24, 29–49. Laufer, B. & Girsai, N. (2008). Form-focused instruction in second language vocabulary learning: A case for contrastive analysis and translation. Applied Linguistics, 29, 694–716. Laufer, B. & Roitblat-Rozovski, B. (2011). Incidental vocabulary acquisition: The effects of task type, word occurrence and their combination. Language Teaching Research, 15, 391–411. Laufer, B. & Waldman, T. (2011). Verb-noun collocations in second language writing: A corpus analysis of learners’ English. Language Learning, 61, 647– 672. Leki, I. (2006). The legacy of first-year composition. In P. K. Matsuda, C. OrtmeireHooper & Lennon, P. (1984). Retelling a story in English. In H. W. Dechert, D. Möhle & M. Raupach (Eds.), Second language productions (pp. 50–68). Tubingen: Gunter Narr Verlag. Lennon, P. (1990a). The advanced learner at large in the L2 community: Developments in spoken performance. International Review of Applied Linguistics in Language Teaching, 28, 309–321. Lennon, P. (1990b). Investigating fluency in EFL: A quantitative approach. Language Learning, 40(3), 387–417. Levy, S. (2003). Lexical bundles in professional and student writing (Doctoral dissertation) Retrieved from CSA Linguistics and Language Behaviour Abstracts. (ISSN: 0419–4209). Levy, S. (2008). Lexical bundles in professional and student writing. Saarbrucken: VDM Verlag. Lewis, M. (1997). Pedagogical implications of the lexical approach. In J. Coady & T. Huckin (Eds.), Second language vocabulary acquisition (pp. 255–270). Cambridge: Cambridge University Press. Lewis, M. (2000). Materials and resources for teaching collocation. In M. Lewis (Ed.), Teaching collocations: Further developments in the lexical approach (pp. 186–204). Boston, MI: Heinle. Lewis, M. (2008). Implementing the lexical approach. London: Heinle. Li, J. & Schmitt, N. (2009). The acquisition of lexical phrases in academic writing: A longitudinal case study. Journal of Second Language Writing, 18(2), 85–102. Lieven, E., Salomo, D., & Tomasello, M. (2009). Two-year-old children’s production of multiword utterances: A usage-based analysis. Cognitive Linguistics, 20, 481–508.

182

REFERENCES

Lin, P. (2010). The phonology of formulaic sequences: A review. In D. Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp. 174–193). New York/London: Continuum. Lin, P. (2012). Sound evidence: The missing piece of the jigsaw in formulaic language research. Applied Linguistics, 33(3), 342–347. Lin, P. M. S. & Adolphs, S. (2009). Sound evidence: Phraseological units in spoken corpora. In A. Barfield and H. Gyllstad (Eds.), Collocating in another language: Multiple interpretations. Basingstoke, UK: Palgrave MacMillan. Lindstromberg, S. & Boers, F. (2008a). The mnemonic effect of noticing alliteration in lexical chunks. Applied Linguistics, 29, 200–222. Lindstromberg, S. & Boers, F. (2008b). Phonemic repetition and the learning of lexical chunks: The mnemonic power of assonance. System, 36, 423–436. Liu, D. (2008). Idioms: Description, comprehension, acquisition, and pedagogy. New York/London: Routledge. Liu, D. (2011). The most frequently used English phrasal verbs in American and British English: A multicorpus examination. TESOL Quarterly, 45(4), 661–688. Liu, D. (2012). The most frequent multiword constructions in academic written English; A multi-corpus study. English for Specific Purposes, 31(1), 25–35. Llach, A. (2011). Lexical errors and accuracy in foreign language writing. Bristol: Multilingual Matters. Lord, A. (1960). The singer of tales. Cambridge, MA: Harvard University Press. Makkai, A. (1972). Idiom structure in English. The Hague: Mouton. Malinowski, B. (1935). Coral gardens and their magic: A study of the methods of tilling the soil and of agricultural rites in the Trobriand Islands. New York: Routledge. Manes, J. & Wolfson, N. (1981). The compliment formula. In F. Coulmas (Ed.), Conversational routine: Explorations in standardized communication situations and prepatterned speech (pp. 115–132). The Hague, the Netherlands: Mouton. Martin, K. I. & Ellis, N. C. (2012). The roles of phonological STM and working memory in L2 grammar and vocabulary learning. Studies in Second Language Acquisition, 34, 379–413. Martinez, R. & Murphy, V. A. (2011). Effect of frequency and idiomaticity on second language reading comprehension. TESOL Quarterly, 45, 267–290. Martinez, R. & Schmitt, N. (2012). A phrasal expressions list. Applied Linguistics, 33(3), 299–320. Matsuda, P. K., Ortmeier-Hooper, C., & You, X., (Eds.). (2006). The politics of second language writing. West Lafayette, Indiana: Parlor Press. McCarthy, M. & O’Dell, F. (2002). English idioms in use. Cambridge: Cambridge University Press. McCarthy, M. & O’Dell, F. (2004). English phrasal verbs in use intermediate. Cambridge: Cambridge University Press. McCarthy, M. & O’Dell, F. (2006). English collocations in use. Cambridge: Cambridge University Press. McCully, G. (1985). Writing quality, coherence, and cohesion. Research in the Teaching of English, 19, 269–282. McDonough, K. & Troﬁmovich, P. (2008). Using priming methods in second language research. London: Routledge. Meddings, L. & Thornbury, S. (2009). Teaching unplugged: Dogme in English language teaching. Peaslake UK: Delta.

REFERENCES

183

Mel’cˇuk, I. (1988). Semantic description of lexical units in an explanatory combinatory dictionary: Basic principles and heuristic criteria. International Journal of Lexicography, 1(3), 165–188. Mel’cˇuk, I. (1998). Collocations and lexical functions. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and application (pp. 23–53). Oxford: Clarendon Press. Meunier, F. (2012). Formulaic language and language teaching. Annual Review of Applied Linguistics, 32, 111–129. Mitchell, T. F. (1971). Linguistic ‘goings-on’: Collocations and other lexical matters arising on the syntactic record. Archivum Linguisticum, 2 (new series), pp. 35–69. Möhle, D. (1984). A comparison of the second language speech production of different native speakers. In H. W. Dechert, D. Möhle & M. Raupach (Eds.), Second language productions (pp. 26–49). Tubingen: Gunter Narr Verlag. Moon, R. (1977). Vocabulary connections: Multi-word items in English. In M. McCarthy (Ed.), Vocabulary: Description, acquisition and pedagogy (pp. 40–63). Cambridge: Cambridge University Press. Moon, R. (1997). Vocabulary connections: Multi-word items in English. In M. McCarthy (Ed.), Vocabulary: Description, acquisition and pedagogy (pp. 40–63). Cambridge: Cambridge University Press. Moon, R. (1998). Frequencies and forms of phrasal lexemes in English. In A. P. Cowie (Ed.), Phraseology. Theory, analysis, and applications (pp. 79–100). Oxford: Clarendon Press. Myles, F. (2004). From data to theory: The over-representation of linguistic knowledge in SLA. Transactions of the Philological Society, 102, 139–168. Myles, F., Hooper, J., & Mitchell, R. (1998). Rote or rule? Exploring the role of formulaic language in classroom foreign language learning. Language Learning, 48 (3), 323–363. Myles, F., Mitchell, R., & Hooper, J. (1999). Interrogative chunks in French L2: A basis for creative construction. Studies in Second Language Acquisition, 21, 49–80. Nassaji, H. (1999). Towards integrating form-focused instruction and communicative interaction in the second language classroom: Some pedagogical possibilities. Canadian Modern Language Review, 55(3), 386–404. Nation, I. S. P. (1990). Teaching and learning vocabulary. Boston, MA: Heinle & Heinle. Nation, P. (1989). Improving speaking fluency. System, 17(3), 377–384. Nattinger, J. R. & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford: Oxford University Press. Nesselhauf, N. (2004). What are collocations? In D. J. Allerton, N. Nesselhauf & P. Scandera (Eds.), Phraseological units: Basic concepts and their application (pp. 1–21). Basel: Schwabe. Nesselhauf, N. (2005). Collocations in a learner corpus. Philadelphia, PA: John Benjamins. O’Brien, I., Segalowitz, N., Collentine, J., & Freed, B. (2006). Phonological memory and lexical, narrative, and grammatical skills in second-language oral production by adult learners. Applied Psycholinguistics, 27, 377–402. O’Donnell, M. B., Romer, U., & Ellis, N. C. (2013).The development of formulaic sequences in first and second language writing: Investigating effects of frequency, association, and native norms. International Journal of Corpus Linguistics, 18,(1), 83–108. Opie, I. & Opie, P. (1959). The lore and language of schoolchildren. Oxford: Oxford University Press.

184

REFERENCES

Palmer, H. E. (1933). Second interim report on English collocations. Institute for Research in English Teaching. Palmer, H. E. (1938). A grammar of English words. London: Longmans Green. Paltridge, B. (2004). Academic writing. Language Teaching, 37(2), 87–105. Paqout, M. (2008). Exemplification in learning writing: A cross-linguistic perspective. In F. Meunier & S. Granger (Eds.), Phraseology in foreign language learning and teaching (pp. 101–119). Amsterdam: John Benjamins. Paré, A. (2009). What we know about writing, and why it matters. Compendium 2, 2(1), 1–12. Parry, M. (1928). L’Epithète traditionelle dans Homère. Paris: Société Editrice Les Belles Lettres. Parry, M. (1930). Studies in the epic technique of oral verse-making. I: Homer and Homeric style. Harvard Studies in Classical Philology, 41, 73–147. Parry, M. (1932). Studies in the epic technique of oral verse-making. II: The Homeric language as the language of an oral poetry. Harvard Studies in Classical Philology, 43, 1–50. Pawley, A. (1986). Lexicalization. In D. Tannen & J. Alatis (Eds.), Georgetown round table in languages and linguistics: The interdependence of theory, data and applications (pp. 98–120). Washington, DC: Georgetown University. Pawley, A. (1991). How to talk cricket: On linguistic competence in a subject matter. In Currents in R. Blust (Ed.), Pacific linguistics: Papers on Austronesian languages and ethnolinguistics in honour of George Grace (pp. 339–368). Canberra: Pacific Linguistics. Pawley, A. (2007). Developments in the study of formulaic language since 1970. In P. Skandera (Ed.), Phraseology and culture in English (pp. 3–45). Berlin/New York: Mouton de Gruyter. Pawley, A. & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication (pp. 191–226). New York, NY: Longman. Peters, A. M. (1977). Language learning strategies: Does the whole equal the sum of the parts? Language, 53(3), 560–573. Peters, A. M. (1983). Units of language acquisition. Cambridge: Cambridge University Press. Peters, E. (2012). Learning German formulaic sequences: The effect of two attentiondrawing techniques. Language Learning Journal, 40, 65–79. Pinker, S. (1999). Words and rules: The ingredients of language. New York, NY: HarperCollins. Poos, D. & Simpson, R. (2002). Cross-disciplinary comparisons of hedging: Some findings from the Michigan corpus of academic spoken English. In R. Reppen, S. Fitzmaurice & B. Douglas (Eds.), Using corpora to explore linguistic variation (pp. 3–23). Amsterdam: John Benjamins. Raimes, A. (2002). The steps in planning a writing course and training teachers of writing. In J. C. Richards & W. A. Renandya (Eds.), Methodology in language teaching: An anthology of current practice (pp. 306–314). Cambridge: Cambridge University Press. Raupach, M. (1980). Temporal variables in first and second language speech production. In H. W. Dechert & M. Raupach (Eds.), Temporal variables in speech (pp. 263–270). The Hague: Mouton.

REFERENCES

185

Rehbein, J. (1987). On fluency in second language speech production. In H. W. Dechert & M. Raupach (Eds.), Psycholinguistic models of language production (pp. 97–105). Norwood, NJ: Ablex. Reiter, R. M., Rainey, I., & Fulcher, G. (2005). A comparative study of certainty and conventional indirectness: Evidence from British English and Peninsular Spanish. Applied Linguistics, 26, 1–31. Ricard, E. (1986). Beyond fossilization: A course on strategies and techniques in pronunciation for advanced adult learners. TESL Canada Journal Special Edition, 1, 243–253. Riggenbach, H. (1991). Toward an understanding of fluency: A microanalysis of nonnative speaker conversations. Discourse Processes, 14, 423–441. Riggenbach, H. (1999). Discourse analysis in the language classroom. Ann Arbor, MI: University of Michigan Press. Robinson, P. (1995). Attention, memory and the “noticing” hypothesis. Language Learning, 45, 283–331. Roever, C. (2005). Testing ESL pragmatics: Development and validation of a webbased assessment battery. Berlin, Germany: Peter Lang. Rowland, C. F., & Pine, J. M. (2000). Subject-auxiliary inversion errors and whquestion acquisition: “What children do know!” Journal of Child Language, 27, 157–181. Rumsey, A. (2001). Tom yaya kange: A metrical narrative genre from the New Guinea Highlands. Journal of Linguistics Anthropology, 11(2),193–239. Sadoski, M. (2005). A dual coding view of vocabulary learning. Reading & Writing Quarterly, 21, 221–238. Salazar, D. & Verdaguer, I. (2009). Polysemous verbs and modality in native and non-native argumentative writing: A corpus-based study. International Journal of English Studies, 9, 209–219. Schauer, G. A. & Adolphs, S. (2006). Expressions of gratitude in corpus and DCT data: Vocabulary, formulaic sequences, and pedagogy. System, 34, 119–134. Schloff, L. & Yudkin, M. (1991). Smart speaking: Sixty-second strategies. New York, NY: Henry Holt and Company. Schmidt, R. (1992). Psycholinguistic mechanisms underlying second language fluency. Studies in Second Language Acquisition, 14, 357–385. Schmidt, R. W. (1983). Interaction, acculturation, and the acquisition of communicative competence: A case study of an adult. In N. Wolfson & E. Judd (Eds.), Sociolinguistics and language acquisition (pp. 137–174). Rowley, MA: Newbury House. Schmitt, N. (2004). Formulaic sequences: Acquisition, processing and use. Philadelphia, PA: John Benjamins. Schmitt, N. (2010). Researching vocabulary: A vocabulary research manual. London: Palgrave Macmillan. Schmitt, N., Grandage, S. & Adolphs, S. (2004). Are corpus-derived recurrent clusters psycholinguistically valid? In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing, and use (pp. 127–151). Philadelphia, PA: John Benjamins. Scott, M. (2007). Oxford Wordsmith Tools: Version 5.0. Released June 2007 from http://www.lexically.net. Searle, J. (1968). Speech acts: An essay on the philosophy of language. Cambridge: Cambridge University Press.

186

REFERENCES

Shariﬁan, F. (2008). Cultural schemas in L1 and L2 compliment responses: A study of Persian-speaking learners of English. Journal of Politeness Research: Language, Behavior, Culture, 4, 55–80. Shei, C. C. (2008). Discovering the hidden treasure on the internet: Using Google to uncover the veil of phraseology. Computer Assisted Language Learning, 21(1), 67–85. Siﬁanou, M. & Tzanne, A. (2010). Conceptualizations of politeness and impoliteness in Greek. Intercultural Pragmatics, 7, 661–687. Silva, T. (1993). Toward an understanding of the distinct nature of L2 writing: The ESL research and its implications. TESOL Quarterly, 27(4), 657–677. Simpson, R. (2004). Stylistic features of academic speech: The role of formulaic expressions. In T. Upton & U. Connor (Eds.), Discourse in the professions: Perspectives from corpus linguistics (pp. 37–64). Amsterdam: John Benjamins. Simpson-Vlach, R. & Ellis, N. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics, 31, 487–512. Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press. Sinclair, J. 2005. Corpus and text – basic principles. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice (pp. 1–16). Oxford: Oxbow Books. Siyanova-Chanturia, A., Conklin, K., & Schmitt, N. (2011). Adding more fuel to the ﬁre: An eye-tracking study of idiom processing by native and non-native speakers. Second Language Research, 27, 1–22. Siyanova-Chanturia, A., Conklin, K., & van Heuven, J. B. (2011). Seeing a phrase “time and again” matters: The role of phrasal frequency in the processing of multiword sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 776–784. Skandera, P. (2004). What are idioms? In D. J. Allerton, N. Nesselhauf & P. Skandera (Eds.), Phraseological units: Basic concepts and their applications (pp. 23–36). Basel: Schwabe AG. Smirnitsky, A. I. (1956). Lexicology of the English language. Moscow: Foreign Literature. Sosa, A. & MacFarlane, J. (2002). Evidence for frequency-based constituents in the mental lexicon: Collocations involving the word of. Brain and Language, 83, 227–236. Spears, R. A., Birner, B., & Kleinelder, S. (1994). NTC’s dictionary of everyday American English expressions. New York, NY: McGraw-Hill. Staehr, L. S. (2009). Vocabulary knowledge and advanced listening comprehension in English as a foreign language. Studies in Second Language Acquisition, 31, 577–607. Staples, S., Egebert, J., Biber, D., & McClair, A. (2013). Formulaic sequences and EAP writing development: Lexical bundles in the TOEFL iBT writing section. Journal of English for Academic Purposes, 12(3), 214–225. Steinel, M. P., Hulstijn, J. H., & Steinel, W. (2007). Second language idiom learning in a paired-associate paradigm: Effects of direction of learning, direction of testing, idiom imageability, and idiom transparency. Studies in Second Language Acquisition, 29, 449–484. Stengers, H., Boers, F., Housen, A., & Eyckmans, J. (2010). Does “chunking” foster chunk-uptake? In S. De Knop, F. Boers & A. De Rycker (Eds.), Fostering

REFERENCES

187

language teaching efﬁciency through cognitive linguistics (pp. 99–117). Berlin, Germany: Mouton de Gruyter. Stubbs, M. (2002). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell. Sugaya, N., & Shirai, Y. (2009). Can L2 learners productively use Japanese tense aspect markers? A usage-based approach. In R. Corrigan, E. Moravcsik, H. Ouali & K. Wheatley(Eds.), Formulaic language: Vol. 2. Acquisition, loss, psychological reality, functional applications. Amsterdam, the Netherlands: John Benjamins. Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In S. Gass & C. G. Madden (Eds.), Input in second language acquisition (pp. 235–253). New York, NY: Newbury House. Swain, M. (1995). Three functions of output in second language learning. In G. Cook & B. Seidlhofer (Eds.), Principles and practice in the study of language (pp. 125–144). Oxford: Oxford University Press. Swinney, D. & Cutler, A. (1979). The access and processing of idiomatic expressions. Journal of Verbal Learning and Verbal Behaviour, 18, 523–534. Taguchi, N. (2007). Chunk learning and the development of spoken discourse in a Japanese as a foreign language classroom. Language Teaching Research, 11, 433–457. Taguchi, N. (2011).The effect of L2 proﬁciency and study-abroad experience on pragmatic comprehension. Language Learning, 61, 1–36. ten Hacken, P. (2004). What are compounds? In D. J. Allerton, N. Nesselhauf & P. Skandera (Eds.), Phraseological units: Basic concepts and their applications (pp. 53–66). Basel: Schwabe AG. Terkouraﬁ, M. (2002). Politeness and formulaicity: Evidence from Cypriot Greek. Journal of Greek Linguistics, 3, 179–201. Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA & London, UK: Harvard University Press. Towell, R. (1987). Variability and progress in the language development of advanced learners of a foreign language. In R. Ellis (Ed.), Second language acquisition in context (pp. 113–127). Toronto: Prentice-Hall. Traverso, V. (2006). Aspects of polite behaviour in French and Syrian service encounters: A data-based comparative study. Journal of Politeness Research: Language, Behavior, Culture, 2, 105–122. Tremblay, A. & Baayen, H. (2010). Holistic processing of regular four-word sequences: A behavioural and ERP study of the effects of structure, frequency, and probability on immediate free recall. In D. Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp. 151–172). New York/London: Continuum. Tremblay, A., Derwing, B., Libben, G., & Westbury, C. (2011). Processing advantages of lexical bundles: Evidence from self-paced reading and sentence recall tasks. Language Learning, 61(2), 569–613. Tucker, G. (2005). Extending the lexicogrammar: Towards a more comprehensive account of extraclausal, partially clausal, and non-clausal expressions in spoken discourse. Language Sciences, 27, 679–709. Underwood, G., Schmitt, N., & Galpin, A. (2004). The eyes have it: An eyemovement study into the processing of formulaic sequences. In N. Schmitt

188

REFERENCES

(Ed.), Formulaic sequences: acquisition, processing, and use (pp. 153–172). Philadelphia, PA: John Benjamins. Van Lancker, D., & Kempler, D. (1987). Comprehension of familiar phrases by left-but not by right-hemisphere damaged patients. Brain and Language, 32, 265–277. Van Lancker, D., Canter, G., & Terbeek, D. (1981). Disambiguation of ditropic sentences: Acoustic and phonetic cues. Journal of Speech and Hearing Research, 24, 330–335. Van Lancker-Sidtis, D. (2003). Auditory recognition of idioms by ﬁrst and second language speakers of English. Applied Psycholinguistics, 24, 45–57. Van Lancker-Sidtis, D. & Postman, W. A. (2006). Formulaic expressions in spontaneous speech of left- and right-hemisphere damaged subjects. Aphasiology, 20, 411–426. VanLier, L. (2004). The ecology and semiotics of language learning: A sociocultural perspective. Dordrecht: Kluwer Academic. Vinogradov, V. V. (1947). Ob osnovnuikh tipakh frazeologicheskikh edinits v russom yazike. [About the basic types of phraseological units in English]. In A. A. Shakhmatov (Ed.), Sbornik statei i materialov [The collection of articles and materials] (pp. 339–364). Moscow: Nauka. Vinogradov, V. V. (1977). Ob osnovnuikh tipakh frazeologicheskikh edinits v russom yazike. [About the basic types of phraseological units in English]. In V. V. Vinogradov (Ed.), Izbrannie trudi. Leksikologia i leksikografia [Selected works. Lexicology and lexicography] (pp. 140–161). Moscow: Nauka. Virtanen, T. (1998). Direct questions in argumentative student writing. In S. Granger (Ed.), Learner English on computer (pp. 94–106). New York, NY: Longman. Wajnryb, R. (1990). Grammar dictation. Oxford: Oxford University Press. Walker, I. & Utsumi, T. (2006). Memorizing dialogues: The case for “performative exercises.” In W. M. Can, K. N. Chin & T. Suthiwan (Eds.), Foreign language teaching in Asia and beyond: Current perspectives and future directions (pp. 243–269). Singapore: Centre for Language Studies. Webb, S., Newton, J., & Chang, A. C. S. (2013). Incidental learning of collocation. Language Learning, 63(1), 91–120. Weinert, R. (1995). The role of formulaic language in second language acquisition: A review. Applied Linguistics, 16(2), 180–205. Weinreich, U. (1969). Problems in the analysis of idioms. In J. Puhvel (Ed.), Substance and structure of language (pp. 23–81). Berkeley: University of California Press. Wen, Z. (2011). Working memory and second language learning. Bristol, UK: Multilingual Matters. Williams, E. (1981). On the notions lexically related and head of a word. Linguistic Inquiry, 12, 245–274. Willis, D. (1990). The lexical syllabus: A new approach to language teaching. London: Harper Collins. Wong, M. L.-Y. (2010). Expressions of gratitude by Hong Kong speakers of English: Research from the International Corpus of English in Hong Kong (ICE-HK). Journal of Pragmatics, 42, 1243–1257. Wong-Fillmore, L. (1976). The second time around: Cognitive and social strategies in second language acquisition. Unpublished doctoral dissertation, Stanford University.

REFERENCES

189

Wood, D. (1998). Making the grade: An interactive course in English for academic purposes. Toronto: Prentice Hall Allyn and Bacon. Wood, D. (2001). In search of fluency: What is it and how can we teach it? Canadian Modern Language Review, 57(4), 573–589. Wood, D. (2002). Formulaic language in acquisition and production: Implications for teaching. TESL Canada Journal, 20(1), 1–15. Wood, D. (2006). Uses and functions of formulaic sequences in second language speech: An exploration of the foundations of fluency. Canadian Modern Language Review, 63(1), 13–33. Wood, D. (2009a). Preparing ESP learners for workplace placement. ELT Journal, 63(4), 323–331. Wood, D. (2009b). Effects of focused instruction of formulaic sequences on fluent expression in second language narratives: A case study. Canadian Journal of Applied Linguistics, 12(1), 39–57. Wood, D. (2010a). Formulaic language and second language speech fluency: Background, evidence, and classroom applications. London/New York: Continuum. Wood, D. (2010b). Lexical clusters in an EAP textbook corpus. In D. Wood (Ed.), Perspectives on formulaic language: Acquisition and communication (pp. 8–106). New York/London: Continuum. Wood, D. & Appel, R. (2013). Lexical bundles in first year university business and engineering textbooks: A resource for EAP. In H. M. McGarrell & D. Wood (Eds.), Special research symposium issue of CONTACT. Refereed Proceedings of TESL Ontario Research Symposium, October 2012. Vol. 39, No. 2 (pp. 92–102). Wood, D. C. & Appel, R. (2014). Multiword constructions in first year university textbooks and in EAP textbooks. Journal of English for Academic Purposes, 15, 1–13. Wood, D. & Namba, K. (2013). Focused instruction of formulaic language: Use and awareness in a Japanese university class. The Asian Conference on Language Learning Official Conference Proceedings 2013, pp. 203–212. Wood, M. M. (1981). A definition of idiom. Bloomington: University of Indiana Linguistics Club. Wray, A. (1999). Formulaic sequences in second language teaching: Principles and practice. Applied Linguistics, 21(4), pp. 463–489. Wray, A. & Fitzpatrick, T. (2008). Why can’t you just leave it alone? Deviations from memorized language as a gauge of nativelike competence. In F. Meunier & S. Granger (Eds.), Phraseology in language learning and teaching (pp. 123–148). Amsterdam: John Benjamins. Wray, A. & Perkins, M. R. (2000). The functions of formulaic language: An integrated model. Language and Communication, 20, 1–28. Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press. Wray, A. (2004). “Here’s one I prepared earlier”: Formulaic language learning on television. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use (pp. 249–268). Amsterdam/Philadelphia, PA: John Benjamins. Wray, A. (2008). Formulaic language: Pushing the boundaries. Oxford: Oxford University Press.

190

REFERENCES

Wray, A. & Namba, K. (2003). Use of formulaic language by a Japanese-English bilingual child: A practical approach to data analysis. Japanese Journal for Multilingualism and Multiculturalism, 9(1), 24–51. Wu, S., Witten, I. H., & Franken, M. (2010). Utilizing lexical data from a webderived corpus to expand productive collocation knowledge. ReCALL, 22, 83–102. Yeung, L. (2009). Use and misuse of ‘besides’: A corpus study comparing native speakers’ and learners’ English. System, 37, 330–342. Yorio, C. (1980). Conventionalized language forms and the development of communicative competence. TESOL Quarterly, 16(4), 433–442. Yorio, C. (1989). Idiomaticity as an indicator of second language proficiency. In K. Hyltenstam & L. K. Obler (Eds.), Bilingualism across the lifespan: Aspects of acquisition, maturity, and loss (pp. 55–71). Cambridge: Cambridge University Press. Zhu, W. (2006). Understanding context for writing in university content classrooms. In P. K. Matsuda, C. Ortmeire-Hooper & X. You (Eds.), The politics of second language writing: In search of the promised land (pp. 129–146). West Lafayette, IN: Parlor Press.

Index Academic Formulas List (AFL) 82, 110–12, 123 academic textbook language 131–2 Academic Word List (AWL) 82, 114 academic writing 16, 45, 103–8, 110, 117–19, 124, 131, 134–7, 164–5, 172 beneficial effects, lexical bundles 134–5 corpus-focused studies 108–17 historical perspectives 105–6 learner corpora 106–8 lists of formulaic sequences 108–17 nature 102–5 acquisition theory adult language 74–5 associative models 76 child language 67–8 developmental sequence 78–9 flooding the input 142, 144 Focus on Form (FonF) pedagogy 147 formulaic language, classroom lessons 69–70, 166 lexical bundles 124 power of memory in language 76, 146 pragmatic competence 95, 139 second language 25, 57–8, 65–6, 157, 162–3, 168 speed of speech 87 thematic context 145 Adel, A. 107 Adolphs, S. 21, 63, 92, 96 Al Hassan, L. 108 Allerton, D. J. 2, 43 Ambridge, B. 74 Amosova, N. N. 7, 39 Anderson, J. 56

Annual Review of Applied Linguistics 35 anthropologists 4–6 Appel, R. 21, 46, 107, 112–16, 124, 132–3, 136, 145, 165 Arnon, I. 62, 77, 101 Ashby, M. 92 Austin, J. L. 6 Baayen, H. 22, 62 Bacha, N. N. 102 Bachman, L. F. 93, 95 Baddeley, A. D. 56 Bahns, J. 70 Baker, P. 108, 124, 133, 135 Bannard, C. 72–3, 77 Barbieri, F. 106, 122, 133, 135 Bardovi-Harlig, K. 93–6 Barlow, M. 169 Barron, A. 95–6 Bassetti, B. 102 Bastos, M.-T. 96 Basturkmen, H. 147 Bauer, L. 42 Becker, J. D. 44 Ben Rejeb, R. 172 Benson, E. 144 Beréndi, M. 143 Biber, D. 16, 21, 45, 82, 106–7, 114, 122, 124, 126, 131–5 bilingual dictionary 142 Birner, B. 144 Bod, R. 62, 64, 77, 101 Boers, F. 81, 107, 137, 139–44, 146, 148, 171 Bolander, M. 15, 75 British National Corpus (BNC) 32–3, 46, 84–5, 109–10, 160 Broca, P. 6 Burmeister, H. 70

192

INDEX

Butler, C. S. 10 Bybee, J. 29, 92 Byrd, P. 104, 114–16, 119, 134 Cadierno, T. 78 Cameron-Faulkner, T. 72 Canadian Academic English Language Assessment (CAEL) 107 Canter, G. 61 Chafe, W. L. 7 Chan, T.-P. 142 Chang, A. C. S. 142 Chen, L. 131, 144 Chen, Y. 108, 124, 133, 135 Chiu, C.-Y. 140 Cieslicka, A. 61 COBUILD 49, 143 Collentine, J. 76 collocation anomalous 43 Firth’s definition 38 frequency-based 38–9 lexicography 40 phraseological approaches 39–40 taxonomy 29 two-word 11, 45 collocation researchers 4–5 Columbus, G. 140 complexity, phonetic 25–6, 31 Conklin, K. 22, 61–2, 77, 140 Connor, U. 102, 106 Conrad, S. 16, 21, 45, 107, 122 Cook, V. 102 Corpus of Contemporary American English (COCA) 22, 32–3, 109, 160 Cortes, V. 16, 21, 45, 122, 124, 131, 133, 135 Coulmas, F. 8, 25, 29–30, 89 Cowie, A. P. 2, 39–40, 104 Coxhead, A. 82, 104, 105, 114–16, 119, 134 criteria checklists Coulmas 25 frequency statistics 23–4 gradience of formulaicity 26–7 judgment procedure 28–30 Peters 25–6, 30–2

themes and patterns 32–3 Wood 27–8, 30–2 Wray and Namba 26–7, 30–2 Csábi, S. 143 Culpeper, J. 94 current research acquisition 162–3 categorization 161 focus of academic writing 172 grammar construction 169–70 history 159–60 identification 160 language teaching 166–7 lexical bundles 165–6 Mel’cuk’s Meaning-Text Theory 167–8 mental processing 161–2 semantics and priming, lexicals 170–1 spoken language 163–4 teaching models 171–2 usage-based theories 168–9 written language 164–5 Cutler, A. 60–1 Dai, Z. 140, 147 Davis, J. 95 Davis, P. 151 DeCarrico, J. S. 2, 10, 29–31, 44–5, 89, 143 Dechert, H. W. 78, 87 De Jong, N. 146, 151 Demecheleer, H. 140 Demecheleer, M. 143 De Pablos-Ortega, C. 96 Derwing, B. 62, 77 Deschamps, A. 87 Ding, Y. 140, 147 discourse analysis activities (class room) 153–6 discourse organizing bundles 127 Ellis, N. 15, 19, 21, 24, 32–3, 46, 57–8, 75–8, 82–3, 101, 104, 106, 109–12, 116, 119, 123, 129–31, 133–4, 136, 140, 147, 165 Ellis, R. 145 English as a Second Language (ESL) 59, 95, 142–4

INDEX English for Academic Purposes (EAP) 103, 105, 108, 111–12, 115, 123–4, 129, 131–3, 143–5 English for specific purposes (ESP) 131, 143–4, 156 epic sung poetry 5 Erman, B. 11, 81, 92, 107 Eskildsen, S. W. 78 Eyckmans, J. 140–1, 143 Ferris, D. 106 Finegan, E. 21, 107 first language children’s use 68–74 double role, formulaic language 71–4 pragmatic competence 70–1 vocabulary acquisition 67–8 Firth, J. R. 4–5, 38 Fitzpatrick, T. 146 fluency workshop automatization stage 152 case study 153 free-talk stage 152 input stage 152 production stage 152 Focus on form (FonF), teaching method 147 folklorists 5–6 formulaic language. See also current research; specific activities children’s use 14–15, 25–6 (See also criteria checklist, Peters) classification 10–11 comprehension 11–14 definition 2–4 identification criteria 9–10, 160 oral formulaic genres 8 speech production 11–14, 163–4 writing process 164–5 formulaic language, pedagogical principle feedback 147 practice 145–7 preparation 145 Formulaic Language Research Network ((FLaRN) 2, 35 formulaic sequences automatization stage 152

193

benefits 134–5 categorization 161 Columas’ criteria 25 communication strategy 15 in corpora 23 corpus analysis 4 discourse analysis, class room 153–6 evidence 153 fillable slots 9 fluency workshop 151–2 free talk stage 153 identification 20, 22–4, 27–8, 32 input stage 23 language models 10 length analysis 132–4 multiword 14 native speaker usage 24 pragmatic function criteria 10 production stage 152 semantic and syntactic irregularities 11 speech fluency 6, 11 as vocabulary 147–9 Forsberg, F. 139 Franken, M. 142 Fraser, B. 7, 41 Freed, B. F. 76, 87–8 frequency statistics corpora 20–4, 31–2 criteria checklists 23–4 native speaker judgment 23–5 phonological characteristics 23 psycholinguistic measures 22–3 Freudenthal, D. 73 Frey, E. 77 Fulcher, G. 94 Galpin, A. 22, 61 Gatbonton, E. 146 Gatenby, E. V. 7, 40 Gibbs, R. 61 Girsai, N. 142 Gobet, F. 73 Goffman, E. 6 Goldberg, A. 64, 169–70 Goldman-Eisler, F. 6 grammar construction 169–70 grammarians 7 Grandage, S. 21, 63

194

INDEX

Granger, S. 2, 38, 105–6 Grant, I. E. 42 Greenbaum, S. 5, 38 Gries, S. T. 21 Haggo, D. 8 Hakuta, K. 68–9 Hamp-Lyons, L. 45, 131 Handl, S. 105 Hasselgren, A. 78 Haywood, S. 103 Hickey, T. 68, 75, 91 Hill, J. 144 Hilpert, M. 76 Hockett. C. F. 40–1 Hoey, M. 170–1 holistic storage, sequence 14, 21–2, 29, 53, 58, 60–1, 63–6, 92, 104, 161–3 Hooper, J. 69–70 Hornby, A. S. 7, 40 Housen, A. 141 Hsu, J.-Y. 140 Hulstijn, J. H. 57, 143 Hyland, K. 45, 104–6, 110, 114, 123, 128, 131, 134–5 Hymes, D. 6 ideational functions, language participant- oriented 128 research- oriented 128 text- oriented 128 idioms categorization 42–4 defining criteria 43–4 definition 40–1 Fraser’s definition 41 Hockett’s definition 40–1 identification 41 Moon’s definition 40, 42 morphemes in 41 polymorphemes in 41 transformational-generative grammar 41 Wood’s definition 42 Ilson, R. 144 Implementing the Lexical Approach (Lewis) 144 Irreversible binomials 42

Jalkanen, I. 77 Jesperson, O. 7 Jiang, N. 140 Johansson, S. 21, 45, 107 Johnson, M. 148 Jones, J 16, 45, 131 Jones, M. 103 Jones, S. 39 Kappel, J. 140 Katz, J. J. 41 Kecskes, I. 95 Kemmer, S. 169 Kempler, D. 63 Keshavarz, M. H. 140 Kirjavainen, M. 73 Kjellmer, G. 5, 38–9 Kleinelder, S. 144 Koprowski, M. 144 Kormos, J. 76 Kramsch, C. 172 Krashen, S. 8 Kress, G. 102 Kuiper, K. 8, 62 Kumaravadivelu, B. 172 Kövecses, Z. 143 labels collocation 37–8 concgrams 37, 49–50 lexical bundles 21 lexical phrases 10 n-grams 37 terminologies 35–7 Lakoff, G. 148 Laufer, B. 139, 142 learning psychologists 6–7 Leech, G. 21, 45, 107 Leki, I. 102 Lennon, P. 87–8 Levy, S. 106, 124 Lewis, M. 103, 143–4, 171 lexical bundles academic discipline 15–16 acquisition 124 class fragments 125–6 components 121–4 corpus analysis 21, 45–6, 165–6 frequency-based method 21, 62

INDEX

195

functional characteristics 126–8 noun-phrase and prepositional fragments 126 research findings 131–5 structural characteristics 124–6 themes and patterns of research 136 verb phrase fragments 125 lexical phrases 10, 29, 31, 37, 44, 65, 89, 143, 162 lexical priming theory 171–2 lexical semantics 167, 170–1 lexicographers 4, 8, 40 lexicographists 40 lexicography 40 Li, J. 105 Libben, G. 62, 77 Lieven, E. 72–3 Lin, P. M. S. 23, 91–2 Lindstromberg, S. 139, 142, 144, 146, 148, 171 Liou, H.-C. 142 Liu, D. 21, 41, 42, 43, 46, 49, 82, 109, 110, 112, 116, 119, 133, 136, 147, 165 Llach, A. 102 L1 learners acquisition contexts 15 class boundaries 87–8 pragmatic goals 95 translated formulas 96 L2 learners natural learning environments 15 pragmatic competence 95–6 proficiency levels 107 purposes of teaching 46, 140 sequence of activities 149 speech production 87–8 syntactic rules 78 use of lexical bundles 124, 134 writing abilities 147 Loewen, S. 147 Lord, A. 5

Martinez, R. 85, 86, 140 Matthews, D. 72, 77 Maynard, C. 77 McCarthy, M. 143, 171 McClair, A. 107 McDonough, K. 77 Meaning-Text Theory, Mel’cuk’s 39, 167–8 Meddings, L. 172 Mel’cˇuk, I. 7, 39, 40, 167, 168 mental processing brain-damaged individuals 63 concepts of cognition 64–5 declarative knowledge 54–5 heteromorphic lexicon 59–60 idiom 60–2 long-term memory 56–7 nonidiomatics 62–3 other types of research 63–4 procedural knowledge 54–5 real-life language use 161–2 second language acquisition theory 57–8 short-term memory 56–7 spontaneous language 55–6 storage and retrieval, formulaic sequence 58–9 themes and patterns of research 65–6 Meunier, F. 2 Mitchell, R. 69, 70 Möhle, D. 87 monolingual dictionary 142 Moon, R. 9, 40, 42 Mouton de Gruyter 2 multiword constructions (MWC) 109–10, 112–14, 132–3 Murphy, V. A. 140 mutual expectancy 44, 89 mutual information (MI) statistics 21–2, 32–3, 77, 82, 110, 123, 140, 160 Myles, F. 69, 70, 78

MacFarlane, J. 62 Makkai, A. 41, 42 Malinowski, B. 6 Manes, J. 94 Martin, K. I. 76

Namba, K. 10, 22, 25, 26, 27, 30, 31, 77, 146 Nassaji, H. 147 Nation, I. S. P. 102 Nation, P. 145, 146, 150, 152

196

native speaker clause-chaining fluency 12 intuition 20, 22–4, 31 judgement 23–5, 27–8, 32 (See also criteria checklist, Wood) Nattinger, J. R. 2, 10, 29, 30, 31, 44, 45, 89, 143 Nekrasova, T. M. 140 Nesselhauf, N. 2, 5, 38, 39 neurologists 4, 6 Newton, J. 142 noncompositionality 39, 42, 44 nonlexical-bundles 129–31 O’Brien, I. 76 O’Dell, F. 143, 171 O’Donnell, M. B. 24 Opie, I. 6 Opie, P. 6 Palmer, H. E. 7, 40 Paltridge, B. 102 Paqout, M. 105 Paré, A. 102 Parry, M. 5 Pawley, A. 5, 8, 9, 10, 12, 13, 14, 81 Perfetti, C. A. 146, 151 Perkins, M. R. 11, 14, 29, 30, 35, 37, 89 Peters, A. M. 14, 15, 25, 30, 31, 68, 69, 74, 75, 89, 91 Peters, E. 141 philosophers 6 phonological short-term memory (PSTM) 76 phrasal compounds 42 phrasal verbs 9–10, 37, 42, 48–9, 81, 109, 143 Pine, J. M. 73, 74 Pinker, S. 64 Poos, D. 106 Postal, P. 41 Postman, W. A. 63 pragmatics 3, 8, 81, 93–6 pre-task planning 145 Raimes, A. 102 Rainey, I. 94 Raupach, M. 87

INDEX Rayson, P. 106 referential bundles 127–8 Rehbein, J. 7 Reiter, R. M. 94 research history, formulaic language early research 4–7 lexical bundles 15–16 since 1970s 7 source of information 2 themes and patterns 16–17 use of strange items 3–4 word sequences, examples 3 Ricard, E. 150 Riggenbach, H. 87, 153, 154 Rinvolucri, M. 151 Robinson, P. 57 Roever, C. 95 Romer, U. 24 Rowland, C. F. 74 Rumsey, A. 8 Sadoski, M. 143 Safar, A. 76 Salazar, D. 106 Salomo, D. 72 Scarcella, R. 8 Schauer, G. A. 96 Schloff, L. 150 Schmidt, R. W. 15, 74, 75 Schmitt, N. 2, 21, 22, 61, 62, 63, 77, 85, 86, 105, 140 Scott, M. 121 Searle, J. 6 second language developmental sequence 78–9 formulaic knowledge 76–8 themes and patterns, research 79–80 theoretical models, acquisition 75–6 vocabulary acquisition 74–5, 162–3 Segalowitz, N. 76, 146 semantic opacity 43–4 Sharifian, F. 96 Shei, C. C. 22 Shirai, Y. 78 Silva, T. 102

INDEX Simpson-Vlach, R. 21, 32, 46, 77, 82, 83, 106, 109, 110, 111, 112, 116, 119, 123, 129, 130, 131, 133, 134, 135, 136, 165 Sinclair, J. 2, 5, 8, 38, 39, 81 Siyanova-Chanturia, A. 61, 62 Skandera, P. 2, 43 Snider, N. 62, 77, 101 sociologists 6 Sosa, A. 62 Spears, R. A. 144 specific activities chain dictations 151 chat circles 151 mingle jigsaw 150 productive skills 97 receptive skills 97 shadowing 150 student dictations 151 4/3/2 technique 145, 150–1 spoken language lists of formulaic sequence 82–6 phonological characteristics 91–2 pragmatics competence, teaching 93–7 speech production 163–4 themes and patterns, research 98 word fluency 82–91 Staehr, L. S. 140 stance bundles 126–7 Staples, S. 107 Steinel, M. P. 143 Steinel, W. 143 Stengers, H. 140, 141 Stoller, F. 16, 45, 131 Stubbs, M. 39, 171 Sugaya, N. 78 Swain, M. 145 Swinney, D. 60, 61 Syder, F. H. 8, 12, 13, 14, 81 Taguchi, N. 77, 96 Tapper, M. 106 task-based language teaching (TBLT) 144 teaching English for Academic Purposes (EAP) 115, 131–2

197

Focus on form (FonF) approaches 147 materials 143–5 models 171 pedagogical intervention 141–3, 145–7, 166–7 practical applications 143–5 pragmatic competence 96–7 strategies 139–41 teaching models 171–2 ten Hacken, P. 47 Terbeek, D. 61 Terkourafi, M. 94 Theakston, A. L. 73, 74 Thornbury, S. 172 Tomasello, M. 64, 72, 73, 74 tournure 42 Towell, R. 87, 88 Traverso, V. 95 Tremblay, A. 22, 62, 77, 101 Trofimovich, P. 77 Tucker, G. 10 Tzanne, A. 95 Underwood, G. 22, 61 usage-based models of language 66, 109, 162, 167–70 Utsumi, T. 147 Van Lancker-Sidtis, D. 61, 63 Verdaguer, I. 106 Vinogradov, V. V. 7, 39 Virtanen, T. 106 vocabulary formulas macro strategies 148 sequence integration 148–9 Vogel, T. 70 Wajnryb, R. 152 Wakefield, H. 7, 40 Waldman, T. 139 Walker, I. 147 Warren, B. 11, 81 Webb, S. 142 Weinert, R. 58 Weinreich, U. 41 Wen, Z. 76 Westbury, C. 62, 77 Williams, E. 47

198

INDEX

Willis, D. 143 within-task planning 145 Witten, I. H. 142 Wolfson, N. 94 Wong, M. L.-Y. 94 Wong-Fillmore, L. 68, 69 Wood, D. C. 2, 7, 11, 21, 22, 25, 27, 28, 29, 30, 31, 33, 46, 77, 88, 89, 90, 91, 99, 107, 108, 112, 113, 114, 115, 116, 119, 124, 132, 133, 136, 140, 144, 145, 146, 150, 152, 153, 154, 165, 171 Wood, M. M. 42

Wray, A. 2, 9, 10, 11, 14, 23, 24, 25, 26, 27, 29, 30, 31, 35, 37, 53, 59, 89, 91, 146 Wu, S. 142 Yearbook of Phraseology, The (Europhras) 2 Yeung, L. 106 Yorio, C. 14, 74 Yuan, F. 145 Yudkin, M. 150 Zhu, W. 102