32nd Annual Clinical Aphasiology Conference: A Special Issue of Aphasiology [1 ed.] 1841699551, 9781841699554, 9780203493151

The papers that appear in this special edition of Aphasiology were selected based upon their theoretical importance, cli

237 102 1MB

English Pages 145 Year 2003

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Book Cover......Page 1
Title......Page 2
CONTENTS......Page 3
Preface......Page 9
Right hemisphere syndrome is in the eye of the beholder......Page 10
METHOD......Page 12
ANALYSES AND RESULTS......Page 14
CONCLUSIONS AND CLINICAL IMPLICATIONS......Page 17
APPENDIX A......Page 19
APPENDIX B......Page 21
A comparison of the relative effects of phonologic and semantic cueing treatments......Page 23
Participant......Page 25
Treatments......Page 27
RESULTS......Page 28
CONCLUSIONS......Page 29
REFERENCES......Page 31
APPENDIX A EXPERIMENTAL STIMULI......Page 32
Phonologic cueing treatment (PCT)......Page 33
Measures of lexical diversity in aphasia......Page 35
Participants......Page 39
Language elicitation and transcription......Page 40
Relationships among D, NDW, and TTR......Page 41
Lexical diversity in adults with fluent vs nonfluent aphasia......Page 42
DISCUSSION......Page 43
REFERENCES......Page 46
Limb apraxia, pantomine, and lexical gesture in aphasic speakers: Preliminary findings......Page 48
PURPOSE OF THE STUDY......Page 53
Procedure......Page 55
Data analysis......Page 56
RESULTS......Page 57
DISCUSSION......Page 59
REFERENCES......Page 61
List of conversation questions......Page 63
Teaching self-cues: A treatment approach for verbal naming......Page 64
Participant......Page 67
Oral language and speech production......Page 68
Written language comprehension......Page 69
Previous and concurrent treatments......Page 71
Experimental stimuli......Page 72
Treatment application......Page 73
Baseline phase......Page 74
Reliability......Page 75
Treatment......Page 76
Maintenance......Page 77
Generalisation......Page 78
DISCUSSION......Page 79
REFERENCES......Page 82
APPENDIX B MODIFIED CUEING HIERARCHY......Page 83
APPENDIX D STIMULI FOR GENERALIZATION PROBE......Page 84
Functional measures of naming in aphasia: Word retrieval in confrontation naming versus connected speech......Page 86
Tasks......Page 90
Data analyses......Page 91
Relationships among speaking contexts and grammatical class......Page 93
% Corrected Errors......Page 94
DISCUSSION......Page 96
Clinical utility: Feasibility and reliability......Page 97
Relationships among contexts......Page 99
Supplementary measures......Page 101
Conclusion......Page 102
REFERENCES......Page 103
APPENDIX %WR SCORING PROTOCOL......Page 106
Narrative and conversational discourse ofadults with closed head injuries and non-brain-injured adults: A discriminantanalysis......Page 108
Participants......Page 112
Analyses of story narratives......Page 113
Analyses of conversation......Page 114
Story narrative measures......Page 115
Conversation measures......Page 116
Story narrative and conversation measures......Page 117
DISCUSSION......Page 118
REFERENCES......Page 120
Relationship between discourse and Western Aphasia Battery performance in African Americans with aphasia......Page 123
Discourse tasks......Page 126
Analysis......Page 127
RESULTS......Page 129
DISCUSSION......Page 133
REFERENCES......Page 135
Coherence......Page 136
Emplotment......Page 137
The inter-rater reliability of the story retell procedure......Page 138
Participants......Page 140
Procedures......Page 142
RESULTS......Page 143
REFERENCES......Page 144
Recommend Papers

32nd Annual Clinical Aphasiology Conference: A Special Issue of Aphasiology [1 ed.]
 1841699551, 9781841699554, 9780203493151

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

APHASIOLOGY Volume 17 Number 5 May 2003 32nd Clinical Aphasiology Conference

Editor: Patrick J.Doyle

CONTENTS

Preface Patrick J.Doyle

viii

Papers Right hemisphere syndrome is in the eye of the beholder Margaret Lehman Blake, Joseph R.Duffy, Connie A.Tompkins, and Penelope S.Myers

1

A comparison of the relative effects of phonologic and semantic cueing treatments Julie L.Wambaugh

14

Measures of lexical diversity in aphasia Heather Harris Wright, Stacy W.Silverman, and Marilyn Newhoff

26

Limb apraxia, pantomine, and lexical gesture in aphasic speakers: Preliminary findings Miranda Rose and Jacinta Douglas

39

Teaching self-cues: A treatment approach for verbal naming Gayle DeDe, Diane Parris, and Gloria Waters

55

Functional measures of naming in aphasia: Word retrieval in confrontation naming versus connected speech Jamie F.Mayer and Laura L.Murray

77

Narrative and conversational discourse of adults with closed head injuries and non-brain-injured adults: A discriminant analysis Carl A.Coelho, Kathleen M.Youse, Karen N.Le, and Richard Feinn

99

Relationship between discourse and Western Aphasia Battery performance in African Americans with aphasia Hanna K.Ulatowska, Gloria Streit Olness, Robert T.Wertz, Agnes M.Samson, Molly W.Keebler, and Karen E.Goins

114

The inter-rater reliability of the story retell procedure William D.Hula, Malcolm R.McNeil, Patrick J.Doyle, Hillel J.Rubinsky, and Tepanta R.D.Fossett

129

32nd Clinical Aphasiology Conference Ridgedale, Missouri, May 31st to June 4th, 2002

Editor Patrick J.Doyle, Ph.D., Geriatric Research Education & Clinical Center, VA Pittsburgh Healthcare System, and Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA. Associate Editors Kirrie J.Ballard, Ph.D., Department of Speech Pathology and Audiology, University of Iowa, Iowa City, IA, USA. Annette Baumgaertner, Ph.D., Neurologische Universitaetsklinik, University of Hamburg, Hamburg, Germany. Mary Boyle, Ph.D., Department of Communication Sciences and Disorders, Montclair State University, Upper Montclair, NJ, USA. Carol Frattali, Ph.D., National Institutes of Health, Bethesda, MD, USA. Michael E.Groher, Ph.D., Department of Communicative Disorders, University of Florida Health Science Center, Gainesville, FL, USA. Katherine Odell, Ph.D., Department of Communication, University of Wisconsin, Madison, WI, USA. Grace H.Park, Ph.D., National Institutes of Health, NIDCD, Language Section, Bethesda, MD, USA. Anastasia Raymer, Ph.D., Child Study Center, Old Dominion University, Norfolk, VA, USA. Linda Schuster, Ph.D., University of West Virginia, Morgantown, WV, USA. Nina Simmons-Mackie, Ph.D., Department of Communication Science & Disorders, Southeastern Louisiana University, Hammond, LA, USA. Evy Visch-Brink, Ph.D., Department of Neuropsychology, Erasmus University Rotterdam, Rotterdam, The Netherlands.

iv

Julie Wambaugh, Ph.D., Department of Communication Disorders, University of Utah and VA Salt Lake City Healthcare System, Salt Lake City, UT, USA.

APHASIOLOGY

SUBSCRIPTION INFORMATION Subscription rates to Volume 17, 2003 (12 issues) are as follows: To individuals: UK £361.00; Rest of World $596.00 To institutions: UK £857.00; Rest of World $1414.00 A subscription to the print edition includes free access for any number of concurrent users across a local area network to the online edition, ISSN 1464– 5041. Print subscriptions are also available to individual members of the British Aphasiology Society (BAS), on application to the Society. For a complete and up-to-date guide to Taylor & Francis Group’s journals and books publishing programmes, visit the Taylor and Francis website: http:// www.tandf.co.uk/ Aphasiology (USPS permit number 001413) is published monthly. The 2003 US Institutional subscription price is $1414.00. Periodicals postage paid at Champlain, NY, by US Mail Agent IMS of New York, 100 Walnut Street, Champlain, NY. US Postmaster: Please send address changes to pAPH, PO Box 1518, Champlain, NY 12919, USA. Dollar rates apply to subscribers in all countries except the UK and the Republic of Ireland where the pound sterling price applies. All subscriptions are payable in advance and all rates include postage. Journals are sent by air to the USA, Canada, Mexico, India, Japan and Australasia. Subscriptions are entered on an annual basis, i.e. from January to December. Payment may be made by sterling cheque, dollar cheque, international money order, National Giro, or credit card (AMEX, VISA, Mastercard).

vi

Orders originating in the following territories should be sent direct to the local distributor. India Universal Subscription Agency Pvt. Ltd, 101–102 Community Centre, Malviya Nagar Extn, Post Bag No. 8, Saket, New Delhi 110017. Japan Kinokuniya Company Ltd, Journal Department, PO Box 55, Chitose, Tokyo 156. USA, Canada and Mexico Psychology Press, a member of the Taylor & Francis Group, 325 Chestnut St, Philadelphia, PA 19106, USA UK and other territories Taylor & Francis Ltd, Rankine Road, Basingstoke, Hampshire RG24 8PR. The print edition of this journal is typeset by DP Photosetting, Aylesbury and printed by Hobbs the Printer, Totton, Hants. The online edition of this journal is hosted by Metapress at journalsonline.tandf.co.uk Copyright © 2003 Psychology Press Limited. All rights reserved. No part of this publication may be reproduced, stored, transmitted or disseminated, in any form, or by any means, without prior written permission from Psychology Press Ltd, to whom all requests to reproduce copyright material should be directed, in writing. Psychology Press Ltd grants authorization for individuals to photocopy copyright material for private research use, on the sole basis that requests for such use are referred directly to the requestor’s local Reproduction Rights Organization (RRO). In order to contact your local RRO, please contact: International Federation of Reproduction Rights Organisations’ (IFRRO), rue de Prince Royal, 87, B–1050 Brussels, Belgium; email: [email protected] Copyright Clearance Centre Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; email: [email protected] Copyright Licensing Agency, 90 Tottenham Court Road, London, W1P 0LP, UK; email: [email protected] This authorization does not extend to any other kind of copying, by any means, in any form, and for any purpose other than private research use.

The Editors of Aphasiology and the Guest Editors of the CAC issues are grateful to the following people for reviewing papers for the CAC issues during 2002:

Larry Boles, Ph.D. Caterina Breitenstein, Ph.D. Ruby Drew, Ph.D. Donald Freed, Ph.D. Margaret Greenwald, Ph.D. Katrina Haley, Ph.D. Brooke Hallowell, Ph.D. Jackie Hinckley, Ph.D. Stefan Kemeny, M.D. Margaret Lemme, Ph.D. Jamie Mayer, Ph.D. Robert M.Miller, Ph.D. Charlotte Mitchum, M.S. Penelope Myers, Ph.D. Mary Oeschlager, Ph.D. Philippe Paquier, Ph.D. Janet Patterson, Ph.D. Scott Rubin, Ph.D. Barry Slansky, Ph.D.

Preface

The papers that appear in this special edition of Aphasiology were selected based upon their theoretical importance, clinical relevance, and scientific merit, from among the many platform and poster presentations comprising the 32nd Annual Clinical Aphasiology Conference held in Ridgedale, Missouri in June of 2002. Each paper was peer-reviewed by the Editorial Consultants and Associate Editors acknowledged herein consistent with the standards of Aphasiology and the rigours of merit review that represent this indexed, archival journal. Patrick J.Doyle, Ph. D. VA Pittsburgh Healthcare System Pittsburgh, PA, USA

© 2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html DOI: 10.1080/ 02687030344000210

Right hemisphere syndrome is in the eye of the beholder Margaret Lehman Blake Syracuse University, NY, USA Joseph R.Duffy Mayo Clinic, MN, USA Connie A.Tompkins University of Pittsburgh, PA, USA Penelope S.Myers Mayo Clinic, MN, USA Background: Specific information about prevalence and patterns of deficits associated with right hemisphere brain damage (RHD) is incomplete. A recent large-scale study of inpatients in a United States rehabilitation centre (Lehman Blake, Duffy, Myers & Tompkins, 2002) provided initial estimates of deficit prevalence and co-occurrence. The data obtained were based on information from multiple medical disciplines, and may not adequately reflect the typical caseload seen by US speech-language pathologists (SLP). Differences in how professionals view RHD may influence whether patients are appropriately referred for services. Aims: The first aim was to evaluate whether prevalence and patterns of deficits differ when diagnoses are made by SLPs versus other disciplines. The second aim was to examine whether the presence of certain deficits is associated with referrals to SLP. Methods and Procedures: A retrospective chart review was conducted examining medical records for 122 adults with RHD in an inpatient rehabilitation unit. Diagnoses were obtained from speechlanguage pathology versus a group of other medical professionals, including neurology/physiatry, neuropsychology, and occupational therapy. Frequencies and cluster analyses were computed for both groups of diagnosticians to examine differences between groups. Relationships between performance on a screening measure of mental status and cognitive/communicative diagnoses were examined to determine if there were obvious connections between specific disorders and referrals to SLP.

2 APHASIOLOGY

Outcomes and Results: Diagnoses of pragmatic and communicative deficits were made more often by SLPs, while the other professionals more often diagnosed deficits in attention, visuoperception, and learning/memory. Moderate-strong correlations between diagnoses from the two groups were obtained only for deficits of attention, linguistics, and neglect. Referral to SLP was not related to performance on a general mental status screening test. Patients who presented with neglect, aprosodia, or deficits in interpersonal interactions were more likely to be referred to SLP than when these deficits were absent. Conclusions: This study raises the question of how to ensure appropriate referrals to SLP when referring professionals may not always identify the communicative disorders exhibited by individuals with RHD. A descriptive definition of right hemisphere syndrome and a consistent set of terminology would facilitate communication about right hemisphere deficits within and across disciplines. A broader scope of referrals to SLP would increase the number of patients who receive appropriate care for their cognitive and communicative deficits. Despite the well-known conventional descriptions of deficits associated with right hemisphere brain damage (RHD), limited data are available regarding specific deficits and patterns of deficits caused by RHD (e.g., Joanette & Goulet, 1994, Myers, 1999, Tompkins, 1995). A previous study (Lehman Blake et al., 2002) evaluated the prevalence and patterns of co-occurrence of cognitive/ communicative deficits in a large retrospective sample. This was the first largescale exploration of “right hemisphere syndrome” as seen in a US inpatient rehabilitation unit. Deficit categories were created to classify the large number of diagnostic labels used in the medical charts (see Appendix A). Results from that

Address correspondence to: Margaret Lehman Blake PhD, University of Houston, Communication Disorders, 100 Clinical Research Center, Houston, TX 77204–6018, USA. Email: [email protected] Margaret Lehman Blake is now at the University of Houston, TX, USA. The data for this project were collected while the first author was a post-doctoral fellow in Speech Pathology at the Mayo Clinic in Rochester, Minnesota, under the direction of the second author. Thanks to Jacque Danielson for her assistance in retrieving the medical records. © 2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html DOI: 10.1080/ 02687030344000120

RIGHT HEMISPHERE SYNDROME 3

study indicated that the most commonly diagnosed deficits were in attention, neglect, visuoperception, and learning/memory. Additionally, the deficit categories of calculation, hyperaffectivity, and linguistics were not closely related to any of the other deficits evaluated. The current study examined the same group of patients to explore some questions that remained unanswered after the initial study. Deficit categories analysed in the original study were based on diagnoses made by four disciplines combined: neurology/physiatry, neuropsychology, occupational therapy (OT), and speech-language pathology (SLP). Because diagnoses from all four professions were used, the picture of right hemisphere syndrome described in the previous study (Lehman Blake et al., 2002) may not be reflective of the typical caseload of RHD patients seen by a US speech-language pathologist, because SLPs and other professionals may not recognise or identify the same deficits. Thus, the first aim was to evaluate whether prevalence and deficit patterns differ when diagnoses are made by SLPs versus other disciplines. Differences between disciplines may provide insight into how cognitive/communicative deficits are perceived by various medical professionals. The previous study also indicated that while 94% of cases exhibited at least one cognitive/communicative deficit, only 44% were referred for an SLP evaluation. Thus, the second aim was to examine which deficits are likely to lead to a referral to SLP. METHOD Inpatient medical records were reviewed for patients with RHD consecutively admitted to a US inpatient rehabilitation unit over a 3–year period. Diagnoses of RHD were made by neurologists. For 88% of the cases, CT or MRI scans confirmed the diagnosis. The initial list contained 246 cases. Seven of these were excluded because the patients did not release their medical records for research purposes. Another 117 cases were excluded due to incomplete charts, lesions restricted to the cerebellum or brain stem, other neurological disease (e.g., dementia, Parkinson’s disease), psychiatric disorder other than depression, and/ or bilateral cerebral lesions (see Lehman Blake et al., 2002, for complete details of the chart review). This left a total of 122 cases available for group analyses. Demographic and clinical data are provided in Table 1. Information about the presence or absence of selected disorders and deficits was obtained from inpatient neurology/ physiatry, neuropsychology, OT, and SLP reports. As detailed in Lehman Blake et al. (2002), the long list of diagnostic labels obtained from the medical charts was reduced to 14 deficit categories based on broad traditional classifications (e.g., linguistics, attention,

4 APHASIOLOGY

TABLE 1 Demographic and clinical information for cases with lesions restricted to the right hemisphere Demographic and clinical variables

All RHD cases (n=122)

Cases referred to SLP (n=54)

Sex

71 male 51 female 68.6 (12.4) 12–95

26 male 28 female 68.6 (12.6) 15–94

12.0 (3.0) 7–20

12.0 (3.3) 7–20

87% right 5% left 1% ambidextrous [7% missing] 86% RH stroke 14% other medical condition* 81% no previous stroke 19% prior RH stroke

87% right 9% left 2% ambidextrous [2% missing] 87% RH Stroke 13% other medical condition* 83% no previous stroke 17% prior RH stroke

13.9 (26.6) 0–240

13.7 (21.5) 1–120

Age (years) Mean (SD) Range Education (years) Mean (SD) Range Handedness

Reason for hospital admission Presence of previous stroke Number of days between onset and admit to rehabilitation unit Mean (SD) Range

RHD = right hemisphere brain damaged; SLP = speech—language pathology; RH = right hemisphere. * These represent patients who were admitted to the hospital for a medical condition other than CVA, who either had a CVA while hospitalised (e.g., as a complication), or who then received inpatient therapy for deficits resulting from a CVA that occurred prior to the current admission.

learning, and memory), and other behavioural characteristics (e.g., hyporesponsive, hyperresponsive). Two of the authors independently classified the labels. Initial agreement was 83%. Disagreements were resolved by discussion. Appendix A contains descriptions and examples of these categories. For each patient, a deficit category was considered present if one or more of the labels within the category was reported by any one discipline. Aprosodia and neglect were not merged into a deficit category, but were analysed as distinct disorders. (Essentially each one was a category of its own.) In this paper, deficit categories will be indicated by italics, while the separate deficits (aprosodia and neglect) will be printed in regular font.

RIGHT HEMISPHERE SYNDROME 5

ANALYSES AND RESULTS The subset of individuals evaluated by SLP (n=54) was used to compare diagnoses made by SLP versus the other three disciplines. Results of frequency analyses, provided in Table 2, indicate that for both groups of diagnosticians the most commonly identified disorder was neglect. Following that, SLPs most commonly diagnosed deficits in other cognitive deficits and hyporesponsivity. In contrast, the other disciplines most often reported deficits in the categories attention, visuoperception, and learning/memory. Further examination of the results illustrates how the focus of a discipline affects diagnosis. Speech pathologists, focusing on communication, diagnosed deficits in interpersonal interactions in nearly 30% of patients, while other disciplines identified such TABLE 2 Frequency of occurrence of deficits and deficit categories present in 54 patients, diagnosed by SLPs or other medical professionals Deficits and deficit categories

Prevalence diagnosed by neurology/physiatry, neurophyschology and OT

Prevalence diagnosed by SLP only

neglect attention perception learning/memory reasoning & problem solving other cognitive deficits orientation awareness hyperresponsive hyporesponsive calculation hypoaffective linguistic hyperaffective aprosodia interpersonal interactions

66.4% 63.9% 58.2% 58.2% 56.6%

53.7% 35.2% 27.8% 24.1% 37.0%

45.1% 40.2% 38.5% 36.1% 30.3% 28.7% 24.6% 21.3% 15.6% 12.3% 7.4%

42.6% 27.8% 27.8% 18.5% 38.9% 5.6% 18.5% 24.1% 7.4% 25.9% 29.6%

OT = occupational therapy; SLP = speech—language pathology. Deficit categories are indicated by italics.

pragmatic deficits in only 13% of those same patients. Aprosodia also was diagnosed twice as often by SLPs (26%) as by the other professionals (12%). In order to examine differences in patterns of co-occurrence related to disciplines, hierarchical cluster analyses (SPSS, 1999) were performed. A cluster analysis is an exploratory tool that identifies related groups or “clusters” within a

6 APHASIOLOGY

body of data (Aldenderfer & Blashfield, 1984). The two categories that co-occur most often are linked to form a cluster, and the linking continues until all categories fit into a specified number of clusters. For the current purposes, clusters were based on how often deficit categories co-occurred across the sample of cases. Six clusters were specified based on findings from the previous study (Lehman Blake et al., 2002). Analyses were conducted first on the data from SLP diagnoses, then on diagnoses from the other disciplines combined. As shown in Table 3, the affective deficits (hypoaffective and hyperaffective) separated into their own clusters when diagnosed by either SLP or other professionals. This result indicates that these deficit categories are relatively dissimilar to all others, regardless of who makes the diagnosis. No other obvious patterns were identified. Phi correlation coefficients also were computed to evaluate similarities between the diagnoses by SLPs versus other disciplines. Based on Cohen’s rule of thumb for evaluating correlation coefficients (Cohen, 1988), moderate to high correlations were obtained for diagnoses of linguistics (phi = .70), attention (phi = .46), and neglect (phi = .42). Small correlations were obtained for all other deficits and deficit categories (phi = .15 to .29), with the exception of learning/ memory (phi = .09). To address the second aim, identifying which patients with RHD are referred to SLP, chi-square cross-tabulation analyses (SPSS, 1999) were performed to evaluate the association between referral to SLP and presence of deficits. First, the relationship TABLE 3 Results of cluster analyses for diagnoses by speech-language pathologists (a) and other medical professionals (b) Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

(a) Clusters of deficit categories based on diagnoses by speech-language pathologists visuopercep linguistic learning hyperaffective hypoaffective attention orientation hyperrespo tion awareness interperson hyporespons nsive al reasoning ive aprosodia other neglect cognitive calculation (b) Clusters of deficit categories based on diagnoses by neurologists/physiatrists, neuropsychologists, and occupational therapists calculation hyperrespo linguistic hyperaffective hypoaffective attention nsive awareness interperson learning/ al memory visuopercept aprosodia ion hyporespons ive

RIGHT HEMISPHERE SYNDROME 7

Cluster 1

Cluster 2

Cluster 3 orientation reasoning other cognitive neglect

Cluster 4

Cluster 5

Cluster 6

between presence of dysarthria and SLP referral was examined to see if a majority of cases was referred based on that diagnosis. If most cases were referred to SLP due to a motor speech disorder, then it would be difficult to determine what cognitive or other communicative deficits might influence the referral process. The entire group of 122 cases was included in these analyses, using diagnoses of dysarthria from neurology, neuropsychology, and OT. The data indicated that only 50% of the patients diagnosed with dysarthria were referred for a SLP evaluation (phi = .09, p > .05). Given that presence of dysarthria did not compel a SLP referral, cross tabulation procedures were conducted to examine the relationship between the presence/absence of cognitive/communicative deficits and referral to SLP. The results suggest that the presence of deficits in visuoperception (phi = .18, p = .05), interpersonal interactions (phi = .19, p = .04), neglect (phi = .29, p = .001), or aprosodia (phi = . 22, p = .02) were associated with SLP referrals more often than when these deficits were absent. Although significant, all of these correlations are small. A second analysis was conducted to further explore the second aim of the study. Data from The Short Test of Mental Status (Kokmen, Naessens, & Offord, 1987) were available for 63 of the RHD patients. This screening tool is similar to the Mini Mental State Examination (Folstein, Folstein, & McHugh, 1975), and provides information about cognitive abilities such as orientation, memory, language, attention, calculation, and visuoperception. A copy of the screening is provided in Appendix B. A series of onetailed independent t-tests was conducted to compare Kokmen mental state scores for patients who were and were not referred for SLP evaluation. There was no difference in mean total scores for these two groups (t = 0.26, p > .05). Examination of each subtest indicated that patients who scored low on orientation were more likely to be referred to SLP than those with higher scores (t = 1.8, p < .05). No group differences were found for any other subtest score. This suggests that neurologists did not base referrals to SLP solely on patients’ performance on this screening. The small range of possible points per subtest (3–8 points) also may have contributed to the nonsignificant results. Another explanation for the nonsignificant results is that this screening tool is not a valid measure of cognitive/communicative deficits. To address this possibility, a second series of one—tailed independent t-tests was conducted to evaluate the relationship between this assessment and the deficit categories. Means on the Kokmen subtests were compared in patients who did or did not present with deficits in a corresponding category, using diagnoses from all four

8 APHASIOLOGY

disciplines. As shown in Table 4, relationships were found between orientation (t = 2.98), recall (t = 2.39), and maths (t = 1.74) subtests and their corresponding deficit categories (all p < .05). However, there was no difference in Kokmen scores on the attention, construction, and abstraction subtests for patients who were and were not diagnosed with deficits in the comparable categories (all p > . 05). These results suggest that for the more complex , multifaceted abilities, the Kokmen screening and the deficit categories do not clearly capture the same behavioural characteristics. CONCLUSIONS AND CLINICAL IMPLICATIONS This study examined a large group of adults with RHD and provides initial data regarding how RHD syndrome is perceived by different medical professionals. The results obtained must be interpreted with caution given the retrospective nature of the study and the imprecision of the deficit classification scheme. Additionally, the data were gathered TABLE 4 T-test results of scores on the Short Test of Mental Status (Kokman et al., 1987) for patients diagnosed with deficit categories present or absent Kokmen subtest—deficit category

n

orientation—orientation* absent 40 present 23 attention—attention absent 22 present 41 recall—learning/memory* absent 22 present 41 abstraction—reasoning/prob.solving absent 23 present 40 construction—visuoperception absent 9 present 20 maths—calculation* 42 absent 21 present

Means (SD)

t

P

7.7 (.72) 6.8 (1.7)

2.98

.002

6.0 (1.5) 6.1 (1.0)

–0.16

.88

2.6 (1.4) 1.7 (1.4)

2.39

.010

2.5 (.90) 2.l (2.1)

1.35

.09

2.4 (1.4) 2.1 (1.2)

0.78

.22

2.5 (1.3) 1.9 (1.6)

1.74

.04

Present = deficit diagnosed as present by at least one of four disciplines; Absent = deficit not present. * Significantly different at p < .05.

RIGHT HEMISPHERE SYNDROME 9

from only one rehabilitation unit, and thus are influenced by sampling biases present in that facility. Despite these limitations, broad clinical implications can be drawn. One implication is that the characteristics of right hemisphere syndrome may vary depending on who makes the diagnosis, as prevalence of deficits may be a reflection of the biases of the professional conducting the evaluation. There appears to be substantial overlap across disciplines in the conceptualisation and recognition of attention, neglect, and linguistic deficits, but much diversity across disciplines for other cognitive/communicative disorders. The important question is not “who is right?” about the deficits that occur after RHD; the data suggest that different professionals focus on different deficits, which is appropriate given the training and expertise that characterise various professions. The relevant question that arises from this study is “how can we ensure that patients with RHD are appropriately referred to SLP when they exhibit deficits that are not consistently recognised by those professionals who make such referrals?”. There is no obvious explanation for which patients are referred to SLP. The presence of some deficits (e.g., interpersonal interactions, aprosodia, neglect) was associated with SLP referrals. This suggests that when other professionals do identify communicative disorders (pragmatic deficits and aprosodia), they refer those patients to SLP. However, the frequency analyses indicated that neurologists, neuropsychologists, and OTs do not consistently identify communicative deficits, or may not be as stringent in judging aspects of communication, and thus many appropriate referrals are missed. Performance on a general mental status screening test was not meaningfully related to referrals or to higher-level deficit categories, and thus does not add much information about how referral decisions are made. Several factors not taken into account here include experience of the referring neurologist/physiatrist, and individual referring preference. For example, some physicians are more likely to refer to SLPs due to their approach to referring in general, without regard for patients’ specific deficits. As discussed in the initial study (Lehman Blake et al., 2002), one important weakness with current practices of diagnosis and treatment of adults with RHD is the absence of a definition of right hemisphere syndrome. This study suggests that different disciplines have their own criteria or expectations regarding what deficits may occur after RHD, likely based on their professional expertise. While it is appropriate that different disciplines focus on different disorders, some patients may not receive proper referrals if deficits that can be treated by one discipline are not recognised by another. Related to this problem is the lack of consistent terminology, both within and across disciplines. A descriptive definition of the deficits associated with RHD would benefit our discipline and would be a step towards developing criteria for other disciplines to use when making decisions about referral for SLP evaluation and management. Of course, terminology or definitions alone cannot solve the problems associated with diagnosis and treatment of right hemisphere syndrome, and it may be impossible

10 APHASIOLOGY

to develop a standard set of terms that is used consistently across disciplines. Additionally, even with an “official” diagnostic label, referrals may not always be forthcoming. For example, in this study only 50% of individuals diagnosed with dysarthria were referred to SLP. Perhaps the best solution to the current referral problem is to urge that all patients admitted to a rehabilitation unit with a cerebral lesion should be referred to SLP. This practice would most definitely increase the rate of identifying cognitive and communicative deficits (not only those associated solely with RHD), although it also would presumably increase the number of evaluations in which such disorders are not identified. Open and active communication within the SLP community and between SLP clinicians and other medical professionals is needed to find an optimal solution that ensures that patients receive the best care possible without being submitted to undue examinations. REFERENCES Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis (pp. 62–74). Beverly Hills, CA: Sage. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Folstein, M. F., Folstein, S. E., & McHugh, P. R. (1975). Mini Mental State: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12, 189–198. Joanette, Y., & Goulet, P. (1994). Right hemisphere and verbal communication: Conceptual, methodological, and clinical issues. Clinical Aphasiology, 22, 1–23. Kokmen, E., Naessens, J. M., & Offord, K. P. (1987). A short test of mental status: Description and preliminary results. Mayo Clinic Proceedings, 62, 281–288. Lehman Blake, M., Duffy, J. R., Myers, P. S., & Tompkins, C. A. (2002). Prevalence and patterns of right hemisphere cognitive/communicative deficits: Retrospective data from an inpatient rehabilitation unit. Aphasiology, 16, 537–547. Myers, P. S. (1999). Right hemisphere damage: Disorders of communication and cognition, San Diego, CA: Singular. SPSS. (1999). SPSS Base 10.0 users guide. Chicago: SPSS Inc. Tompkins, C. A. (1995). Right hemisphere communication disorders: Theory and management. San Diego, CA: Singular.

APPENDIX A Deficits and deficit categories defined by Lehman Blake et al. (2002) Category

Description

Illustrative labels encompassed under category

Hyperaffective

heightened affective response

labile, pseudobulbar effect, hallucinations

RIGHT HEMISPHERE SYNDROME 11

Category

Description

Illustrative labels encompassed under category

Hypoaffective

dampened or restricted affective response ability to focus on stimuli; includes focused, sustained, and divided attention awareness of, or insight into deficits and consquences of the deficits ability to learn and retain new information visual and tactile perception and construction

flat affect

Attention

Awareness

Learning/Memory Perception

Hyperresponsive

Hyporesponsive

heightened responsivity to stimuli verbosity, talkative, tangential, dampened or restricted responsivity to stimuli

Linguistic

basic expressive and receptive language functions

Orientation

orientation to self, time, situation

Reasoning & Problem solving

cognitive skills associated with identifying problems, identifying relevant information and appropriate solutions, and goal achievement

Other Cognitive Deficits

cognitive skills associated with organising, sequencing, categorising, and integrating information

Calculation

mathematical skills

Attention, concentration, distractible

insight, awareness, refusal, denial learning, memory visuoperception (includes perception & construction), agraphesthesia impulsive, disinhibition

paucity of speech, slow responses, poor initiation, unelaborated speech aphasia or other language deficits, auditory comprehension, anomia, paraphasias orientation, confusion, confabulation, right/left orientation problem solving, verbal reasoning, planning, executive function, mental flexibility, abstraction, inferencing, higher cognitive deficits, perseveration, detail oriented organisation, sorting, sequencing, integration, cognitive deficits, slow processing, vague speech, poor details calculation, money handling

12 APHASIOLOGY

Category

Description

Illustrative labels encompassed under category

Interpersonal Interactions

behavioural aspects of interpersonal communication — —

eye contact, humour, inappropriate pragmatics, overpersonalisation aprosodia visuospatial, hemispatial, or leftsided neglect

aprosodia visuospatial neglect

APPENDIX B The Short Test of Mental Status (Kokmen et al., 1987)

Orientation (8 points) Full name, day, date, month, year, address, city, building name (1 point per item) Attention: forward digit span (7 points) Repeat a string of numbers, starting with five digits, increasing to seven Score is number of digits correctly repeated Learning (4 points) Patient repeats four words after all are presented (apple, Mr. Johnson, charity, tunnel) Examiner can repeat the words up to four times if needed for patient to learn all of them. Score is the total number of words, minus number of trials needed if more than one (e.g., if patient requires two trials, then score = 3; if patient requires only one trial, score = 4) Calculation (4 points) multiply 5 by 13 substract 7 from 65 divide 58 by 2 add 11 and 29 Abstraction: similarities (3 points) orange/banana horse/dog table/bookcase Information (4 points) current president first president number of weeks in a year define the word “island” Construction (4 points)

RIGHT HEMISPHERE SYNDROME 13

draw the face of a clock, showing the time of 11:15 copy a 3D cube points (per picture): adequate drawing = 2; incomplete = 1; inability to perform task = 0 Recall (4 points) Recall the four words presented earlier in the Learning task Total score possible = 38 points Mean for normally ageing adults (average age = 51.5) = 33.1 (SD = 3.0)

A comparison of the relative effects of phonologic and semantic cueing treatments Julie L.Wambaugh VA Salt Lake City Healthcare System and University of Utah, USA

Background: Lexical retrieval problems are pervasive in aphasia and are often an important focus of treatment. Although many treatments have been demonstrated to positively impact lexical retrieval in aphasia, comparisons of such treatments have been relatively rare. Aims: The purpose of this investigation was to compare the relative effects of two lexical retrieval cueing treatments when administered concurrently with a participant with chronic anomic aphasia. The cueing treatments, phonological cueing treatment (PCT) and semantic cueing treatment (SCT) were designed to target the lexical phonologic and lexical semantic levels of processing, respectively. Methods & Procedures: The participant received both treatments concomitantly in the context of an alternating treatments design and multiple baseline design across behaviours. Separate lists of words were assigned to each treatment and additional word lists were designated for generalisation assessment. Following achievement of criterion levels of performance, each treatment was then applied to the additional lists in order to attempt to replicate treatment effects. Outcomes & Results: The participant showed a positive response to both treatments. However, he achieved higher levels of accuracy of naming for items treated with SCT. This effect was observed in both phases of treatment application. Conclusions: For this participant, SCT appeared to be the preferred treatment, at least in the context of concurrent administration of the treatments. This preferential response may be related to a pretreatment pattern of responding in which the

COMPARISON OF CUEING TREATMENTS 15

participant routinely used descriptions and semantically related sentence cues to attempt to retrieve words. The development and evaluation of effective treatments for word—finding deficits continues to be an important issue in the remediation of aphasia. Recent trends have consistently reflected a movement towards model—based treatments designed to target specific levels of lexical retrieval processing. In general, process —oriented treatments have been shown to result in positive increases in word— finding behaviours (Nickels & Best, 1996). However, evidence has rarely been presented to indicate that the participant(s) who received process—oriented treatment did not or would not respond positively to an alternative treatment. As with most aphasia treatments, there is little research comparing types of word— retrieval treatments within or across participants. Early and influential research by Howard, Patterson, Franklin, Orchard-Lisle, and Morton (1985) indicated that semantically oriented therapy may have more robust effects than phonologically oriented therapy. However, several subsequent noncomparative investigations have indicated that phonological approaches may produce more lasting effects than had previously been predicted (Davis & Pring, 1991; Miceli, Amitrano, Capasso, & Carramazza, 1996; Raymer, Thompson, Jacobs, & LeGrand, 1993). Findings from a few recent studies that have examined both lexical semantic and lexical phonological treatments suggest that each may produce positive effects. Visch-Brink, Doesborgh, van Harskamp, Bippel, Koudstaal, and van de SandtLoenderman (2002) compared the effects of lexical semantic and lexical phonological therapy across two groups of patients with aphasia, with 58 patients randomly assigned to the treatments. Therapists were provided with a variety of tasks that fell within each general category (i.e., semantic or phonological). The investigators designated the type of therapy each patient received, but allowed the therapists to utilise tasks within the specified category at their own discretion. Each patient received 40 to 60 hours of therapy applied between 3 to 12 months post-onset. Progress on the ANELT (Blomert, 1992) was measured

Address correspondence to: Julie L.Wambaugh, Department of Communication Sciences and Disorders, Rm. 1201, 390 South 1530 East University of Utah, Salt Lake City, Utah 84112, USA. Email: [email protected] Thanks are extended to Aida Martinez, Michelene Kalinyak—Fliszar, and Michele Allegre for their assistance with this project. This research was supported by Rehabilitation Research and Development, Department of Veterans Affairs. © 2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html 02687030344000085

DOI:

10.1080/

16 APHASIOLOGY

and results revealed no significant differences between groups/therapy conditions. Wambaugh and colleagues (Wambaugh, Doyle, Martinez, & Kalinyak-Fliszar, 2002; Wambaugh, Linebaugh, Doyle, Martinez, Kalinyak-Fliszar, & Spencer, 2001; Wambaugh, Linebaugh, Doyle, Spencer, & Kalinyak-Fliszar, 1999) have evaluated the effects of two cueing treatments for word-finding deficits in aphasia using single-case experimental designs. The treatments, phonological cueing treatment (PCT) and semantic cueing treatment (SCT), were designed to target the lexical phonologic and lexical semantic levels of treatment, respectively. Findings suggested that targeting treatment at one or the other of these levels of lexical processing may not be critical for many patients with aphasia in terms of the effects of PCT and SCT. That is, most patients responded positively to both treatments. Unfortunately, the multiple baseline designs employed with PCT and SCT to date have not allowed for direct comparison of the treatments. The purpose of the present investigation, therefore, was to further examine the effects of SCT and PCT by comparing the treatments within an individual speaker with aphasia. METHOD Participant The participant in this investigation was a 44–year-old male speaker with chronic anomic aphasia who was 15 months post-onset of a single, left-hemisphere, thromboembolic stroke. He was premorbidly right-handed, had completed 14 years of formal education, and had worked as an electrician prior to his stroke. Pretreatment assessment results are shown in Table 1. The participant exhibited moderate word-finding difficulties that appeared to be predominantly semantic in nature, as indicated by lexical retrieval assessments and a word-retrieval error analysis. As seen in Table 1, the majority of the participant’s errors took the form of semantic paraphasias (37% of all errors), with an additional 13% of errors being mixed, semantic—phonemic paraphasias. He exhibited few nonmixed, phonemic paraphasias (7% of all errors). The participant provided the response of “I don’t know” for a relatively large percentage of items (24% of all error items). Comprehension testing for error items on the TAAWF (German, 1990) revealed accurate verbal label to picture matching, indicating sufficient TABLE 1 Pretreatment assessment results Measure

Score or result

Western Aphasia Battery (Kertesz, 1982) (administered 2 months prior to study) Aphasia Quotient (100 possible)

79

COMPARISON OF CUEING TREATMENTS 17

Measure

Score or result

Aphasia Classification Subtests (AQ totals) Spontaneous Speech Comprehension Repetition Naming Test of Adolescent/Adult Word Finding (German, 1990) Total raw score (107 possible) Subtests Picture Naming: Nouns Sentence Completion Naming Description Naming Picture Naming: Verbs Category Naming Spoken Word—Picture Matching (Comprehension) (88 possible) Category Sorting (informal assessment) Total items correct (7 categories, 49 possible correct) PALPA Subtests (Kay, Lesser, & Coltheart, 1992) Auditory synonym judgements (#49) High imageability (30 possible) Low imageability (30 possible) Word semantic association (printed stimuli, #51) High imageability (15 possible) Low imageability (15 possible) Confrontation Naming of Objects: Snodgrass and Vandenvart (1980) items (1st administration) Total items correct (260 possible) Error analysis (71 errors) Semantic paraphasias Phonemic paraphasias Mixed semantic & phonemic paraphasias Gestural response Unrelated, real—word response Neologism Perseverative response No response Calculated as percentage of total errors.

Anomic 16.5 9.5 6.8 6.7 19 (18%) 4 4 1 2 8 88 (100%) 49 (100%)

28 (93%) 21 (70%) 15 (100%) 15 (100%)

189 (73%) 26 (37%)* 5 (7%)* 9 (13%)* 3 (4%)* 5 (7%)* 5 (7%)* 1 (1%)* 17 (24%)*

18 APHASIOLOGY

semantic information to make such judgements. Similarly, he performed accurately with category sorting and semantic association tasks. However, the participant exhibited difficulty in making auditory synonym judgements. In light of his performance on these tasks and the types of errors exhibited, the participant’s lexical retrieval difficulties seemed to be predominately semantic in nature, with the likelihood of some co-existing phonologic—level disruptions. Experimental stimuli The participant was asked to name a set of 260 line drawings depicting objects (Snodgrass & Vanderwart, 1980) twice on two separate occasions. Performance on the two administrations of the 260 items was used as the basis for selection of the experimental stimuli. Specifically, items that were selected were those that the participant has missed on both naming occasions. Four sets of stimuli, of 12 items each, were individually selected (see Appendix A). The participant’s stimuli sets were matched as closely as possible for frequency of occurrence and relative difficulty during baseline testing. The two lists with the most similar and stable baseline naming performance were selected for initial application of treatment and the remaining lists were designated for generalisation assessment and for secondary application of treatment. Treatments The treatments studied in this investigation, SCT and PCT, were hierarchical cueing treatments that were designed to be similar to each other in terms of general application of treatment. Each treatment was comprised of a prestimulation phase and a traditional cueing hierarchy (Patterson, 2001). SCT and PCT both began with a prestimulation phase in which the target item was presented with three picture foils and the participant was asked to point to the picture that corresponded to either a description (SCT) or nonword rhyme (PCT) (see Appendix B for treatment descriptions). Following the prestimulation phases, the cueing treatments were applied. Both cueing hierarchies were composed of five levels of cueing that were response— contingent (i.e., the cues were applied only upon an incorrect response). With both hierarchies, the successive cues became increasingly powerful in terms of eliciting the target response. Upon elicitation of a correct response, the cueing hierarchies were applied in reverse order, beginning with the level of cue that preceded the correct response. For both treatments, each of the pictures designated for treatment was presented individually, in random order. One presentation of each of the 12 pictures constituted a treatment trial. The participant completed three trials per treatment session.

COMPARISON OF CUEING TREATMENTS 19

Treatment was conducted until 100% accuracy of naming was achieved on two of three consecutive probes for at least one of the treated lists, or until 20 treatment applications were conducted for both treatments. Experimental design An alternating treatments design (ATD) was employed in combination with a multiple baseline design across behaviours. PCT and SCT were applied to two word lists while the remaining lists remained untreated. Following achievement of probe performance criterion, the treatments were applied to the remaining two word lists. As indicated previously, the two word lists for which performance was the most similar and stable over three baseline probes were selected for initial application of treatment. PCT and SCT were then randomly assigned to those lists. Treatments were applied concurrently to the word lists. Specifically, on each day that a participant received treatment, one treatment was applied, a rest period of 10–20 minutes was provided, and then the other treatment was applied. Treatments were alternated (across days) in keeping with design constraints. Baseline phase. During baseline probes, the experimental picture stimuli (i.e., four sets of pictures) were presented in random order. The participant was instructed to name each picture to the best of his/her ability and a 15–second response interval was provided. Each final response was scored according to a multidimensional scoring system. A binary scoring system was used for the purposes of graphing. Treatment phases. Treatment was conducted two to three times per week. Probes, identical to those administered in baseline, were conducted immediately prior to the start of each treatment session. Items in lists that were not currently undergoing treatment were scheduled for probing every fourth session. Maintenance phase and follow-up phases. Following completion of treatment with Lists 1 and 2, maintenance probing of items in those lists continued during treatment of Lists 3 and 4. Follow-up probes with all items were conducted at 2and 6-week intervals following the completion of treatment. RESULTS The percentage of items named correctly by the participant in probe sessions is depicted in Figure 1. The top graph shows responses to items in Sets 1 and 2 and the bottom graph shows responses to items in Sets 3 and 4. The participant named only one item correctly per probe for both Sets 1 and 2 during baseline for an accuracy level of 8% for each. He correctly named two to three items for Sets 3 and 4 during baseline (accuracy levels ranging from 17% to 25%). Following application of PCT to Set 1 items and SCT to Set 2 items, correct responses increased for both sets of items. The participant achieved criterion for

20 APHASIOLOGY

SCT items following five treatment sessions and reached 83% correct responding for PCT items. The participant’s responses to untrained items (Sets 3 and 4) remained relatively stable during the initial training phase. Specifically, there was an increase of one additional item named correctly for the SCT set in comparison to the maximum baseline level (i.e., an increase from 25% correct to 33% correct). Because of this increase, additional probing was conducted to establish stability of responding. Two additional probes (see sessions 9 and 10 on the lower graph) indicated relative stability at 33% correct for Sets 3 and 4. After the termination of treatment with Sets 1 and 2 and the extended probing of Sets 3 and 4, treatment was then extended to the untreated sets: PCT was applied to Set 3 and SCT was applied to Set 4. Increases in correct responses were seen for both sets. Criterion was reached for Set 3 (SCT) after nine treatment sessions. Correct naming of Set 4 (PCT) items reached 75%. Probes of performance with Set 1 and 2 items during treatment of Sets 3 and 4 revealed initially strong maintenance (i.e., 100% and 83% accuracy for SCT and PCT items, respectively) followed by a reduction in accuracy (i.e., 67% and 58% accuracy for SCT and PCT, respectively). However, follow-up probing at 2 and 6 weeks following cessation of all treatment indicated maintenance of trained behaviours at levels that approximated treatment probe performance for all sets: PCT #1 (Set 1) = 83% and 75%; SCT #1 (Set 2) = 83% and 92%; PCT #2 (Set 3) = 75% and 75%; SCT #2 (Set 4) = 92% and 92%. CONCLUSIONS The results of this investigation are in accord with findings by Visch—Brink et al. (2002) and Wambaugh et al. (2001) in that both treatments produced positive changes with this participant. Although the participant displayed superior performance with SCT, he did respond positively to PCT as well. The participant achieved higher levels of accuracy of naming with SCT for two treatment comparisons (i.e., the ATD was replicated within the participant). For both treatment comparisons, he correctly named SCT items at levels that were approximately 20% higher than PCT items. This difference also remained at 6 weeks post—treatment. His greater success with SCT may be related to word— retrieval behaviours that were observed prior to the start of treatment. That is, he often spontaneously used semantically related sentence cues and descriptions to facilitate word retrieval. It is unknown whether this self-cueing strategy was selfinitiated or was a result of previous therapy. In either case, the participant may have been predisposed to favour semantic cues. It is also possible that SCT had a more facilitative impact than PCT in effecting accurate lexical processing. It should be noted that the participant received a limited number of treatment sessions during both treatment phases (i.e., five applications of each treatment during the first treatment phase and nine applications during the second phase). Additional treatment sessions may have resulted in increased levels of accuracy

COMPARISON OF CUEING TREATMENTS 21

Figure 1. Percentage of items named correctly in probes.

of responding to PCT items. That is, the maximal effects of PCT may not have been observed with this participant.

22 APHASIOLOGY

The results of this investigation provide further support for the use of SCT and PCT, in that both appear likely to be beneficial in promoting increased accuracy of naming of trained items. Clinicians may consider the use of a period of trial therapy in the form of an ATD to assist in treatment selection. The use of an ATD to compare speech/language treatments is almost always complicated by the issue of possible generalisation effects. Although the measurement of additional, untreated behaviours is not a requisite in the application of an ATD (Barlow & Hersen, 1984), the use of such measurements may assist in the determination of the presence of potential generalisation effects. However, in the case of treating a behaviour that may improve through repeated exposure to probe stimuli (as in the case of word retrieval), improvements in performance may be misinterpreted as generalisation. If measuring untrained behaviours repeatedly, the investigator may wish to measure other untreated behaviours at pre- and post-treatment intervals (i.e., limited repeated measurement) to compare the effects of repeated exposure on untrained behaviours. If previous research has indicated that generalisation effects can be expected to be minimal and the researcher’s interest in is the relative differences of treatments being administered concurrently, the researcher may chose to forgo the repeated measurement of untreated behaviours and utilise a more traditional ATD. Regardless of design specifics, the replication of the observed effects is recommended both within and across speakers to strengthen internal and external validity, respectively. REFERENCES Barlow, D. H., & Hersen, M. (1984). Single case experimental designs: Strategies for studying behavior change (2nd ed.). New York: Pergamon Press. Blomert, L. (1992). The Amsterdam–Nijmegen Everyday Language Test (ANELT). In N.Steinbuchel & D.Y. von Cramon (Eds.), Neuropsychological rehabilitation (pp. 121–127). Berlin: Springer Verlag. Davis, A., & Pring, T. (1991). Therapy for word-finding deficits: More on the effects of semantic and phonologic approaches to treatment with dysphasic patients. Neuropsychological Rehabilitation, 1, 135–145. German, D. J. (1990). Test of adolescent/adult word finding. Austin, TX: Pro-Ed. Howard, D., Patterson, K., Franklin, S., Orchard-Lisle, V., & Morton, J. (1985). Treatment of word retrieval deficits in aphasia: A comparison of two methods. Brain, 108, 817–829. Kay, J., Lesser, R., & Coltheart, M. (1992). Psycholinguistic Assessment of Language Processes in Aphasia (PALPA). Hove, UK: Lawrence Erlbaum Associates Ltd. Kertesz, A. (1982). The Western Aphasia Battery. New York: Grune & Stratton. Miceli, G., Amitrano, A., Capasso, R., & Caramazza, A. (1996). The treatment of anomia resulting from output lexical damage: Analysis of two cases . Brain and Language, 52, 150–174. Nickels, L., & Best, W. (1996). Therapy for naming disorders (Part I): Principles, puzzles, and progress. Aphasiology, 10,

COMPARISON OF CUEING TREATMENTS 23

Patterson, J. (2001). The effectiveness of cueing hierarchies as a treatment for word retrieval impairment. ASHA Special Interest Division–2 Newsletter, 11(2), 11–17. Raymer, A. M, Thompson, C. K., Jacobs, B., & LeGrand, H. R. (1993). Phonological treatment of naming deficits in aphasia: Model-based generalization analysis. Aphasiology, 7, 27–53. Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and complexity. Journal of Experimental Psychology: Human Learning and Memory, 6, 174–215. Vishch-Brink, E., Doesborgh, S., van Harskamp, F., Bippel, D., Koudstaal, P., & van de Sandt-Loenderman, M. (2002, June). The efficacy of lexical semantic therapy in aphasia, a randomized controlled trial. Paper presented at the annual Clinical Aphasiology Conference, Branson, MO. Wambaugh, J. L., Doyle, P. J., Martinez, A. L., & Kalinyak-Fliszar, M. (2002). Effects of two lexical retrieval cueing treatments on action naming in aphasia. Journal of Rehabilitation Research and Development, 39(4), 455–466. Wambaugh, J. L., Linebaugh, C. W., Doyle, P. J., Martinez, A. L., Kalinyak-Fliszar, M. M., & Spencer, K. A. (2001). Effects of two cueing treatments on lexical retrieval in aphasic speakers with different levels of deficit. Aphasiology, 10/11, 933–950. Wambaugh, J. L., Linebaugh, C. W., Doyle, P. J., Spencer, K. A., & Kalinyak-Fliszar, M. (1999). Effects of deficit-oriented treatments on lexical retrieval in a patient with semantic and phonologic deficits. Brain and Language, 73, 446–450.

APPENDIX A EXPERIMENTAL STIMULI List 1—PCT #1

List 2—SCT #1

List 3—PCT #2

List 4—SCT #2

axe cannon eagle frog ironing board kettle necklace peanut sled suitcase toothbrush wheel

alligator bowl chisel fence lips lobster mitten pumpkin rollerskate ruler thread toaster

basket cigarette couch crown donkey garbage can grapes mushroom saw skirt traffic light wagon

bottle bow caterpillar desk doorknob envelope guitar kangaroo lettuce pliers tennis racket windmill

24 APHASIOLOGY

APPENDIX B DESCRIPTION OF TREATMENTS Semantic Cueing Treatment (SCT) Prestimulation. The target item was presented in picture form with three picture foils (two semantically related, one unrelated). The examiner provided a verbal phrase corresponding to the item and asked the participant to point to the correct picture. Cueing hierarchy. The application of the steps of the hierarchy was responsecontingent. The steps were applied sequentially until a correct naming response was elicited. Then, the order of the steps was reversed, to elicit correct responses at each of the preceding steps. In the event that an incorrect response occurred during the hierarchy reversal, the order of hierarchy steps was again reversed until a correct response was obtained. (1) Picture of target item presented, naming response requested, verbal feedback provided for correct or incorrect responses (7–8-second response time allowed—same for following steps). (2) Picture of target item presented along with a verbal description of target, naming response requested, verbal feedback provided for correct or incorrect responses (e.g., target = cow, “a farm animal that gives milk”). (3) Picture of target item presented along with a semantically nonspecific sentence completion phrase, naming response requested, verbal feedback provided for correct or incorrect responses (e.g., “The farmer fed the…”). (4) Picture of target item presented along with a semantically loaded sentence completion phrase, naming response requested, verbal feedback provided for correct or incorrect responses (e.g., “The farmer went to the barn to milk the…”). (5) Picture of target item presented along with verbal model of target word, repetition of target word requested. Phonologic cueing treatment (PCT) Prestimulation. The target item was presented in picture form with three picture foils (two phonetically related, one unrelated). The examiner provided a verbal phrase corresponding to the item and asked the participant to point to the correct picture. Cueing hierarchy. The application of the steps of the hierarchy was the same as above. (1) Picture of target item presented, naming response requested, verbal feedback provided for correct or incorrect responses (7–8-second response time allowed—same for following steps).

COMPARISON OF CUEING TREATMENTS 25

(2) Picture of target item presented along with a verbal production of a non-real word that rhymed with the target (e.g., target = pig, “it rhymes with chig”). (3) Picture of target item presented along with a verbal first sound cue (e.g., “it starts with /p/”). (4) Picture of target item presented along with a sentence completion phrase that included the rhyme and the sound cue, naming response requested, verbal feedback provided for correct or incorrect responses (e.g., “The name of this picture rhymes with chig, it is a /p/…”). (5) Picture of target item presented along with verbal model of target word, repetition of target word requested.

Measures of lexical diversity in aphasia Heather Harris Wright University of Kentucky, USA Stacy W.Silverman University of Missouri-Columbia, USA Marilyn Newhoff San Diego State University, USA

Background: Important to the assessment of aphasia are analyses of discourse production and, in particular, lexical diversity analyses of verbal production of adults with aphasia. Previous researchers have used type-token ratio (TTR) to measure conversational vocabulary in adults with aphasia; however, this measure is known to be sensitive to sample size, requiring that only samples of equivalent length be compared. The number of different words (NDW) is another measure of lexical diversity, but it also requires input samples of equivalent length. An alternative to these measures, D, has been developed (Malvern & Richards, 1997) to address this problem. D allows for comparisons across samples of varying lengths. Aims: The first objective of the current study was to examine the relationships among three measures of productive vocabulary in discourse for adults with aphasia: TTR, NDW, and D. The second objective was to use these measures to determine in what ways, and to what degree, they each can differentiate fluent and nonfluent aphasia. Methods & Procedures: Eighteen adults with aphasia participated in this study (nine with nonfluent aphasia; nine with fluent aphasia). Participants completed the Western Aphasia Battery (WAB) and produced language samples consisting of conversation and picture description. Samples were then subjected to the three lexical diversity analyses. Outcomes & Results: Results indicated that, although the measures generally correlated with each other, adults with fluent aphasia evidenced significantly higher D and NDW values than those

LEXICAL DIVERSITY IN APHASIA 27

with nonfluent aphasia when whole samples were subjected to analyses. Once samples were truncated to 100- and 200-word samples, groups differed significantly for all three measures. Conclusions: These findings add further support to the notion that because TTR and, although to a lesser extent, NDW are sensitive to sample size, length differences across samples tend to confound results. As an alternative to these measures, the use of D for the measurement of conversational vocabulary of adults with aphasia enables the analysis of entire language samples, so that discarding language sample data is not necessary. In the present study, D values differed for fluent and nonfluent aphasia samples.

Adults with aphasia present with word retrieval deficits during discourse production. These deficits may present themselves in discourse through the person’s use of nonreferential terminology, pauses, filler terms, paraphasias, or neologisms. Typically, adults with a nonfluent type of aphasia use pauses and filler terms as they struggle with verbal output. By contrast, adults with a fluent type of aphasia have little difficulty with verbal output, although they do produce paraphasias and neologisms during verbal production. Clearly, an important aspect to aphasia assessment is the analysis of discourse production, especially given the fact that many of the above characteristics are primarily detectable through the analysis of discourse. Several researchers have assessed percent of information units provided by adults with aphasia when stimuli are controlled (e.g., McNeil, Doyle, Fossett, Park, & Goda, 2001; Nicholas & Brookshire, 1993). However, one important aspect of discourse that has not been readily assessed in adults with aphasia is the lexical diversity of their verbal production. Given the observation that many of the error types observed in adults with aphasia appear to be, at least in part, lexical in nature, it seems of particular importance to refine the tools we use in measuring aspects of the lexical domain in discourse production. One measure of lexical diversity in conversation has enjoyed particular popularity in the child language literature for decades: type-token ratio (TTR). TTR is a measure of conversational vocabulary and is defined as the ratio of the total number of different words in a language sample to the total number of

Address correspondence to: Heather Harris Wright PhD, The University of Kentucky, Division of Communication Disorders, CHS Building, 900 S. Limestone, Lexington, KY 40536–0200, USA. Email: [email protected] © 2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html DOI:10.1080/ 02687030344000166

28 APHASIOLOGY

words in the sample (Miller, 1981; Templin, 1957). Ratios closer to 0 reflect less diversity of vocabulary, whereas values closer to 1.0 reflect greater diversity. As was identified early on, however, TTR measurements are sensitive to sample size variations; larger samples tend to yield lower TTR values than smaller samples (Fillenbaum, Jones, & Wepman, 1961; Hess, Sefton, & Landry, 1986). Wachal and Spreen (1973) used TTR and a variety of TTR-based alternatives (i.e., mean segmental TTR, Johnson, 1944; bilogarithmic TTR, Chotlos, 1944; Herdan, 1960), designed to account for differences in sample sizes across participants, to measure vocabulary diversity in adults with aphasia and their non-brain-damaged counterparts. They found that adults with aphasia presented with less lexical diversity in conversation compared to adults with no brain damage, and that several of the measures, including the original TTR calculation, mean segmental TTR , bilogarithmic TTR, and root TTR (Guiraud, 1959), significantly differentiated the two groups. The concern about the sensitivity of TTR to sample size variation is, although perhaps to a lesser extent, also valid for many of the various transformations of TTR. For example, mean segmental TTR is the average TTR for several consecutive, equal-length segments of the sample. This would allow for comparisons of different sample lengths, as long as equivalent sample segment sizes were used. However, since segment size must be controlled, this measure is still dependent on sample size. Other investigations involving the use of TTR as a measure of lexical diversity in discourse production of adults with aphasia (e.g., Prins, Snow, & Wagenaar, 1978; Spreen & Wachal, 1973) have produced similar findings relative to the productive vocabulary of adults with aphasia, but they highlight the central weakness of TTR; its sensitivity to sample size variation. More recently, although within the area of child language research, there have been some data to suggest that TTR, when used in the diagnosis of specific language impairment, is not sufficiently sensitive to separate the lexical performance of children with and without language impairments (Watkins, Kelly, Harbers, & Hollis, 1995). In particular, Watkins and colleagues found that, even when samples were truncated to 50- and 100-utterance subsamples, TTR did not distinguish between the two groups. It was only when samples were controlled for the number of words, rather than utterances (i.e., calculating the number of different words occurring in subsamples of 100 and 200 total words) that the measure differentiated between the two groups. Thus, it appears rather critical to control sample size by truncating samples to a common length. Given the findings of Watkins and colleagues (1995), sample length should be determined by the number of words, rather than the more common practice of determining length by the number of utterances. Relatedly, number of different words (NDW) has also been used to estimate the diversity of conversational vocabulary across clinical populations (e.g., Ratner & Silverman, 2000; Watkins et al., 1995). Of importance, however, although NDW has become the preferred measure in child language studies (e.g., Dollaghan et al., 1999; Goffman & Leonard, 2000), investigators who compute

LEXICAL DIVERSITY IN APHASIA 29

TTR on a standard number of words across samples (as opposed to a standard number of utterances) are, in essence, computing NDW, because they are simply dividing the number of different words in the samples by a common denominator (e.g., 100 words). That is, TTR and NDW are perfectly correlated when samples contain the same number of total words. Although NDW is also somewhat sample-size sensitive, it is less so than TTR. As a given sample grows in length, the probability that each consecutive word represents a “new” word (as opposed to one that has already been produced in the sample) decreases. That is, as the sample becomes longer, the NDW increases less. However, TTR does not yield this same stability; as a sample increases, the probability of the numerator (NDW) increasing with each new word decreases, but the denominator (total number of words) increases with each item. From a computational standpoint, then, NDW is not as affected by size variation as TTR. Despite its greater stability, however, to date NDW has not been studied empirically as a measure of lexical diversity with the language samples of adults with aphasia. Recently, an alternative to these two measures, D, has been developed by Malvern and Richards (1997) to address the problem of varying sample sizes. D is a mathematical algorithm applied to TTR. As McKee, Malvern, and Richards (2000, p. 323) noted: The new measure is calculated by, first, randomly sampling words from the transcript to produce a curve of the TTR against Tokens for the empirical data. Then the software finds the best fit between this empirical curve and theoretical curves calculated from the model by adjusting the value of a parameter. The parameter, D), is shown to be a valid and reliable measure of vocabulary diversity without the problems of sample size found with previous methods. D is computed as described above using the vocd utility of the CLAN language analysis program (MacWhinney, 2000). D allows for the input of samples of any size (greater than 50 words), and preliminary results have suggested that the resulting D values are relatively stable despite size variations across samples. McKee and colleagues (2000) found that D values obtained with split-half samples were not significantly different from those obtained with whole samples of children with normal language. Similar results have been reported by Silverman and Ratner (in press) in the measurement of conversational vocabulary in children who stutter. Owen and Leonard (2002), in comparing diversity of conversational vocabulary in children with and without specific language impairment, concluded that, although sample-size effects could not be ruled out, it was clear that D was less susceptible to size variation than TTR or NDW. The validity of D has been evaluated using samples of normal (e.g., McKee et al., 2000) and disordered language production in children (Owen & Leonard, 2002) and samples of adults learning English as a second language (Malvern &

30 APHASIOLOGY

Richards, 1997), although to date there has been no known application to language samples of individuals with aphasia. Of additional interest, although measures of lexical diversity have been used to examine lexical differences between the discourse of adults with and without aphasia (Prins et al., 1978; Spreen & Wachal, 1973; Wachal & Spreen, 1973), these measures are not known to have been applied to the differentiation of individuals with fluent and nonfluent aphasia. Given the distinct verbal production characteristics of adults with fluent and nonfluent types of aphasia, and the greater ease of verbal output by adults with a fluent type of aphasia, it would perhaps be expected that vocabulary diversity would differ between groups, with adults with fluent aphasia possessing greater vocabulary diversity. On the other hand, both groups are apt to present with word retrieval deficits that would impact the diversity of vocabulary they demonstrate. Whether or not it is reasonable to expect adults with fluent aphasia to demonstrate greater vocabulary diversity in discourse than nonfluent counterparts, one would expect differences in the lengths of samples obtained from adults with fluent and nonfluent aphasia when time and/ or content are controlled. For this reason, a lexical measure that allows for such variation in sample lengths could be beneficial as a tool for assessment of lexical content across aphasia types. The purpose of this study, then, was to examine the relationships among three measures of productive vocabulary in discourse for adults with aphasia: TTR, NDW, and D. Moreover, we sought to determine how these measures differentiate fluent from nonfluent aphasia if, indeed, they did. Given that adults with nonfluent aphasia tend to produce smaller language samples, in number of words and number of utterances, than the fluent aphasia of their counterparts, we were particularly interested in ascertaining whether this difference appeared to impact results when whole language samples were analysed. We expected that the measures, when used as they were designed (i.e, using whole samples for D, restricting sample size for TTR and NDW), would differentiate the groups as demonstrating nonfluent or fluent aphasia. Additionally, we expected that these measures, each performed as originally intended, would be strongly correlated with each other. METHOD Participants A total of 23 adults with unilateral left brain damage subsequent to cerebrovascular accident participated in the study. Once language samples were transcribed and sample length was determined, five study participants’ data were not included because these participants did not produce the minimum 200 words. Data from nine adults with nonfluent aphasia (NF) and nine adults with fluent aphasia (F) were included and subjected to analyses. Type and severity of

LEXICAL DIVERSITY IN APHASIA 31

aphasia were confirmed by performance on the Western Aphasia Battery (WAS) (Kertesz, 1982). Aphasia quotients (AQ) were obtained for each participant. The mean AQ for participants with nonfluent aphasia was 77.2 (SD = 5.7; range 67.0– 81.0) and the mean AQ for participants with fluent aphasia was 85.8 (SD = 7.7; range 71.0–94.0). Participants in the NF group scored 4 or lower on the fluency portion of the WAB and participants in the F group scored 5 or higher. Participants in the two aphasia groups were matched by their performance on the auditory comprehension subtests of the WAB (NF group: X = 9.2, SD = 0.8; F group: X = 9.3, SD = 0.7). Further, aphasia groups did not differ significantly in age, t(8) = 1.13, p = .29, years of education, t(8) = 1.17, p = .28, WAB AQ, t(8) = 2.19, p = .06, or score on the auditory comprehension subtests, t(8) = 0.69, p = . 51. Table 1 shows demographic and clinical data for the individual participants. TABLE 1 Demographic and clinical description data for the aphasia participants Participant Age

Gender

Education M/p (in CVA1 years)

WAB AQ2

And comp3

Sample size

4NF1

Female Female Female Female Female Female Male Female Female Male Male Female Female Male Male Female Female Male

14 14 20 14 19 18 16 14 11 15 12 12 13 16 16 15 13 16

80.6 81.0 80.6 72.3 82.0 79.5 81.9 70.1 67.0 71.0 76.3 85.6 87.3 87.6 94.0 85.4 93.0 92.0

9.00 10.00 9.80 9.45 9.10 9.75 9.35 8.75 7.20 8.70 8.65 9.70 9.95 9.80 10.00 9.30 9.60 8.20

421 489 273 587 277 409 375 208 240 392 598 396 463 438 655 427 474 458

NF2 NF3 NF4 NF5 NF6 NF7 NF8 NF9 5F1 F2 F3 F4 F5 F6 F7 F8 F9 1

86 59 57 85 47 53 38 52 35 67 76 54 55 76 63 76 60 83

25 41 72 25 48 261 10 12 204 15 163 6 29 12 17 6 42 8

months since cerebrovascular accident; 2 Western Aphasia Battery aphasia quotient; 3 score on auditory comprehension snhtests of WAB: 4 nonfliient; 5 fluent

Language elicitation and transcription Spontaneous conversation, supplemented by elicited conversation in response to the WAB “Picnic Scene”, comprised the language sample for each participant.

32 APHASIOLOGY

The samples were audiorecorded, then transcribed and coded according to the conventions of the Child Language Data Exchange System (CHILDES; MacWhinney, 2000). The CHILDES system consists of a transcription protocol (CHAT) and a series of language analysis programs (CLAN). Samples were first transcribed verbatim and then coded for CLAN analysis. For inter-rater agreement, the first author reviewed 22% of the audiotaped samples for correspondence to the transcript. Word-by-word agreement was determined to be 98.5%. Revisions, direct repetitions, and fillers were coded for exclusion, so as not to be counted in calculations of vocabulary diversity. This decision was made because counting them essentially penalised participants who were more disfluent, regardless of the lexical content of their language. In addition, paraphasias that were recognisable English words were transcribed verbatim, instead of attempting to discern intended lexical targets; that is, the transcriptions are reflective only of the words and utterances actually produced. Unrecognisable words and neologisms were coded for exclusion from the transcripts. Each of the language samples obtained had at least 200 words; a minimum sample length of 50 words is required to compute D. Sample size ranged from 208 to 587 words for the NF group (X = 364.33; SD = 125.65) and from 392 to 655 words (X = 477.89; SD = 89.92) for the F group; the F group produced significantly more words than the NF group, t(8) = 2.63, p < .05. Language analysis Each sample was subjected to three measures of lexical diversity, each performed by CLAN (MacWhinney, 2000): D, NDW, and TTR. Because TTR is known to be sensitive to sample size variation, truncated samples of the middle 100 and 200 words were obtained, and each was subjected to the three analyses. This procedure, similar to that of Watkins and colleagues (1995), allowed for a more equitable comparison of TTR, NDW, and D. If these measures are each performed as originally intended (restricting sample size for TTR and NDW, and using whole samples for D), they should be strongly correlated with each other. RESULTS Relationships among D, NDW, and TTR Pearson correlations were performed to determine the possible relationships among D, NDW, and TTR. Because of the number of correlations performed, a Bonferroni adjustment was applied to minimise the probability of Type I error. Consequently, results with p values of < .001 were considered significant. As shown in the correlation matrix in Table 2, when the samples were truncated to 100 and 200 words, the three measures correlated with each other for each sample length. However, when whole samples were used, none of the three

LEXICAL DIVERSITY IN APHASIA 33

measures was significantly related to the others at the p < .001 level. In addition, as shown in the correlation matrix, although each of the three D sample sizes were significantly correlated with each other, this was not the case for the other two measures; correlations between the three sample sizes were observed only once for TTR (between TTR–100 and TTR–200) and twice for NDW (between whole samples NDW and NDW–200, and between NDW–100 and NDW–200). Finally, the relationships among lexical diversity measures, when used as intended, were assessed. In particular, the relationships among D “whole samples,” NDW “100- and 200-words,” and TTR “100and 200–words” were evaluated. The results indicated that each of these correlations was significant at the p < .001 level. TABLE 2 Pearson correlation matrix for whole samples, 100–word samples, and 200– word samples of NDW, TTR, and D

NDW NOW 100 NDW 200 TTR TTR 100 TTR 200 D D 100 1

NDW1

NDW 100

NDW 200

TTR2

TTR 100

TTR 200

D

D 100

D 200

1.0

.66 1.0

.79* .92* 1.0

.06 .46 .46 1.0

.66 1.0* .92* .46 1.0

.79* .91* 1.0* .44 .91* 1.0

.69 .77* .81* .56 .77* .80* 1.0

.63 .96* .86* .48 .96* .85* .79* 1.0

.78* .88* .95* .40 .88* .95* .86* .86*

Number of different words; 2 type-token ratio; * indicates significant at p < .001.

Lexical diversity in adults with fluent vs nonfluent aphasia In an attempt to determine the sensitivity of the measures to aphasia type, several paired sample Mests were performed. Groups differed significantly in the conversational vocabulary they produced when measured by D with whole samples, t(8) = 2.69, p < .05, 100–word samples, t(8) = 2.74, p < .05, and 200– word samples, t(8) = 3.55, p < .01. Groups also differed significantly for NDWwith whole language samples t(8) = 3.07, p < .05, and when samples were truncated to 100 words, t(8) = 2.44, p < .05, and 200 words, t(8) = 2.77, p < .05. However, groups did not differ significantly for TTR with whole samples, t(8) = 0.65, p > .05, but did once samples were truncated to 100 words, t(8) = 2.45, p < . 05, and 200 words, t(8) = 2.71, p< .05. See Table 3 for means and standard deviations for groups’ D, NDW, and TTR results.

34 APHASIOLOGY

DISCUSSION The first objective of the investigation was to examine the relationships among D, TTR and NDW across three sample lengths: whole sample, 100–word, and 200– word. Since D is a relatively new measure of lexical diversity, one that had not previously been used to assess the conversational vocabulary of adults with aphasia, it was appropriate to determine the extent of the relationships among this new measure and two other, wellestablished measures. Findings suggested that the measures were all significantly correlated when samples were truncated to 100 and 200 words, but that there were no significant relationships among any of the measures when whole samples were used. This finding is likely attributable to the fact that, in the present study, adults with fluent aphasia produced (a) greater vocabulary diversity, but also (b) significantly longer samples. Although longer samples should not impact D, longer samples would be expected to negatively impact TTR while positively impacting NDW. With adults with fluent and nonfluent aphasia presenting opposite patterns, then, it appears there was a cancellation effect as a result of the uncontrolled length of the samples. It is clear from these data that, in general, the correlations are relatively stronger, across analyses, for samples of the same length (e.g., greater for D–100 and NDW–100 than for NDW whole samples and NDW–200). These stronger correlations would appear to be a result of the fact that the same subset of language sample data is used. Such TABLE 3 D, NOW, and TTR means (standard deviations) for whole language samples, 100–word samples, and 200–word samples for fluent and nonfluent aphasia groups Aphasia groups

D whole sample D 100–word sample D 200–word sample NDW3 whole sample NDW 100–word sample NDW 200–word sample TTR4 whole sample TTR 100–word sample TTR 200–word sample 1

NF1 Group (N = 9)

F2 Group (N = 9)

55.11 (19.35) 45.17 (15.82) 50.35 (20.04) 148.33 (43.93) 58.44 (6.46) 95.22 (13.06) .41 (.05) .58 (.06) .48 (.08)

79.39 (12.14) 70.43 (19.12) 79.16 (11.03) 202.89 (29.86) 66.22 (5.09) 110.89 (7.98) .43 (.04) .66 (.05) .56 (.04)

nonfluent; 2 fluent; 3 number of different words; 4 type-token ratio.

findings would seem to suggest that, as long as the decision is made to limit samples to a particular length, any of the three analyses might be used to arrive at similar conclusions about conversational vocabulary for adults with aphasia.

LEXICAL DIVERSITY IN APHASIA 35

The finding that the D values for each sample length were significantly related seems to provide evidence of the stability of this measure of vocabulary measurement. In contrast, TTR and NDW did not demonstrate this same consistency of results across sample sizes. Rather, the significance of the correlations with these measures varied (respectively) across sample sizes. As a whole, this finding seems to highlight the issue of the sample-size sensitivity of TTR and NDW, compared to the D. As discussed above, although NDW is also somewhat sample-size sensitive, it is less so than TTR, the value of which changes with every new word added to the sample. In some sense, given sample sensitivity concerns, comparing TTR and NDW to D using whole samples for each analysis is a rather unfair comparison, although such analyses seemed to lead to a more complete understanding of both the analyses and the lexical abilities of each group of adults with aphasia. Results of the present study suggest that if each analysis is used as intended, with equivalent samples for TTR and NDW and with whole samples for D, the measures are each significantly related to one another, highlighting the importance of using truncated sample data with TTR and NDW. There is an equally important issue, however, related to the ecological validity of using measures that require discarding language sample data. In collecting language sample data, of course, there is an attempt to obtain as representative a sample as possible. When language sample data are discarded because of the constraints of a particular analysis, it is important to question whether the sample is then less representative of the person’s language abilities. In addition, arbitrary decisions come into play, related to selecting the subset of words or utterances to be included in the sample. In the present study, for example, we chose the middle 100 and 200 words for our truncated analyses; however, not all participants will be at their best in the middle of the samples. Due to fatigue and/or frustration, some might perform better earlier in the conversation. Still others, due to slow rise time, might actually perform better later. Finally, in some cases, adults with aphasia may not be capable of providing a sample of sufficient length to allow the examiner the option of selecting 100 or 200 words for analysis. An a priori intent to truncate samples to a specific predetermined length, then, could lead either to misrepresentation of conversational abilities in cases in which a substantial amount of language sample data is discarded, or to discarding language sample data altogether if a client is not able to produce a sample of the predetermined length. The second objective of our study was to determine if these measures of lexical diversity adequately differentiated adults with nonfluent and fluent aphasia. Again, we were most interested in the use of whole language samples for these analyses since it is our position that these will be most representative of the abilities of individuals with aphasia. When whole language samples were used, only D and NDW differentiated NF and F aphasia types. It is not surprising that TTR did not reveal between-group differences, given its sensitivity to sample size variation. The finding that NDW produced between-group differences on whole

36 APHASIOLOGY

samples, however, is initially somewhat surprising. In theory, it, too, is samplesize sensitive. That is, if a person produces a 200–word language sample, there is a greater opportunity to produce more different words than if the person only produces a 100–word sample. Upon reflection, however, this finding might have been expected; the adults with fluent aphasia tended to produce longer samples, as well as to use more diverse vocabulary. The greater diversity of their vocabulary can be seen in all three analyses when sample length was controlled. With NDW, however, this between-groups difference is magnified by the fact that the adults with fluent aphasia also produced more language, resulting in their performance appearing even more different from those with nonfluent aphasia than it actually was. Thus, our finding of group differences for whole samples with NDW is in no way indicative that the measure is stable across sample-size variation. The question of sample-size sensitivity might also be raised with respect to D. If D is sample-size sensitive, it also should inflate the values for adults with fluent aphasia. To assess this possibility informally, D analyses were performed on split halves of each of five samples selected randomly, such that for each sample every other utterance was omitted from the analysis. The 10 half-sample D values (even utterances and odd utterances) were then compared to the 5 whole-sample D values. In theory, if D were sample-size sensitive, D values for the half samples would each fall below their respective D value for the whole sample, demonstrating that fewer words yield lower values than more words. The results of this informal analysis, however, were that six of the halves fell below their “whole” D value, and four of the halves fell above their “whole” D value. These results are consistent with those of McKee and colleagues (2000), and suggest that D is not, in fact, sample-size sensitive, at least to the extent that TTR and NDW are. Taken as a whole, our results in relation to D are of interest because they suggest that this analysis is appropriate for quantifying conversational vocabulary performance of adults with aphasia. Moreover, our results add to the growing literature regarding the utility of D as a measure of lexical diversity in clinical populations (e.g., Malvern & Richards, 1997; Owens & Leonard, 2002). With respect to NDW, our findings that truncated samples can be used to distinguish groups of differing language skills corroborate those of Watkins and colleagues (1995), although they found that, even using truncated samples, TTR did not distinguish children with language impairment from normal-language peers. The most likely reason for this difference is that Watkins and colleagues truncated samples by utterances rather than words. One limitation of the present study is that the language samples obtained for analyses are relatively small. Hess et al. (1986), for example, have made the case that a minimum of 350 words are needed for reliable computation of TTR, at least for analyses of preschool children with normal language. Despite this limitation, however, findings and implications of the present study are important for at least two reasons. First, samples of the present study allow for comparison to other

LEXICAL DIVERSITY IN APHASIA 37

work with similar samples (e.g., Watkins et al., 1995) and, arguably, are valuable in that insight can be gained from evaluating the extent to which smaller samples can be used with these analyses for this clinical population. Second, from our perspective it is important to propose measurement procedures that are both realistic and capable of implementation in a clinical setting. Although longer samples are desirable from a research perspective, conclusions based on longer samples are not as readily applied to clinical endeavours, given the nature of fluent and nonfluent aphasias and the inherent time constraints of clinical work. Conclusion and clinical implications It appears that D is a rather promising tool for the analysis of lexical diversity in the conversation of adults with aphasia. Its greatest strength is in its ability to accommodate whole language samples while controlling for sample size in its output. In contrast, TTR and NDW both require that language sample data be discarded so that only samples of equivalent length in words are legitimately compared. We have raised concerns about the ecological validity of procedures that require the discarding of language sample data. It appears that D analysis provides group separation between fluent and nonfluent aphasia samples, suggesting perhaps its future use as an additional tool in the differential diagnosis of aphasia. Future studies might further investigate the validity and reliability of D as a measure of conversational vocabulary in adults with aphasia. In addition, conversational vocabulary diversity among other populations with acquired neurogenic disorders warrants exploration. Such work might increase our confidence in the clinical utility of this new measure, as well as enhance our understanding of the conversational lexical abilities of adults with a range of acquired neurogenic disorders. REFERENCES Chotlos, J. W. (1944). Studies in language behavior. IV. A statistical and comparative analysis of individual written language samples. Psychological Monographs, 56(2), 77–111. Dollaghan, C. A., Campbell, T. F., Paradise, J. L., Feldman, H. M., Janosky, J. E., Pitcairn, D. N., et al. (1999). Maternal education and measures of early speech and language. Journal of Speech, Language, and Hearing Research, 42, 1432–1443. Fillenbaum, S., Jones, L. V., Wepman, J. M. (1961). Some linguistic features of speech from aphasic patients. Language and Speech, 4, 91–108. Goffman, L., & Leonard, J. (2000). Growth of language skills in preschool children with specific language impairment: Implications for assessment and intervention. American Journal of Speech-Language Pathology, 9, 151–161. Guiraud, P. (1959). Problemes et methodes de la statistique linguistique. Dordrecht: D.Reidel.

38 APHASIOLOGY

Herdan, G. (1960). Type-token mathematics: A textbook of mathematical linguistics. The Hague: Mouton. Hess, C. K., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech and Hearing Research, 29, 129–134. Johnson, W. (1944). Studies in language behavior. I. A program of research. Psychological Monographs, 56(2), 1–15. Kertesz, A. (1982). Western aphasia battery. New York: Grune & Stratton. MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk (2nd ed.). Hillsdale, NJ: Erlbaum. Malvern, D. D., & Richards, B. J. (1997). A new measure of lexical diversity. In A.Ryan & A.Wray (Eds.), Evolving models of language (pp. 58–71). Clevedon: Multilingual Matters. McKee, G., Malvern, D., & Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Literary and Linguistic Computing, 15 (3), 323–338. McNeil, M. R., Doyle, P. J., Fossett, T. R. D., Park, G. H., & Goda, A. J. (2001). Reliability and concurrent validity of the information unit scoring metric for the story retelling procedure. Aphasiology, 15, 991–1006. Miller, J. (1981). Assessing language production in children. Baltimore: University Park Press. Nicholas, L. E., & Brookshire, R. H. (1993). A system for quantifying the informativeness and efficiency of connected speech of adults with aphasia. Journal of Speech and Hearing Research, 36, 338–350. Owen, A. J., & Leonard, L. B. (2002). Lexical diversity in the spontaneous speech of children with specific language impairment: Application of D. Journal of Speech, Language, and Hearing Research, 45, 927–937. Prins, R. S., Snow, C. E., & Wagenaar, E. (1978). Recovery from aphasia: Spontaneous speech versus language comprehension. Brain and Language, 6, 192–211. Ratner, N. B., & Silverman, S. (2000). Parental perceptions of children’s communicative development at stuttering onset. Journal of Speech, Language, and Hearing Research, 43, 1252–1263. Silverman, S., & Ratner, N. B. (in press). Measuring lexical diversity in children who stutter: Application of D. Journal of Fluency Disorders. Spreen, O., & Wachal, R. S. (1973). Psycholinguistic analysis of aphasic language: Theoretical formulations and procedures. Language and Speech, 16, 130–146. Templin, M. C. (1957). Certain language skills in children, their development and interrelationships. Minneapolis: University of Minnesota Press. Wachal, R. S., & Spreen, O. (1973). Some measures of lexical diversity in aphasic and normal language performance. Language and Speech, 16, 169–181. Watkins, R., Kelly, D., Harbers, H., & Hollis, W. (1995). Measuring children’s lexical diversity: differentiating typical and impaired language learners. Journal of Speech and Hearing Research, 38, 1349–1355.

Limb apraxia, pantomine, and lexical gesture in aphasic speakers: Preliminary findings Miranda Rose and Jacinta Douglas La Trobe University, Victoria, Australia

Background: Speech-language pathologists considering the use of gesture as a therapeutic modality for clients with aphasia must first evaluate the integrity of their cleints’ gesture systems. Questions arise with respect to which behaviours to assess and how to assess the chosen behaviours. There has been a long-held belief that tests of limb apraxia and pantomime provide valid information about candidacy for gesture-based interventions, yet the theoretical and empirical basis of this assumption is limited. Further, the relationship between conversational gesture skill and limb apraxia in cooccurring aphasia has been largely unexplored. It is possible that a client’s gesture performance in natural conversation provides more valid information about gesture treatment candidacy than do tests of limb apraxia. Aims: This study aimed to investigate the relationship between the presence of limb apraxia and conversational gesture use in speakers with nonfluent aphasia. Following the assumption that limb praxis and conversational gesture reflect differing underlying processing, it was hypothesised that speakers with aphasia and limb apraxia would produce the full range of conversational gesture types in a conversational context. Further, it was hypothesised that speakers with demonstrated pantomime deficits on formal tests of pantomime would produce pantomimes naturally in conversation. Thus, a dissociation would be demonstrated between the processing responsible for gesture production as measured in limb apraxia tests and that subserving the production of conversational gesture.

40 APHASIOLOGY

Methods & Procedure: Seven participants with nonfluent aphasia and ideomotor and conceptual limb apraxia conversed in a semistructured conversation with the researcher. All arm and hand gestures produced by the participants were counted and rated according to guidelines provided by Hermann, Reichle, and LuciusHoene (1988), and the time they spent in either gesture or spoken expression was compared. Correlations were calculated between limb apraxia scores and proportions of meaning-laden gestures used in conversation. Outcomes & Results: All seven participants produced a wide range of gesture types. Participants with limited verbal output produced large amounts of meaning-laden gesture. Importantly, even participants with severe limb apraxia produced high proportions of meaning-laden gestures (codes and pantomimes) in the natural setting. There were no significant relationships found between scores on limb apraxia tests and natural gesture use. Conclusions: Patients with nonfluent aphasia and limb apraxia may still use meaningful conversational gesture in naturalistic settings. Tests of limb apraxia may be poor predictors of use of lexical gesture. Thus, clinicians are advised to sample lexical gesture use in spontaneous interactions. Speech-language pathologists often consider the use of gesture as either an alternative to or facilitator of verbal communication for individuals with aphasia (Christopoulou & Bonvillian, 1985; Helm-Estabrooks, Fitzpatrick, & Baressi, 1982; Rao, 1995; Skelly, Schinsky, Smith, & Fust, 1974; Wertz, LaPointe, & Rosenbek, 1984). Clients may be taught to use pantomime (a sequence of gestures demonstrating objects or actions without the need for speech) or AmerInd gestures (Skelly et al., 1974) to communicate thoughts and feelings when verbal communication is not possible, or gestures may be paired with verbal targets in order to facilitate verbal production (Hanlon, Brown, & Gerstmann, 1990; Kearns, Simmons, & Sisterhen, 1982; Rose & Douglas 2001; Rose, Douglas, & Matyas, 2002). Thus, there is a need to determine the integrity of clients’ gesture systems when considering the use of gesture in speech-language pathology interventions for aphasia. However, in considering the gesture abilities

Address correspondence to: Dr Miranda Rose, School of Human Communication Sciences, La Trobe University, Bundoora, 3086, Victoria, Australia. Email: [email protected] © 2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html 02687030344000157

DOI:10.1080/

LIMB APRAXIA AND LEXICAL GESTURE 41

of a client, the method of assessment, and the processes and skills requiring assessment, need to be defined. There have been three main approaches to assessing the gesture abilities of speakers with aphasia. One approach has focused on assessing pantomime skills through tests of limb apraxia (Duffy & Duffy, 1981; Goodglass & Kaplan, 1963; Wang & Goodglass, 1992), the second approach is best described as a “trial-anderror” method where gesture treatments are trialled and their success monitored (Wertz et al., 1984), and the third approach has measured the use of conversational gesture in natural settings (Behrmann & Penn, 1984; Hermann et al., 1988; Le May, David, & Thomas, 1988). Confusion persists with respect to the utility of these three approaches and exactly what information speechlanguage pathologists require in considering candidacy for gesture-based treatments. There has been a long-held notion that testing for the presence of limb apraxia in patients with aphasia provides information about treatment candidacy for gesture-based interventions (Helm-Estabrooks et al., 1982), although evidence to support this assumption is limited. Limb apraxia, which is commonly defined as the inability to perform skilled, purposeful limb movements in the absence of elementary sensorimotor disorders, intellectual deterioration, or comprehension difficulties (Chainay & Humphreys, 2002), frequently co-occurs with aphasia (De Renzi, Raglioni, Lodesani, & Vecchi, 1983; Goodglass & Kaplan, 1963; Kertesz, Ferro, & Shewan, 1984). Limb apraxia is assessed by asking the individual to produce limb movements to verbal command and/or to imitation. Clients are asked to make socially regulated, intransitive gestures such as, saluting, waving goodbye, and making an “OK” sign. Clients are also asked to make transitive gesturres displaying how objects are used, such as demonstrating how to cut bread, either with or without the actual object present. In some apraxia batteries, meaningless movements are also tested to imitation (Kimura & Archibald, 1974). Three types of limb apraxia are currently recognised: ideomotor apraxia (a disorder of temporal, sequential, and spatial organisation of action), ideational apraxia (an incapacity to mentally evoke the action associated with a sequence of objects), and conceptual apraxia (an incapacity to mentally evoke the action associated with a single object) (Ochipa, Rothi, & Heilman, 1992). Several seminal texts and papers have suggested that the presence of limb apraxia in speakers with aphasia prevents them from learning a gesture system for communication and/or from responding to gestural facilitation treatments (Helm-Estabrooks et al., 1982; Rothi & Heilman 1997). However, the empirical evidence to support this assumption is extremely limited. Helm, Kaplan and Vercruysse (as cited in Helm-Estabrooks et al., 1982) found that patients with severe aphasia and limb apraxia did not produce verbal labels or representational gestures to picture stimuli and they suggested that limb apraxia may prevent patients from using representational gestures as a natural means of communication. However, natural gesture use was not assessed in the Helm et al.

42 APHASIOLOGY

study, rather an evaluation of the patients’ abilities to gesture to command or to picture stimuli was made and their natural gesture abilities were inferred from these tasks. Similarly, in a study by Borod, Fitzpatrick, Helm-Estabrooks, and Goodglass (1989), significant correlations were found between the conversational gesture abilities of individuals with aphasia as rated on their Nonvocal Communication Scale and ratings of limb apraxia from the Boston Apraxia Test (Helm-Estabrooks, 1986, as cited in Borod et al., 1989). Borod et al. concluded that tests of limb apraxia could predict the probability of a patient’s competence in nonverbal social interaction. However, the adequacy of the gestural repertoire rated in the Nonvocal Communication Scale is questionable, in that it fails to rate many important natural gestural acts, such as yes/no gestures, pantomimes used for indicating affective states, and strength of gesture movements indicating emphasis. Limb praxis may not be the only behaviour requiring assessment when considering gesture-based interventions for aphasia. The integrity of a client’s lexical gesture (arm and hand gestures that spontaneously accompany verbalisation) (Krauss, Chen, & Gottesman, 2000) may be a valid predictor of gesture-based treatment candidacy. While early studies inferred that speakers with aphasia had poor use of lexical gesture, as measured by formal tests of pantomime and limb praxis (Duffy & Duffy, 1981; Goodglass & Kaplan, 1963; Wang & Goodglass, 1992), later research that directly examined lexical gesture in naturalistic settings demonstrated that speakers with aphasia had considerable gesture skills (Cicon, Wapner, Foldi, Zurif, & Gardner, 1979; Behrmann & Penn, 1984; Feyereisen, Barter, Goosens, & Clerebaut, 1988; Glosser, Weiner, & Kaplan, 1986; Hadar & Krauss 1999; Herrmann et al., 1988; Le May et al., 1988). In Pedelty’s study of four participants with Broca’s aphasia and five participants with Wernicke’s aphasia reported in McNeill, Levy, and Pedelty (1990), similar rates of gesture production were found as compared to normal speakers. Those with Broca’s aphasia demonstrated brief, meaningful, interpretable gestures while those with Wernicke’s aphasia predominantly produced fluent, vague, meaningless, and uninterpretable gestures. Similar findings were reported by Behrmann and Penn (1984), Cicone et al. (1979), and Feyereisen (1983), suggesting that the nature of the lexical gesture produced is closely related to aphasia type. However, the impact of the presence of limb apraxia on lexical gesture skills in aphasia remains uncertain and controversy is emerging in the current literature with respect to this relationship. Several authors have argued for a direct relationship between limb praxis and lexical gesture (Feyereisen & de Lannoy 1991; Hadar & Krauss (1999). Glosser, Wiley, and Barnoski (1998) examined the lexical gesture of patients with Alzheimer’s disease made during a 5–minute conversation, and the degree of the limb apraxia in each patient. They found that in comparison to normal control participants, patients with Alzheimer’s disease produced proportionately more referentially unclear or semantically ambiguous gestures as compared to contentbearing gestures. Further, significant correlations were found between imitation

LIMB APRAXIA AND LEXICAL GESTURE 43

and production of meaningful pantomimic movements (conceptual apraxia) and production of ambiguous lexical gesture. This was not the case for nonmeaningful movements (ideomotor apraxia). Thus, Glosser et al. concluded that tests of conceptual limb apraxia are predictive of referential gesture ability in spontaneous production. While one might expect an underlying semantic/ conceptual disorder, such as is presumed in Alzheimer’s disease, to affect both representational gestures and spontaneous gesture, the case may be quite different in nonfluent aphasic populations where semantic representations are frequently intact. Lausberg, Davis, and Rothenhausler (2000), presented a clear dissociation between limb apraxia and lexical gesture. Lausberg et al. found extensive use of left-handed lexical gesture in a patient with severe left limb apraxia (in movements to command and to imitation) following a complete callosal infarction. Lausberg et al. argued that lexical gesture largely occurs without conscious control, while the movements made during limb apraxia testing are made consciously and in an abstract context. The dissociation noted in Lausberg et al.’s patient highlighted both the differences in the performance demands of unconscious versus conscious gesture production, as well as the impact of context on production. Recent developments in cognitive neuropsychology and psycholinguistics have led to the creation of models of word production, limb praxis, and lexical gesture. These models help to explain the differences in processing underlying consciously produced toolrelated gestures and unconsciously produced lexical gestures. Rothi and Heilman (1997) presented a cognitive neuropsychological model that portrays the relationship between tool knowledge and use on the one hand, and comprehension and production of tool names on the other (see Figure 1). The model builds on a model of word processing described by Patterson and Shewell (1987) by adding a series of action lexicons where movement representations are stored. The model is extremely useful in highlighting the complex nature of the processing involved in the comprehension and production of toolrelated action and words. The model also assists in clarifying some of the differences between the three currently recognised types of limb apraxia, for example, conceptual apraxia relates to deficits at the “semantics” level of processing, while ideomotor apraxia related to deficits at the “action output lexicon”. In the field of psycholinguistics, researchers interested in conversational or lexical gesture, have developed models describing the relationships between the processes of speech production and lexical gesture production (Hadar & Butterworth, 1997; Krauss et al., 2000) (see Figure 2). These later models are not restricted to tool names or tool actions, rather they attempt to account for the processing associated with a wide range of nouns, verbs, and adjectives and the lexical gesture that is born out of pre-communication imagistic thought processes. Krauss and Hadar’s model postulates three phases of gesture production: spatial/dynamic feature selection, spatial/dynamic feature

44 APHASIOLOGY

Figure 1. A model of praxis processing and its relation to semantics, naming, and word and object recognition. From Apraxia: The neuropsychology of action, L.Rothi and K.Heilman (Eds.), (1997), p. 45. Hove, UK: Psychology Press. Copyright by Psychology Press, Reprinted with permission.

specification, and motor planning. The model suggests a direct communication between gesture and speech production attempting to account for gestural facilitation effects. It is unclear how the models of limb praxis and lexical gesture relate and many questions remain unanswered. Do the movements evaluated in tests of limb apraxia and modelled by Rothi and Heilman share underlying processing components with those termed lexical gesture and modelled in Krauss et al.’s psycholinguistic representation? Does the presence of limb apraxia have any bearing on lexical gesture use? PURPOSE OF THE STUDY This paper reports on a study that aimed to investigate the relationship between the presence of ideomotor and conceptual limb apraxia, and lexical gesture use in

LIMB APRAXIA AND LEXICAL GESTURE 45

Figure 2. Krauss et al.’s (2000) model of cognitive architecture for the speech-gesture production process. From Language and gesture, D.McNeill, (ed.). (2000), p. 261. Cambridge: Cambridge University Press. Copyright by Cambridge University Press. Reprinted with permission.

speakers with nonfluent aphasia. Following the theoretical models proposed by Krauss et al. (2000) and Rothi and Heilman (1997) it was reasoned that gesture movements assessed in tests of limb apraxia and those observed in lexical gesture reflect different underlying processing. Therefore, it was hypothesised that in a conversational context, speakers with nonfluent aphasia and concomitant limb apraxia would produce the full range of lexical gesture types. Further, in speakers with nonfluent aphasia, pantomime deficits as measured on formal tests of pantomime were not expected to correlate with the amount of pantomime-type

46 APHASIOLOGY

lexical gestures produced during spontaneous conversation. Thus, a dissociation would be demonstrated between the processing responsible for skilled action associated with tool use (impaired in the clinical construct of limb apraxia) and formal pantomime production, and the production of spontaneous lexical gestures. METHOD Participants Seven participants with aphasia were recruited for this study. Each sustained a single leftsided stroke at least 18 months previously. The seven participants met the following inclusionary and exclusionary criteria: English was their first and only language, they were right-handed premorbidly (Simplified Hand Preference Score = +1.0; Bryden, 1982), there was no history of drug or alcohol abuse, and gesture had not been targeted in any previous speech-language pathology interventions. Aphasia syndrome assignment and severity was based on the language profiles obtained from performance on relevant subtests of either the Boston Diagnostic Aphasia Examination (BDAE) (Goodglass & Kaplan, 1983), or the Western Aphasia Battery Aphasia Quotient (WAB AQ) (Kertesz, 1982). Each participant was required to demonstrate ideomotor and conceptual limb apraxia, as defined by Rothi and Heilman (1997), by impaired performance on the limb and gestures pictures subtests of the Test of Oral and Limb Apraxia (TOLA) (HelmEstabrooks, 1992). Participant demographic data and results of language assessments are provided in Table 1. Procedure All participants were tested on a range of formal measures and conversation tasks by the first-named investigator, during a 2–week period. The Test of Limb Apraxia (TOLA) (Helm-Estabrooks, 1992) was utilised to obtain standard scores of both ideomotor limb apraxia and pantomime-to-picture stimuli. In addition, the abbreviated form of Kimura and Archibald’s movement copying test described by Corina, Poizner, Bellugi, Feinberg, Dowde, and O’Grady-Batch (1992) was used as a measure of praxis on non-meaningful stimuli. Impaired performance on the limb and gestured pictures subtests of the TOLA was defined as a total subtest score falling below the published mean scores obtained from a group of normal participants. The types of errors that participants made were recorded and scored in accordance with the TOLA scoring protocol. Errors in timing, spatial orientation, and precision are consistent with a diagnosis of ideomotor apraxia. Errors on transitive gesture items were analysed for possible underlying conceptual problems, for example, performing a sawing motion for a paintbrush,

LIMB APRAXIA AND LEXICAL GESTURE 47

TABLE 1 Participant demographic and linguistic data

WAB AQ = Western Aphasia Battery Aphasia Quotient (Kertesz, 1982); BDAE Severity Rating = Boston Diagnostic Aphasia Examination ranging from 0—no useable speech to 5—minimal discernible handicap.

such content errors being consistent with a diagnosis of conceptual apraxia. Finally, participants interacted in a 20–minute conversation with the first investigator. In order to achieve some degree of structural similarity between the seven conversations, but maintain a high degree of naturalness, the investigator used a list of topics to direct the interaction (see Appendix), resulting in semistructured conversations, which were videotaped for later analysis. The investigator aimed to have the participant provide the maximum amount of conversational turns, and only spoke in order to encourage continuation of the topic, show interest, or to ask the next question on the list of topics. Data analysis Conversation analysis was based on 6 minutes of conversation for each participant, starting after a 3–minute “warming-up” period had elapsed. Gesture behaviour was assessed by a system of observations focusing on head, arm, and body movements, while isolated movements of the face were not analysed. The entire 6–minute conversations were transcribed in broad phonetic transcription, using gesture transcription conventions suggested by McNeill (1992). Following definitions provided by Herrmann et al. (1988), each verbalisation, each gesture, and the time and duration over which they occurred were recorded. Verbalisations were operationally defined as verbal utterances preceded and followed by pauses of 2 seconds or more. Gestures were operationally defined as any movement or sequences of movements of the head, arms, or body that had a perceptible beginning and end. Gestures were then rated as one of four types as described by Herrmann et al. (1988). Speech-focused gestures were defined as communicative actions that subserve spoken language and cannot be interpreted in isolation. Descriptive gestures were defined as actions that convey information independent of spoken language and therefore can be interpreted in isolation. Codified gestures were defined as actions that are not restricted to the given

48 APHASIOLOGY

situation or context, but which are generally used in connection with or as a substitute for verbal utterances (e.g., nodding as a sign of approval). Pantomimes were defined as actions of a complex, usually sequential nature, which substitute for a verbal utterance. Spearman rank-order correlation coefficients (Spearman, 1904, as cited in Pett, 1997) were calculated between pantomime scores from the TOLA, limb apraxia scores from the TOLA, scores on the non-meaningful movements test, and proportion of codified and pantomime gestures produced in conversation.

RESULTS Two qualified speech-language pathologists acted as independent raters. Following a 1–hour training session provided by the first investigator, the two raters classified 50% of the corpus of each participants’ gestures using Hermann et al.’s four categories. Point-topoint percentage agreement was calculated as 90%. Where discrepancies emerged, the raters re-coded the item in contention by consensus discussion. The remaining 50% of the corpus of gestures was rated by the first investigator. In order to calculate intra-rater agreement, all the gestures produced by participant JS were rated on two separate occasions by the first investigator. The point-to-point percentage intra-rater agreement was 86%. Where discrepancies in rating occurred, the first investigator and one of the independent raters re-coded the items in contention by consensus discussion. The results of testing for each participant are presented in Table 2. All seven participants demonstrated ideomotor limb apraxia ranging from severe for GC through to mild for SA. Pantomime deficits (and conceptual apraxia), as measured on a formal test of pantomime (TOLA subtest “gestured pictures”), were also present in all seven participants. Two participants, SA and BO demonstrated performances within normal limits on imitation of non-meaningful movements, while the remaining five participants had impaired performance. Table 3 displays the absolute duration of all verbal and gesture elements used by each participant. Four participants (SA, GC, RG, JS) spent longer time in gestural than verbal expression, reflecting the severity of their verbal expression deficits. Table 4 details the percentage and actual numbers of each gesture type used by the participants. There was considerable variability in the types of gesture used by the participants. Of particular note was the high mean percentage use of descriptive, codified, and pantomimic gestures known to carry a high meaning load (M = 73.4%, range 46–97%). This compares with a group of normal speakers who produced a lower percentage of meaning-laden gestures (iconics) (M = 56%) as compared to 44% non-meaning-laden gestures (beats, metaphorics, deictics) in conversational settings (McNeill, 1992). Participants GC and JS demonstrated similar distribution patterns for gesture type, with the greatest use of pantomime and codified gestures. Similarly, 42.4% of SA’s gestures were codified or pantomime, again reflecting the severe verbal output

LIMB APRAXIA AND LEXICAL GESTURE 49

deficits and the attempts to enhance meaningful output through the gestural modality. Spearman rank-order correlation coefficients were computed to examine the relationships between meaning-laden lexical gesture (codes and pantomimes) and scores on the tests of ideomotor and conceptual limb apraxia (Table 5). No significant relationships were found. TABLE 2 Test results on standard measures

Limb Apraxia Pantomi me Kimura Test

SA

BO

KC

WS

GC

RG

JS

84

75

37

63

16

25

63

50

84

63

63

37

50

75

22

22

15*

16*

12*

11*

14*

Limb Apraxia, and Pantomime, from Test of Limb Apraxia (Helm-Estabrooks, 1992), expressed as percentile ranks of performance from participants with brain damage. Kimura Test: Movement Copying Test (Kimura & Archibald, 1982, as cited in Corina et al., 1992) scored out of 24 points (* 2 standard deviations below the mean for age-matched non-brain-damaged controls. Note that for the sentence comprehension subtest, norms are available for the full test and not specific sentence types. ** Unable to compare to age-matched norms because the test was not administered in full. Performance was severely impaired relative to normative sample, who performed at > 95% on both writing tasks. TABLE 2 Performance on language pretesting: BDAE subtests BDAE subtest

Date administered (in months prior to Score study)

1. Visual Confrontation Naming (items per category) a. Objects b. Letters c. Forms d. Actions e. Numbers f. Colours g. Body Parts

13

36/114 1/6 0/6 1/2 4/6 3/6 1/6 2/6

62 APHASIOLOGY

BDAE subtest

Date administered (in months prior to Score study)

2. Responsive Naming 3. Verbal Agility 4. Word Repetition 5. Repeating Phrases 6. Written Confrontation Naming 7. Writing to Dictation

13 18 18 18 13 16

18/30 7/14 9/10 5/16 4/10 3/10

Written language production LN’s ability to write single words was assessed using informal measures based on subtests of the PAL. Testing was limited to 10 items from each subtest to minimise LN’s frustration. He was unable to write words to dictation or complete written naming, but he was able to write the first graphemes of words with 50–60% accuracy. Summary of deficits Comprehension. Access to the auditory input lexicon was better than access to the visual input lexicon, as demonstrated by better scores on the auditory than written version of the lexical decision task. The semantic system also followed this pattern, with better performance on comprehension tasks presented in the oral than written modality. The results from the verbal and written modalities suggest that LN’s semantic system is mildly impaired, with factors such as morphological complexity and imageability affecting performance. Production. LN’s oral and written production are severely impaired at the word level. His ability to access partial information about lexical forms is stronger in the written than verbal modality. He is able to write the first letter of words with greater accuracy than he is able to verbally produce first sounds, as evidenced by his ability to identify and write the first letters of words in verbal and written naming tasks. Previous and concurrent treatments Previous treatment included written and verbal naming tasks, verbal apraxia drills, as well as training with augmentative devices such as communication books and computer programs (C-Speak Aphasia, Nicholas & Elliott, 1999). LN had demonstrated negligible gains in verbal naming of objects, although his overall naming score on the BDAE showed some improvement (0/114 when tested 24 months prior to the present study, compared to 36/114 when tested 13 months prior to the present study).

TEACHING SELF-CUES 63

During the course of the study, LN participated in 1 hour of group treatment each week in addition to the study-related treatment sessions. The emphasis of group treatment was on counselling and multi-modal communication, for example, use of gestures and drawing. Special effort was taken to avoid treatment of verbal or written naming in the group forum. Treatment rationale Verbal naming was the primary target of therapy because of LN’s strong motivation to pursue this mode of communication. The rationale for the development of this treatment was based on the results of the pretesting and several informal observations made during previous testing and treatment sessions. Pretesting suggested that LN’s partial lexical access was better in the written than verbal modality. Although LN was able to identify or write the first letters of words, he was unable to orally read what he had written, that is, graphemic representations were not sufficient to trigger phonemic associations. Informal observations indicated that he was able to benefit from tactile/placement cues to facilitate articulatory placement. Additionally, minimal phonemic cues, such as demonstration of placement without sound, were successful in eliciting targets. Treatment was designed to tap into LN’s partial access to the written form of lexical items so that he could use it to independently generate the first phoneme of the word from tactile cues. Based on his performance in treatment and testing, we hypothesised that the combination of the first letter of the word and a tactile cue for the articulatory placement associated with that grapheme would be sufficient for LN to generate the first sound of the word independently. Because he was able to verbally name items with fairly minimal phonological information, we hoped that the first sound would provide enough information for him to verbally name the item. Experimental stimuli The target and control word lists comprised one- and two-syllable words judged to be relevant to LN by his primary clinician. Target words began with initial phonemes /d, f, t, k/, while control items began with phonemes /b, p, s, g/. These phonemes were selected because they are considered to be in the mid-range of difficulty for apraxic speakers (Johns & Darley, 1970). All items beginning with the phoneme /k/ began with the grapheme “c”, and all items beginning with the phoneme /s/ began with the grapheme “s”. These choices were made to facilitate selection of items deemed to be relevant to LN. A total of 48 words were included, with three examples of one- and two-syllable words in each phoneme class. Target and control words were matched for frequency (Francis & Kucera, 1982) t(46) = –.80, p = NS. A complete list of experimental stimuli is presented in Appendix A.

64 APHASIOLOGY

Picture stimuli were colour photographs either downloaded from an internet picture gallery or supplied by LN’s family. Treatment Treatment consisted of a confrontation naming task using the modified cueing hierarchy presented in Appendix B and described below. LN was not required to make a verbal response until the last step of the hierarchy. If he verbally named the item spontaneously or after writing it, he was required to go through each step of the hierarchy in order to reinforce the self-cueing strategies. For each step in the hierarchy, verbal feedback was provided for both correct and incorrect responses. LN’s most frequent error in response to general prompts was a failure to respond, in which case the clinician moved on to the next step in the hierarchy. (1) The first step in the modified cueing hierarchy targeted written naming. LN was first prompted to write the word using a general prompt such as, “Can you write it?”. If he was unable to independently write the name of the target, he was provided with a series of cues that included a choice of first letters (from a field of three) and blanks (e.g., providing half of the letters in a word and requiring him to fill in the rest). If he was unable to write the word using these cues, the clinician wrote it and LN copied the word. (2) The second stage of the hierarchy targeted use of tactile cues. LN was taught to associate particular graphemes with placement cues. For example, the letter “c” was associated with the index finger placed against the top of the neck. A complete description of the placement cues is given in Appendix C. If LN was unable to generate the cue independently, he was provided with pictures of the clinician modelling the four cues and asked to select the correct picture. If he failed to select the correct picture, the clinician modelled the tactile cue. (3) In the third step of the hierarchy, LN was provided with specific phonological cues such as the first sound and syllable of the word. As a final measure, the clinician named the item and LN repeated it. Treatment application Each of the 24 treatment items was presented in random order once per session. Treatment lasted for 13 weeks and occurred once per week. Treatment sessions were 1 hour in length. Treatment was conducted by a graduate student clinician under the supervision of a certified speech language pathologist.

TEACHING SELF-CUES 65

Home practice programme A video version of the treatment programme was made to permit more intensive experience using the steps in the hierarchy to generate self-cues (“practice”) than was permitted by weekly treatment sessions. In the video, LN’s clinician presented each step of the hierarchy in succession. LN was verbally prompted to attempt each stage of the hierarchy and provided with visual prompts such as pictures and choices of first letters for the items in the same way as was used in treatment sessions. Pauses were left within the video to allow LN time to make responses. All of the trained items were included in the home video given to LN during the treatment phase of the study. No family members were available for training to provide feedback to LN during home practice or to provide data regarding his performance at home. LN was not provided with specific instructions as to how often he should use the video. Each week, LN brought the paper on which he had written the names of items during home practice. Calculation of practice frequency was based on this paperwork. LN practised using his home video four to five times each week. In order to assess the effectiveness of the home video in the absence of structured treatment sessions, LN practised 12 (50%) of the trained items using his home video for 6 weeks following termination of the study. This was accomplished by editing the home video so that it included only half of the target items. Items to be practised were random with respect to initial phonemes and success in naming during the course of structured treatment. Frequency of practice was not monitored during this period, but LN reported that he continued to practice four to five times per week on average. Experimental design An AB design using multiple baselines was used to examine the effects of treatment. The behaviours of interest were measured repeatedly during a baseline phase and then treatment was applied to one set of behaviours. A second set of behaviours remained untreated and was used to evaluate generalisation effects of treatment. Baseline phase Baseline verbal naming performance was documented over three sessions. LN was presented with each of the 48 to-be-trained and control pictures in random order and instructed to name each item. Pictures were presented until LN provided a verbal label or stated “I don’t know”. Responses were recorded verbatim, including errors and the presence of gestures. Final responses were coded as correct or incorrect for the purposes of calculating baseline measures of performance.

66 APHASIOLOGY

Treatment phase There were two parts to each treatment session. In the treatment portion of the session, LN was presented with pictures of target items and asked to name the pictured item. During treatment, LN was prompted to use self-cueing strategies through application of the modified cueing hierarchy. Criteria for termination of treatment was established as 80% accuracy in naming the target items. In the end of session probes, LN was presented with pictures of target and control items and asked to name the pictured items. LN was not prompted to use any strategies, although he was permitted to do so. All responses, including errors and the presence of gestures were recorded verbatim. Final responses were coded as correct or incorrect. No feedback regarding accuracy was provided during probes. In the interest of time, items were split such that half of the items were probed each week. Follow-up and post-testing Maintenance. Naming of all treated and control items was probed 6 weeks following termination of treatment. Generalisation. Generalisation of treatment effects was assessed in three ways. First, naming of the untrained, control items was assessed repeatedly across the course of the study. Second, a confrontation naming task using novel stimuli was employed at the cessation of treatment. Specifically, two groups of novel stimuli beginning with the same phonemes as the original target and control items were presented to LN to probe response generalisation effects of treatment. These items were matched for frequency of occurrence, t(38) = .34, NS, and are presented in Appendix D. Third, the verbal and written naming subtests of the PAL were re-administered to determine whether naming ability had changed as measured by standardised tests. Reliability To test reliability, 15% of all baseline and probe sessions were re-scored by an independent examiner. Verbal naming responses were scored as correct or incorrect. Average point-to-point agreement between observers was 98%, with a range of 96–100%. Reliability of the independent variable, that is, of the determination of the level of cueing required was also calculated by an independent observer for 15% of the treatment sessions. Average point-to-point agreement was 92% with a range of 88–96%.

TEACHING SELF-CUES 67

RESULTS Treatment Figure 1 illustrates LN’s performance on control and target items over time. Table 3 summarises performance at baseline, immediately post-treatment, and 6 weeks posttreatment. At baseline, LN achieved an average of 4.2% correct for items beginning with control phonemes and 6.8% for items beginning with target phonemes. None of the items was named accurately more than once during the baseline phase. Post-treatment results were obtained by averaging accuracy over the final three treatment sessions. LN achieved 12.1% accuracy on control items and 55.5% accuracy on target items. By the final three sessions, LN was also able to write all of the target items with 100% accuracy. No data for written naming were collected for control items. Data from the treatment portion of each session were analysed to determine how LN’s responses changed over time. Figure 2 shows how the level of cue required for LN to verbally name target items changed over time. Data were not available from sessions 2 and 5. In the figure, spontaneous refers to situations in which LN independently named items without use of any cueing strategies. Instances in which LN named the item following application of the written

68 APHASIOLOGY

Figure 1. Percent correct of control and trained items named accurately at baseline (Bl, B2, B3), during end of session probes (1–13), and in post-treatment assessment of maintenance (M) and generalisation (G).

naming portion of the hierarchy (step 1 in the hierarchy) were coded as use of graphemic cues. If LN verbally named the item after TABLE 3 Percent correct on target and control items (end of session probes) Word group

Baseline

Post-treatment

6 weeks post-treatment

Target Control

6.8% 4.2%

55.5% 12.1%

25% 8.3%

being prompted to generate a tactile cue (step 2 in the hierarchy), the response was coded as use of tactile cues. Verbal cues refer to phonemic cues ranging from phonemes to word repetition (step 3 in the hierarchy). LN required fewer verbal prompts over time, and provided more self-cues, either through writing alone or by writing and using a tactile cue to generate a phonemic cue. Note that he named items (when prompted to use strategies) with an average of 72% accuracy over the last three sessions. Maintenance Post-testing following the 6–week break showed some loss of treatment gains. LN named 8.3% (2/24) of control and 25% (6/24) of target items. He wrote the names of 21% of control words and 58% of target words. Items practised using

TEACHING SELF-CUES 69

Figure 2. Percent correct of trained items accurately named spontaneously and when prompted to use graphemic, tactile, and verbal cues.

the home video were not more likely to be named verbally (1 of 6 targets named correctly) or in writing (6 of 14 written correctly) than nonpractised items. No spontaneous use of tactile cues was observed. Generalisation There was some evidence of generalisation to naming ability in general, as LN’s performance improved on the verbal and written naming subtests of the PAL (Table 4). He TABLE 4 Tasks administered at initial assessment and after 13 weeks of treatment (percent correct) Accuracy Test

Pretreatment

Post-treatment

PAL: Verbal Naming PAL: Written Naming

16% (N = 32) 0% (N = 10)

34% (N = 32) 16% (N = 32)

achieved 11/32 correct on the verbal naming test at time 2 compared to 5/32 at time 1. On the written naming test, he achieved 5/32 correct compared to 0/10 at pretesting, and successfully wrote the first letter on 3 additional items.

70 APHASIOLOGY

There was not strong evidence of generalisation to untrained items beginning with target and control phonemes (i.e., novel items probed at post-treatment only). Naming of items beginning with trained phonemes was 25% accurate (5/ 20 items), while naming of items beginning with untrained phonemes was 15% accurate (3/20 items). DISCUSSION This single-case study provides qualified support for a treatment programme that targets self-generation of phonemic cues through the use of tactile cues and partial access to the written form of words. Treatment gains were observed in target compared to control words over the period of study. Although LN required prompts to use cueing strategies, he was increasingly able to benefit from selfgenerated cues across treatment sessions. When compared to LN’s baseline level of performance, effects of treatment were observed 6 weeks following cessation of the structured treatment phase. However, there was some loss of treatment gains, as verbal naming at the 6–week follow-up revealed a decline from the level of performance observed at the end of the treatment period. Data regarding generalisation of the treatment effects were inconclusive. Response generalisation to untrained words beginning with trained phonemes was difficult to assess in the context of this design. However, post-treatment assessment of frequency-matched items (Francis & Kucera, 1982) revealed that LN named the untrained exemplars of /k, d, f, t/ at a higher level of accuracy (i.e., 25%) than was observed during baseline for the items used in treatment (i.e., 6.8%). Response generalisation to untrained words beginning with untrained phonemes was evaluated throughout the course of treatment (i.e., control items) as well as in a post-treatment assessment. Little change was observed in the accuracy with which LN named /b, p, s, g/ items, indicating negligible generalisation. A comparison of pre- and post-treatment performance on the oral naming subtest of the PAL revealed improved accuracy. These generalisation findings indicate that treatment may have had some effect on verbal naming in general. There were two components of self-cueing in this treatment programme: writing the word and producing a tactile cue. Although LN benefited from tactile cues when prompted to use them, he did not use them independently when naming of target items was probed during treatment sessions or when maintenance of treatment effects was tested following the 6–week break. On the other hand, there was evidence of generalisation to written naming, as demonstrated by his improved performance on the PAL written naming subtest. He also wrote names of items when maintenance of treatment effects was tested after the break and had greater success naming words he was able to write. In sum, the writing portion of the treatment programme appears to have been more critical to the observed gains in verbal naming than the tactile cueing portion.

TEACHING SELF-CUES 71

LN’s reliance on writing over tactile cues suggests a candidate explanation for the lack of generalisation to novel stimuli beginning with targeted phonemes. Hillis (1989) proposed that for patients with phonological impairments, the effect of verbal naming treatments was to increase access to specific phonological representations, thus limiting generalisation to untreated items. It could be argued that Nickels (1992) overcame this limitation by providing a strategy, that is, teaching grapheme-phoneme conversion to facilitate oral reading, for verbally producing items that were not explicitly trained. In the present study, we attempted to teach LN to use tactile cues as a strategy to facilitate oral reading. His failure to use the tactile cues may have played a role in the lack of generalisation to items beginning with targeted phonemes. We had hoped that LN’s partial lexical access in the written modality, in combination with the tactile cues, would be sufficient to allow generalisation beyond the trained items. We did not predict generalisation to written naming of untrained items beginning with target phonemes, because their graphemic representations would not necessarily be more accessible following treatment. This may have limited the extent to which the tactile cues could be generalised to novel items beginning with the targeted phonemes. However, verbal naming of items whose graphemic representations were at least partially accessible to LN may have shown improvement as a result of learning to use self-cueing strategies. This is consistent with the evidence of generalisation to verbal naming in general discussed above. Along the same lines, the lack of a semantic component to the cueing hierarchy may have limited the treatment programme’s ability to produce more robust generalisation (Nickels & Best, 1996). Although LN’s primary impairment appeared to be in the phonological output lexicon, he did show some mild deficits in the semantic system. A treatment programme that also targeted semantics may have resulted in generalisation to a novel set of stimuli semantically related to the target items by improving access to phonological representations through the semantic system. It is unclear why LN showed little spontaneous use of the tactile cues, even for trained items. Given their effectiveness in treatment sessions before and during the present study, it is unlikely that LN found them not to be useful. LN’s resistance to alternative modes of communication may have been a factor. Although writing is not typically used to facilitate verbal language, it is a modality that most people use to communicate at least occasionally. In contrast, tactile cues are not used by non-brain-damaged individuals. For this reason, writing may have seemed more acceptable to LN than tactile cues. This raises the possibility that patients who are more receptive to using tactile cues may show greater treatment effects and generalisation. Another aspect of the present study was the use of a home practice video to increase the intensity of exposure to the treatment materials. Previous work has suggested that trained family members can provide effective treatment, thereby increasing intensity of treatment (Yampolsky & Waters, 2002). However, some

72 APHASIOLOGY

patients, such as LN, do not have family members who are able to take an active role in treatment. The home video programme used in the present study provided an opportunity for LN to have more intense exposure to treatment materials and the selfcueing strategies. However, home practice was not effective in maintaining treatment effects in the absence of structured treatment sessions. Further research is necessary to investigate the frequency of structured sessions necessary to maintain treatment gains. A possible clinical application would be periodic treatment sessions with patients discharged from outpatient services to review compensatory strategies and provide materials for home practice. There are a number of limitations to the present study that limit the scope of interpretation. First, the complete set of tactile cues were not explicitly taught to LN prior to inception of the treatment programme. Specifically, he was not taught the tactile cues associated with the control items. This may have limited his ability to generalise the selfcueing strategies to the control items. As a result, our ability to measure generalisation to untrained phoneme classes may have been limited. Another method to assess generalisation would have been to include nontrained items beginning with target phonemes in the verbal naming probes. This would have allowed us to assess generalisation to nontrained stimuli during the treatment period. Future research should address the issue of generalisation in these ways. A related issue is that we did not extend treatment to the control items in an ABA design, due to limitations on the time course of the present study. Replication of treatment effects with the control stimuli would have provided a stronger demonstration of experimental control, strengthening the results of the study. Another limitation is that probes were conducted at the end of treatment sessions, following exposure to the target items in the context of the modified cueing hierarchy. Other researchers (e.g., Raymer et al., 1993, Wambaugh et al., 2001) have conducted probes at the beginning of treatment sessions to avoid inflating accuracy due to the recent exposure or deflating accuracy due to client fatigue. Another issue related to exposure pertains to the greater number of times that LN was exposed to target compared to control stimuli. Given these confounds, it is difficult to isolate the source of observed treatment effects. Future research should address these issues by probing performance on trained and control stimuli prior to the treatment session and controlling for the number of exposures to trained and control stimuli. In sum, the present study suggests that a programme focusing on selfgeneration of phonemic cues can be an effective treatment approach for anomia. Functionally, this approach may be most effective in building a repertoire of trained items, with some generalisation to verbal naming in general.

TEACHING SELF-CUES 73

REFERENCES Bashir, A. S., Grahjones, F., & Bostwick, R. Y. (1984). A touch cue method of therapy for developmental verbal apraxia. Seminars in Speech and Language, 5(2), 127–137. Best, W., Hickin, J., Herbert, R., Howards, D., & Osborne, F. (2000). Phonological facilitation of aphasic naming and predicting the outcome of treatment for anomia. Brain and Language, 74(3), 435–438. Bruce, C, & Howard, D. (1988). Why don’t Broca’s aphasics cue themselves? An investigation of phonemic cueing and tip of the tongue information. Neuropsychologia, 26(2), 253–264. Caplan D. (1992). Language: Structure, processing, and disorders (pp. 403–441). Cambridge, MA: MIT Press. Dabul, B. L. (1979). Apraxia Battery for Adults. Austin, TX: Pro-Ed. Davis, A., & Pring, T. (1991). Therapy for word-finding deficits: More on the effects of semantic and phonological approaches to treatments with dysphasic patients. Neuropsychological Rehabilitation, 1(2), 135–145. Drew, R. L. & Thompson, C. K. (1999). Model-based semantic treatment for naming deficits in aphasia. Journal of Speech Language and Hearing Research, 42, 972–989. Francis, W. N., & Kucera, H. (1982). Frequency Analysis of English Usage. Boston, MA: Houghton Mifflin. Goodglass, H., & Kaplan, E. (1983). The assessment of aphasia and related disorders. Philadelphia, PA: Lea & Febiger. Hillis, A. (1989). Efficacy and generalization of treatment for aphasic naming errors. Archives of Physical Medicine and Rehabilitation, 70, 632–636. Hillis, A. (1998). Treatment of naming disorders: New issues regarding old therapies. Journal of the International Neuropsychological Society, 4, 648–660. Johns, D. F., & Darley, F. L. (1970). Phonemic variability in apraxia of speech. Journal of Speech and Hearing Research, 13, 556. Miceli, G., Amitrano, A., Capasso, R., & Caramazza, A. (1996). The treatment of anomia resulting from output lexical damage: Analysis of two cases. Brain and Language, 52, 150–174. Nicholas, M., & Elliott, S. (1999). C-Speak aphasia: A communication system for adults with aphasia. Solana Beach, CA: Mayer-Johnson Co. Nickels, L. (1992). The autocue? Self-generated phonemic cues in the treatment of a disorder of reading and naming. Cognitive Neuropsychology, 9(2), 155–182. Nickels, L., & Best, W. (1996). Therapy for naming disorders (part I): Principles, puzzles, and progress. Aphasiology, 10(1), 21–47. Raymer, A. M., Thompson, C. K., Jacobs, B., & Le Grand, H. R. (1993). Phonological treatment of naming deficits in aphasia: Model based generalization analysis. Aphasiology, 7(1), 27–53. Wambaugh, J. L., Linebaugh, C. W., Doyle, P. J., Martinez, A. L., Kalinyak-Fliszar, M., & Spencer, K. A. (2001). Effects of two cueing treatments on lexical retrieval in aphasic speakers with different levels of deficit. Aphasiology, 15(10/11), 933–950. Yampolsky, S., & Waters, G. (2002). Treatment of single word oral reading in an individual with deep dyslexia. Aphasiology, 16(2), 455–471.

74 APHASIOLOGY

APPENDIX A TRAINING & CONTROL STIMULI IN TREATMENT PHASE Target words

Control words

Camera Coat Coffee College Couch Cup Date Desk Dinner Doctor Dollar Door Family Father Finger Fish Food Foot Table Teacher Time Tire Tissue Toast

Bank Bill Body Book Business Butter Garage Garbage Garden Gas Girl Gum Park Pool Pill Paper Police Popcorn Salt Sister Sock Son Subway Summer

APPENDIX B MODIFIED CUEING HIERARCHY (1) Present picture & produce written form a. General prompt, e.g., “Can you write it?.” b. Choose first letter from field of three, c. Fill in the blanks provided, d. Clinician writes word. (2) Generate tactile cue

TEACHING SELF-CUES 75

a. General prompt, e.g., “What is the cue for that and what sound does it make?”, b. Picture of cue presented, c. Clinician demonstrates cue. (3) Verbal naming a. General prompt, e.g., “What is it called?”, b. Phonemic cue. c. Word provided. APPENDIX C DESCRIPTION OF TACTILE CUES (1) /d/ Index finger bent and placed on upper lip. Thumb placed on the neck to indicate voicing (and to distinguish from /t/). (2) /t/ Index finger bent and placed on upper lip. (3) /k/ Index finger placed at top of throat. (4) /f/ Index finger bent and placed below lower lip. APPENDIX D STIMULI FOR GENERALIZATION PROBE Target phonemes

Control phonemes

Cake Candle Card Corn Cow Deer Dentist Diamond Dog Duck Fan Farm Feather Fork Tail Tape Tent

Bat Belt Bird Boat Bottle Gift Goat Golf Guitar Gun Paint Pie Pillow Police Pot Sink Soap

76 APHASIOLOGY

Target phonemes

Control phonemes

Tie Tulip

Soldier Suitcase

Functional measures of naming in aphasia: Word retrieval in confrontation naming versus connected speech Jamie F.Mayer and Laura L.Murray Indiana University, USA

Background: Word-finding difficulties are central to aphasia and as such have received a great deal of attention in aphasia research. Although treatment for lexical retrieval impairments can be effective, studies often use measurement of single-word performance (e.g., confrontation naming) to support such claims. In contrast, what matters most to patients with aphasia and their families is the ability to converse. Few aphasia studies, however, have addressed word retrieval in connected speech. Furthermore, one could debate whether generating names for single pictured stimuli bears resemblance to the online, multifaceted retrieval required during conversation. Aims: The purpose of this study was to assess the adequacy of Percent Word Retrieval (%WR) as well as two supplementary analyses, Percent Substantive Verbs (%SV) and Percent Corrected Errors (%CERR), to depict word retrieval in connected and conversational speech with respect to lexical class (noun vs verb) and aphasia severity (mild vs moderate). Specifically, we examined: (1) the relationship between lexical retrieval in confrontation naming, composite description, and conversational samples; and (2) the clinical utility and feasibility of %WR, %SV, and %CERR in quantifying such data. Methods & Procedures: A total of 14 individuals with aphasia, divided into mild (n = 7) and moderate (n = 7) groups based on aphasia severity, participated. Word retrieval was tested in three different contexts: single-word confrontation naming, composite description, and conversational speech. Lexical retrieval was analysed in each

78 APHASIOLOGY

context using the analyses described above (%WR, %SV, and % CERR). The effects of context, grammatical class, and measurement technique were explored using repeated measures ANOVA and correlational analyses. Outcomes & Results: Statistical analyses revealed a significant effect of context for both %WR and %CERR, with superior lexical retrieval and self-correction of errors in connected speech versus single-word naming tasks. Moreover, %SV in conjunction with % WR was sensitive to possible verb retrieval deficits undetected by % WR alone, particularly for mild patients. Confrontation naming scores were strongly related to aphasia severity classification (mild vs moderate), but were not significantly correlated with naming abilities in connected speaking tasks. Conclusions: These findings endorse the incorporation of discourse-level tasks into aphasia assessment and treatment protocols. Use of simple and easily quantifiable measures (e.g., % WR) may be an option to extend current methodology and reconcile issues of ecological validity and clinical feasibility.

The widespread prevalence of word-finding difficulties in aphasia is well-known (HelmEstabrooks, 1997; Larfeuil & Le Dorze, 1997). As such, the treatment of word retrieval disorders has received a great deal of research attention compared to remediation of other areas of language or communication. Although word retrieval treatments can be effective (e.g., Osborne, Hickin, Best, & Howard, 1998), studies often use measurement of singleword performance (e.g., confrontation naming) to support such claims. In contrast, what matters most to patients with aphasia and their families is the ability to converse (Boles, 1998; Edwards, 1998); likewise, it is in such situations that aphasia is most intrusive (Wilkinson et al., 1998). Furthermore, it could be contended that generating names for single pictured stimuli bears little resemblance to the online, multifaceted word retrieval required during conversation. Few aphasia studies, however, have either addressed word retrieval in connected or conversational speech (Doesborgh, van de Sandt-Koenderman, Dippel, van Harskamp, Koudstaal, & Visch-Brink, 2002; Jordan, Ward, & Cremona-

Address correspondence to: Jamie F. Mayer, Dept. of Speech and Hearing Sciences, 200 South Jordan Avenue, Bloomington, IN 47405, USA. Email: [email protected] © 2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html DOI:10.1080/ 02687030344000148

WORD RETRIEVAL IN CONNECTED SPEECH 79

Meteyard, 1997) or explored the relationship between confrontation naming and conversational word retrieval abilities. The limited research conducted thus far has produced conflicting results. For example, there are reports of patients with aphasia demonstrating vastly superior word retrieval during confrontation naming than during connected speech (Manning & Warrington, 1996; Schwartz & Hodgson, 2002; Wilshire & McCarthy, 2002), as well as patients with the inverse profile of better word retrieval in discourse than during confrontation naming (Ingles, Mate-Kole, & Connolly, 1996; Pashek & Tompkins, 2002; Zingeser & Berndt, 1990). Whereas some investigators have found nominal correlation, in general, between confrontation naming and conversational speech (e.g., Nicholas, Obler, Albert, & Helm-Estabrooks, 1985), others have concluded that the relationship between confrontational versus discourse-level word retrieval may vary as a function of aphasia classification (Williams & Canter, 1982, 1987) or error type (e.g., phonemic vs neologistic paraphasias; Vermeulen, Bastiaanse, & Van Wageningen, 1989). Although findings from two studies support a substantial relationship between confrontation naming and connected speech (Brown & Cullinan, 1981; Hickin, Best, Herbert, Howard, & Osborne, 2001), methodological limitations (e.g., imprecise measures of conversation word retrieval; insufficient description of patient characteristics such as aphasia type, severity, and chronicity) restrict confident conclusions based solely on these results. Several theoretical perspectives have been posited for observed discrepancies between confrontation naming and discourse-level word retrieval. These explanations typically highlight the differential context-dependent, nonlinguistic demands (e.g., attention, level of abstraction) and linguistic factors (e.g., syntax formulation, pragmatic factors) inherent within each naming context (Berndt, Mitchum, Haendiges, & Sandson, 1991a, 1997b; Edwards, 1998; Murray & Karcher, 2000; Penn, 2000; Williams & Canter, 1987). For example, Pashek and Tompkins (2002) suggested that semantic, phonologic, or syntactic priming in addition to “probabilistic lexical co-occurrence of words” (p. 228) might facilitate word retrieval in connected speech tasks over and above confrontation naming. A number of single-case studies, in contrast, have implicated differential impairments to various processing mechanisms to explain observed contextual dissociations: for example, damage to nominal versus prepositional speech routes (Manning & Warrington, 1996), interference during multiple lexeme selection (Schwartz & Hodgson, 2002), or impaired “lexical control” (Wilshire & McCarthy, 2002). Given the disparate findings and theoretical rationales from previous research, the intent of the current study was to examine further the possible relationship between confrontation naming and word retrieval in the discourse of adults with aphasia. Whereas there are several well-accepted tools with which to assess single-word naming in aphasia (e.g., Boston Naming Test; Kaplan, Goodglass, & Weintraub, 1983), there is less agreement regarding what measures are most suitable for quantifying and qualifying word retrieval during discourse.

80 APHASIOLOGY

Conversational analysis (CA; Perkins, 1995) has a number of potential strengths for qualifying conversational behaviour in that it imposes no structure on the data except that of the conversation itself, allowing for an emphasis on ecological validity (Osborne et al., 1998). It is this validity, however, that also underlies one of CA’s caveats: that is, “[it] does not marry easily with quantification” (Perkins, 1995, p. 372). Although attempts have been made to quantify aspects of CA such as proportion of major conversational turns or number/types of repairs (Crockford & Lesser, 1994; Osborne et al., 1998; Perkins, 1995), reliability data for application of such measures are unavailable; furthermore, the time demands of such analyses may exceed standards of clinical practicality (Crockford & Lesser, 1994). Likewise, the reliability and clinical feasibility of Correct Information Units (CIUs; Nicholas & Brookshire, 1993a, 1993b), a measure used fairly frequently at least in aphasia research studies, have been questioned (Oelschlaeger & Thorne, 1999). Therefore, the challenge remains to develop a word retrieval measure that is clinically reliable, easily quantifiable, and meaningful to patients with aphasia and their families. One approach may be the direct measurement of the percentage of successful word retrieval in conversation, quantified by dividing the number of wordfinding errors (e.g., paraphasias, circumlocutions, hesitations) by the total number of content words produced. Use of a similar measure has been reported in a few previous studies such as that by Hickin et al. (2001), who described their application of “lexical selection” as “a measure of the person’s ability to retrieve lexical items in conversation and…how often this process fails” (p. 16). Their data provided a good initial step towards the use of valid, conversational word retrieval measures; however, many important factors (e.g., aphasia severity) were unspecified, and Hickin et al.’s primary focus on nouns, like that of the majority of word-finding studies (but for exceptions see Kemmerer & Tranel, 2000a, 2000b; Murray & Karcher, 2000), precludes application of their conclusions across lexical classes (e.g., to verbs). More recently, Pashek and Tompkins (2002) explored contextual influences on lexical retrieval in aphasia by quantifying noun and verb retrieval during confrontation naming and video narration tasks, and by carefully controlling several potentially influential variables (e.g., word class, frequency, and length; lexical stimulus matching; aphasia type/severity). Whereas this level of control is, admittedly, crucial in overcoming the methodological limitations inherent in studying discourse-level skills (Crockford & Lesser, 1994; Marshall & Pound, 1997), it may come at the expense of ecological validity. Therefore, generalisation of Pashek and Tompkins’ (2002) results across communication situations (e.g., unstructured conversation), data collection options (e.g., online scoring), or aphasia severity levels (e.g., moderate aphasia) may be open to examination. In summary, the clinical utility or feasibility of discourse-level word retrieval measures has yet to be verified. Furthermore, the relationship between naming abilities in single-word paradigms versus connected speech, and the theoretical

WORD RETRIEVAL IN CONNECTED SPEECH 81

basis of the association, remain unresolved. Accordingly, the purpose of this study was to assess the adequacy of Percent Word Retrieval (%WR), as well as two supplementary analyses, Percent Substantive Verbs (%SV; cf. Berndt et al., 1997a, 1997b; Breedin, Saffran, & Schwartz, 1998) and Percent Corrected Errors (%CERR; Larfeuil & Le Dorze, 1997), to depict word retrieval in connected and conversational speech with respect to lexical class (noun vs verb) and aphasia severity (mild vs moderate). Specifically, we examined: (1) the relationship between lexical retrieval in confrontation naming, composite naming (i.e., picture description), and question-elicited conversational samples; and (2) the clinical utility and feasibility of %WR, %SV, and %CERR in quantifying such data. METHOD Subjects A total of 14 right-handed individuals with aphasia secondary to unilateral, lefthemisphere damage participated (see Table 1). All subjects were native speakers of English and demonstrated hearing and visual skills adequate for the testing protocol. The Western Aphasia Battery (Kertesz, 1982) was used to determine aphasia type and severity (based on the Aphasia Quotient; AQ). Subjects were then divided into mild (MI) (n = 7; mean AQ = 87.8) and moderate (MO) (n = 7; mean AQ = 51.7) aphasia groups, using a classification system modified from Shewan and Bandur (1986). The two groups differed significantly with respect to AQ, t(12) = 7.18, p < .001, but not age, t(12) = .69, p = .50, or education, t(12) = . 60, p = .56. Tasks Word retrieval was tested in three different contexts, with the order of administration randomised across participants. All subjects completed Sections 1 and 4 of the Test of Adolescent and Adult Word Finding (TAWF; German, 1990) to assess labelling of pictured nouns (n = 37) and verbs (n = 21), respectively. TAWF noun and verb stimuli were matched roughly for frequency of occurrence and word length (German, 1990). Composite naming samples were elicited through description of pictured scenes, which were constructed so as to depict a series of events via a sequence of three sketches. Scene topics included restaurant, beach, shopping mall, and classroom scenarios, with multiple characters and activities depicted (e.g., the shopping mall scene characters included Santa Claus, children, parents, and policemen; events included a child pulling off Santa’s beard, policemen eating doughnuts, and a robbery). All sketches were drawn by the first author for a separate study, piloted with a group of normal adults, and found to elicit similar numbers of words and correct

82 APHASIOLOGY

TABLE 1 Subject demographic data

TCM = transcortical motor; * = retired.

information units (Murray, unpublished data). Finally, subjects participated in brief conversations with the first author who initiated similar topics (e.g., family, travel, occupation) across subjects to promote comparable transcripts. The number of scenes described and conversational topics initiated across subjects varied according to the amount of language elicited, with a range of one to three three-scene sketches and/or conversational topics required in an attempt to elicit a criterion number of words per context for each subject. Data analyses Sample collection. All connected speech samples were tape-recorded and transcribed in standard orthography, with neologisms and phonemic paraphasias transcribed phonetically. Following previously suggested guidelines (Larfeuil & Le Dorze, 1997; Nicholas & Brookshire, 1993a; Vermeulen et al., 1989), the first 300 words of each discourse context sample (composite and conversational word retrieval) were tallied for further analysis. Unintelligible words as well as real- and non-word fillers (e.g., “well”, “um”) were excluded from this count (Nicholas & Brookshire, 1993b). Due to moderate verbal output deficits, two subjects, MO2 and MO3, were unable to meet the 300–word criterion; for these speakers, sample sizes of 150–200 words from each context were considered acceptable (Berndt et al., 1997b; Brown & Cullinan, 1981). A third subject (MO7) demonstrated moderate-to-severe initiation deficits in addition to his moderate aphasia; for this individual, therefore, it was necessary to follow time-

WORD RETRIEVAL IN CONNECTED SPEECH 83

based guidelines (Boles, 1998; Crockford & Lesser, 1994), by which 10 minutes of connected speech (25 words and 43 words for composite description and conversation, respectively) were utilised for analyses. The resultant sample size, although small, was considered representative of this subject’s typical daily output Scoring procedures. A detailed list of scoring procedures is provided in the Appendix. To determine %WR for the composite and conversational contexts, the total number of words in each grammatical class (nouns and verbs), and the number of wordfinding errors per class were tallied. Word-finding errors were defined broadly using criteria adapted from Crockford and Lesser (1994), Dollaghan and Campbell (1992), and Pashek and Tompkins (2002). Specifically, words were counted in error under the following conditions: (a) preceded immediately by a 2+ second pause, (b) preceded or accompanied by comments indicating difficulty, (c) self-corrections, and (d) obvious semantic, phonemic, or unrelated paraphasias. Following the initial, objective identification of such errors, transcripts were re-read comprehensively for identification of more subtle instances of word retrieval difficulty such as deletions (Kemmerer & Tranel, 2000b) or indefinite terms (Hickin et al., 2001; Nicholas et al., 1985). Such errors were identified via a careful analysis of contextual factors (e.g., if a subject produced an indefinite term such as “thing”, was a clear referent available within the transcript?), with subjects being given the benefit of the doubt in ambiguous situations. The number of successful word retrieval attempts was divided by the total number of words in each class (i.e., correct attempts + errors) and multiplied by 100 to yield the %WR score. The TAWF (i.e., confrontation naming context) was scored using percent correct measures (i.e., correct attempts ÷ total number of stimuli x 100) to promote comparable measures across elicitation contexts. Supplementary analyses. To provide supplementary information about word retrieval during composite description and in conversation, two additional analyses were undertaken: (1) proportion of substantive versus light verbs (% SV) and (2) proportion of corrected versus uncorrected errors (%CERR). The former (%SV) has been shown previously to address semantic complexity in verb retrieval (Breedin et al., 1998). Briefly, verbs such as do, make, have, or go may be conceived of as semantically simple, or primitive verbs, and in the linguistic literature are referred to as “light” verbs. Other verbs, however, are classified as “heavy” or substantive because they contain additional and more specific semantic components (e.g., compare go with run), and are therefore more complex (Breedin et al., 1998). Both composite naming and conversational transcripts were analysed for the number of substantive verbs, which was divided by the total number of verbs produced (i.e., substantive + light verbs) and multiplied by 100 to yield %SV. The second supplementary measure, %CERR, has been described as a means of gauging efficiency in word retrieval (Larfeuil & Le Dorze, 1997). Any resolution of an episode of word-finding difficulty (e.g., after a 2 + second delay, revision) was noted for each subject with respect to

84 APHASIOLOGY

lexical class and naming context. Corrected errors were divided by the total number of errors (corrected + unresolved) and multiplied by 100 to yield % CERR. RESULTS Reliability and clinical feasibility Approximately 20% of the data were randomly selected for reliability analyses. Similar to the procedures of Oelschlaeger and Thorne (1999), raters (one certified speechlanguage pathologist and two graduate students in speech and hearing sciences) were provided with written instructions regarding the application of %WR, %SV, and %CERR analyses to language samples. No formal discussions of rules or rule interpretations were undertaken, allowing raters to apply independently the scoring rules as written. Point-topoint interrater agreement was calculated for %WR, %CERR, and %SV in composite naming and conversational speech samples, and ranged from 80.5% to 100% for %WR, 88.2% to 90% for %SV, and 88.9% to 100% for %CERR. Intra-judge reliability was calculated for each measure on another randomly selected subset of the data 1 week following initial data scoring, and ranged from 88.9% to 97. 1% for %WR, 87.2% to 93% for %SV, and 82.0% to 100% for %CERR. Although no formal time limit was given for inter-judge analyses, raters reported requiring approximately 45 minutes to score each 300–word transcript. This time requirement included that committed to learning and applying a set of pre-constructed printed scoring rules. The time required for intra-judge rescoring (i.e., given the first author’s familiarity with scoring standards), on the other hand, was approximately 15 minutes per transcript. Relationships among speaking contexts and grammatical class Results of a three-way repeated measures ANOVA yielded significant main effects of aphasia severity, F(l, 12) = 73.86, p < .001, and speaking context, F(2, 24) = 20.61, p < .001, on word retrieval scores, with no significant interactions between factors (see Figure 1). As expected, patients with mild aphasia outscored those with moderate aphasia across all measures. Both groups exhibited superior performance in composite naming and conversational speech contexts compared to confrontation naming. Post-hoc paired ttests, with p set to .017 using the Bonferroni correction, confirmed this observation. That is, composite noun, t(13) = 3.33, p = .005, and verb scores, t(13) = 2.98, p = .011, were significantly higher than TAWF noun/verb subtest scores; conversational noun, t(13) = 3.44, p = .004, and verb scores, t(13) = 4.71, p < .001, followed a similar pattern. A comparison of word retrieval across the two connected speaking contexts, composite naming and conversational speech, revealed a significant difference

WORD RETRIEVAL IN CONNECTED SPEECH 85

between verbs, t(13) = 2.83, p = .014, but not nouns, t(13) = .91, p = .38. Although visual inspection of subjects’ scores (see Figure 1) indicated a general trend toward more accurate verb compared to noun retrieval across severity groups and elicitation contexts, no main effect of lexical class was revealed, F(l, 12) = 3.39, p = .091. Despite the significant differences yielded through ANOVA analyses, all measures of word retrieval (TAWF scores and %WR) were highly and significantly correlated across elicitation contexts (see Table 2). That is, subjects with low TAWF scores were likely to perform poorly on word retrieval measures in composite description and conversational speech, and subjects who scored well on the TAWF demonstrated relatively higher word retrieval scores across contexts. When the mild and moderate groups were analysed separately, however, this effect disappeared. No correlation was significant for the mild group, and only one comparison (conversational nouns to composite nouns) was significant for the moderate group. % Substantive Verbs Statistical analyses indicated no main effect of connected speaking context (composite description vs conversation) on subjects’ generation of substantive verbs, F(l, 12) = 1.76, p = .21 (see Figure 2). There was, however, a significant effect of severity, F(l, 12) = 17.28, p = .001, and a significant interaction between context and severity, F(l,12) = 5.92, p = .032, such that mild subjects produced significantly more substantive verbs in composite description compared to conversation, whereas moderate subjects generated slightly more substantive verbs in the conversational versus composite condition. Accordingly, %SV scores for the composite naming condition correlated more strongly with other verb retrieval measures (TAWF and %WR) compared to conversational % SV scores (see Table 2); moreover, composite %SV scores differentiated significantly between the two severity groups, F(l, 12) = 19.26, p = .001, whereas conversational %SV scores did not, F(l, 12) = 4.07, p = .067. % Corrected Errors ANOVA results revealed a significant main effect of context, F(2,14) = 4.73, p = . 027, indicating that subjects were more likely to self-correct word-finding errors in discourse contexts than during confrontation naming (see Figure 3). Whereas no main effect of

86 APHASIOLOGY

Figure 1. The percentage of correct word retrieval in composite description and conversational speech (%WR scores), in comparison with single-word (TAWF) naming scores.

WORD RETRIEVAL IN CONNECTED SPEECH 87

TABLE 2 Correlations across groups (mild and moderate, n = 14) among TAWF scores, %WR (composite description and conversation), and %SV (composite description and conversation)

TAWF nouns TAWF verbs Comp nouns Comp verbs Comp %SV Conv nouns Conv verbs

TAWF nouns

TAWF verbs

Comp nouns (%WR)

Comp verbs (%WR)

Comv verbs (%SV)

Comv nouns (%WR)

Conv verbs (%WR)

Conv verbs (%SV)



.95**

.78**

.80**

.84**

.84**

.80**

.68



.87**

.83**

.84**

.89**

.88**

.69**



.77**

.71**

.97**

.84**

.56



.85**

.79**

.97**

.49



.71**

.83**

.54



.84**

.61



.56

Correlations across groups (mild and moderate, n = 14) among TAWF scores, %WR (composite description and conversation), and %SV (composite description and conversation). ** Correlation is significant at p < .007 (Bonferroni correction).

grammatical class was noted, F(l, 7) = 0.47, p = .52, significant interactions were found between context and class, F(2,14) = 4.97, p = .023, and context, class, and severity, F(2, 14) = 11.33, p = .001. The nature of the interactions was such that mild subjects tended to self-correct verbs more often than nouns during composite naming and conversational speech, whereas moderate subjects tended to do the opposite (increased self-correction of nouns compared to verbs). No main effect of severity, F(1, 7) = 4.32, p = .08, was detected. DISCUSSION Whereas several measures have been proposed to analyse various aspects of connected speech, quantification of word-finding difficulties in this context has been often overlooked. Given the centrality of such difficulties to aphasia (Boles, 1998; Larfeuil & Le Dorze, 1997), the development of a measure to analyse lexical retrieval in natural contexts appears essential. This study examined several such measures (i.e., %WR, %SV, and %CERR) in an attempt to describe clinically useful patterns and bridge a likely gap between frequently used singleword measures of word retrieval versus more complex, connected speaking paradigms. Findings from the current study demonstrated a significant effect of context, with superior word retrieval in connected speech compared to

88 APHASIOLOGY

Figure 2. Percent Substantive Verbs (%SV) across severity groups and connected speaking contexts.

confrontation naming, and a nonsignificant trend towards lexical class effects, with more accurate retrieval of verbs than nouns. Confrontation naming scores were strongly related to aphasia severity classification, but did not predict robustly composite or conversational word retrieval scores. Clinical utility: Feasibility and reliability Conversational data are difficult to quantify (Perkins, 1995), time-consuming (Crockford & Lesser, 1994; Togher, 2001), of questionable reliability (Brookshire & Nicholas, 1994; Oelschlaeger & Thorne, 1999; Osborne et al., 1998), and confounded by the complex interaction of extraneous factors (Doyle, Thompson, Oleyar, Wambaugh, & Jackson, 1994; Jordan et al., 1997). This study proposed a simplistic measure, %WR, to quantify one aspect of an extraordinarily complicated entity. Whereas the limited reliability measures described herein do not purport to establish psychometric properties of %WR, they do underscore the clinical utility of this metric. Oelschlaeger and Thorne (1999), in their application of a %CIUs to conversation, noted that previously published reliability standards for highly controlled clinical measures (e.g., standardised tests) may be unrealistic as applied to naturally occurring language phenomena. On the other hand, clinical decision making that affects individuals’ lives must, by definition, meet fairly high reliability standards (Crockford & Lesser, 1994; Oelschlaeger & Thorne, 1999). Although no firm guidelines with

WORD RETRIEVAL IN CONNECTED SPEECH 89

Figure 3. Percent Corrected Errors (%CERR) across elicitation contexts. Note this measure was inapplicable to the single-word (TAWF) verb scores of the mildly aphasic group due to ceiling effects on this subtest (i.e., few opportunities for corrected error scores). The n from which each percentage was determined varied from subject to subject according to the number of word-finding errors elicited per context, with a range of 1–16 in the mild group and 4–45 in the moderate group.

90 APHASIOLOGY

respect to precise reliability standards have been established, the range of interand intra-rater reliability scores obtained in this study was similar to that considered acceptable by Oelschlaeger and Thorne (i.e., > 80%). Importantly, the fact that our raters were able to apply written scoring rules with at least 80% accuracy in the absence of any formal training in or discussion of the measure supports the straightforward and intuitive nature of %WR. It also is noteworthy that given the ever-increasing pressure to increase clinical assessment efficiency, these data were obtained in the context of feasible clinical time demands for both initial testing and subsequent analysis (Crockford & Lessser, 1994). Finally, the nature of %WR and supplementary analyses lends itself to the possibility of online data measurement (Hickin et al., 2001), a much-needed step in the advancement of functional communication measures (Togher, 2001). Relationships among contexts Results of this study demonstrated a significant effect of context on word retrieval, with enhanced performance of subjects during connected speaking tasks compared to confrontation naming. Whereas initial analyses based on the entire subject sample (n = 14) demonstrated a high correlation between lexical retrieval in single-word and connected speaking tasks, subsequent separate correlational analyses of the mild and moderately aphasic groups’ data failed to detect such effects. That is, the high correlation obtained for the entire subject sample appeared to reflect broad inter-group score differences between the mild versus moderate groups (i.e., floor/ceiling effects), rather than a strong predictive effect within each group. Because of the small n and relatively restricted range of scores included in the separate group analyses, however, these results should be interpreted with caution (i.e., within-group analyses may have reflected lower statistical power to detect significant effects compared to the between-group analysis). Nonetheless, single-word, TAWF scores, although highly predictive of aphasia severity (mild vs moderate), did not appear strongly related to lexical retrieval in composite naming and conversational speech within each aphasic group. The current identification of divergence between single-word confrontation naming and discourse-level word retrieval is consistent with previous data (Pashek & Tompkins, 2002; Williams & Canter, 1987), and has important implications in terms of how researchers and clinicians should assess and treat their patients with aphasia. Contemporary cognitive neuropsychological approaches that promote specific, model-based treatments for hypothesised deficits (Hillis, 1998; Raymer & Gonzalez-Rothi, 2001) and that have become the focus of much recent research, primarily address language deficits at the single-word level. In fact, few cognitive neuropsychological model-based treatment studies have addressed discourse-level tasks, and those that have (e.g., McNeil, Doyle, Spencer, Jackson-Goda, Flores, & Small, 1997), reported minimal transfer of single-word therapy gains to discourse contexts. Other

WORD RETRIEVAL IN CONNECTED SPEECH 91

aphasia treatment studies have similarly found that conversational gains may be more resistant to treatment than less natural communicative behaviours (e.g., Larfeuil & Le Dorze, 1997; Murray & Karcher, 2000). Alternatively, work utilising conversational analysis frameworks has demonstrated that aphasia therapy targeted directly at conversational behaviours may create ecologically valid change in daily interactions with communication partners (Hopper, Holland, & Rewega, 2002; Lock et al., 2001; Wilkinson et al., 1998). Collectively, the current and previous findings endorse incorporating discourselevel tasks into aphasia assessment and treatment protocols. Further data are needed to affirm which of the two connected speaking contexts utilised in this study best lends itself to accurate and valid assessment of naming. Each context entails a number of benefits and caveats; for example, composite description tasks have inherent limitations such as practice effects and a lack of interactional opportunities (Shewan, 1988; Togher, 2001), but the practical advantages of consistency and a priori targets (Hickin et al., 2001; Shewan, 1988). Likewise, the interchange of natural communication is theoretically the ideal setting for aphasia assessment and remediation; nonetheless, difficulties associated with applying consistent analytical measures to the conversational speech of aphasic patients are well recognised (Crockford & Lesser, 1994; Edwards, 1998; Marshall & Pound, 1997). For example, conversation may encourage various tactics (e.g., indefinite terms, anticipating/ avoiding difficult words) that “allow the patient to maintain a socially acceptable level of fluency in the face of severe difficulty in finding the right word” (Vermeulen et al., 1989, p. 262), thereby decreasing the likelihood of detecting word-finding errors. In contrast, developing similar compensatory strategies has been recommended as a desired and important therapeutic outcome (e.g., Holland, 1994). In such cases, %WR could nevertheless function as a critical outcome measure: that is, albeit limited to the perspective of lexicalsemantic output, %WR could quantify patients’ ability to acquire these strategies, and thus function appropriately and meaningfully in naturalistic communicative conditions. The type of discourse task (i.e., composite description vs conversation) appeared to affect specific aspects of lexical retrieval (i.e., accuracy or %WR; semantic complexity of verbs or %SV), a finding consistent with previous reports of variation in word retrieval according to numerous task-related variables (e.g., Cooper, 1990; Doyle et al., 1994). The current findings suggested a general trend towards more accurate verb retrieval in conversational contexts compared to composite description, at the expense of semantic complexity, particularly for mild subjects. Thus, the choice of which or how many discourselevel tasks to incorporate into assessment and treatment (composite description and/or conversational samples) may be, in part, a function of ultimate treatment goals with respect to, for example, verb naming or sentence construction.

92 APHASIOLOGY

Effect of grammatical class Previous research is inconsistent and inconclusive with respect to grammatical class effects in aphasic naming, with patterns of better noun than verb retrieval (Breedin et al., 1998; Edwards, 1998; Williams & Canter, 1987), as well as the inverse noted (Pashek & Tompkins, 2002; Zingeser & Berndt, 1990). The current findings tentatively support the data of Pashek and Tompkins (2002), in that a trend towards superior retrieval of verbs compared to nouns was observed across mild and moderate patients with aphasia. These results may reflect predominant characteristics of our sample of participants (e.g., fluency, lesion site, aphasia type). That is, superiority of noun over verb retrieval has been associated with agrammatic aphasia and anterior left hemisphere lesions; conversely, the opposite pattern has been associated typically with anomic aphasia and middle/inferior left temporal lesions (Damasio & Tranel, 1993; Hillis, Tuffiash, Wityk, & Barker, 2002; Zingeser & Berndt, 1990). Of the 14 participants in this study, only 3 were judged to be agrammatic. Although subjective analysis of these subjects’ data demonstrated a mild discrepancy in patterns of noun/verb retrieval compared to the more fluent speakers (i.e., slightly higher noun than verb retrieval scores in the composite condition), the fact that this discrepancy did not extend to the single-word and conversational contexts is inconsistent with broad grammatical class differences among subjects based on fluency/ aphasia type alone. Rather, the nonsignificant grammatical class trend in this study is more likely a function of a variety of stimulus factors such as word length, frequency of occurrence, semantic complexity (Berndt et al., 1997a; Breedin et al., 1998; Pashek & Tompkins, 2002). Because this study’s intent was to explore a clinically feasible and ecologically valid discourse measure, these factors were left to vary, and therefore, sound conclusions regarding lexical organisation and processing cannot be drawn. A meaningful implication of these results, however, is the inadequacy of limited grammatical class inclusion (e.g., nouns only) when assessing naming in adults with aphasia. Supplementary measures The Percent Substantive Verbs measure (%SV) proved to be a relatively simple means of extracting important lexical retrieval information from the language samples. In fact, for several subjects, %SV was sensitive to possible deficits undetected by %WR alone. For example, although two subjects, MI2 and MI5, demonstrated 100% accurate verb retrieval in conversational speech, their %SV scores of just 33% and 34%, respectively, indicated that these individuals relied heavily on light verbs to communicate meaning (cf. Berndt et al., 1997a, 1997b; Breedin et al., 1998). Thus, a decision to utilise %SV or %WR when assessing word retrieval may be influenced by aphasia severity: %SV may be more useful than, or an important complement to, %WR to gauge verb retrieval in mild patients, given that many mild subjects were at ceiling with the latter measure.

WORD RETRIEVAL IN CONNECTED SPEECH 93

That composite description elicited higher %SV compared to conversation for mild subjects (see Figure 2) also has assessment and treatment ramifications. For example, one strategy to encourage conversational substantive verb production may involve eliciting specific verbs during picture description tasks. Conversely, the goal with moderate patients may be to encourage word retrieval by whatever means necessary; in that case, %WR may provide more valuable information than %SV regarding initial status and measurement of treatment outcomes. It is noteworthy that ideal %SV values for non-brain-damaged individuals have yet to be established. Although previous research has reported approximately 65–70% substantive verb production by control subjects (Berndt et al., 1997b), these data were collected via sentence-level tasks and may be inapplicable to expository and conversational speech samples. Future research, therefore, should establish appropriate %SV norms by which to gauge the level of lexical-semantic complexity of verbs produced by patients with aphasia. The percentage of corrected word-finding episodes (%CERR) has previously been utilised to look beyond lexical retrieval accuracy and examine lexical retrieval efficiency (Larfeuil & Le Dorze, 1997). A caveat of the current study was that a few mildly aphasic subjects had 100% accurate word retrieval in some contexts, and consequently, %CERR could not be applied. Therefore, statistical analyses were necessarily performed on a smaller data set, and thus, results should be interpreted with caution. Nevertheless, the significant main effect of context (i.e., higher %CERR in connected speaking vs singleword contexts) dovetails with more accurate general lexical retrieval abilities, as measured by % WR, in connected speaking than in confrontation naming tasks. That is, connected speech facilitated not only general lexical retrieval (Pashek & Tompkins, 2002), but also the efficiency of strategies to correct retrieval failures. Conclusion In summary, use of confrontation naming procedures to assess aphasia severity and to demonstrate treatment progress is endorsed by a large number of effectiveness studies, and reflects common, current clinical practice (Doesborgh et al., 2002; Jordan et al., 1997). Clearly, single-word naming tests assess the “impairment” level of functioning (i.e., structure/function limitations; World Health Organisation, 2001), and yet are often used to make predictions about daily communication abilities in patients with aphasia. Healthcare system changes have fostered a growing awareness of our obligation to address directly the functional (i.e., activity limitations) and personal (i.e., participation restrictions) consequences of aphasia (World Health Organisation, 2001). The issue has grown more complicated still, as the search for quantifiable conversational measures has historically encountered numerous obstacles. Use of a simple and easily quantifiable lexical retrieval measure, %WR, may be an option to extend current assessment methodology and reconcile issues of ecological validity and clinical feasibility. Although many questions regarding

94 APHASIOLOGY

the utility of this measure remain (e.g., feasibility of %WR for online measurement, its sensitivity to measure treatment outcomes, performance of nonbrain-damaged individuals as measured by this system, effects of complex or abstract conversational topics), the theoretical implications and clinical ramifications of the current data provide a solid basis for further exploration of % WR and related measures (e.g., %SV, %CERR) in our continual efforts to gauge legitimately the strengths and needs of our patients with aphasia.

REFERENCES Berndt, R. S., Mitchum, C. C., Haendiges, A. N., & Sandson, J. (1997a). Verb retrieval in aphasia, 1. Characterizing single word impairments. Brain and Language, 56, 68–106. Berndt, R. S., Mitchum, C. C., Haendiges, A. N., & Sandson, J. (1997b). Verb retrieval in aphasia, 2. Relationship to sentence processing. Brain and Language, 56, 107–137. Boles, L. (1998). Conversational discourse analysis as a method for evaluating progress in aphasia: A case report. Journal of Communication Disorders, 31, 261–274. Breedin, S. D., Saffran, E. M., & Schwartz, M. F. (1998). Semantic factors in verb retrieval: An effect of complexity. Brain and Language, 63, 1–31. Brookshire, R. H., & Nicholas, L. E. (1994). Test–retest stability of measures of connected speech in aphasia. Clinical Aphasiology, 22, 119–133. Brown, C. S., & Cullinan, W. L. (1981). Word-retrieval difficulty and disfluent speech in adult anomic speakers. Journal of Speech and Hearing Research, 24, 358–365. Cooper, P. V. (1990). Discourse production and normal aging: Performance on oral picture description tasks. Journal of Gerontology, 45, 210–214. Crockford, C., & Lesser, R. (1994). Assessing functional communication in aphasia: Clinical utility and time demands of three methods. European Journal of Disorders of Communication, 29, 165–182. Damasio, A. R., & Tranel, D. (1993). Nouns and verbs are retrieved with differently distributed neural systems. Proceedings of the National Academy of Sciences, 90, 4957–4960. Doesborgh, S. J. C., van de Sandt-Koenderman, W. M. E., Dippel, D. W. J., van Harskamp, F., Koudstaal, P. J., & Visch-Brink, E. G. (2002). The impact of linguistic deficits on verbal communication. Aphasiology, 16 (4/ 5/6/), 413–423. Dollaghan, C. A., & Campbell, T. F. (1992). A procedure for classifying disruptions in spontaneous language samples. Topics in Language Disorders, 12, 56–68. Doyle, P. J., Thompson, C. K., Oleyar, K., Wambaugh, J., & Jackson, A. (1994). The effects of setting variables on conversational discourse in normal and aphasic adults. Clinical Aphasiology, 22, 135–143. Edwards, S. (1998). Single words are not enough: Verbs, grammar and fluent aphasia. International Journal of Language and Communication Disorders, 33 (Supplement), 190–195. German, D. J. (1990). The Test of Adolescent and Adult Word-Finding. Austin, TX: ProEd.

WORD RETRIEVAL IN CONNECTED SPEECH 95

Helm-Estabrooks, N. (1997). Treatment of aphasic naming problems. In H.Goodglass & A.Wingfield (Eds.), Anomia (pp. 189–202). San Diego, CA: Academic Press. Hickin, J., Best, W., Herbert, R., Howard, D., & Osborne, F. (2001). Treatment of word retrieval in aphasia: Generalisation to conversational speech. International Journal of Language and Communication Disorders, 36(Suppl.), 3–8. Hillis, A. E. (1998). What’s in a name? A model of the cognitive processes underlying object naming. In E.G. Visch-Brink & R.Bastiaanse [Eds.], Linguistic levels in aphasiology (pp. 35–48). San Diego, CA: Singular. Hillis, A. E., Tuffiash, E., Wityk, R. J., & Barker, P. B. (2002). Regions of neural dysfunction associated with impaired naming of actions and objects in acute stroke. Cognitive Neuropsychology, 19 (6), 523–534. Holland, A. L. (1994). Cognitive neuropsychological theory and treatment for aphasia: Exploring the strengths and limitations. Clinical Aphasiology, 22, 275–282. Hopper, T., Holland, A., & Rewega, M. (2002). Conversational coaching: Treatment outcomes and future directions. Aphasiology, 16(7), 745–761. Ingles, J. L., Mate-Kole, C. C., & Connolly, J. F. (1996). Evidence for multiple routes of speech production in a case of fluent aphasia. Cortex, 32(2), 199–219. Jordan, F., Ward, K., & Cremona-Meteyard, S. (1997). Word-finding in the conversational discourse of children with closed head injury. Aphasiology, 11(9), 877–888. Kaplan, E., Goodglass, H., & Weintraub, S. (1983). The Boston naming test. Philadelphia: Lea & Febiger. Kemmerer, D., & Tranel, D. (2000a). Verb retrieval in brain-damaged subjects: 1. Analysis of stimulus, lexical, and conceptual factors. Brain and Language, 73, 347–392. Kemmerer, D., & Tranel, D. (2000b). Verb retrieval in brain-damaged subjects: 2. Analysis of errors. Brain and Language, 73, 393–420. Kertesz, A. (1982). Western Aphasia Battery. New York: Grune & Stratton. Larfeuil, C., & Le Dorze, G. (1997). An analysis of the word-finding difficulties and the content of the discourse of recent and chronic aphasic speakers. Aphasiology, 11(8), 783–811. Lock, S., Wilkinson, R., Bryan, K., Maxim, J., Edmundson, A., Bruce, C. et al. (2001). Supporting partners of people with aphasia in relationships and conversation (SPPARC). International Journal of Language and Communication Disorders, 36 (Supplement), 25–30. MacWhinney, B. (1995). The CHILDES project: Tools for analyzing talk (pp. 41–45). Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Manning, L., & Warrington, E. K. (1996). Two routes to naming: A case study. Neuropsychologia, 34(8), 809–817. Marshall, J., & Pound, C. (1997). Difficulties with discourse. Aphasiology, 11(6), 625–629. McNeil, M. R., Doyle, P. J., Spencer, K. A., Jackson-Goda, A., Flores, D., & Small, S. L. (1997). A doubleblind, placebo-controlled study of pharmacological and behavioral treatment of lexical-semantic deficits in aphasia. Aphasiology, 11(4/5), 385–400. Murray, L. L., & Karcher, L. (2000). A treatment for written verb retrieval and sentence construction skills. Aphasiology, 14, 585–602. Nicholas, L. E., & Brookshire, R. H. (1993a). A system for scoring main concepts in the discourse of non-braindamaged and aphasic speakers. Clinical Aphasiology, 21, 87–99.

96 APHASIOLOGY

Nicholas, L. E., & Brookshire, R. H. (1993b). A system for quantifying the informativeness and efficiency of the connected speech of adults with aphasia. Journal of Speech and Hearing Research, 36, 338–350. Nicholas, M., Obler, L. K., Albert, M. L., & Helm-Estabrooks, N. (1985). Empty speech in Alzheimer’s disease and fluent aphasia. Journal of Speech and Hearing Research, 28, 405–410. Oelschlaeger, M. L, & Thorne, J. C. (1999). Application of the Correct Information Unit analysis to the naturally occurring conversation of a person with aphasia. Journal of Speech, Language, and Hearing Research, 42, 636–648. Osborne, F., Hickin, J., Best, W., & Howard, D. (1998). Treating word-finding difficulties: Beyond picturenaming. International Journal of Language and Communication Disorders, 33 (Supplement), 208–213. Pashek, G. V., & Tompkins, C. A. (2002). Context and word class influences on lexical retrieval in aphasia. Aphasiology, 16(3), 261–286. Penn, C. (2000). Paying attention to conversation. Brain and Language, 71, 185–189. Perkins, L. (1995). Applying conversation analysis to aphasia: Clinical implications and analytic issues. European Journal of Disorders of Communication, 30, 372–383. Raymer, A. M., & Gonzalez-Rothi, L. J. (2001). Cognitive approaches to impairments of word comprehension and production. In R.Chapey (Ed.), Language intervention strategies in aphasia and related neurogenic communication disorders (pp. 524–550). Philadelphia, PA: Lippincott, Williams, & Wilkins. Schwartz, M. F., & Hodgson, C. (2002). A new multiword naming deficit: Evidence and interpretation. Cognitive Neuropsychology, 19(3), 263–288. Segalowitz, S. J., & Lane, K. C. (2000). Lexical access of function versus content words. Brain and Language, 75(3), 376–389. Shewan, C. M. (1988). The Shewan Spontaneous Language Analysis (SSLA) system for aphasic adults: Description, reliability, and validity. Journal of Communication Disorders, 21, 103–138. Shewan, C. M., & Bandur, D. L. (1986). Treatment of aphasia: A language-oriented approach (pp. 243–259). London: Taylor & Francis. Snow, P., Douglas, J., & Ponsford, J. (1995). Discourse assessment following traumatic brain injury: A pilot study examining some demographic and methodological issues. Aphasiology, 9, 365–380. Togher, L. (2001). Discourse sampling in the 21st century. Journal of Communication Disorders, 34, 131–150. Vermeulen, J., Bastiaanse, R., & Van Wageningen, B. (1989). Spontaneous speech in aphasia: A correlational study . Brain and Language, 36, 252–274. Wilkinson, R., Bryan, K., Lock, S., Bayley, K., Maxim, J., Bruce, C. et al. (1998). Therapy using conversation analysis: Helping couples adapt to aphasia in conversation. International Journal of Language and Communication Disorders, 33 (Supplement), 144–149. Williams, S. E., & Canter, G. J. (1982). The influence of situational context on naming performance in aphasic syndromes. Brain and Language, 17, 92–106, Williams, S. E., & Canter, G. J. (1987). Action-naming performance in four syndromes of aphasia. Brain and Language, 32, 124–136. Wilshire, C. E., & McCarthy, R. A. (2002). Evidence for a context-sensitive word retrieval disorder in a case of nonfluent aphasia. Cognitive Neuropsychology, 19(2), 165–186.

WORD RETRIEVAL IN CONNECTED SPEECH 97

World Health Organisation (2001). ICIDH–2: International Classification of Functioning, Disability and Health. Geneva, Switzerland: WHO. Zingeser, L. B., & Berndt, R. S. (1990). Retrieval of nouns and verbs in agrammatism and anomia. Brain and Language, 39 (1), 14–32.

APPENDIX %WR SCORING PROTOCOL ● Count the first 3001 words of the sample (Larfeuil & Le Dorze, 1997; Vermeulen et al., 1989). ● Count the number of nouns and verbs within the 300–word corpus, with the following exceptions: (a) Modalising speech2 (cf Larfeuil & Le Dorze, 1997) is excluded from further analysis (i.e., noun/verb counts). This is to prevent artificial inflation of a speaker’s noun/verb output. (b) Nouns and verbs that are part of circumlocutions, self-corrections, or repetitions/stalling are counted only in initial form (Larfeuil & De Lorze, 1997; Vermeulen et al., 1989). (c) If a word is repeated for emphasis (e.g., “shark, shark, shark!”) or to denote different items (e.g., “lifeguard…lifeguard” (pointing to one, then the other)), it is counted each time; for other instances of repetition (e.g., stalling), the word is counted just once (MacWhinney, 1995). (d) The verb “to be” is not counted (Breedin et al., 1998). (e) Pronouns and prepositions are not counted (Segalowitz & Lane, 2000). (f) Numerals are not counted as nouns (Segalowitz & Lane, 2000). ● Noun and verb word-finding episodes/errors include: (a) Words immediately preceded by prolonged (filled or unfilled) hesitation (2+ seconds; Crockford & Lesser, 1994; Dollaghan & Campbell, 1992; Pashek & Tompkins, 2002); if the pause is utteranceinitial, however, it is ignored (due to the possibility of sentence construction deficits).3 (b) Words preceded or accompanied by comments indicating difficulty. (c) Self-corrections (Pashek & Tompkins, 2002). (d) Paraphasias (semantic, phonemic, or unrelated). (e) Deletions (Kemmerer & Tranel, 2000b): e.g., obvious deletion of a syntactic constituent (main verb, object noun) in a required context.4 5

98 APHASIOLOGY

(f) Overuse of indefinite terms (Nicholas eet al., 1985). (g) Overuse of pronouns, or pronouns without antecedents (Hickin et al., 2001; Nicholas et al., 1985).

1 If 300 words were not produced in the sample, the first 150–200 words were used (Berndt et al., 1997; Brown & Cullinan, 1981). 2 Larfeuil and LeDorze (1997) defined “modalising speech” as “all utterances in which the speaker includes himself/herself in the discourse…[e.g.], verbs and verbal phrases… whose function is not to express the speaker’s feelings but rather to predicate…e.g., ‘I think that’ ” ” (p. 788). 3 Pashek and Tompkins (2002) noted possible confounds of the 2+ second rule for indicating word-finding difficulty; they suggest that hesitations be analysed relative to overall rate of speech. Therefore, language samples of patients whose response times or speaking rates were judged clinically to be slow were analysed such that relative hesitations (i.e., > 2 seconds longer than average pause time between words in fluent speech) were noted rather than 2+ second pauses, per se. 4 With the exception of obvious deletions, additional errors of grammaticality (e.g., “I had going…”) were not considered errors of word retrieval if the subject demonstrated evidence of having retrieved the correct lexical constituent. 5 The inclusion of indefinite terms [see (f)] is controversial in some respects, as normal speakers have been noted to utilise such terms nearly as often as brain-injured speakers in some cases (Snow, Douglas, & Ponsford, 1995). Therefore, the appropriateness or inappropriateness of such terms in the context of the discourse (i.e., whether or not an intended referent could be interpolated) was noted prior to definitive scoring. Subjects were given the benefit of the doubt in ambiguous situations.

Narrative and conversational discourse of adults with closed head injuries and nonbrain-injured adults: A discriminant analysis Carl A.Coelho University of Connecticut, and Hospital for Special Care, New Britain, CT, USA Kathleen M.Youse and Karen N.Le University of Connecticut, USA Richard Feinn University of Connecticut Health Center, Farmington, USA Background: Although there is general agreement regarding the clinical utility of discourse analyses for detecting the often subtle communicative impairments following closed head injuries (CHI), there is little consensus regarding discourse elicitation or analysis procedures. Consequently it has been difficult to compare findings across studies. Aims: In an effort to facilitate a movement towards the adoption of a more consistent methodology for the assessment of discourse abilities, the current study examined several commonly used measures of discourse performance and the accuracy with which these measures were able to distinguish individuals with CHI from non-brain-injured (NBI) controls. Previous studies have suggested that conversation is less demanding than narrative discourse because such narratives require greater manipulation of extended units of language while conversational discourse can be maintained with minimal responses (Chapman, 1997; Galski, Tompkins, & Johnston, 1998). On the basis of these reports it was hypothesised that the measures of narrative story performance would more accurately discriminate the participant groups than conversational measures. Methods & Procedures: Discourse samples were elicited from 32 adults with CHI and 43 NBI adults. Discourse samples included two story narratives, generation and retelling, and 15 minutes of conversation. A variety of discourse analyses were performed including story narrative measures of grammatical complexity, cohesive adequacy, and story grammar. Measures of conversation included appropriateness and topic initiation. Discriminant function

100 APHASIOLOGY

analyses (DFA) were then employed to determine the accuracy of the selected measures in classifying the participants into their respective groups. Outcomes & Results: Results of the DFA with only the story narrative measures indicated that 70% of the cases, 64.5% of the CHI group, and 74.4% of the NBI group were accurately classified. This finding was not significant, suggesting that the story narrative measures did not reliably discriminate the CHI from the NBI participants. The DFA with the conversational measures correctly classified over 77% of the cases, 78.1% of the CHI participants, and 72.1% of the NBI group. This finding was significant, which suggests that the measures of conversational discourse were better able to discriminate the participant groups. A third DFA was performed, with all of the story narrative and conversational discourse measures included, which revealed that the conversational measures, comments and adequate plus responses, and the story narrative measure, T-units within episode structure in the generation task, made the greatest contributions to discriminating between the groups. Overall, group membership was correctly classified by the DFA in 81% of the cases, 84.4% of the CHI group, and 77.5% of the NBI participants. This finding was significant, suggesting that these three discourse measures discriminated the two participant groups with the highest degree of reliability. Conclusions: These findings did not support the hypothesis that the narrative discourse measures would more accurately predict group membership of the CHI and NBI participants than the conversational measures. A variety of factors may account for these findings including the interactive nature of conversation as well as social factors which appear to make this genre more difficult for individuals with CHI and a more sensitive index of their cognitivecommunicative impairments.

Address correspondence to: Carl A. Coelho PhD, Communication Sciences Dept, University of Connecticut, Unit 1085, Storrs, Connecticut 06268–1085, USA. Email: [email protected] This project was supported by grants from the University of Connecticut Research Foundation, and the Hospital for Special Care. © 2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html 02687030344000111

DOI:10.1080/

NARRATIVE AND CONVERSATIONAL DISCOURSE 101

The clinical utility of including discourse analyses in the assessment procedures for cognitive-communicative deficits secondary to closed head injury (CHI) in adults has been documented by a variety of recent investigations (e.g., Coelho, Liles, & Duffy, 1991, 1995; Hartley & Jensen, 1991; Mentis & Fruiting, 1987; Snow, Douglas, & Ponsford, 1995, 1997). Although there is general agreement among these studies regarding the sensitivity of discourse analyses for detecting the often subtle communicative impairments following CHI, there is little consensus regarding discourse elicitation or analysis procedures. Consequently it has been difficult to compare findings across studies. For example, two recent studies compared the discourse performance of CHI and non-brain-injured (NBI) controls. In the first study (Coelho, 2002) narratives were elicited in two story tasks, retelling and generation, from two groups of adults, 55 CHI and 47 NBI. Narratives were analysed at the levels of sentence production, cohesive adequacy, and story grammar. Discourse performance was then compared across groups and tasks. Results indicated that two measures distinguished the groups. The CHI participants produced significantly fewer words per T-unit and fewer Tunits within episode structure than the NBI group. In addition significant differences were noted for all five discourse measures (words per T-unit, subordinate clauses per T-unit, cohesive adequacy, number of complete episodes, and proportion of T-units within episode structure) across the story tasks. All participants, CHI and NBI, produced longer and more grammatically complex T-units in the story generation task than in story retelling. However, cohesive adequacy and story grammar were better in the story retelling task as compared to the story generation task. In the second study (Coelho, Youse, & Le, 2002) samples of conversation were elicited from 32 individuals with CHI and 43 NBI adults, and analysed for various dimensions of appropriateness and topic initiation. Findings indicated that the CHI group produced significantly fewer comments (i.e., utterances for which no response was explicitly demanded) than the NBI participants. In addition, the CHI participants produced significantly more adequate plus responses (i.e., providing more information than was requested) than the NBI group. The findings of these studies illustrate some of the difficulties involved in the measurement of narrative ability, namely the pragmatic nature of the task. Narrative performance may be influenced by a variety of contextual parameters such as: listener characteristics, elicitation procedures, presentation medium, complexity of content, structural complexity, social function, and manner of textual coherence (Liles, Duffy, Merritt, & Purcell, 1995; Togher, 2001). In addition to contextual influences, the measurement of narrative performance is further complicated by the multiplicity of measures that have been applied. For example, narrative ability may be described in terms of the speaker’s social role, cognitive organisation, linguistic structure of the text, and sentence-level complexity (Liles et al., 1995; Togher, 2001). In an effort to facilitate a movement towards the adoption of a more consistent and clinically efficient methodology for the assessment of discourse abilities, the

102 APHASIOLOGY

current study examined several commonly used measures of discourse performance in story and conversational narratives. Specifically, we were interested in the accuracy with which these measures were able to distinguish individuals with CHI from NBI controls. The measures applied to story narratives included within-sentence analyses such as grammatical complexity and between-sentence analyses such as cohesive adequacy and story grammar. Conversational measures included dimensions of response appropriateness and topic initiation. Discriminant function analyses were then employed to determine which of the selected measures were most effective at classifying large groups of adults with CHI and NBI adults. Ylvisaker, Szekeres, and Feeney (2001) have suggested that discourse proficiency involves an interaction of cognitive and linguistic organisational processes. Story narratives were selected for study because they provide the opportunity for the analysis of two separate sources of information related to cognitive and linguistic levels of narrative organisation. The first level, macroorganisation, relates to story grammar. At this level “information is organised in terms of how intentions and events are logically related in time or through cause– effect relations reflected in general experience” (Liles et al., 1995, p. 423). Because the interpretation of content is facilitated by the speaker/ listeners’ access to general cognitive schemata, this level of narrative organisation is hypothesised to go beyond the content of a specific text (Mandler, 1982). The second organisational level, micro-organisation, involves linguistic organisation of the text, both within and across sentence boundaries. At this level the text is processed as a closed unit (Liles et al., 1995). The complexities of conversational discourse have been well described in a number of studies (Doyle, Goda, & Spencer, 1995; Mackenzie, 2000; Togher, 2001; Togher, Hand, & Code, 1999; Wilkinson, 1999). Effective participation in a conversation is dependent on a variety of factors such as: topic maintenance, turn taking, appropriate referencing, sensitivity to the conversational partner, and general cognitive abilities such as attention, vigilance, and memory. Somewhat contrary to this view, other studies have suggested that conversation is less demanding than narrative discourse, because such narratives require greater manipulation of extended units of language while conversational discourse can be maintained with minimal responses (Chapman, 1997; Galski et al., 1998). In the present study we hypothesised that the measures of narrative story performance would more accurately discriminate the participant groups than conversational measures. It was predicted that because proficiency in story narrative production is heavily dependent on cognitive and linguistic organisational skills, this discourse genre would be more sensitive to the cognitive-linguistic dysfunction associated with CHI than conversational discourse. It was also hypothesised that entering all of the narrative and conversational discourse measures into a single step-wise discriminant function analysis would increase the accuracy with which the participant groups could be discriminated.

NARRATIVE AND CONVERSATIONAL DISCOURSE 103

METHOD Participants CHI. A total of 32 native speakers of English who had sustained a CHI were studied. Participants were selected because they had recovered a high level of functional language—that is, they had achieved fluent conversation and did not demonstrate any significant deficits on traditional clinical language tests. In addition, participants were recruited to represent a range of socioeconomic backgrounds (see below). All CHI participants met the following criteria: (a) no reported history of substance abuse or psychiatric illness; (b) visual acuity and visual perceptual abilities adequate to distinguish stimulus materials as determined by screening procedures; (c) hearing acuity adequate to follow directions in each task as determined by screening procedures; (d) an aphasia quotient (AQ) from the Western Aphasia Battery (Kertesz, 1982) above 93; (e) no significant motor speech disorder as determined by an experienced speech-language pathologist; (f) Rancho Los Amigos Level of Cognitive Functioning (Hagen, Malkmus, & Durham, 1980) of VII (automatic-appropriate) or above; (g) Galveston Orientation and Amnesia Test (Levin, O’Donnel, & Grossman, 1979) score of 75 or above; and (h) a score of 120 or above on the Dementia Rating Scale (Mattis, 1976), a general screen of cognitive processing. The CHI group consisted of 8 females and 24 males ranging in age from 16–69 years (mean—31.7 years). Four members of this group were African-American and the remainder Caucasian. Years of education for the CHI group ranged from 10–21 (mean = 13.2). The CHI participants were also assigned to one of three socioeconomic groups: Professional, Skilled Worker, or Unskilled Worker on the basis of the Hollingshead rating (Hollingshead, 1972) (see Coelho et al., 2002 for a description). The group consisted of 11 professionals, 10 skilled workers, and 11 unskilled workers. All of the CHI participants’ injuries were rated as either moderate (duration of coma less than 6 hours) or severe (duration of coma greater than than 6 hours) on the basis of criterion established by Lezak (1995). Time post onset ranged from 1–99 months (mean = 12.8 months). NBI. A total of 43 hospital employees, working in a variety of capacities, who were native speakers of English made up the NBI group. No individual in this group reported a history of neurologic or psychiatric disease, or substance abuse. NBI participants were also selected on the basis of socioeconomic level. Attempts were also made to match these individuals, as closely as possible, with the CHI participants on the basis of age and gender. There were 30 males and 13 females studied, ages ranged from 16–63 years old (mean = 31.9 years). Two individuals from this group were African-American and 41 Caucasian. Level of education ranged from 11–24 years (mean = 15.3). With regard to socioeconomic status, the NBI group consisted of 15 professionals, 10 skilled workers, and 18 unskilled workers.

104 APHASIOLOGY

Discourse elicitation procedures Two genres of discourse were elicited from all participants: stories under two conditions, Retelling and Generation, and conversation. Story retelling task. Subjects were presented the picture story, The Bear and the Fly (Winter, 1976), by filmstrip projector on a 23 cm x 30.5 cm screen. The picture story has 19 frames with no soundtrack. After viewing the filmstrip the subjects were given the following instruction: “Tell me that story.” Story generation task. Subjects were presented with a copy of the Norman Rockwell painting, The Runaway. The subjects were given the following instruction: “Tell me a story about what you think is happening in this picture.” The picture remained in view of the examiner and subject until the task was completed. Conversation. Each of the individuals, CHI and NBI, was individually brought into a quiet room by the examiner. He introduced himself to each participant and stated that he was interested in learning more about conversational behaviour. Each participant was then engaged in a 15–minute conversation. The examiner and co-interactor in each of the conversations was a 42–year-old Caucasian male with approximately 22 years of education working as a speech-language pathologist. The examiner was essentially a stranger to all of the individuals with CHI and NBI participants prior to the conversations. Most conversations were initiated by the examiner with the question “Why are you here at the hospital today?”. Each conversation was audiotaped and each recording transcribed verbatim with each utterance being assigned to one of the speakers (examiner or participant). Data collection Each story and conversation was audiotaped and later transcribed verbatim. Transcriptions of the stories were distributed into T-units (i.e., an independent clause plus any subordinate clauses associated with it) prior to analysis, following the conventions described by Liles (1985). For the conversations each utterance was assigned to one of the speakers, examiner or participant. Any discourse samples that were judged to be inconsistent with the intended elicited genre (e.g., production of a narrative description instead of a story, or an extended monologue instead of conversation) were excluded from analysis. Analyses of story narratives The narrative discourse analysis procedures, including reliability measures, employed in the present study have been explained in detail elsewhere (see Coelho, 2002). Therefore the analyses are only briefly described below. Within-sentence. Two measures of sentence production were examined and compared across tasks and groups:

NARRATIVE AND CONVERSATIONAL DISCOURSE 105

(1) Number of Words per T-unit. (2) Number of Subordinate Clauses per T-unit—the total number of subordinate clauses in each story was obtained and divided by the total number of Tunits. The frequency of subordinate clause use may be considered a measure of the complexity of sentence-level grammar. Between-sentence. Between-sentence measures included: (1) Cohesive Adequacy—The measure of cohesive adequacy used in this study was Percent Complete Ties out of Total Ties. Cohesive ties pertain to how meaning is conjoined across sentences. A word is considered to be a cohesive tie if the listener must search outside the sentence for the completed meaning. Three categories of adequacy were used: complete, incomplete, and erroneous. (2) Story Grammar—Two measures of story grammar performance were employed in this study: (a) Number of Total Episodes: number of complete and incomplete episodes, considered to be a measure of content organisation; and (b) Proportion of T-units Contained within Episode Structure (T-units in episode structure/total T-units). An episode consists of (a) an initiating event that prompts a character to formulate a goal, (b) an action, and (c) a direct consequence marking attainment or nonattainment of the goal. Analyses of conversation The procedures for the analysis of conversation, as well as reliability measures, have been discussed elsewhere (see Coelho et al., 2002). These analyses are summarised below. The middle 6 minutes of each conversation were analysed. Two categories of analyses were employed with each transcribed conversation: Appropriateness (Blank & Franklin, 1980) and Topic Initiation (Brinton & Fujiki, 1989). Number of conversational turns was also tallied. Appropriateness. Within the category of Appropriateness, each utterance was categorised either as a Speaker-Initiation or a Speaker-Response. Speaker-initiations. These were classified as Obliges (utterances containing explicit requirements for a response from the listener) or Comments (utterances not containing an explicit demand for a response). The total numbers of Obliges and Comments produced by a subject or the examiner over the course of each conversation were tallied. Speaker-responses. These were classified in terms of adequacy. An Adequate response was one that appropriately met the initiator’s verbalisation. An Adequate Plus response was relevant and elaborated the theme, providing more information than was requested. The total numbers of Adequate Plus and Adequate responses produced by each participant in each conversation were tallied.

106 APHASIOLOGY

Topic initiation. Either a participant or the examiner could introduce topics. Topics could be changed in one of three ways: (a) at the beginning of the conversation, or by ending discussion of one topic and initiating another, referred to as a Novel Introduction; (b) by means of a Smooth Shift, in which discussion of one topic is subtly switched to another; or (c) by means of a Disruptive Shift, in which discussion of one topic is abruptly or illogically switched to another topic. The total numbers of Novel Introductions and shifts (Smooth, Disruptive) produced by a participant over the course of each conversation were tallied. Turns. An utterance was defined as an oral statement or response. Reliability of discourse measures Inter- and intra-examiner reliability scores for all of the discourse measures described in the present paper have been reported on elsewhere (Coelho, 2002; Coelho et al., 2002) and therefore will only be summarised here. For the measures of story narrative ability, inter- and intra-examiner reliability scores ranged from 90–98%. Reliability scores for the conversation measures ranged from 80–99%.

RESULTS In the present study data from 32 CHI and 43 NBI participants from the two previous investigations described in the introduction (Coelho, 2002; Coelho et al., 2002) were reanalysed using discriminant function analyses (DFA). The intent of the present study was to investigate the accuracy with which group membership (CHI versus NBI) could be predicted on the basis of discourse performance. The measures selected for inclusion in the DFA included measures of story narrative and conversational discourse. Results from each of these DFAs are discussed below. Story narrative measures Five measures that sampled aspects of micro-organisation (i.e., words per T-unit and subordinate clauses per T-unit) and macro-organisation (i.e., percentage of complete cohesive ties to total cohesive ties, total episodes, and proportion of Tunits within episode structure) in story retelling and story generation tasks were entered into the DFA for narrative discourse. The DFA accurately classified 70% of the cases, x2(10) = 14.54, p = .15, 64.5% of the CHI group and 74.4% of the NBI group (see Table 1). This finding was not significant, accounting for approximately 20% of the explained variance, suggesting that the story narrative measures did not reliably discriminate the CHI from the NBI participants. Of the story narrative measures, the proportion of T-units within episode structure and

NARRATIVE AND CONVERSATIONAL DISCOURSE 107

words per T-unit both from the story generation task had the highest correlations with the discriminant function, .54 and .49 respectively (see Table 2). Conversation measures Seven measures of conversational performance (i.e., numbers of obliges, comments, adequate and adequate plus responses, novel topic introductions, smooth topic shifts, and turns) were included in the DFA for conversation. Of the measures of conversational performance studied, number of comments and adequate plus responses, had the highest correlations to the discriminant function,—.91 and .67 respectively (see Table 3). This DFA correctly classified over 77% of the cases, x2(7) = 25.04, p = .001, 78.1% of the CHI participants and 72.1% of the NBI group (see Table 4). This finding was significant, accounting for approximately 30% of the explained variance, which suggests that the measures of conversational discourse were better able to discriminate the participant groups. TABLE 1 Classification results from discriminant function analysis of story narrative measures Predicted group membership Actual group

CHI

NBI

Total

CHI NBI

20 (64.5%) 10 (25.6%)

11 (35.5%) 29 (74.4%)

31 (100.0%) 39 (100.0%)

70.0% of original grouped cases correctly classified. TABLE 2 Correlations between the story narrative measures and the discriminant function Measure

Correlation

GENER-TUEPTR GENER-WDSTU RETELL-TUEPTR RETELL-SUBT GENER-COMTPC RETELL-COMTPC RETELL-WDSTU GENER-SUBT GENER-EPTOT RETELL-EPTOT

.54 .49 .42 .39 .29 .26 .23 .22 –.15 –.03

The measures with the highest correlation contribute the most to discriminating between the groups. GENER = story generation task, RETELL = story retelling, task, WDSTU = words per Tunit, SUBT = subordinate clauses per T-unit, COMTPC = percent complete

108 APHASIOLOGY

ties out of total ties, EPTOT = number of total episodes, TUEPTR = proportion of T-units within episode structure.

Story narrative and conversation measures In an effort to determine if group classification could be improved by including all 17 measures of both the story narrative and conversational discourse, a step-wise DFA was performed. In this procedure the measure providing the best discrimination is entered first, then from the remaining 16 measures, the measure that adds the most to discriminating between the groups is added to the first selected measure. This procedure continues until there are no measures that, when added, significantly increase the capacity to discriminate above the measures entered in previous steps. Results from the step-wise DFA revealed that the conversational measures comments and adequate plus responses and the story narrative measure T-units within episode structure in the generation task made the greatest contributions to discriminating between the groups (see Table 5). The combination of just these three measures discriminated the groups as well as any other TABLE 3 Correlations between the conversation measures and the discriminant function Measure

Correlation

COMMENTS –.91 ADEQUATE PLUS RESPONSES .67 OBLIGES –.28 ADEQUATE RESPONSES .23 NOVEL TOPIC INTRODUCTIONS –.19 TURNS –.09 –.04 SMOOTH TOPIC SHIFTS The measures with the highest correlation contribute the most to discriminating between the groups.

TABLE 4 Classification results from discriminant function analysis with conversation measures Predicted group membership Actual group

CHI

NBI

Total

CHI NBI

28 (87.5%) 13 (30.2%)

4 (12.5%) 30 (69.8%)

32 (100.0%) 43 (100.0%)

77.3% of original grouped cases correctly classified.

NARRATIVE AND CONVERSATIONAL DISCOURSE 109

combination of the 17 story narrative and conversational measures. Overall, group membership was correctly classified by the DFA in 81% of the cases, x2 (3) = 32.23, p < .001, 84.4% of the CHI group and 77.5% of the NBI participants (see Table 6). This finding was significant and accounted for over 37% of the explained variance suggesting that these three discourse measures discriminate the participant groups with the highest degree of reliability. However it was the conversational measures (i.e., comments and adequate plus responses) that had the largest correlations with the discriminant function, .79 and .55, versus the story narrative measure (i.e., T-units within episode structure in the generation task) with a correlation of .40. DISCUSSION Prior to discussing the findings of this study it is important to acknowledge a limitation in the procedures employed for data analysis. If one estimates the discriminant functions that may best predict group membership from a given data set, one should not then use the same data set, as was done in this study, to judge the accuracy of the prediction. Validation of a predicted discriminant function requires testing of the function with another data sample, thereby reducing the effect of chance on the predictive process. Replication of the present study is needed. With that qualification in mind, the results of the present study should be interpreted cautiously. Results of the DFAs run with the discourse data from the CHI and NBI participants indicate that the conversational measures were more accurate in discriminating the groups. These findings did not support the hypothesis which predicted that the narrative discourse measures would more accurately predict group membership of the CHI and NBI participants than the conversational measures. A variety of explanations may account for these findings. TABLE 5 Correlations between selected story narrative and conversation measures and the discriminant function Measure

Correlation

COMMENTS ADEQUATE PLUS RESPONSES GENER-TUEPTR

.79 –.55 .40

GENER = story generation task, TUEPTR = proportion of T-units within episode structure

110 APHASIOLOGY

TABLE 6 Classification results from discriminant function analysis with selected story narrative and conversation measures Predicted group membership Group

CHI

NBI

Total

CHI NBI

27 (84.4%) 7 (22.5%)

5 (15.6%) 31 (77.5%)

32 (100.0%) 40 (100.0%)

80.6% of original grouped cases correctly classified.

Galski and colleagues (1998) have commented that the success of an individual’s social, vocational, familial, and academic integration rests on the recovery of effective communication. Although previous research has demonstrated that individuals with CHI have difficulty with many narrative discourse tasks (see Coelho, 1995), it may be that because of the interactive nature of conversation it is a more difficult discourse genre for this population. Consistent with this explanation, it has been reported that individuals with CHI produced more discourse errors in conversation than in a structured referential communication task. This may be attributed to social aspects, such as the relationship between conversational partners—that is, familiarity, status, and role —as well as the face-saving strategies used for politeness when communication breakdowns occur (Prince, Haynes, & Haak, 2002). Such factors are extremely difficult to simulate in other types of noninteractive discourse. A second explanation pertains to the stylistic variation that can exist among speakers within a specific genre. In other words speakers may achieve the same text macrostructure through many different patterns of micro structure (Armstrong, 2002). Consequently such variation in NBI speakers is important to note when making judgements regarding what is “normal” or what is “disordered” in the discourse of individuals with CHI. For example, in the present study over 25% of the NBI participants were classified as CHI on the story narrative tasks and that rose to over 30% in conversation. An additional explanation pertains to the potential cognitive factors that have been suggested to be important for meaningful participation in conversation. For example, topic maintenance and appropriate referencing require both selective and sustained attention. Further, functional memory is required to recall what the speaker has said as well as the listener (Mackenzie, 2000). Similarly, comprehension of sarcasm and implicit language may also influence the effectiveness of a conversational participant. Individuals with CHI would be at risk for demonstrating difficulty with any or all of these factors. Finally, it is important to emphasise that although all of the CHI participants studied had suffered moderate to severe injuries, they were selected on the basis of having recovered fluent conversational speech. Therefore the present findings may not be applicable to all individuals with CHI, particularly those with limited discourse production capabilities. Although the DFAs reported in the present study involved a variety of discourse measures, discriminant functions derived

NARRATIVE AND CONVERSATIONAL DISCOURSE 111

from different measures of narrative and conversational discourse may have yielded different results. A related issue pertains to the potential effects of the measures and interactions with the targeted discourse genres. For example, the conversational measures are considered to be pragmatic in nature, while the story narrative measures involve various aspects of cognitivelinguistic organisation. A reasonable explanation for the present findings would be that pragmatic measures are more sensitive to the communicative dysfunction displayed by individuals with CHI than the more structurally focused narrative measures. The findings from the present study did support the second hypothesis which stated that if all of the discourse measures were entered into a DFA, the CHI and NBI participants would be discriminated with a higher degree of accuracy than with the conversational or story narrative measures alone. Previous investigations of the discourse of individuals with CHI have documented an array of impairments across discourse genres analysed at varied levels. The likelihood of delineating the nature of discourse impairment secondary to CHI with a single measure is poor given the broad array of cognitive, linguistic, and psychosocial sequelae that characterise CHI. Therefore it is not surprising that, as noted in the present study, a variety of discourse measures more accurately discriminated the CHI and NBI participant groups. The study of discourse following brain injury requires the use of multiple and varied elicitation tasks and measures. Regarding implications of these findings, it has been observed that discourse represents a critical point of intersection between cognition and language, and therefore is an important component in the management of individuals with CHI (Ylvisaker et al, 2001). The present findings suggest that conversation may be more sensitive than story narratives to the discourse impairments that characterise individuals with CHI. None the less, ongoing research is needed to develop discourse procedures that will not only be sensitive to subtle impairments but clinically efficient as well. REFERENCES Armstrong, E. (2002). Variation in the discourse of non-brain-damaged speakers on a clinical task. Aphasiology, 16, 647–658. Blank, M., & Franklin, E. (1980). Dialogue with pre-schoolers: A cognitively-based system of assessment. Applied Psycholinguistics, 1, 127–150. Brinton, B., & Fujiki, M. (1989). Conversational management with language-impaired children, Rockville, MD: Aspen. Chapman, S. B. (1997). Cognitive-communication abilities in children with closed head injury. American Journal of Speech-Language Pathology, 6, 50–58. Coelho, C. A. (2002). Story narratives of adults with closed head injury and non-braininjured adults: Influence of socioeconomic status, elicitation task, and executive functioning. Journal of Speech, Language, and Hearing Research, 45, 1232–1248.

112 APHASIOLOGY

Coelho, C. A. (1995). Discourse production deficits following traumatic brain injury: A critical review of the recent literature. Aphasiology, 9, 409–429. Coelho, C. A., Liles, B. Z., & Duffy, R. J. (1991). Discourse analyses with closed head injured adults: Evidence for differing patterns of deficits. Archives of Physical Medicine and Rehabilitation, 72, 465–468. Coelho, C. A., Liles, B. Z., & Duffy, R. J. (1995). Impairments of discourse abilities and executive functions in traumatically brain-injured adults. Brain Injury, 9, 471–477. Coelho, C. A., Youse, K. M., & Le, K. N. (2002). Conversational discourse in closedhead-injured and nonbrain-injured adults. Aphasiology, 16, 659–672. Doyle, P. J., Goda, A. J., & Spencer, K. A. (1995). The communicative informativeness and efficiency of connected discourse by adults with aphasia under structured and conversational sampling conditions. American Journal of Speech-Language Pathology, 4, 130–134. Galski, T., Tompkins, C., & Johnston, M. V. (1998). Competence in discourse as a measure of social integration and quality of life in persons with traumatic brain injury. Brain Injury, 12, 769–782. Hagan, C., Malkmus, D., & Durham, P. (1980). Levels of cognitive functioning. In Rehabilitation of the head injured adult: Comprehensive physical management. Downey, CA: Professional Staff Association of Rancho Los Amigos Hospital. Hartley, L. L., & Jensen, P. (1991). Narrative and procedural discourse after closed head injury. Brain Injury, 5, 267–285. Hollingshead, A. (1972). Four factor index of social status. Unpublished manuscript. Yale University, New Haven, CT. Kertesz, A. (1982). Western Aphasia Battery. New York: Grime & Stratton. Levin, H. S., O’Donnell, V. M., & Grossman, R. G. (1979). The Galveston orientation and amnesia test: A practical scale to assess cognition after head injury. Journal of Nervous and Mental Disease, 167, 675–684. Lezak, M. (1995). Neuropsychological assessment (3rd Ed.). New York: Oxford University Press. Liles, B. Z. (1985). Narrative ability in normal and language disordered children. Journal of Speech and Hearing Research, 28, 123–133. Liles, B. Z., Dufiy, R. J., Merritt, D. D., & Purcell, S. L. (1995). Measurement of narrative discourse ability in children with language disorders. Journal or Speech, Language, and Hearing Research, 38 (2), 415–425. Mackenzie, C. (2000). Adult spoken discourse: The influences of age and education. International Journal of Language and Communication Disorders, 35(2), 269–285. Mandler, J. M. (1982). An analysis of story grammars. In F.Klix, J.Hoffman, & E.van der Meer (Eds.), Cognitive Psychology, 9, 111–115. Mattis, S. (1976). Mental status examination for organic mental syndrome in the elderly patient. In L.Bellak & T.B.Karasu (Eds.), Geriatric psychiatry. New York: Grune & Stratton. Mentis, M., & Prutting, C. A. (1987). Cohesion in the discourse of normal and headinjured adults. Journal of Speech and Hearing Research, 30, 583–595. Prince, S., Haynes, W.O., & Haak, N. J. (2002). Occurrence of contingent queries and discourse errors in referential communication and conversational tasks: A study of college students with closed head injury. Journal of Medical Speech-Language Pathology, 10, 19–39.

NARRATIVE AND CONVERSATIONAL DISCOURSE 113

Snow, P., Douglas, J., & Ponsford, J. (1995). Discourse assessment following traumatic brain injury: A pilot study examining some demographic and methodological issues. Aphasiology, 9, 365–380. Snow, P., Douglas, J., & Ponsford, J. (1997). Procedural discourse following traumatic brain injury. Aphasiology, 11, 947–967. Togher, L. (2001). Discourse sampling in the 21st century. Journal of Communication Disorders, 34, 131–150. Togher, L., Hand, L., & Code, C. (1999). Exchanges of information in the talk of people with traumatic brain injury. In S.McDonald, L.Togher, & C.Code (Eds.), Communication skills following traumatic brain injury (pp. 113–145). Hove, UK: Psychology Press. Wilkinson, R. (1999). Sequentiality as a problem and resource for intersubjectivity in aphasic conversation: Analysis and implications for therapy. Aphasiology, 13, 327–343. Winter, P. (1976). The bear and the fly. New York: Crown Publishers. Ylvisaker, M., Szekeres, S. F., & Feeney, T. (2001). Communication disorders associated with traumatic brain injury. In R.Chapey (Ed.), Language intervention strategies in aphasia and related neurogenic communication disorders (pp. 745–800). Philadelphia: Lippincott, Williams & Wilkins.

Relationship between discourse and Western Aphasia Battery performance in African Americans with aphasia Hanna K.Ulatowska and Gloria Streit Olness University of Texas at Dallas, USA Robert T.Wertz Department of Veterans Affairs Tennessee Valley Healthcare System and Vanderbilt University School of Medicine, Tennessee, USA Agnes M.Samson, Molly W.Keebler, and Karen E.Goins University of Texas at Dallas, USA Background: There is a need for discourse research with African Americans who have aphasia, highlighted by ethnic group differences in stroke prevalence, and potential ethnic group differences in dialect. Identification of ethnic dialect is critical to differentiate communication changes associated with pathology from normal communicative differences associated with ethnicity. Also, preliminary research on adults with aphasia indicates an uncertain relationship between discourse performance and standardised test performance. Aims: This study was designed to assess: (1) the relationship between performance on a standardised language measure and discourse performance, and (2) the use of ethnic dialect and discourse features, in the narrative productions of African-American adults with moderate aphasia on a variety of discourse tasks. Methods & Procedures: We investigated the discourse of 12 African Americans with scores in the moderate severity range on the Western Aphasia Battery, Aphasia Quotient (WABAQ). Each subject produced a fable retell, a story derived from a picture sequence, two stories derived from single pictures, and a topicelicited personal narrative of a frightening experience. Analysis consisted of ratings of discourse quality (coherence, reference, and emplotment); a measure of discourse quantity (number of propositions); and a tally of the presence or absence of ethnic dialect and discourse features. Outcomes & Results: The correlation between WAB-AQ and discourse quality was statistically significant on the picture sequence

DISCOURSE AND THE WESTERN APHASIA BATTERY 115

task and one single-picture task, but not on the other discourse tasks. There was a significant relationship between WAB-AQ and overall quality ratings of coherence, reference, and emplotment. The correlation between WAB-AQ and discourse quantity was not significant for any task, and discourse quality was not significantly correlated with discourse quantity. Ethnic features appeared most often on one single-picture task and the personal narrative. No ethnic dialect features occurred on the fable retell. Conclusions: These findings suggest the need to supplement standardised assessment of aphasia with assessment of discourse performance, using less structured discourse tasks, such as a personal narrative task. Less structured discourse tasks may also be optimal for eliciting natural ethnic patterns of communication. The lack of relationship between narrative quantity and narrative quality may not generalise to individuals with aphasia that is severe or mild. This study contributes towards development of a discourse assessment tool for culturally and linguistically diverse populations that may supplement information provided by standardised testing. Several factors point to the need for discourse research with African Americans who have aphasia. The incidence of stroke, and hence the probability of aphasia, is higher in African Americans than in Caucasians (Kittner, White, Losonczy, Wolf, & Hebel, 1990). Moreover, many African Americans are speakers of a distinct ethnic dialect. While ethnicity does not determine ethnic dialect use, previous research has confirmed its presence in some African Americans with aphasia on certain tasks (Ulatowska & Olness, 2001). Identification of ethnic dialect is critical to differentiate communication change associated with pathology from normal communicative differences associated with ethnicity (Wolfram, 1992), especially when surface features of the ethnic dialect overlap

Address correspondence to: Hanna K. Ulatowska, UTD/Callier Center for Communication Disorders, 1966 Inwood Road, Dallas, TX, 75235, USA. Email: [email protected] Agnes M.Samson is now at Integrated Health Services (IHS) in Richardson, Texas. Molly W.Keebler is now at The Center for Brain Health, University of Texas at Dallas. This research was supported by the Department of Veterans Affairs Rehabilitation Research and Development Service, and Excellence in Education Funds from the Callier Center for Communication Disorders, University of Texas at Dallas. © 2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html 02687030344000102

DOI:10.1080/

116 APHASIOLOGY

with features of aphasia. Unfortunately, clinical discourse research has traditionally excluded African Americans or, when they are included, has not differentiated subjects according to ethnicity (Ulatowska & Chapman, 1994; Wallace, 1996). Aphasia assessment for any ethnic group requires procedures that reflect an individual’s level of impairment as well as functional communication ability. Impairment measures, such as the Western Aphasia Battery (WAB, Kertesz, 1982) provide profiles of basic language skills. In contrast, discourse measures are thought to be a reflection of daily, functional language (Chapman, Highley, & Thompson, 1998; Holland, 1983; Ulatowska & Chapman, 1994). Because discourse tasks require skills close to those required in daily life, they often display ethnic styles of communication (Ulatowska & Olness, 2001; Ulatowska, Olness, Hill, Roberts, & Keebler, 2000). The relationship between discourse performance and performance on the lexico-syntactic skills addressed by impairment measures is not well understood (Ulatowska & Olness, 2000). This study assesses the extent to which various discourse tasks provide information that may either duplicate or supplement information gained from use of standardised aphasia tests. The first purpose of this study was to examine the relationship between discourse performance and standardised test performance in African Americans with moderate aphasia. The second purpose was to determine whether ethnic features of dialect and discourse (i.e., reflections of natural, functional language) are present in the discourse of African Americans with moderate aphasia and whether their presence differs among discourse tasks. The standardised test used for this study was the Western Aphasia Battery (WAB, Kertesz, 1982), and the discourse tasks were part of a larger seven-task battery used to study narratives of African Americans and Caucasians with and without aphasia (Wertz et al., 2000). Research on adults with aphasia indicates that discourse performance and standardised test performance are not significantly related for some discourse tasks, and are significantly related for others. Two studies have examined the relationships between discourse performance and standardised test performance among African Americans with relatively mild aphasia. The first of these (Ulatowska et al., 200la) found a nonsignificant correlation between performance on a personal narrative of a frightening experience and performance in the Western Aphasia Battery, Aphasia Quotient (WAB-AQ). Neither the quantity of language in the narrative task (measured in propositions), nor the quality of the narrative (measured on a narrative quality scale) was correlated with WAB-AQ scores. A second study (Ulatowska, 2001b) examined the relationship of WAB-AQ with (1) interpretations of “mini-narratives” (in this case, interpretations of proverbs); and (2) the ability to comprehend and express the lesson derived from didactic narratives (in this case, fables). WAB-AQ was found to correlate with the overall generalisation level, accuracy, and completeness of spontaneous interpretations of proverbs, and with the generalisation level of lessons derived from an auditory fable (Ulatowska et al., 2001b). The same study did not find significant

DISCOURSE AND THE WESTERN APHASIA BATTERY 117

correlations between WAB-AQ and: (1) a lesson derived from a picturesequence fable; or (2) multiple-choice proverb interpretation. A recent case series study (Ulatowska, Olness, Hill, Samson, & Goins, 2002) suggests the importance of studying the discourse and standardised test performance of individuals with moderate aphasia, in contrast to the previous studies focusing on mild aphasia. Subjects for this study of individuals with moderate aphasia were drawn from the larger comparative study previously cited (Wertz et al., 2000). Individuals with aphasia in the moderate severity range, i.e., with similar aphasia severity as measured by the WAB, were found to vary widely in their ability to produce stories that were coherent, referentially clear, and contained all elements of the story or scenario. Individuals with moderate aphasia appear to be an informative group for study because they have the ability to produce discourse-length responses (unlike individuals with severe aphasia), yet display various patterns of discourse disruption (unlike many individuals with mild aphasia). This study was designed to answer the following questions: (1) What is the relationship between performance on a standardised language measure and discourse performance on a variety of discourse tasks in African-American adults with moderate aphasia? (2) Are ethnic dialect and discourse features present in the discourse of AfricanAmerican adults with moderate aphasia, and if present, does their appearance differ among discourse tasks? METHODS Subjects Twelve African-American adults with aphasia subsequent to a left-hemisphere stroke participated in the study. Participants were selected from a larger discourse investigation (Wertz et al., 2000), based on standardised test scores in the moderate aphasia severity range on the Western Aphasia Battery, Aphasia Quotient (WAB-AQ) (Kertesz, 1979, 1982). Table 1 provides participant demographics. Socioeconomic status (SES) was rated on a 1–7 scale (adapted from Featherman & Stevens, 1980), where a higher number indicates lower SES. All participants were native speakers of English and were raised in the southern United States. Table 2 displays participants’ WAB scores. Discourse tasks Five discourse tasks that varied in the type of demand placed on the speaker, amount of information in the stimulus, and stimulus modality (visual or auditory) were presented. All the tasks were designed to elicit narrative discourse, through

118 APHASIOLOGY

selection of stimuli that could be expressed as a sequence of events containing some complicating action or situation. The five tasks, ordered by the degree to which each inherently specified the structure and content of the participants’ response, from most specification to least, were: retell of a fable (“Farmer and Sons”); “tell a story” based on a picture sequence (“Apple Theft”); “tell a story” based on a single picture (“Counting Money”); “tell a story” based on a single picture (“Easter Morning”); and a topic-elicited personal narrative of a frightening experience. Five of the participants were interviewed by an AfricanAmerican clinician, and seven were interviewed by a Caucasian clinician. Analysis Three measures (see Appendix) were used to rate the quality of participants’ discourse: coherence, reference, and emplotment. All three of these dimensions reflect quality of information processing in the narrative and are not necessarily a direct reflection of wordor sentence-level skills. Two of them are basic properties of discourse in general, whereas emplotment is a quality specific to the narrative discourse genre. Coherence is a cognitive-linguistic property of the text, specifying how well the story makes sense as a whole. This core property of narratives differentiates a coherent story from an unrelated sequence of sentences. The coherence rating represented how well information was connected in the stories. Reference signals the elements talked about in the story, such as characters, locations, time, etc. The reference system appears to be more vulnerable than other discourse systems in aphasia and other language disorders (Chapman & Ulatowska, 1989). The reference rating represented how well information elements were unambiguously signalled in the story. Emplotment is the ability to express information about an event in a narrative structural form, including all elements of the story or scenario. The emplotment rating represented how well the information in a story formed a complete story. Each aspect of narrative quality (coherence, reference, and emplotment) was measured by rating on a 5–point (0–4) scale. Thus a total quality score of 12 was possible for each task. Rating systems were used because these fundamental dimensions of narrative quality are not directly related to features of sentential structure, and thus are difficult to associate empirically with any particular linguistic material contained in the discourse (Patry & Nespoulous, 1990). Discourse quantity for each task was measured in number of propositions (Mross, 1990). This measure represented the quantity of information contained in a production, as a complement to the rating scales, which assessed the quality of the information produced. The presence or absence of ethnic dialect and discourse features (Mufwene, Rickford, Baily, & Baugh, 1998; Ulatowska & Olness, 2001) on each task was recorded. To identify ethnic dialect features, we focused on the verb system of African-American Vernacular English (AAVE) for three reasons. First, the verb system is one of the more complex and pivotal of the grammatical systems of

TABLE 1 Demographic and clinical data

DISCOURSE AND THE WESTERN APHASIA BATTERY 119

* Classification is based on performance on the Western Aphasia Battery (Kertesz, 1982).

AAVE (Green, 1998; Wolfram & Fasold, 1974), distinctive from other dialects of

120 APHASIOLOGY

American English. Second, the verb carries the temporal and aspectual information that forms the backbone of narratives, and narratives were elicited in this study. Third, morpho-syntactic features of the verb are highly prone to disruption in aphasia. Thus, the verb system was a natural choice of focus, for its complexity, distinctiveness, its importance in narratives, and its ability to reflect aphasic disruptions. Example verb forms from the sample included habitual aspect BE (“My father said, ‘Don’t be playing with those guns’. ”), and perfective aspect DONE (“He said, ‘It kinda look like she done had a stroke’.”). Discourse features of repetition and direct speech were also identified. Repetition included both partial and full repetitions of previous portions of an utterance, and instances of direct speech were reproductions of the speech produced by characters in a narrative. These features are common in the oral storytelling styles of many African Americans (Mitchel-Kernan, 1972; Ulatowska et al., 2000) and act to highlight information and increase vividness in narratives. Examples are: “I was laying on the hospital, can’t walk, can’t talk, can’t move.… I couldn’t walk, talk, nothing”, and “They said, ‘Oooh, girl, they gon get you tonight, they gon get you’. ” Six raters, including two African-American clinicians, were trained to discriminate the points on the rating scales. The stories were then rated, and disagreements were resolved by group consensus. For each task, the relationship between the discourse measures (quality and quantity) and WAB-AQ was determined by computing correlations (Spearman and Pearson, respectively). Alpha was adjusted to .01 to control for familywise error. Reliability Interrater reliability of the ratings for coherence, reference, and emplotment was analysed by comparison of the original group ratings with ratings of an individual rater on complete data from six of the twelve subjects. Point-by-point interrater agreement was 90% for the coherence rating, 70% for the reference rating, and 75% for the emplotment rating. The final rating assigned to each response was that of the original six raters, whose disagreements had been resolved by group consensus. RESULTS Quality scores are shown in Table 3. A Spearman correlation, adjusted for family-wise error (alpha = .01), between WAB-AQ and discourse quality was statistically significant on the picture sequence, rs(10) = .82, p < .01, and single picture “Easter Morning”, rs(10) = .83, p < .01. Correlations between WAB-AQ and discourse quality were nonsignificant at an alpha level of .01 for the other tasks: fable retell, rs(10) = .76; single picture “Counting Money”, rs(10) = .74; and personal narrative, rs(10) = .15.

DISCOURSE AND THE WESTERN APHASIA BATTERY 121

Table 4 shows participants’ combined quality ratings—coherence, reference, and emplotment—across the five discourse tasks. The maximum possible score in each cell of this table is 20 (maximum of 4 points per rating, multiplied by 5 different tasks). This score represents the individuals’ overall ability to produce quality narratives, since coherence, reference, and emplotment are general qualities of narrative, irrespective of task. Spearman correlations revealed a significant relationship between WAB-AQ and each quality rating: coherence, rs (10) = .91, p < .01; reference, rs(10) = .79, p < .01; and emplotment, rs(10) = .90, p < .01. Quantity of discourse (number of propositions) is shown in Table 5. Pearson correlations revealed no significant relationship between WAB-AQ and quantity of discourse on any task, with alpha at .01: fable retell, r(10) = –.21; picture sequence, r(10) = .001; single pictures “Counting Money”, r(10) = .11, and “Easter Morning”, r(10) = –.17; and personal narrative, r(10) = –.20. Moreover, a Spearman correlation between participants’ discourse quality ratings and their discourse quantity across tasks was not significant, with alpha at .01, rs(58) = .30. Table 6 shows the presence or absence of ethnic dialect or discourse features across discourse tasks. Ten of the twelve participants displayed at least one ethnic feature on at least one task. Five of these ten were interviewed by an African-American clinician, and five were interviewed by a Caucasian clinician. Presence of ethnic features in subjects’ TABLE 3 Quality scores Discourse tasks Participant number*

Father & & Sons (retell)

Boys & Apples (picture sequence)

Counting Money (single picture)

Easter Morning (single picture)

Frightening Experience (personal narrative)

01 02 03 04 05 06 07 08 09 10 11 12 M Range

10 7 11 8 3 5 7 5 5 4 6 3 6.17 3–1

9 10 7 10 10 8 9 6 6 4 3 4 7.17 3–10

12 4 7 11 12 6 3 0 9 8 6 6 7.00 0–12

12 7 9 9 11 6 9 7 10 4 3 5 7.67 3–12

11 10 8 9 8 0 5 4 9 8 8 11 7.58 0–11

122 APHASIOLOGY

Discourse tasks Participant number*

Father & & Sons (retell)

Boys & Apples (picture sequence)

Counting Money (single picture)

Easter Morning (single picture)

Frightening Experience (personal narrative)

SD 2.55 2.55 3.67 2.81 3.18 Sum of the three quality response scores (Coherence, maximum 4 points; Reference, maximum 4 points; Emplotment, maximum 4 points) for the 12 participants’ responses on five discourse tasks. * Participants are listed in order by decreasing WAB-AQ score. TABLE 4 Combined quality ratings Discourse quality measures Participant number* Coherence

Reference

Emplotment

01 02 03 04 05 06 07 08 09 10 11 12 M Range SD

17 11 13 13 15 8 9 6 12 9 9 8 10.83 6–17 3.24

19 16 16 15 14 9 13 9 15 10 8 11 12.92 8–19 3.48

18 11 13 15 15 8 11 7 12 9 9 10 11.50 7–18 3.26

Sum of each of the discourse quality measures for responses of the 12 participants across five discourse tasks (retell, picture sequence, two single pictures, and personal narrative), with maximum score of 20 points (4 points per measure across 5 tasks). * Participants are listed in order by decreasing WAB-AQ score.

DISCOURSE AND THE WESTERN APHASIA BATTERY 123

TABLE 5 Quality scores Discourse tasks Participant number*

Father & & Sons (retell)

Boys & Apples (picture sequence)

Counting Money (single picture)

Easter Morning (single picture)

Frightening Experience (personal narrative)

01 02 03 04 05 06 07 08 09 10 11 12 M Range SD

8 8 5 6 3 4 7 7 16 3 5 10 6.83 3–16 3.59

10 15 5 10 7 5 9 4 26 6 5 11 9.42 4–26 6.14

9 5 5 4 4 4 4 0 8 4 4 6 4.75 0–9 2.26

4 6 6 10 4 4 4 4 8 5 6 7 5.67 4–10 1.92

5 13 6 29 4 0 13 27 19 6 6 19 12.25 0–29 9.43

Quantity (number of propositions) in discourse responses of 12 African–American adults with aphasia on five discourse tasks. * Participants are listed in order by decreasing WAB-AQ score. TABLE 6 Ethnic dialect and discourse features Discourse tasks Participant number*

Father & & Sons (retell)

Boys & Apples (picture sequence)

Counting Money (single picture)

Easter Morning (single picture)

Frightening Experience (personal narrative)

01 02 03 04 05 06 07 08 09 10

(–) (–) (–) (–) (–) (–) (–) (–) (–) (–)

(–) (–) (–) (–) (–) (–) (+) (+) (–) (–)

(+) (–) (–) (–) (+) (–) (–) n.r.* (–) (–)

(+) (+) (–) (–) (+) (+) (+) (+) (–) (–)

(+) (–) (–) (+) (–) n.r.* (–) (–) (+) (–)

124 APHASIOLOGY

Discourse tasks Participant number*

Father & & Sons (retell)

Boys & Apples (picture sequence)

Counting Money (single picture)

Easter Morning (single picture)

Frightening Experience (personal narrative)

11 12 Total +s

(–) (–) 0

(+) (+) 4

(–) (+) 3

(+) (+) 8

(+) (+) 5

Presence (+) or absence (–) of ethnic dialect and discourse features in responses by 12 African- American adults with moderate aphasia on five discourse tasks. * n.r. = no response.

responses did not differ by ethnicity of the interviewer. Ethnic features appeared most often on one single-picture task (“Easter Morning”) and the personal narrative. No ethnic dialect features occurred for any participant on the fable retell. DISCUSSION The current study adds to our knowledge of what appears to be an inconsistent pattern of relationship between performance on language impairment measures and performance on discourse tasks (Ulatowska et al., 2001a, 2001b). Although the findings suggest that overall dimensions of narrative quality (coherence, reference, and emplotment) may be related to performance on the WAB-AQ, this relationship may be task-specific. In particular, the quality of personal narratives does not appear to be related to aphasia severity level as measured by the WAB, at least among these individuals with moderate aphasia. Of the tasks in this discourse battery, the personal narrative most closely reflects functional communication, allowing subjects full latitude in task interpretation, story evaluation, and creativity. Overall, this group of findings suggests that less structured discourse tasks and the WAB-AQ assess different linguistic domains. However, confirmation of these possibilities is beyond the power of correlational analyses. Nevertheless, the absence of significant relationships may suggest supplementing standardised aphasia tests with functional discourse measures in aphasia assessment. The findings also provide evidence that the more functional and open-ended the discourse task, the more frequent the production of ethnic features of communication. Almost all the participants in this study displayed ethnic dialect or discourse features on one or more discourse tasks, irrespective of the ethnicity of the interviewer. This was most common on one single-picture task (“Easter Morning”) and the personal narrative task and completely absent on the fable retell. The tasks that most frequently elicited ethnic features did not require close replication of a stimulus provided by the experimenter. Subjects incorporated more

DISCOURSE AND THE WESTERN APHASIA BATTERY 125

interpretive material in responses on these tasks, and their personal involvement in creating the response yielded more natural, ethnically marked language. For example, two seemingly identical tasks, i.e., the two single-picture tasks “Counting Money” and “Easter Morning”, differed in their frequency of elicitation of ethnic features. This difference may be accounted for, in part, by differences in the pictures’ effectiveness in evoking personal involvement on the part of the responder. “Counting Money” depicts a dated scene of adult characters counting their savings in the form of hard currency, while “Easter Morning” is a more plausible scenario in personal life, i.e., family conflict over church attendance. In contrast, the task that most closely constrained the content of the subjects’ responses and which was least likely to involve personal involvement by the speaker (i.e., fable retell) did not elicit any ethnic features. The fable used for the fable retell task was presented in Standard American English, and subjects rarely if ever incorporated interpretive information in the responses on this task. In summary, it would appear that more functional tasks, such as a personal narrative task, may supplement language impairment measures, both in the cognitive-linguistic skills they require, and in the degree to which they are able to elicit natural ethnic patterns of communication. Because this article addresses the ways in which discourse testing may complement standardised testing, a logical extension of the analysis would be to examine responses to the WAB Picnic Picture for presence or absence of ethnic features. This task is unlikely to evoke personal involvement on the part of the listener, or to reflect natural language, because subjects are instructed only to tell the examiner what they see in the picture, i.e., to describe. Descriptive discourse is less likely than narrative discourse to elicit narrative features (Olness, Ulatowska, Wertz, Thompson, & Auther, 2002). Another finding with potential clinical implications is the lack of significant relationships between the WAB-AQ and discourse quantity (number of propositions). Thus, severity of aphasia, as indicated by the WAB-AQ, does not seem to predict the number of propositions a person with aphasia will produce on a discourse task, at least for the group of individuals with moderate aphasia considered here. In addition, discourse quality ratings were not significantly related with discourse quantity ratings. One must remember, however, that all the subjects in this small sample had aphasia of moderate severity; interpretation of a lack of relationship between narrative quantity and narrative quality cannot be generalised to individuals with aphasia that is either severe or mild. One might predict that individuals with severe aphasia produce discourse that is both short and relatively poor, while individuals with mild aphasia produce discourse that is relatively longer and higher in quality. Expansion of the aphasia severity range might be predicted to yield a relationship between discourse quality and discourse length. In summary this study provides performance information on those tasks that may elicit natural language, with ethnic features, and those that may not. It also expands our knowledge of performance profiles on a variety of discourse tasks,

126 APHASIOLOGY

such as retell, personal narrative, and narrative elicited with pictures. Thus, it brings us one step closer to development of a discourse assessment tool for culturally and linguistically diverse populations which may serve to supplement information provided by standardised testing. REFERENCES Chapman, S. B., Highley, A. P., & Thompson, J. L. (1998). Discourse in fluent aphasia and Alzheimer’s disease: Linguistic and cognitive considerations. In M.Paradis (Ed.), Pragmatics in neurogenic communication disorders (pp. 55–78). New York: Elsevier. Chapman, S. B., & Ulatowska, H. K. (1989). Discourse in aphasia: Integration deficits in processing reference. Brain and Language, 36, 651–668. Featherman, D. L., & Stevens, G. A. (1980). A revised socioeconomic index of occupational status. Center for Demography and Ecology Working Paper 79–48. Madison, WI: University of Wisconsin. Green, L. (1998). Aspect and predicate phrases in African-American Vernacular English. In S.S.Mufwene, J.R. Rickford, G.Baily, & J.Baugh (Eds.), African-American English: Structure, history, and use (pp. 37–68). London: Routledge. Holland, A. L. (1983). Non-biased assessment and treatment of adults who have neurologic speech and language problems. Topics in Language Disorders, 3, 67–75. Kertesz, A. (1979). Aphasia and related disorders: Taxonomy, localization, and recovery. New York: Grune & Stratton. Kertesz, A. (1982). Western Aphasia Battery. New York: Grune & Stratton. Kittner, S. J., White, L. R., Losonczy, K. G., Wolf, P. A., & Rebel, J. R. (1990). Blackwhite differences in stroke incidence in a national sample. The contribution of hypertension and diabetes mellitus. Journal of the American Medical Association, 264, 1267–1270. Mitchel-Kernan, C. (1972). Signifying, loud-talking, and marking. In T.Kochman (Ed.), Rappin’ and stylin’ out. Communication in urban black America (pp. 315–335). Urbana, IL: University of Illinois Press. Mross, E. F. (1990). Text analysis: Macro- and microstructural aspects of discourse processing. In Y.Joanette & H.H.Brownell (Eds.), Discourse ability and brain damage: Theoretical and empirical perspectives (pp. 50–68). New York: SpringerVerlag. Mufwene, S. S., Rickford, J. R., Baily, G., & Baugh, J. (Eds.). (1998). African-American English: Structure, history and use. London: Routledge. Olness, G. S., Ulatowska, H. K., Wertz, R. T., Thompson, J. L., & Auther, L. L. (2002). Discourse elicitation with pictorial stimuli in African Americans and Caucasians with and without aphasia. Aphasiology, 16, 623–633. Patry, R., & Nespoulous, J. L. (1990). Discourse analysis in linguistics: Historical and theoretical background. In Y.Joanette & H.H.Brownell (Eds.), Discourse ability and brain damage: Theoretical and empirical perspectives (pp. 3–27). New York: Springer-Verlag. Ulatowska, H. K., & Chapman, S. B. (1994). Discourse macrostructure in aphasia. In R.L.Bloom, L.K.Obler, S.DeSanti, & J.S.Ehrlich (Eds.), Discourse analyses and

DISCOURSE AND THE WESTERN APHASIA BATTERY 127

applications: Studies in adult clinical populations (pp. 29–46). Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Ulatowska, H. K., & Olness, G. S. (2000). Discourse revisited: Contributions of lexicosyntactic devices. Brain and Language, 71, 249–251. Ulatowska, H. K., & Olness, G. S. (2001). Dialectal variants of verbs in narratives of African Americans with aphasia: Some methodological considerations. Journal ofNeurolinguistics, 14, 93–110. Ulatowska, H. K., Olness, G. S., Hill, C. L., Roberts, J., & Keebler, M. W. (2000). Repetition in narratives of African Americans: The effects of aphasia. Discourse Processes, 30, 265–283. Ulatowska, H. K., Olness, G. S., Hill, C. L., Samson, A., & Goins, K. (2002, April). The influence of aphasia on narrative quality in African Americans. Paper presented at the meeting of the National Black Association for Speech-Language and Hearing (NBASLH), Raleigh, NC. Ulatowska, H. K., Olness, G. S., Wertz, R. T., Thompson, J. L., Keebler, M. W., Hill, C. L. et al. (2001 a). Comparison of language impairment, functional communication, and discourse measures in AfricanAmerican aphasic and normal adults. Aphasiology, 15, 1007–1016. Ulatowska, H. K., Wertz, R. T., Chapman, S. B., Hill, C. L., Thompson, J. L., Keebler, M. W. et al. (2001b). Interpretation of fables and proverbs by African Americans with and without aphasia. American Journal of Speech Language Pathology, 10, 40–50. Wallace, G. L. (1996). Management of aphasic individuals from culturally and linguistically diverse populations. In G.L.Wallace (Ed.), Adult aphasia rehabilitation. Newton, MA: Butterworth-Heinemann. Wertz, R. T., Ulatowska, H. K., Wallace, G., Payne, J. C., Chapman, S., Auther-Steffen, L. L. et al. (2000, November). A comparison of aphasia in African Americans and Caucasians. Paper presented at the meeting of the American Speech and Hearing Association, Washington, DC. Wolfram, W. (1992, September). The sociolinguistic model in speech and language pathology. Keynote address at the International Conference on Inter-Disciplinary Perspectives in Speech and Language Pathology, Dublin, Ireland. (ERIC Documentation Reproduction Service No. ED 359 789.) Wolfram, W., & Fasold, R. W. (1974). The study of social dialect in American English. Englewood Cliffs, NJ: Prentice-Hall.

APPENDIX QUALITY RATING SYSTEM Coherence 4 –All portions of the response are interconnected and clear 3 –Most of the response is connected and clear, with some problems 2 –Some of the elements of the response are connected 1 –The discourse is not interpretable

128 APHASIOLOGY

0 –No response Reference 4 –All referents and the relationship between them are clear 3 –Some reference errors 2 –Many reference errors 1 –None of the referents, nor their relationship, is interpretable 0 –No response Emplotment 4 –Full scenario is produced 3 –Story or scenario is produced with some elements missing 2 –Story or scenario is produced with many elements missing 1 –Only brief mention of elements with no story or scenario observable 0 –No response

The inter-rater reliability of the story retell procedure William D.Hula, Malcolm R.McNeil, and Patrick J.Doyle VA Pittsburgh Healthcare System Geriatric Research Education & Clinical Center, and University of Pittsburgh, USA Hillel J.Rubinsky University of Pittsburgh, USA Tepanta R.D.Fossett VA Pittsburgh Healthcare System Geriatric Research Education & Clinical Center, and University of Pittsburgh, USA Background: McNeil, Doyle, Fossett, Park, and Goda (2001) have presented the story retell procedure (SRP) as an efficient means of assessing discourse in adults with aphasia, in part because it provides reliable, valid, and sensitive indices of performance without the need for time-consuming transcription of language samples. Aims: The purpose of this study was to demonstrate that the SRP, when scored without transcription by judges with minimal training, produces a reliable measure of information transfer. Methods & Procedures: Four judges who had not used the SRP previously scored audiorecorded language samples, produced by four subjects with aphasia and eleven normal subjects, for percent information units per minute (%IU/Min). Outcomes & Results: The results demonstrate that the SRP has high inter-rater reliability. Reliability coefficients ranged from .89 to . 995, and the standard error of measurement associated with interrater scoring error ranged from .59 to 1.42 %IU/Min. Point-to-point reliability in scoring individual information units ranged from 85– 95% and averaged 91% for both subject groups. Conclusions: The SRP is a potentially useful tool for quantifying connected language behaviour, and may be particularly valuable in clinical and research settings where economy of assessment procedures is essential. In a series of recent publications, McNeil, Doyle, and colleagues have presented information on a story retell procedure (SRP) used to elicit language samples from persons with and without aphasia (Doyle et al., 2000; Doyle, McNeil,

130 APHASIOLOGY

Spencer, Goda, Cottrell, & Lustig, 1998; McNeil et al., 2001; McNeil, Doyle, Park, Fossett, & Brodsky, 2002). The SRP consists of auditory presentation of stories derived from Brookshire and Nicholas’s (1993) Discourse Comprehension Test to a subject or patient, followed by an immediate retell. The stories can be presented with or without picture support, and likewise, picture support can be provided for the retells, or not, depending on the patient. It has been argued that the SRP possesses some distinct advantages over other connected language sampling procedures described in the literature, including conversational observation (Oelschlaeger & Thorne, 1999), scripted interviews (Goodglass & Kaplan, 1983), on-line video narration (McNeil, Small, Masterson, & Fossett, 1995), fable generation and storytelling (Berndt, Wayland, Rochon, Saffran, & Schwartz, 2000; Ulatowska, Chapman, Highley, & Prince, 1998), picture description (Nicholas & Brookshire, 1993, 1995; Yorkston & Beukelman, 1980), and procedural description (Nicholas & Brookshire, 1993, 1995). From a language sampling perspective, it has been suggested that the constrained nature of the SRP enables it to provide a well-standardised and replicable sample of language formulation and production. Specifically, data have been presented to support the internal validity of the SRP (Doyle et al., 1998) and the linguistic equivalence of language samples generated by four alternate forms of the procedure (Doyle et al., 2000). In addition, a scoring metric was developed to quantify the information content and communicative efficiency of the samples generated by the SRP. This metric, labelled the information unit (IU), was derived from Nicholas and Brookshire’s (1993, 1995) correct information unit, and was defined as “an identified word, phrase, or acceptable alternative from the stimulus story that is intelligible and informative and conveys accurate and relevant information about the story” (McNeil et al., 2001, p. 994). The primary virtue of the IU scoring metric used with the SRP is that all possible IUs are known a priori and can be printed on score sheets. This potentially allows scoring to be done directly from audio recordings, eliminating the need for time-consuming transcription of lengthy language samples. The IU scoring metric expressed as a percentage of total possible IUs (%

Address correspondence to: William D.Hula MS, Doctoral Fellow, Audiology & Speech Pathology, VA Pittsburgh Healthcare System, 7180 Highland Drive, Pittsburgh, PA 15206, USA. Email: [email protected] The authors gratefully acknowledge the assistance of Stephanie Nixon and Joyce Poydence. This research was supported by VA Rehabilitation Research and Development Project # C894–2RA. © 2003 Psychology Press Ltd

http://www.tandf.co.uk/journals/pp/02687038.html 02687030344000139

DOI:10.1080/

INTER-RATER RELIABILITY OF THE SRP 131

IU) has been demonstrated to be reliable across forms of the SRP and to have good criterion validity (McNeil et al., 2001). Also, an efficiency measure obtained by dividing %IUs by the time taken to produce them (%IU/Min) has been demonstrated to be reliable across forms and to discriminate between normal and aphasic performance with reasonable accuracy (McNeil et al., 2002). In addition to reporting on the validity and alternate form reliability of the %IU metric, McNeil and colleagues (2001) demonstrated that it has good interobserver reliability. However, these data were obtained from scoring of printed transcripts that had themselves already been subjected to reliability checks. Furthermore, all of the data presented thus far on the SRP have been generated by scorers who were themselves involved in the development of the IU measure. If the SRP and its associated IU metrics are to be used to their fullest advantage, particularly in a clinical setting, they must be demonstrated to have acceptable reliability when scored directly from audio recordings by observers who have received training comparable to what a practising clinician could be expected to receive. One final shortcoming of prior work done to demonstrate the psychometric strength of the SRP concerns the distinction between IUs that were directly stated in the stimulus stories (direct IUs) and lUs retold as synonyms (alternate IUs) of words and phrases contained in the stimulus stories. In a study investigating memory demands of the SRP (Brodsky, McNeil, Park, Fossett, Timm, & Doyle, 2000), a strong serial position effect was demonstrated for direct lUs, but not for alternate IUs. Thus far, no data have been presented to demonstrate that this distinction can be reliably scored. The purpose of the current paper is to present additional information on the inter-rater reliability of the SRP using procedures and raters more representative of a clinical setting than have been used in the past. Inter-rater reliability coefficients and standard errors of measurement (SEM) will be reported for the % IU/Min score. Also, point-to-point reliability for identification of individual IU’s will be reported. METHOD Participants Recordings of story retells by four persons with aphasia and eleven normal individuals were used. All recordings were randomly drawn from the sample of 15 subjects with aphasia and 31 normal subjects reported by McNeil et al. (2001). Descriptive statistics for the subjects with aphasia are presented in Table 1. Judges were four individuals with varying amounts of experience with aphasia and language transcription: a licensed psychologist, a master’s student in speech-language pathology, and two doctoral students who are also certified speech-language pathologists. These judges were a convenience sample, as they

132 APHASIOLOGY

were all new employees in the second author’s laboratory and required training in scoring the SRP for their work. The two doctoral students both had 2–3 years of work experience that involved transcription of language samples from clinical populations. The psychologist had approximately 13 years of experience with neuropsychological testing of rehabilitation patients, including patients with aphasia, but little experience with language transcription per se. The master’s student had no experience with language transcription for research or clinical purposes.

INTER-RATER RELIABILITY OF THE SRP 133

TABLE 1 Biographical and descriptive subject information for subjects with aphasia (N = 15)

Subjects chosen for reliability analysis in this study are marked with an (*). MPO = Months post onset; RTT = Revised Token Test (McNeil & Prescott, 1978), percentile compared to adults with left-hemisphere damage; ABCD ratio = Arizona Battery for Communication Disorders of Dementia (Bayles & Tomoeda, 1993) ratio, determined by number of delayed recall items/number of immediate recall items x 10; Raven’s = Raven’s Coloured Progressive Matrices (Raven, 1976), raw score out of a possible 36; PICA = Porch Index of Communicative Ability (Porch, 1981), percentile compared to adults with lefthemisphere damage, OA = overall percentile and VRB = verbal percentile.

Procedures Prior to scoring any of the story retells, each of the four judges read the IU definition and examples published by McNeil et al. (2001), and practised scoring IUs on six to eight stories from printed transcripts. These language samples were drawn from the samples collected by Doyle et al. (2000) and McNeil et al. (2001). After training, each judge scored the same SRP form for each of the four persons with aphasia and eleven normal subjects. Each form consisted of three separate stories as reported by Doyle et al. (2000). All scoring was done from audio files using score sheets containing all possible direct and alternative IUs. Judges listened to each story as many times as they wanted to and placed a check on the score sheet wherever an IU was observed. Wherever an alternate IU (as opposed to a direct IU) was observed, they made an additional mark to denote

134 APHASIOLOGY

which of the predetermined synonyms was produced. The %IU/Min for each story was calculated and averaged across the appropriate three-story form to give a total %IU/Min score for each subject. The total %IU/Min score was also broken down into %direct IU/Min and %alternate IU/Min to allow for assessment of inter-rater reliability on these more specific measures. The judges all reported that it generally took 15–30 minutes for them to score a single form (three stories) of the SRP for a single subject. Data on the time spent scoring retells were kept for the least trained and experienced judge. Her average time to complete a single form was 23 minutes (range = 12–29; SD = 4). RESULTS Inter-rater reliability coefficients were calculated separately for subjects with aphasia and normal subjects using the %total, %direct, and %alternate IU/Min scores generated by each of the four judges for each of the subjects. To determine a reliability coefficient that would allow for generalisation to judges beyond those in this study, absolute-agreement intraclass correlation coefficients (ICCs) were calculated with both subjects and judges as random factors. The ICC has been argued to be a more conservative measure of reliability than the Pearson Product Moment Correlation (Denegar & Ball, 1993). The ICCs are presented in Table 2. They ranged from .94 to .995 for the subjects with aphasia and from .89 to .99 for the normal subjects. The SEM associated with inter-judge scoring error was also calculated for each metric. These results are presented in Table 3 and they ranged from .59 to .95 %IU/Min for the subjects with aphasia and from .99 to 1.42 %IU/Min for the normal subjects. Point-to-point reliability between all six possible pairings of judges was calculated separately for the four subjects with aphasia and for four of the normal subjects. The TABLE 2 Inter-rater reliability (intraclass) correlation coefficients for total, direct and alternate %IU/Min Subjects

Total

Direct

Alternate

Aphasic (n = 4) Normal (n = 11)

0.995 0.993

0.986 0.979

0.944 0.885

All significant at p < .001. TABLE 3 Inter-rater standard errors of measurement (SEM) for total, direct and alternate %IU/Min Subjects

Total

Direct

Alternate

Aphasic (n = 4) Normal (n = 11)

0.69 0.99

0.95 1.42

0.59 1.04

INTER-RATER RELIABILITY OF THE SRP 135

formula used was [(agreements/disagreements + agreements) x 100]. Point-topoint reliability averaged 91% (range = 85–95%) for both subject groups. DISCUSSION The inter-rater reliability for the %IU/Min metric, when scored directly from audio recordings by newly and minimally trained judges, was high, with small differences in scoring reliability among judges with differences in professional experience. The SEMs were found to be much lower than the SEMs reported by McNeil et al. (2002) for the four alternate forms for subjects with aphasia (range = 4.8–5.6) and for the normal subjects (range = 3.2–4.7). The low SEMs suggest that measurement error attributable to differences between raters is small relative to the score variance due to the story forms themselves. Furthermore, the present data, scored to include the direct vs alternate IU distinction, demonstrated point-to-point reliability that was high and comparable to previously reported values obtained from printed transcripts. Finally, the preliminary data presented regarding the time needed to score language samples elicited by the SRP suggest that it might be useful in clinical environments where economy of assessment procedures is essential. REFERENCES Bayles, K. A., & Tomoeda, C. K. (1993). Arizona Battery for Communication Disorders of Dementia. Tucson, AZ: Canyonlands Publishing, Inc. Berndt, R. S., Wayland, S., Rochon, E., Saffran, E., & Schwartz, M. (2000). Quantitative Production Analysis (QPA). Philadelphia: Psychology Press. Brodsky, M., McNeil, M., Park, G., Fossett, T., Timm, N., & Doyle, P. (2000). Auditory memory for story retelling in normal male, female, young, and old adult subjects in persons with aphasia. Poster presented to the Academy of Aphasia Conference, Montreal, CA. Brookshire, R. H., & Nicholas, L. H. (1993). Discourse Comprehension Test. Tuscon, AZ: Communication Skill Builders. Denegar, C. R., & Ball, D. W. (1993). Assessing the reliability and precision of measurement: An introduction to the intraclass correlation and standard error of measurement. Journal of Sports Rehabilitation, 2, 35–42. Doyle, P. J., McNeil, M. R., Park, G., Goda, A., Rubenstein, E., Spencer, K., et al. (2000). Linguistic validation of four parallel forms of a story retelling procedure. Aphasiology, 14, 537–549. Doyle, P. J., McNeil, M. R., Spencer, K. A., Goda, A. J., Cottrell, K., & Lustig, A. P. (1998). The effects of concurrent picture presentations on retelling of orally presented stories by adults with aphasia. Aphasiology, 12, 561–574. Goodglass, H., & Kaplan, E. (1983). The assessment of aphasia and related disorders. Philadelphia: Lea & Febiger.

136 APHASIOLOGY

McNeil, M. R., Doyle, P., Fossett, T., Park, G., & Goda, A. (2001). Reliability and concurrent validity of an information unit scoring metric for the retelling procedure. Aphasiology, 15, 991–1007. McNeil, M.R., Doyle, P., Park, G., Fossett, T., & Brodsky, M. (2002). Increasing the sensitivity of the Story Retell Procedure for the discrimination of normal elderly subjects from persons with aphasia. Aphasiology, 16, 815–822. McNeil, M.R., & Prescott, T.E. (1978). The Revised Token Test. Austin, TX: Pro-Ed. McNeil, M.R., Small, S.L., Masterson, R.J., & Fossett, T.R. D. (1995). Behavioral and pharmacological treatment of lexical-semantic deficits in a single patient with primary progressive aphasia. American Journal of Speech-Language Pathology, 4, 76–87. Nicholas, L.E., & Brookshire, R.H. (1993). A system for quantifying the informativeness and efficiency of the connected speech of adults with aphasia. Journal of Speech and Hearing Research, 36, 338–350. Nicholas, L.E., & Brookshire, R.H. (1995). Presence, completeness, and accuracy of main concepts in the connected speech of non-brain-damaged adults and adults with aphasia. Journal of Speech and Hearing Research , 38, 145–156. Oelschlager, M.L., & Thorne, J.C. (1999). Application of the correct information unit analysis to the naturally occurring conversation of a person with aphasia. Journal of Speech, Language, and Hearing Research, 42, 636–648. Porch, B.E. (1981). Porch Index of Communicative Ability. Palo Alto, CA: Consulting Psychologists Press. Raven, J.C. (1976). Coloured Progressive Matrices. Oxford: Oxford Psychologists Press, Ltd. Ulatowska, H.K., Chapman, S.B., Highley, A.P., & Prince, J. (1998). Discourse in healthy old-elderly adults: A longitudinal study. Aphasiology, 15, 619–633. Yorkston, K.M., & Beukelman, D.R. (1980). An analysis of connected speech samples of aphasic and normal speakers. Journal of Speech and Hearing Disorders, 45, 27–36.