398 95 9MB
English Pages 1033 Year 1979
Handbook of Behavioral Medicine
Andrew Steptoe Editor
Handbook of Behavioral Medicine Methods and Applications Editor Andrew Steptoe Department of Epidemiology and Public Health, University College London, London, UK Associate Editors Kenneth E. Freedland Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA J. Richard Jennings Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA Maria M. Llabre Department of Psychology, University of Miami, Miami, FL, USA Stephen B. Manuck Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA Elizabeth J. Susman Department of Biobehavioral Health, Pennsylvania State University, University Park, PA, USA Assistant Editor Lydia Poole Department of Epidemiology and Public Health, University College London, London, UK In association with the Academy of Behavioral Medicine Research
123
Editor Andrew Steptoe Department of Epidemiology and Public Health University College London London, UK [email protected] Associate Editors Kenneth E. Freedland Department of Psychiatry Washington University School of Medicine St. Louis, MO, USA [email protected]
J. Richard Jennings Department of Psychiatry University of Pittsburgh Pittsburgh, PA, USA [email protected]
Maria M. Llabre Department of Psychology University of Miami Miami, FL, USA [email protected]
Stephen B. Manuck Department of Psychology University of Pittsburgh Pittsburgh, PA, USA [email protected]
Elizabeth J. Susman Department of Biobehavioral Health Pennsylvania State University University Park, PA, USA [email protected]
ISBN 978-0-387-09487-8 e-ISBN 978-0-387-09488-5 DOI 10.1007/978-0-387-09488-5 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2010933789 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Behavioral medicine emerged in the 1970s as the interdisciplinary field concerned with the integration of behavioral, psychosocial, and biomedical science knowledge relevant to the understanding of health and illness, and the application of this knowledge to prevention, diagnosis, treatment, and rehabilitation. The Academy of Behavioral Medicine Research was founded in 1978 as a forum for established behavioral medicine researchers to exchange ideas in an informal atmosphere. The discipline has subsequently grown and evolved substantially. Recent years have witnessed an enormous diversification of behavioral medicine, with new sciences (e.g., genetics, life course epidemiology) and new technologies (e.g., neuroimaging) coming into play. New health problems have emerged, notably obesity and metabolic disorders, that present fresh challenges to the integration of behavioral sciences with public health. Traditional areas of behavioral medicine research such as the influence of psychological factors on physiological responses have been transformed with measures of intracellular processes, cell signaling molecules, cardiac morphology, and gene expression. Cardiovascular behavioral medicine and psychoneuroimmunology, the disciplines which underpin much of the pathophysiological research in behavioral medicine, have converged in the shared exploration of biobehavioral processes across a range of medical conditions. The field of psychological assessment has benefited from new techniques such as ecological momentary assessment and item response theory, while objective methods are being increasingly used in behavioral assessment. Interventional behavioral medicine has had a new lease on life with large clinical trials, the use of the Internet and other information technologies, and the introduction of the public health perspective into the individual-level behavioral change tradition. These developments have obliged practitioners to embrace new statistical and analytic approaches. Theoretical understanding has developed considerably, with concepts such as allostatic load, illness representations, and epigenetics enriching the diverse domains of behavioral medicine. The discipline has also become international, with learned societies in more than 20 countries, and high-quality research laboratories spread throughout the world. There is a need to bring together these new developments in a compendium of methods and applications. This handbook aims to fill this need by providing an up to date survey of methods and applications drawn from the v
vi
Preface
broad range of behavioral medicine research and practice. The handbook is divided into 10 sections that address key fields in behavioral medicine, ranging from basic biobehavioral processes, through individual developmental and socioemotional factors, to public health and clinical trials. Each section begins with one or two methodological or conceptual chapters, followed by contributions that address substantive topics within that field. There are very few disease-orientated chapters; rather, major health problems such as cardiovascular disease, cancer, HIV/AIDS, and obesity are explored from multiple perspectives. Our aim is to present behavioral medicine as an integrative discipline, involving diverse methodologies and research paradigms that converge on health and well-being. As an editor, I should like to express my gratitude to the five associate editors who provided great expertise and support throughout the preparation of this book, to the assistant editor Lydia Poole for her unstinting work, and to the many contributors who have enabled the handbook to be completed in a timely fashion. The editorial team have also greatly benefited from the wisdom of an advisory group of distinguished members of the Academy of Behavioral Medicine Research, namely Ronald Glaser (Ohio State University), Kenneth E. Freedland (Washington University School of Medicine), Kathleen C. Light (University of Utah), Philip M. McCabe (University of Miami), and Andrew Baum (University of Texas, Arlington). Our thanks also go to the editorial and production groups at Springer for their efficiency and helpfulness during the production process. London, UK January 2010
Andrew Steptoe
Contents
Part I
Health Behaviors: Processes and Measures
1 Social and Environmental Determinants of Health Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Verity J. Cleland, Kylie Ball, and David Crawford 2 Cognitive Determinants of Health Behavior . . . . . . . . . . Mark Conner 3 Assessment of Physical Activity in Research and Clinical Practice . . . . . . . . . . . . . . . . . . . . . . . Lephuong Ong and James A. Blumenthal
3 19
31
4 Dietary Assessment in Behavioral Medicine . . . . . . . . . . Marian L. Neuhouser
49
5 Assessment of Sexual Behavior . . . . . . . . . . . . . . . . . Lori A.J. Scott-Sheldon, Seth C. Kalichman, and Michael P. Carey
59
6 By Force of Habit . . . . . . . . . . . . . . . . . . . . . . . . . Bas Verplanken
73
7 Adherence to Medical Advice: Processes and Measurement . . . . . . . . . . . . . . . . . . . . . . . . . Jacqueline Dunbar-Jacob, Martin P. Houze, Cameron Kramer, Faith Luyster, and Maura McCall Part II
83
Psychological Processes and Measures
8 Ecological Validity for Patient Reported Outcomes . . . . . . Arthur A. Stone and Saul S. Shiffman
99
9 Item Response Theory and Its Application to Measurement in Behavioral Medicine . . . . . . . . . . . . 113 Mee-Ae Kim-O and Susan E. Embretson 10
Applications of Neurocognitive Assessment in Behavioral Medicine . . . . . . . . . . . . . . . . . . . . . . 125 Shari R. Waldstein, Carrington Rice Wendell, and Megan M. Hosey vii
viii
Contents
11
Lay Representations of Illness and Treatment: A Framework for Action . . . . . . . . . . . . . . . . . . . . . 137 Howard Leventhal, Jessica Y. Breland, Pablo A. Mora, and Elaine A. Leventhal
12
Conceptualization, Measurement, and Analysis of Negative Affective Risk Factors . . . . . . . . . . . . . . . . 155 Timothy W. Smith
13
Hostility and Health . . . . . . . . . . . . . . . . . . . . . . . 169 John C. Barefoot and Redford B. Williams
14
Positive Well-Being and Health . . . . . . . . . . . . . . . . . 185 Andrew Steptoe
15
Coping and Health . . . . . . . . . . . . . . . . . . . . . . . . 197 Charles S. Carver and Sara Vargas
Part III
Social and Interpersonal Processes
16
Experimental Approaches to Social Interaction for the Behavioral Medicine Toolbox . . . . . . . . . . . . . . 211 Jerry Suls and M. Bryant Howren
17
Social Support and Physical Health: Links and Mechanisms . . . . . . . . . . . . . . . . . . . . . . 225 Tara L. Gruenewald and Teresa E. Seeman
18
Social Networks and Health . . . . . . . . . . . . . . . . . . . 237 Ai Ikeda and Ichiro Kawachi
19
Social Norms and Health Behavior . . . . . . . . . . . . . . . 263 Allecia E. Reid, Robert B. Cialdini, and Leona S. Aiken
20
Social Marketing: A Tale of Beer, Marriage, and Public Health . . . . . . . . . . . . . . . . . . . . . . . . . 275 Gerard Hastings and Ray Lowry
Part IV
Epidemiological and Population Perspectives
21
Assessment of Psychosocial Factors in Population Studies . . . . . . . . . . . . . . . . . . . . . . . 291 Susan A. Everson-Rose and Cari J. Clark
22
Socio-economic Position and Health . . . . . . . . . . . . . . 307 Tarani Chandola and Michael G. Marmot
23
Race, Ethnicity, and Health in a Global Context . . . . . . . . 321 Shawn D. Boykin and David R. Williams
24
Neighborhood Factors in Health . . . . . . . . . . . . . . . . 341 Mahasin S. Mujahid and Ana V. Diez Roux
Contents
ix
25
Health Literacy: A Brief Introduction . . . . . . . . . . . . . 355 Michael S. Wolf, Stacy Cooper Bailey, and Kirsten J. McCaffery
26
Screening and Early Detection of Cancer: A Population Perspective . . . . . . . . . . . . . . . . . . . . . 367 Laura A.V. Marlow, Jo Waller, and Jane Wardle
27
The Impact of Behavioral Interventions in Public Health . . . 383 Noreen M. Clark, Melissa A. Valerio, and Christy R. Houle
Part V
Genetic Process in Behavioral Medicine
28
Quantitative Genetics in Behavioral Medicine . . . . . . . . . 399 Eco de Geus
29
Candidate Gene and Genome-Wide Association Studies in Behavioral Medicine . . . . . . . . . . . . . . . . . 423 Ilja M. Nolte, Jeanne M. McCaffery, and Harold Snieder
30
Functional Genomic Approaches in Behavioral Medicine Research . . . . . . . . . . . . . . . . . . . . . . . . 443 Gregory E. Miller and Steve W. Cole
31 Genetics of Stress: Gene–Stress Correlation and Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Stephen B. Manuck and Jeanne M. McCaffery 32
Nicotine Dependence and Pharmacogenetics . . . . . . . . . . 479 Riju Ray, Robert Schnoll, and Caryn Lerman
33
Genetics of Obesity and Diabetes . . . . . . . . . . . . . . . . 499 Karani S. Vimaleswaran and Ruth J.F. Loos
Part VI
Development and the Life Course
34
A Life Course Approach to Health Behaviors: Theory and Methods . . . . . . . . . . . . . . . . . . . . . . . 525 Gita D. Mishra, Yoav Ben-Shlomo, and Diana Kuh
35
Prenatal Origins of Development Health . . . . . . . . . . . . 541 Christopher L. Coe
36
The Impact of Early Adversity on Health . . . . . . . . . . . 559 Shelley E. Taylor
37
Health Disparities in Adolescence . . . . . . . . . . . . . . . . 571 Hannah M.C. Schreier and Edith Chen
38 Reproductive Hormones and Stages of Life in Women: Moderators of Mood and Cardiovascular Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Susan S. Girdler and Kathleen C. Light
x
39
Contents
Aging and Behavioral Medicine . . . . . . . . . . . . . . . . . 603 Brenda W.J.H. Penninx and Nicole Vogelzangs
Part VII Biological Measures and Biomarkers 40
Use of Biological Measures in Behavioral Medicine . . . . . . 619 Andrew Steptoe and Lydia Poole
41
Laboratory Stress Testing Methodology . . . . . . . . . . . . 633 William Gerin
42
Stress and Allostasis . . . . . . . . . . . . . . . . . . . . . . . 649 Ilia N. Karatsoreos and Bruce S. McEwen
43
Neuroendocrine Measures in Behavioral Medicine . . . . . . 659 Petra Puetz, Silja Bellingrath, Andrea Gierens, and Dirk H. Hellhammer
44
Immune Measures in Behavioral Medicine Research: Procedures and Implications . . . . . . . . . . . . . . . . . . . 671 Michael T. Bailey and Ronald Glaser
45
Circulating Biomarkers of Inflammation, Adhesion, and Hemostasis in Behavioral Medicine . . . . . . . . . . . . 685 Paul J. Mills and Roland von Känel
46
The Metabolic Syndrome, Obesity, and Insulin Resistance . . 705 Armando J. Mendez, Ronald B. Goldberg, and Philip M. McCabe
47 The Non-invasive Assessment of Autonomic Influences on the Heart Using Impedance Cardiography and Heart Rate Variability . . . . . . . . . . . 723 Julian F. Thayer, Anita L. Hansen, and Bjorn Helge Johnsen 48
Cardiac Measures . . . . . . . . . . . . . . . . . . . . . . . . 741 Gina T. Eubanks, Mustafa Hassan, and David S. Sheps
49 Behavioral Medicine and Sleep: Concepts, Measures, and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 Martica H. Hall Part VIII Brain Function and Neuroimaging 50
Neuroimaging Methods in Behavioral Medicine . . . . . . . . 769 Peter J. Gianaros, Marcus A. Gray, Ikechukwu Onyewuenyi, and Hugo D. Critchley
51
Applications of Neuroimaging in Behavioral Medicine . . . . 783 Marcus A. Gray, Peter J. Gianaros, and Hugo D. Critchley
Contents
xi
52
Neuroimaging of Depression and Other Emotional States . . 803 Scott C. Matthews and Richard D. Lane
53
The Electric Brain and Behavioral Medicine . . . . . . . . . . 821 J. Richard Jennings, Ydwine Zanstra, and Victoria Egizio
Part IX
Statistical Methods
54
Reporting Results in Behavioral Medicine . . . . . . . . . . . 845 Michael A. Babyak
55
Moderators and Mediators: The MacArthur Updated View . . . . . . . . . . . . . . . . . . . . . . . . . . . 869 Helena Chmura Kraemer
56
Multilevel Modeling . . . . . . . . . . . . . . . . . . . . . . . 881 S.V. Subramanian
57
Structural Equation Modeling in Behavioral Medicine Research . . . . . . . . . . . . . . . . . . . . . . . . 895 Maria Magdalena Llabre
58
Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 909 Larry V. Hedges and Elizabeth Tipton
Part X
Behavioral and Psychosocial Interventions
59
Trial Design in Behavioral Medicine . . . . . . . . . . . . . . 925 Kenneth E. Freedland, Robert M. Carney, and Patrick J. Lustman
60
Methodological Issues in Randomized Controlled Trials for the Treatment of Psychiatric Comorbidity in Medical Illness . . . . . . . . . . . . . . . . . . . . . . . . . 941 David C. Mohr, Sarah W. Kinsinger, and Jenna Duffecy
61
Quality of Life in Light of Appraisal and Response Shift . . . 955 Sara Ahmed and Carolyn Schwartz
62
Behavioral Interventions for Prevention and Management of Chronic Disease . . . . . . . . . . . . . . 969 Brian Oldenburg, Pilvikki Absetz, and Carina K.Y. Chan
63 Psychosocial–Behavioral Interventions and Chronic Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989 Neil Schneiderman, Michael H. Antoni, Frank J. Penedo, and Gail H. Ironson 64
The Role of Interactive Communication Technologies in Behavioral Medicine . . . . . . . . . . . . . . . . . . . . . . 1009 Victor J. Strecher
xii
65 Behavioral Medicine, Prevention, and Health Reform: Linking Evidence-Based Clinical and Public Health Strategies for Population Health Behavior Change . . . . . . . . . . . . . . . . . . . . . . . . . 1021 Judith K. Ockene and C. Tracy Orleans Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037
Contents
Contributors
Pilvikki Absetz Adjunct Professor of Health Promotion, University of Tampere, School of Public Health and National Institute for Health and Welfare, P.O. Box 30, FI-00271 Helsinki, Finland, [email protected] Sara Ahmed Assistant Professor, Faculty of Medicine, School of Physical and Occupational Therapy, McGill University, 3654 Prom Sir-William-Osler, Montréal, QC H3G 1Y5, Canada, [email protected] Leona S. Aiken Professor of Psychology, Department of Psychology, Arizona State University, 950 S. McAllister Ave., P.O. Box 871104, Tempe, AZ 85287-1104, USA, [email protected] Michael H. Antoni Professor of Psychology, Department of Psychology, University of Miami, P.O. Box 248185, Coral Gables, FL 33124-0751, USA, [email protected] Michael A. Babyak Professor of Medical Psychology, Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Box 3119 DUMC, Durham, NC 27707, USA, [email protected] Stacy Cooper Bailey Clinical Research Associate and Program Director, Health Literacy and Learning Program, Center for Communication in Healthcare, Division of General Internal Medicine, and Institute for Healthcare Studies, Feinberg School of Medicine at Northwestern University, 750 N. Lake Shore Drive, 10th Floor, Chicago, IL 60611, USA, [email protected] Michael T. Bailey Assistant Professor, Institute for Behavioral Medicine Research, The Ohio State University, 257 IBMR Building, 460 Medical Center Drive, Columbus, OH 43210, USA, [email protected] Kylie Ball Associate Professor, School of Exercise and Nutrition Sciences, Deakin University, 221 Burwood Highway, 3125 VIC, Australia, [email protected] John C. Barefoot Research Professor, Department of Psychiatry and Behavioral Science, Duke University Medical Center, Box 2969, Durham, NC 27710, USA, [email protected]
xiii
xiv
Silja Bellingrath Postdoctoral Fellow in Health Psychology, Jacobs Center on Lifelong Learning and Institutional Development, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany, [email protected] Yoav Ben-Shlomo Professor of Clinical Epidemiology, Department of Social Medicine, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol BS8 2PS, UK, [email protected] James A. Blumenthal Professor of Medical Psychology, Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Box 3119, Durham, NC 27710, USA, [email protected] Shawn D. Boykin Research fellow, Department of Epidemiology, Center for Integrative Approaches to Health Disparities, University of Michigan School of Public Health, 109 South Observatory St, Ann Arbor, MI 48109, USA, [email protected] Jessica Y. Breland Teaching Assistant, Department of Psychology, Institute for Health, Health Care Policy and Aging Research Rutgers, The State University of New Jersey, 30 College Ave., New Brunswick, NJ 08901-1293, USA, [email protected] Michael P. Carey Director, Center for Health and Behavior, Syracuse University, 415-B Huntington Hall, Syracuse, NY 13244-2340, USA, [email protected] Robert M. Carney Professor of Psychiatry, Behavioral Medicine Center, Washington University School of Medicine, 4320 Forest Park Avenue, Suite 301, St. Louis, MI 63108 USA, [email protected] Charles S. Carver Professor of Psychology, Department of Psychology, University of Miami, 5665 Ponce de Leon Blvd., Coral Gables, FL 33124-0751, USA, [email protected] Carina K.Y. Chan Lecturer, Medicine and Health Sciences, Monash University (Sunway Campus), Building 3, Jalan Lagoon Selatan, Bandar Sunway, 46150 Selangor Darul Ehsan, Malaysia, [email protected] Tarani Chandola Professor in Medical Sociology, CCSR, School of Social Sciences, Kantorovich Building, Humanities Bridgeford Street, University of Manchester, Manchester, M13 9PL, UK, [email protected] Edith Chen Canada Research Chair in Health & Society, Associate Professor, Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC V6T 1Z4, BC, Canada, [email protected] Robert B. Cialdini Professor of Psychology and Marketing, Department of Psychology, Arizona State University, 950 S. McAllister Ave., P.O. Box 871104, Tempe, AZ 85287-1104, USA [email protected]
Contributors
Contributors
xv
Cari J. Clark Research Associate, Department of Medicine, Program in Health Disparities Research, University of Minnesota Medical School, 717 Delaware Street SE, Suite166, Minneapolis, MN 55414, USA, [email protected] Noreen M. Clark Myron E. Wegman Distinguished University Professor, Director, Center for Managing Chronic Disease, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA, [email protected] Verity J. Cleland Research Fellow, Centre for Physical Activity and Nutrition Research, Deakin University, 221 Burwood Highway, Burwood, VIC 3125, Australia, [email protected] Christopher L. Coe Professor of Psychology, Department of Psychology, Harlow Center for Biological Psychology, University of Wisconsin, 22 N. Charter Street, Madison, WI 53715, USA, [email protected] Steve W. Cole Associate Professor, Department of Medicine, Division of Hematology-Oncology, UCLA School of Medicine, 11-934 Factor Building, Los Angeles, CA 90095-1678, USA, [email protected] Mark Conner Professor of Applied Social Psychology, Institute of Psychological Sciences, University of Leeds, Leeds LS2 9JT, UK, [email protected] David Crawford Director, Centre for Physical Activity and Nutrition Research, Deakin University, 221 Burwood Highway, Burwood, VIC 3125, Australia, [email protected] Hugo D. Critchley Professor of Psychiatry, Clinical Imaging Sciences Centre, Brighton and Sussex Medical School, University of Sussex, Falmer, Brighton BN1 9RR, UK, [email protected] Eco de Geus Professor of Psychology, Department of Biological Psychology, VU University, Van der Boechorststraat 1, 1081 BT, Amsterdam, The Netherlands, [email protected] Ana V. Diez Roux Professor of Epidemiology, Department of Epidemiology, Center for Social Epidemiology and Population Health, University of Michigan School of Public Health, 109 Observatory St, Ann Arbor, MI 48109-2029, USA, [email protected] Jenna Duffecy Assistant Professor, Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, 680 N. Lakeshore Drive, Suite 1220, Chicago, IL 60611, USA, [email protected] Jacqueline Dunbar-Jacob Professor and Dean of Nursing, University of Pittsburgh, 350 Victoria Building, 3500 Victoria St, Pittsburgh, PA 15261, USA, [email protected] Victoria Egizio Graduate Student, Department of Psychiatry, University of Pittsburgh, Western Psychiatric Institute and Clinic, 3811 O’Hara St, Pittsburgh, PA 15213, USA, [email protected]
xvi
Susan E. Embretson Professor of Psychology, School of Psychology, Georgia Institute of Technology, 654 Cherry St, Atlanta, GA 30332-0170, USA, [email protected] Gina T. Eubanks Supervisor, Research Project Coordinator, Division of Cardiovascular Medicine, Emory University, Atlanta, GA, USA; University of South Florida, 1717 W Hills Ave Unit 3, Tampa, FL 33606, USA, [email protected] Susan A. Everson-Rose Associate Professor, Department of Medicine, Program in Health Disparities Research, University of Minnesota Medical School, 717 Delaware Street SE, Suite 166, Minneapolis, MN 55414, USA, [email protected] Kenneth E. Freedland Professor of Psychiatry, Behavioral Medicine Center, Department of Psychiatry, Washington University School of Medicine, 4320 Forest Park Avenue, Suite 301, St. Louis, MI 63108, USA, [email protected] William Gerin Professor of Behavioral Health, Department of Biobehavioral Health, College of Health and Human Development, The Pennsylvania State University, 315 Health and Human Development East, University Park, PA 16802, USA, [email protected] Peter J. Gianaros Assistant Professor of Psychiatry and Psychology, Department of Psychiatry, University of Pittsburgh, Western Psychiatric Institute and Clinic, 3811 O’Hara Street, Pittsburgh, PA 15213, USA, [email protected] Andrea Gierens Director of the Biochemical Laboratories, Division of Clinical and Physiological Psychology, University of Trier, Johanniterufer 15, D-54290 Trier, Germany, cortlab@uni_trier.de Susan S. Girdler Professor and Director of the Stress and Health Research Program, Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7175, USA, [email protected] Ronald Glaser Professor of Molecular Virology, Immunology and Medical Genetics, Director, Institute for Behavioral Medicine Research, The Ohio State University, 120 IBMR Building, 460 Medical Center Drive, Columbus, OH 43210, USA, [email protected], [email protected] Ronald B. Goldberg Professor of Medicine, Division of Endocrinology, Diabetes and Metabolism and Diabetes Research Institute, University of Miami Miller School of Medicine, P.O. Box 016960, Miami, FL 33101-6960, USA, [email protected] Marcus A. Gray Lecturer in Psychiatry and Neuroimaging, Trafford Centre/Clinical Imaging Sciences Centre University of Sussex, Falmer, Brighton, BN1 9PX, UK, [email protected] Tara L. Gruenewald Assistant Professor, Department of Medicine, Division of Geriatrics, UCLA School of Medicine, 10945 Le Conte Avenue,
Contributors
Contributors
xvii
Suite 2339, Los Angeles, CA 90095-1687, USA, [email protected] Martica H. Hall Associate Professor of Psychiatry and Psychology, Department of Psychiatry, University of Pittsburgh, Western Psychiatric Institute and Clinic, 3811 O’Hara St, Pittsburgh, PA 15213, USA, [email protected] Anita L. Hansen Leader, Operational Psychology Research Group, Department of Psychosocial Sciences, Faculty of Psychology, University of Bergen, Christiesgate 12, N-5015 Bergen, Norway, [email protected] Mustafa Hassan Cardiology Fellow, Division of Cardiovascular Medicine, University of Florida, 1600 SW Archer Rd., PO Box 100277, Gainesville, FL 32610, USA, [email protected] Gerard Hastings Director, Institute for Social Marketing, University of Stirling, Stirling FK9 4LA, Scotland, UK, [email protected] Larry V. Hedges Professor of Statistics and Policy Research, Professor in the School of Education and social Policy, Department of Statistics, Northwestern University, 2046 Sheridan Road, Evanston, IL 60208, USA, [email protected] Dirk H. Hellhammer Professor of Clinical and Physiological Psychology, Division of Clinical and Physiological Psychology, University of Trier, Johanniterufer 15, Trier, D-54290, Germany, [email protected] Megan M. Hosey Graduate Student, Department of Psychology, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA, [email protected] Christy R. Houle Postdoctoral Scholar, Center for Managing Chronic Disease, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA, [email protected] Martin P. Houze Graduate Student, Center for Research and Evaluation, School of Nursing, University of Pittsburgh, 350 Victoria Building, 3500 Victoria St, Pittsburgh, PA 15261, USA, [email protected] M. Bryant Howren Postdoctoral Fellow, VA Iowa City Healthcare System, 601 Hwy 6 west, Iowa City, IA 52240, USA, matthew.howren.va.gov Ai Ikeda Research Fellow, Department of Society, Human Development and Health, Harvard School of Public Health, 677 Huntington Avenue, Kresge Building 7th Floor, Boston, MA 02115, USA, [email protected] Gail H. Ironson Professor of Psychology, Department of Psychology, University of Miami, P.O. Box 248185, Coral Gables, FL 33124-0751, USA, [email protected]
xviii
J. Richard Jennings Professor of Psychiatry and Psychology, Department of Psychiatry, University of Pittsburgh, Western Psychiatric Institute and Clinic, 3811 O’Hara St, Pittsburgh, PA 15213, USA, [email protected] Bjorn Helge Johnsen Professor of Personality Psychology, Faculty of Psychology, University of Bergen, Christiesgate 12, N-5015, Bergen, Norway, [email protected] Seth C. Kalichman Professor of Psychology, Department of Psychology, Center for Health, Intervention, and Prevention, University of Connecticut, 406 Babbidge Road, Storrs, CT 06269, USA, [email protected] Ilia N. Karatsoreos Postdoctoral Fellow, Harold and Margaret Milliken Hatch Laboratory of Neuroendocrinology, The Rockefeller University, 1230 York Ave, New York, NY 10021, USA, [email protected] Ichiro Kawachi Professor of Social Epidemiology and Chair, Department of Society, Human Development and Health, Harvard School of Public Health, 677 Huntington Avenue, Kresge Building 7th Floor, Boston, MA 02115, USA, [email protected] Mee-Ae Kim-O Graduate Student, School of Psychology, Georgia Institute of Technology, 654 Cherry St, Atlanta, GA 30332-0170, USA, [email protected] Sarah W. Kinsinger Assistant Professor, Department of Medicine and Psychiatry, Division of Gastroenterology, Northwestern University, Feinberg School of Medicine, 680 N. St. Clair Street, Suite 1400, Chicago, IL 60611, USA, [email protected] Helena Chmura Kraemer Professor of Biostatistics in Psychiatry (Emerita), Department of Psychiatry and Behavioral Sciences, Stanford University, 1116 Forest Avenue, Palo Alto, CA 94301, USA, [email protected] Cameron Kramer Graduate Student, Department of Health and Community Systems, School of Nursing, University of Pittsburgh, 350 Victoria Building, 3500 Victoria St, Pittsburgh, PA 15261, USA, [email protected] Diana Kuh Professor of Life Course Epidemiology, Director, 1MRC Unit for Lifelong Health and Ageing, Department of Epidemiology and Public Health, University College London, 33 Bedford Place, London WC1B 5JU, UK, [email protected] Richard D. Lane Professor of Psychiatry, Psychology, and Neuroscience, Department of Psychiatry, University of Arizona, 1501 N. Campbell Ave., Tucson, AZ 85724-5002, USA, [email protected] Caryn Lerman Mary W. Calkins Professor and Director, Tobacco Use Research Center, Department of Psychiatry, University of Pennsylvania, 3535 Market Street, Suite 4100, Philadelphia, PA 19104, USA, [email protected]
Contributors
Contributors
xix
Elaine A. Leventhal Professor of Medicine, Department of Medicine, University of Medicine and Dentistry of New Jersey, UMDNJ-RWJ Medical School 125 Paterson Street - CAB 2310, New Brunswick, NJ 08903, USA, [email protected] Howard Leventhal Professor of Health Psychology, Department of Psychology, Institute for Health, Health Care Policy and Aging Research Rutgers, The State University of New Jersey, 30 College Ave., New Brunswick, NJ 08901-1293, USA, [email protected] Kathleen C. Light Research Professor, Department of Anesthesiology, University of Utah, 615 Arapeen Drive, Suite 200, Salt Lake City, UT 84108, USA, [email protected] Maria Magdalena Llabre Professor of Psychology, Department of Psychology, University of Miami, P.O. Box 24-8185, Coral Gables, FL 33124, USA, [email protected] Ruth J.F. Loos Group Leader, Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Addenbrooke’s Hospital – Box 285, Hills Road, Cambridge CB2 0QQ, UK, [email protected] Ray Lowry Senior Lecturer, Child Dental Health School of Dental Sciences, Newcastle University, Newcastle on Tyne, NE2 4BW, UK, [email protected] Patrick J. Lustman Professor of Psychiatry, Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid, Campus Box 8134, St. Louis, MI 63108, USA, [email protected] Faith Luyster Postdoctoral Scholar in Psychiatry, Department of Psychiatry, University of Pittsburgh, 350 Victoria Building, 3500 Victoria St., Pittsburgh, PA 15261, USA, [email protected] Stephen B. Manuck Distinguished University Professor of Health Psychology and Behavioral Medicine, Behavioral Physiology Laboratory, Department of Psychology, University of Pittsburgh, 506 OEH, 4015 O’Hara Street, Pittsburgh, PA 15260, USA, [email protected] Laura A.V. Marlow Research Associate, Department of Epidemiology & Public Health, Health Behaviour Research Centre, University College London, Gower Street, London WC1E 6BT, UK, [email protected] Sir Michael G. Marmot Professor of Epidemiology, Department of Epidemiology and Public Health, University College London, London WC1E 6BT, UK, [email protected] Scott C. Matthews Assistant Professor of Psychiatry, University of California San Diego, 3350 La Jolla Village Drive (Mail Code 116-A), San Diego, CA 92161, USA, [email protected]
xx
Philip M. McCabe Professor, Associate Chairman, Department of Psychology, University of Miami, P.O. Box 248185, Coral Gables, FL 33124, USA, [email protected] Jeanne M. McCaffery Assistant Professor of Psychiatry and Human Behavior, Weight Control and Diabetes Research Center, Brown Medical School and The Miriam Hospital, 196 Richmond Street, Providence, RI 02903, USA, [email protected] Kirsten J. McCaffery Senior Research Fellow, School of Public Health and Centre for Medical Psychology and Evidence-Based Decision Making, Edward Ford Building (A27), The University of Sydney, Sydney, NSW 2006, Australia, [email protected] Maura McCall Graduate Student, Department of Health and Community Systems, School of Nursing, University of Pittsburgh, 360 Victoria Building, 3500 Victoria St, Pittsburgh, PA 15261, USA, [email protected] Bruce S. McEwen Alfred E. Mirsky Professor, Head, Harold and Margaret Milliken Hatch Laboratory of Neuroendocrinology, The Rockefeller University, 1230 York Ave, New York, NY 10021, USA, [email protected] Armando J. Mendez Assistant Professor of Medicine, Department of Medicine, Division of Endocrinology, Diabetes and Metabolism and Diabetes Research Institute, University of Miami Miller School of Medicine, 1450 N.W. 10th Avenue, Miami, FL 33136, USA, [email protected] Gregory E. Miller Associate Professor, Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver BC, Canada V6T 1Z4, [email protected] Paul J. Mills Professor in Residence, Department of Psychiatry, Behavioral Medicine Program, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0804, USA, [email protected] Gita D. Mishra Programme leader, MRC Unit for Lifelong Health and Ageing, Department of Epidemiology and Public Health, University College London, 33 Bedford Place, London WC1B 5JU, UK, [email protected] David C. Mohr Professor of Preventive Medicine, Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, 680 N. Lakeshore Drive, Suite 1220, Chicago, IL 60611, USA, [email protected] Pablo A. Mora Assistant Professor of Psychology, Psychology Department, University of Texas at Arlington, 501 S. Nedderman, Arlington, TX 76019, USA, [email protected] Mahasin S. Mujahid Assistant Professor of Epidemiology, Division of Epidemiology, University of California Berkeley, School of Public Health,
Contributors
Contributors
xxi
50 University Hall, #7360, Berkeley, CA 94720-7360, USA, [email protected] Marian L. Neuhouser Associate Member, Cancer Prevention Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, M4-B402, Seattle, WA 98109-1024, USA, [email protected] Ilja M. Nolte Statistical Geneticist, Unit of Genetic Epidemiology & Bioinformatics, Department of Epidemiology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, PO Box 30.001, 9700 RB Groningen, The Netherlands, [email protected] Judith K. Ockene Professor of Medicine, Division of Preventive and Behavioral Medicine, University of Massachusetts Medical School, 55, Lake Avenue North, Worcester, MA 01655-0214, USA, [email protected] Brian Oldenburg Professor of International Public Health, Department of Epidemiology and Preventive Medicine, Monash University, 89 Commercial Rd, Melbourne, VIC 3004, Australia, [email protected] Lephuong Ong Clinical Associate, Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Box 3119, Durham, NC 27710, USA, [email protected] Ikechukwu Onyewuenyi Graduate Student, Department of Psychiatry, University of Pittsburgh, Western Psychiatric Institute and Clinic, 3811 O’Hara Street, Pittsburgh, PA 15213, USA, [email protected] C. Tracy Orleans Distinguished Fellow and Senior Scientist, Robert Wood Johnson Foundation, Route 1 and College Road East, P.O. Box 2316, Princeton, NJ 08543, USA, [email protected] Frank J. Penedo Associate Professor of Psychology, Department of Psychology & Psychiatry & Behavioral Sciences, University of Miami, P.O. Box 248185, Coral Gables, FL 33124-0751, USA; Behavioral Medicine Research Center, University of Miami, P.O. Box 248185, Coral Gables, FL 33124-0751, USA, [email protected] Brenda W.J.H. Penninx Professor of Psychiatric Epidemiology, Department of Psychiatry, VU University Medical Center, AJ Ernststraat 887, 1081 HL, Amsterdam, The Netherlands, [email protected] Lydia Poole Graduate Student, Psychobiology Group, Department of Epidemiology and Public Health, University College London, 1-19 Torrington Place, London WC1E 6BT, UK, [email protected] Petra Puetz Research Associate, Department of Clinical and Physiological Psychology, University of Trier, Johanniterufer 15, Trier, D-54290, Germany, [email protected]
xxii
Riju Ray Research Associate in Psychiatry, Department of Psychiatry, University of Pennsylvania, 3535 Market Street, Suite 4100, Philadelphia, PA 19104, USA, [email protected] Allecia E. Reid Graduate Research Associate, Department of Psychology, Arizona State University, 950 S. McAllister Ave., P.O. Box 871104, Tempe, AZ 85287-1104, USA, [email protected] Neil Schneiderman Professor of Psychology, Department of Psychology, University of Miami, P.O. Box 248185, Coral Gables, FL 33124-0751, USA, [email protected] Robert Schnoll Associate Professor, Department of Psychiatry, University of Pennsylvania, 3535 Market Street, Suite 4100, Philadelphia, PA 19104, USA, [email protected] Hannah M.C. Schreier Graduate Student, Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC V6T 1Z4, BC, Canada, [email protected] Carolyn Schwartz President and Chief Scientist, DeltaQuest Foundation Inc, 31 Mitchell Road, Concord, MA 01742, USA; Research Professor of Medicine and Orthopaedic Surgery, Tufts University School of Medicine, Boston, MA, USA, [email protected] Lori A.J. Scott-Sheldon Research Assistant Professor of Psychology, Center for Health and Behavior, Syracuse University, 430 Huntington Hall, Syracuse, NY 13244-2340, USA, [email protected] Teresa E. Seeman Professor of Medicine and Epidemiology, Department of Medicine, Division of Geriatrics, UCLA School of Medicine, 10945 Le Conte Avenue, Suite 2339, Los Angeles, CA 90095-1687, USA, [email protected] David S. Sheps Professor of Medicine, Division of Cardiovascular Medicine, Emory University, EPICORE, 1256 Briarcliff Rd. NE, Building A, Suite 1N, Atlanta, GA 30306, USA, [email protected] Saul S. Shiffman Professor of Psychology, Department of Psychology, University of Pittsburgh, Sennott Square, 210 S. Bouquet Street, Pittsburgh, PA 15260, USA, [email protected] Timothy W. Smith Professor of Psychology, Department of Psychology, University of Utah, 380 South 1530 East (Room 502), Salt Lake City, UT 84112-0251, USA, [email protected] Harold Snieder Professor, Unit of Genetic Epidemiology & Bioinformatics, Department of Epidemiology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, PO Box 30.001, 9700 RB Groningen, The Netherlands, [email protected] Andrew Steptoe British Heart Foundation Professor of Psychology, Department of Epidemiology and Public Health, University College
Contributors
Contributors
xxiii
London, 1-19 Torrington Place, London WC1E 6BT, UK, [email protected] Arthur A. Stone Distinguished Professor and Vice Chairman, Department of Psychiatry and Behavioral Science, Stony Brook University, Stony Brook, NY 11994-8790, USA, [email protected] Victor J. Strecher Professor and Director, Center for Health Communications Research, Department of Health Behavior and Health Education, Center for Health Communications Research, School of Public Health, University of Michigan, 300 N. Ingalls – Room 5D-04 (0471), Ann Arbor, MI 48109-0471, USA, [email protected] S.V. Subramanian Associate Professor, Department of Society, Human Development and Health, Harvard School of Public Health, 677 Huntington Avenue, Kresge Building, 7th Floor, Boston MA 02115, USA, [email protected] Jerry Suls Professor of Psychology, Department of Psychology, Spence Laboratories, University of Iowa, Iowa City, IA 52242, USA, [email protected] Shelley E. Taylor Distinguished Professor of Psychology, Department of Psychology, University of California, 1282A Franz Hall, Los Angeles, CA 90095, USA, [email protected] Julian F. Thayer The Ohio Eminent Scholar Professor in Health Psychology, Department of Psychology, The Ohio State University, 1835 Neil Avenue, Columbus, OH 43210, USA, [email protected] Elizabeth Tipton Graduate Student, Department of Statistics, Northwestern University, 2046 Sheridan Road, Evanston, IL 60208, USA, [email protected] Melissa A. Valerio Assistant Professor, Health Behavior and Health Education, School of Public Health, University of Michigan, 1415 Washington Heights Street, Ann Arbor, MI 48109, USA, [email protected] Sara Vargas Graduate Student, Department of Psychology and Sylvester Comprehensive Cancer Center, University of Miami, 5665 Ponce de Leon Blvd., Coral Gables, FL 33124-0751, USA, [email protected] Bas Verplanken Professor of Social Psychology, Department of Psychology, University of Bath, Claverton Down, Bath, BA2 7AY, UK, [email protected] Karani S. Vimaleswaran Career Development Fellow, Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Addenbrooke’s Hospital – Box 285, Hills Road, Cambridge, CB2 0QQ, UK, [email protected]
xxiv
Roland von Känel Professor of Medicine, Head, Psychosomatic Division, Department of General Internal Medicine, University Hospital/Inselspital, CH-3010 Bern, Switzerland, [email protected] Nicole Vogelzangs Postdoctoral Researcher, Department of Psychiatry and EMGO Institute for Health and Care Research, VU University Medical Center, AJ Ernststraat 887, 1081 HL, Amsterdam, The Netherlands, [email protected] Shari R. Waldstein Professor of Psychology, Department of Psychology; University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA, [email protected] Jo Waller Senior Research Associate, Health Behaviour Research Centre, Department of Epidemiology & Public Health, University College London, Gower Street, London WC1E 6BT, UK, [email protected] Jane Wardle Professor of Clinical Psychology, Director, Health Behaviour Research Centre, Department of Epidemiology & Public Health, University College London, Gower Street, London WC1E 6BT, UK, [email protected] Carrington Rice Wendell Graduate Student, Department of Psychology, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA, [email protected] David R. Williams Professor of African and African American Studies and of Sociology, Department of Society, Human Development and Health, Harvard School of Public Health, 677 Huntington Ave, 6th Floor, Boston, MA 02115, USA, [email protected] Redford B. Williams Professor of Psychiatry & Behavioral Sciences, Director, Behavioral Medicine Research Center, Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Box 3926, Durham, NC 27710, USA, [email protected] Michael S. Wolf Associate Professor of Medicine and Learning Sciences, Division of General Internal Medicine, Feinberg School of Medicine, Northwestern University, 750 N. Lake Shore Drive, 10th Floor, Chicago, IL 60611, USA, [email protected] Ydwine Zanstra Postdoctoral Fellow, Department of Psychiatry, University of Pittsburgh, Western Psychiatric Institute and Clinic, 3811 O’Hara St, Pittsburgh, PA 15213, USA, [email protected]
Contributors
Part I
Health Behaviors: Processes and Measures
Chapter 1
Social and Environmental Determinants of Health Behaviors Verity J. Cleland, Kylie Ball, and David Crawford
1 Introduction Physical activity and healthy eating behaviors have an important role to play in the prevention of a range of adverse health outcomes. An extensive body of epidemiological evidence from large prospective cohort studies demonstrates that compared with those who are less physically active, those who are more active are at lower risk of all-cause mortality, cardiovascular diseases, stroke, type 2 diabetes, obesity, certain cancers (mainly breast and colon), musculoskeletal conditions, and poor mental health (US Department of Health and Human Services, 1996). Similarly, healthy eating behaviors have consistently been found to have positive health benefits: high fruit and vegetable consumption assists in the prevention of ischemic heart disease, obesity, certain cancers, and, to a lesser extent, stroke; fish and fish oil consumption is protective against coronary heart disease; and diets high in fiber protect against obesity and type 2 diabetes (World Cancer Research Fund and American Institute for Cancer Research, 2007; World Health Organization, 2002). Despite these welldocumented health benefits, a large proportion of the population living in developed nations
V.J. Cleland () Centre for Physical Activity and Nutrition Research, Deakin University, 221 Burwood Highway, Burwood, VIC 3125, Australia e-mail: [email protected]
fail to meet physical activity and healthy eating recommendations. Given the importance of physical activity and healthy eating behaviors for health, a number of countries have developed guidelines aimed at educating the public about optimal levels of physical activity and healthy eating patterns. Physical activity and healthy eating guidelines tend to be similar in countries such as the United States (US), Canada, Europe, the United Kingdom (UK), and Australia. Physical activity guidelines for adults generally recommend achieving at least 150 min per week of moderate-intensity activity, and that physical activity can be accumulated in 10-min bouts. Recent Physical Activity Guidelines for Americans suggest that physical activity can alternatively be accumulated through 75 min a week of vigorous-intensity aerobic physical activity, or an equivalent combination of moderate- and vigorous-intensity aerobic activity (US Department of Health and Human Services, 2008). The 2005 Dietary Guidelines for Americans suggest consuming a variety of nutrient-dense foods and beverages within and among the basic food groups, while choosing foods that limit the intake of saturated and trans fats, cholesterol, added sugars, salt, and alcohol (US Department of Health and Human Services, 2005). Dietary Guidelines for Australian Adults recommend enjoying a wide variety of nutritious foods (including plenty of vegetables, legumes, and fruits; wholegrain cereals; lean meat, fish, and poultry; reduced-fat milks, yoghurts, and cheeses; and drinking plenty of water) and taking care to limit saturated fat, moderate total
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_1, © Springer Science+Business Media, LLC 2010
3
4
fat, choose low-salt foods, limit alcohol, and consume only moderate amounts of sugars and foods containing added sugars (National Health and Medical Research Council, 2003). Despite these guidelines, in many developed countries, a significant proportion of the population eats poorly and is not physically active at levels recommended for good health. It is important to understand why so many people fail to meet physical activity and healthy eating recommendations, in order to inform the development of effective preventive strategies. A broad range of determinants of physical activity and healthy eating behaviors have been identified. Historically, much research examining determinants of health behavior, including physical activity and eating behaviors, has focused on individual and cognitive factors such as knowledge, motivation, and self-efficacy (described in Section 2). While selected individual factors have consistently been shown to be important in predicting physical activity and/or eating behaviors, more recently researchers have begun to examine the broader social and environmental contexts in which physical activity and eating behaviors occur. While research of this nature is new in its application to understanding physical activity and eating behaviors, it is not new in terms of its application to other public health issues. The classic example, where in 1854 John Snow removed the handle of the local public water pump on Broad Street, London, to end a cholera epidemic, highlights the importance of structural changes in influencing public health. A focus on understanding “upstream” determinants, such as social and environmental factors, of physical activity and eating behaviors may offer important opportunities for intervention. However, there are many challenges involved in the definition, conceptualization, and measurement of environments, which must be considered when attempting to understand the role of the environment as a determinant of health behavior. While the challenges inherent in investigating environmental influences on health behavior have been discussed elsewhere (Ball et al, 2006c), their significance warrants mention here. Defining environments is difficult because
V.J. Cleland et al.
people live and function in multiple contexts or settings (e.g., family, home, and work environments) and in multiple geographic areas (e.g., streets, neighborhoods, cities). Furthermore, there are different types of environmental influences, including factors within the built and natural environment, the social environment, the cultural environment, and the policy environment. Even defining a “neighborhood” environment, which has often been used as the unit of study in much of the research on environmental influences on health behavior, poses unique challenges. For instance, administratively classified definitions, such as postal (zip) codes or census block areas, may conflict with community perceptions of what constitutes a neighborhood. While defining neighborhoods with specificity to individuals (e.g., a 1 km radius of the home) may improve the ability to detect associations, studying environments at such a specific level can be time- and labor-intensive, and there is not yet agreement in defining appropriate geographical boundaries. For example, some studies have used a range of definitions including 400 m, 800 m, 1 km, 1 m, or 5 km. Another key issue is identifying which aspects of the environment to measure from thousands of potential exposure variables. Clear justification based on careful theoretical considerations must be provided in combination with thoughtful hypotheses, and consideration of the outcome being measured and the target group under investigation is recommended. For the purposes of this chapter, social determinants are defined as the subjective social norms, support, and other social influences on physical activity and eating behaviors (Brug et al, 2008). Environments are defined here as the neighborhoods within which individuals, families, and communities exist, which in the health behavior literature has typically focused on aspects of the built environment. This chapter will focus primarily on the social and environmental determinants of physical activity and eating behaviors using evidence from systematic and narrative reviews and original research studies. It is acknowledged that other social and environmental influences are likely to be important in influencing physical activity and
1 Social and Environmental Determinants of Health Behaviors
eating behaviors, but this chapter will focus on those determinants that have been most comprehensively examined in the scientific literature. Furthermore, because the social and environmental determinants of physical activity and eating behaviors are likely to be dramatically different in developing countries, this chapter is limited to research conducted in developed nations.
5
Structural/environmental factors
Social/interpersonal factors
Individual characteristics
2 Theoretical Frameworks In attempting to understand the determinants of physical activity and eating behaviors, theoretical frameworks offer a useful starting point to conceptualize the multitude of potential determinant factors. Many different theories have been developed in an attempt to explain behavior, and these can be broadly classified as intra-personal theories or inter-personal theories. Intra-personal theories, such as the health belief model (Becker and Maiman, 1975) and the theory of planned behavior (Ajzen, 1985), are primarily concerned with psychological factors and are based on the premise that behavior is largely choice-driven by individuals (see Chapter 2). In contrast, interpersonal theories, such as social cognitive theory (Bandura, 1986) and ecological models (Sallis and Owen, 2002; Stokols, 1992), posit that there are multiple layers of influence on behavior and emphasize the role of the broader environment in enabling or hindering individuals in their efforts to make healthy choices. To date, much research on the determinants of physical activity and eating behavior has been atheoretical or has been largely driven by intrapersonal theories (Baranowski et al, 1999; Cliska et al, 2000). This chapter will be based on social– ecological models because these give consideration to the broader social and environmental contexts in which physical activity and eating behaviors occur. Social–ecological models posit that there are multiple levels of influence, including individual factors, social factors, and environmental factors, and that these interact with each other to predict behavior (Fig. 1.1).
Fig. 1.1 Diagrammatic representation of the social– ecological model of influences on physical activity and eating behaviors
3 Social and Environmental Determinants of Physical Activity Physical activity comprises a complex set of behaviors and as a result is difficult to measure. A detailed discussion of physical activity assessment is provided in Chapter 3, but is described briefly here. Physical activity can be classified by its type (e.g., swimming, walking, skiing, tennis, and basketball), intensity (e.g., light, moderate, vigorous), frequency (how many times per day/week/month/year), duration (how long per session), and the domain in which it occurs (e.g., leisure, transport, occupation, domestic). Self-reported (e.g., surveys and interviews) and objective (e.g., pedometers and accelerometers) measures of physical activity each have strengths and limitations, and a combination of both have been recommended for use. When considering the influence of social and environmental determinants of physical activity, it is important to measure context-specific physical activity behaviors (Brug et al, 2008; Giles-Corti et al, 2005). For instance, when trying to understand whether the presence of a walking trail influences physical activity, it may be more important to assess walking behaviors undertaken during leisure time, as opposed to a global measure of physical activity, since the latter may have been
6
accumulated in other domains such as at work or in the home and is hence less likely to be related to the local presence of a walking trail.
3.1 Social Determinants of Physical Activity There are a large number of potential social determinants of physical activity. This section focuses on four key social influences commonly examined in the literature: socioeconomic position, social support, personal safety/crime, and social capital/participation.
3.1.1 Socioeconomic Position While there is some contention over the most appropriate indicator of socioeconomic position, there is relatively consistent evidence of a socioeconomic gradient in physical activity, whereby those experiencing the greatest socioeconomic disadvantage are least likely to report participating in physical activity during their leisure time. These findings tend to be independent of the measure of socioeconomic position used. A review of 57 studies examining relationships between socioeconomic position and physical activity found a socioeconomic gradient in physical activity in 90% of studies (n = 10) that used social class as the socioeconomic position indicator, 61% of studies (n = 18) that used income as the indicator, 71% that used education (n = 24), 50% that used an asset-based indicator (n = 2), and 100% that used an area-based indicator of socioeconomic position (n = 3) (Gidlow et al, 2006). In the United Kingdom, where social classification by employment grade is commonly used as an indicator of socioeconomic position, an examination of over 10,000 adults involved in the Whitehall II study found that men and women of low employment grade had significantly greater odds of no or low exercise compared with those
V.J. Cleland et al.
of high employment grade, independently of spousal social class (Bartley et al, 2004). There is also some evidence of differences in the barriers to participation in physical activity according to socioeconomic position. For instance, a qualitative study conducted in Australia found that negative early life/family physical activity experiences and lack of time due to work commitments were consistent themes among women of lower socioeconomic position, but not among those of higher socioeconomic position (Ball et al, 2006b). A study in the United Kingdom of over 6000 adults found barriers such as lack of motivation, lack of time, lack of money, and lack of transport to be differentially distributed across different indicators of socioeconomic position (which included education, housing tenure, employment status, household social class, car ownership, and household income), with a higher proportion of adults of lower socioeconomic position identifying barriers to activity than those of higher socioeconomic position (Chinn et al, 1999).
3.1.2 Social Support Social support is one of the strongest and most consistent predictors of physical activity behavior (Sallis and Owen, 1999; Trost et al, 2002). In their systematic review of articles published between 1998 and 2000, Trost and colleagues reported that a significant positive relationship was evident between social support and physical activity in each of the nine studies reviewed that included a measure of social support. Another review of studies published between 1980 and 2004 concluded that there was convincing evidence for a positive relationship between social support and general physical activity, vigorous physical activity/sports, moderate-to-vigorous physical activity, and walking (Wendel-Vos et al, 2007). Most evidence comes from cross-sectional studies, for example, an Australian study of 1803 adults aged 18–59 years found that perceptions of high social support for walking in the neighborhood were associated with an 80% increase in the odds of
1 Social and Environmental Determinants of Health Behaviors
walking for recreation and a 50% increase in the odds of walking six times per week for at least 30 min each session (Giles-Corti and Donovan, 2002). Little evidence from prospective cohort studies is available. However, one Danish study examined changes in physical activity over 6 years among nearly 3000 adults aged 16 years and older and found in multivariable analyses that the only significant predictor of moving from the inactive category at baseline to the active category at follow-up was regularity of meeting with family, which may be an indirect indicator of social support (Zimmermann et al, 2008).
3.1.3 Personal Safety and Crime The evidence surrounding the associations between personal safety, crime, and physical activity is equivocal, with inconsistencies in findings likely due to substantial differences in definitions, measures (perceived or objective), sampling, and the unit of analysis (individual, neighborhood, or state level) across studies. A lack of prospective and intervention studies also limits firm conclusions. A study of an ethnically diverse sample of 2338 urban and rural older women found no evidence of a relationship between perceived high levels of crime or lack of a safe place and participation in regular physical activity (Wilcox et al, 2000), while a smaller study of 291 adult women of low socioeconomic position identified no relationship between perceived neighborhood safety and meeting leisure time physical activity recommendations (Cleland et al, 2010; Epub ahead of print, Oct 29, DOI:10.1093/her/cyn054 Oct 29, DOI:10.1093/her/cyn054 #1861). In contrast, a study of 1659 adults aged 18 years and over found that lower perceived neighborhood crime was associated with leisure time physical activity, particularly activity conducted outdoors (McGinn et al, 2008). In a sub-sample of 303 participants from the same study, objective measures of low total crime and low criminal offences, but not incivilities or traffic offences, were associated with higher odds of meeting
7
leisure time physical activity recommendations, particularly outdoor physical activity.
3.1.4 Social Capital Social capital has been defined as those features of social relationships, such as inter-personal trust, social participation, group membership, and norms of reciprocity, that facilitate collective action and cooperation for mutual benefit (Kawachi, 1999). While there is debate over whether social capital should be operationalized at the individual or community level (Putnam, 2000; Rose, 2000; Veenstra, 2000), it has been argued that a multilevel analytical approach is most appropriate because social capital may influence health at both levels (Kawachi et al, 2004). Although a number of studies have assessed relations between social capital and health outcomes, fewer have examined the association between social capital and physical activity. Despite difficulties in conceptualizing and measuring social capital, of those studies that have examined relations with physical activity, findings have tended to suggest a positive association. For instance, a study of 11,837 Swedish adults found that those reporting lower levels of social participation had significantly higher odds of low leisure time physical activity, and social participation explained most of the association observed between socioeconomic position and leisure time physical activity (Lindstrom et al, 2001). A multilevel analysis of data from another Swedish survey found that an individuallevel indicator of social capital (social participation), but not a neighborhood-level indicator of social capital (out-migration), was positively associated with leisure time physical activity (Lindstrom et al, 2003). A state- and countylevel analysis of social capital and physical activity among 167,000 adults in 48 states in the United States identified positive associations between social capital and physical activity in multilevel, multivariable analyses (Kim et al, 2006).
8
3.2 Environmental Determinants of Physical Activity There are a large number of potential determinants of physical activity in the physical environment, although research examining these is still relatively new. As discussed earlier, issues around definitions, measurement, and conceptualization of the environment and the infancy of this field make it difficult to draw firm conclusions about associations with physical activity. For instance, a recent review has highlighted an extensive range of issues associated with measuring the physical activity built environment and provides a useful summary of the many measurement tools currently available (Brownson et al, 2009). This section will focus on four key physical environment influences that have commonly been examined in the literature: availability and accessibility; aesthetics; infrastructure; and road safety.
3.2.1 Availability and Accessibility Evidence from studies of the influence of the physical environment on physical activity suggests a positive association between availability of and access to facilities such as recreation centers, cycle paths, footpaths and swimming pools, and physical activity. While most studies examining this association have been cross-sectional in design, findings have been relatively consistent. For example, a population-based study of 1796 adults in the United States found that those who reported access to places to be physically active had more than twice the odds of doing any activity and of doing recommended amounts of activity, after adjusting for sociodemographic and other environmental factors (Huston et al, 2003). The same study also found that those reporting access to neighborhood trails had significantly higher odds of achieving recommended levels of leisure time physical activity, independent of other sociodemographic and environmental factors. A number of studies have also found positive associations
V.J. Cleland et al.
between physical activity and access to local parks (Booth et al, 2000; Foster et al, 2004; Nagel et al, 2008), residing in coastal areas (Ball et al, 2007; Bauman et al, 1999), convenience of physical activity facilities (De Bourdeaudhuij et al, 2003; Duncan et al, 2009; Humpel et al, 2004b), and negative associations between distance to cycle paths (Troped et al, 2001). A recent study of adults from 11 countries found the odds of being physically active were significantly higher among those who had access to low-cost recreational facilities, bicycle facilities, and sidewalks on most local streets (Sallis et al, 2009). Furthermore, the odds of being active improved with increasing number of favorable environmental characteristics, suggesting that “clusters” of activity friendly environmental features may be important for promoting physical activity.
3.2.2 Aesthetics Consistent positive associations have been documented between aesthetic features of neighborhoods and participation in different types of physical activity (Humpel et al, 2002). Aesthetic features are often assessed through self-reported perceptions of the attractiveness of the environment, the amount of greenery or trees, the pleasantness of housing or the neighborhood, or the presence of enjoyable scenery. Cross-sectional evidence of a relationship between aesthetics and physical activity comes from a study of 3392 Australian adults which found those who reported less aesthetically pleasing environments had 28–39% lower odds of walking for exercise or recreation in the previous 2 weeks, compared with those reporting more aesthetically pleasing environments (Ball et al, 2001). Further longitudinal evidence of an association is provided by a 10-week prospective study of 512 Australian adults which found that men who reported positive changes in perceived aesthetics had twice the odds of increasing walking, although no relationship was observed among women (Humpel et al, 2004b), who are possibly more influenced by factors such as safety or accessibility.
1 Social and Environmental Determinants of Health Behaviors
3.2.3 Neighborhood Infrastructure The evidence regarding the importance of neighborhood infrastructure in influencing physical activity is equivocal. However, this may be related to a lack of specificity in the assessment of physical activity. For example, Wilcox and colleagues found no relationship between the presence of sidewalks and total leisure time physical activity among urban and rural women (Wilcox et al, 2000), but the measure of physical activity used was overall leisure activity, which may include many different activity types, rather than walking per se. Plausibly, some of this leisure time physical activity could have been accumulated in recreational facilities or other places where the presence of sidewalks would not be expected to have an influence. Associations may have been observed if instead walking for leisure or walking for transport had been assessed, because these physical activity behaviors are more likely to be influenced by the presence of sidewalks. This was evident in an Australian study of mothers, where the presence of sidewalks and good street lighting at night were positively associated with walking for transport (Cleland et al, 2008). That study also found that limited public transport was inversely associated and having many alternative routes for getting from place to place was positively associated with both walking for leisure and walking for transport. Similarly, a study of Belgian adults found that a greater ease of the walk to a public transportation stop was associated with higher levels of walking, but only among women (De Bourdeaudhuij et al, 2003).
3.2.4 Road Safety Road safety elements of the physical environment have been assessed objectively (for example, with a geographic information system) and subjectively (for example, self-reported perceptions), with contrasting findings observed. For instance, a North American study found that perceiving a busy street as a barrier was inversely associated with usage of a bikeway, but objective
9
measurement of this same variable was not associated with bikeway usage among adults (Troped et al, 2001). Another North American study found perceptions of high-speed traffic were not associated with physical activity, but objectively measured low traffic speeds were positively associated with meeting leisure time physical activity guidelines among adults (McGinn et al, 2007). In contrast, a study in two North American cities found no relationship between self-reported perceptions of safety from traffic while riding or walking, or an objective audit of street safety and physical activity for transportation or for recreation (Hoehner et al, 2005). In one of the few longitudinal studies to examine the influence of road safety on physical activity, no relationship was observed between perceived road safety and walking for leisure, but participant reports of satisfaction with pedestrian crossings, the presence of traffic-slowing devices, and slow local traffic speed were positively associated with walking for transportation over 2 years (Cleland et al, 2008). The findings from this study further highlight the importance of examining physical activity behaviors specific to the environmental features being examined.
4 Social and Environmental Determinants of Eating Behaviors Like physical activity, healthy eating comprises a complex set of behaviors that are challenging to measure. A detailed discussion of eating behavior assessment is provided in Chapter 4, but is described briefly here. A key consideration in dietary assessment is that there are many different elements of eating behavior that can be measured, including overall diet, patterns of food intake, consumption of specific foods, dietary habits, and nutrient intakes. It is therefore essential that a clear research question with a well-defined focus is established to assist in the selection of an appropriate assessment tool. Assessments of eating behaviors are generally conducted via self-report and involve
10
either recording of intake (e.g., weekly food diary) or recalling intake, retrospectively (e.g., 24-h food recalls or recall of intake via food frequency questionnaires). Both methods have strengths and limitations, and there is currently no “gold standard” assessment tool.
4.1 Social Determinants of Healthy Eating Behaviors There are many potential social determinants of healthy eating behaviors. This section will focus on three key social influences that have commonly been examined in the literature: socioeconomic position, social support, and family and household composition.
4.1.1 Socioeconomic Position In general, those of lower socioeconomic position tend to consume poorer diets than those of higher socioeconomic position (Diez-Roux et al, 1999). For example, cross-sectional data from the Netherlands demonstrated that men and women in the lower socioeconomic groups (defined according to education, occupation, and occupational position) tended to have dietary patterns less conducive to good health, including greater intakes of sugars and sweets (Hulshof et al, 2003). Similarly, findings from the Australian National Nutrition Survey found men and women of higher socioeconomic status (defined according to occupation) more frequently consumed foods promotive of good health such as breakfast cereals and wholemeal bread (Mishra et al, 2002). A Swedish study found many differences in associations between dietary intake and socioeconomic position across two different measures of socioeconomic position, educational attainment, and occupational status (Galobardes et al, 2001). For instance, in that study fiber intake was significantly lower in men and women of lower socioeconomic position defined according to occupation, but
V.J. Cleland et al.
no significant differences were observed across educational categories. Similarly, meat intake was significantly higher among women of lower occupational status, but no significant difference across educational categories was evident. These findings highlight the importance of giving careful consideration to the measures of socioeconomic position and eating behavior employed.
4.1.2 Social Support Social support from family and friends has demonstrated consistent positive associations with fruit and vegetable consumption in diverse populations (Kamphuis et al, 2006; Shaikh et al, 2008). A study of 271 adults from a lowincome population found that increases over 12 months in fruit and vegetable intake associated with a brief behavioral intervention were predicted by baseline social support for dietary change (Steptoe et al, 1997). A study of 658 African-American adults found that social support was associated with overall fruit and vegetable intake, and with fruit intake among women (Watters et al, 2007), and an Australian study found a positive relationship between social support from family and friends and fruit and vegetable intake among women of varying socioeconomic position (Ball et al, 2006a). While studies assessing relationships with fat intake are less common, one study of 441 overweight and obese men found social support was significantly inversely associated with percentage of energy from fat intake after adjusting for demographic and other psychosocial factors (Hagler et al, 2007). An intervention study among older adults found correlations between a social support score and changes in fruit and vegetable consumption, but not changes in fat intake, over 1 year (Murphy et al, 2001).
4.1.3 Family and Household Composition Given that many people spend much of their time in their home and that behavior is likely influenced by those who they are living with,
1 Social and Environmental Determinants of Health Behaviors
household composition is likely to influence eating behavior. The available evidence suggests that being married is positively associated with fruit and vegetable intake (Kamphuis et al, 2006), but also positively associated with energy and total fat intake and inversely associated with saturated fat intake (Giskes et al, 2007a). Fewer studies have focused on associations with fruit and vegetable consumption than energy or fat intake, and those existing studies have tended to focus on women, limiting the ability to draw conclusions related to marital status and fruit and vegetable consumption among men. For instance, a UK study of more than 35,000 women found that married participants had 62% higher odds of having a high fruit and vegetable consumption compared with their single counterparts (Pollard et al, 2001). A Canadian study of older adults found that a significantly greater proportion of those who were married consumed fruit and vegetables at least five times per day compared with those who were single (Riediger and Moghadasian, 2008). Despite a larger number of studies having examined relationships between marital status and energy or fat intakes, evidence remains inconclusive. For instance, an Irish study of over 6500 adults found married men and women on average consumed more energy per day than single adults, and married women consumed more fat per day than single women, but these differences were not statistically significant (Friel et al, 2003). Other features of the household that may impact on eating behavior are the presence and number of children. Women with children under the age of 16 years have been found to consume significantly more servings of fruit but significantly fewer servings of vegetables than women without children (Pollard et al, 2001). However, a Norwegian study found that those with children consumed fruits less often than participants without children (Wandel, 1995). One study found that, among white adults, those who had a young child, regardless of whether they were married or single, consumed significantly more fruit than did those who were married and had no children (Devine et al, 1999). Limited research has examined whether the presence and number
11
of children in the household is associated with energy or fat intake, making firm conclusions difficult to draw, and highlighting the need for further research in this area.
4.2 Environmental Determinants of Healthy Eating Behaviors There are many potential environmental determinants of healthy eating behaviors. Much of the existing research on environmental influences on eating has focused on features of the built environment, in particular, the accessibility and availability of food outlets. However, a growing body of evidence has investigated the affordability of foods, and this research will also be summarized here.
4.2.1 Availability and Access While a number of studies have investigated availability of different food stores, or of food items within food stores, across different neighborhoods, very few have linked these data with data on eating behaviors at the individual level. Consequently, evidence from empirical studies examining the relationship between availability of food and eating behaviors is relatively limited and remains equivocal. For instance, the presence of local grocery stores and the shelf space occupied by healthy foods in stores has been found to be negatively associated with fat intakes (Cheadle et al, 1991; Morland et al, 2002), but the presence of supermarkets, fullservice restaurants, or fast-food restaurants has not (Morland et al, 2002). Two recent reviews of the relationship between the environment and fat and energy intake (Giskes et al, 2007a) and fruit and vegetable intake found that the limited available evidence made firm conclusions difficult (Kamphuis et al, 2006); however, there was some evidence to suggest that fruit and vegetable consumption is likely to be highest among those who
12
have good local availability and accessibility of fruit and vegetables. Associations between access and availability among different population groups may also differ. In one US study, the presence of a supermarket was positively associated with fruit and vegetable consumption among black but not white residents (Morland et al, 2002), and in another, proximity to a supermarket was associated with fruit consumption among lowincome residents (Rose and Richards, 2004). Associations between access, availability, and eating behaviors may also be place-dependent. For example, in contrast to some US evidence, studies in Australia have found no relationship between the “objective” availability of recommended foods and food purchasing behavior (Giskes et al, 2007b), or between the density of supermarkets and fruit and vegetable stores in local neighborhoods and fruit and vegetable consumption among women (Ball et al, 2006a). These null findings may be partly attributable to ceiling effects in access to healthy foods, such that at least in urban areas of many developed countries, residents all have good access to food stores, and hence there may be insufficient variation in the availability of healthy foods to distinguish those with more healthy eating behaviors (Brug et al, 2008). It has also been suggested that findings of stronger or more consistent associations between access to healthy foods and eating behavior have been observed in countries such as the US, where there may be greater spatial segregation in availability of healthy food options (Brug et al, 2008).
4.2.2 Affordability The perceived high cost or low affordability of healthy foods is one of the most frequently cited barriers to healthy eating, particularly among low-income individuals (Glanz et al, 1998; Inglis et al, 2005). A recent review of environmental correlates of fruit and vegetable consumption (Kamphuis et al, 2006) demonstrated that living in low-income households or neighborhoods, or being food insecure, was associated
V.J. Cleland et al.
with lower fruit and vegetable consumption. However, findings of studies examining economic factors and eating behaviors are equivocal. For example, another review found that low household income and neighborhood disadvantage were not strongly associated with energy or fat intakes (Giskes et al, 2007a). Similarly, while some studies have reported that healthy diets are more expensive than less healthy diets (Andrieu et al, 2006; Drewnowski et al, 2004; Jetter and Cassady, 2006), others have not – two studies in the US, for instance, showed that nutrientdense diets are not more expensive than lower quality diets and may even cost less (Burney and Haughton, 2002; Raynor et al, 2002). It is possible that perceived costs may represent a greater barrier to healthy eating than actual costs, an argument supported by results of a recent Australian study which showed that perceived availability and price of healthy foods were more important than objective measures in predicting diet and mediating socioeconomic variations in diet (Giskes et al, 2007b). Perhaps the strongest evidence for the impact of affordability on eating behaviors comes from experimental or intervention studies and several of these have suggested that cost is an important determinant of food consumption. For example, two community-based intervention studies demonstrated that price reduction strategies promote the choice of targeted foods. The Changing Individuals’ Purchase of Snacks (CHIPS) study, based in 12 high schools and 12 worksites, found that lowering the prices of lower fat snacks increased the purchase of these snacks, with increases in direct proportion to the price reductions. Compared with usual price conditions, price reductions of 10, 25, and 50 on lower fat snacks resulted in an increase in sales of 9, 39, and 93%, respectively (French et al, 2001). The second study showed a fourfold increase in fresh fruit sales and a twofold increase in baby carrot sales resulting from a 50% price reduction in the costs of these items in two secondary school cafeterias (French et al, 1997). While confirmation of these findings in settings other than schools and worksites is required, evidence from experimental studies
1 Social and Environmental Determinants of Health Behaviors
such as those reviewed here suggests that cost appears a potent modifiable intervention level for strategies designed to promote healthy eating behaviors. This is particularly timely and important given that fiscal approaches to modifying eating behaviors (for instance, as a potential strategy to counter rising rates of obesity worldwide) are currently the topic of intense debate and increasing attention in research, public health, and policy circles internationally (e.g., McColl, 2009).
5 Conclusions Physical activity and healthy eating are complex behaviors that are important predictors of a range of health outcomes and indicators. Despite recommendations that aim to promote these behaviors, a large proportion of the population fails to meet physical activity and eating guidelines. Given the importance of physical activity and healthy eating for good health, it is important to understand influences on these behaviors in order to develop effective interventions and strategies aimed at promoting health. While much past research has examined individuallevel influences on physical activity and healthy eating, less is known about the influence of the broader social and physical environment on these behaviors. Social and physical environment influences offer significant public health prospects because of the opportunity to intervene “upstream”. Despite the potential offered by these factors, there are many methodological issues that need to be resolved in order to advance our understanding of how social and environmental factors are related to physical activity and healthy eating. Current evidence suggests that socioeconomic position and social support are consistently positively associated with physical activity, highlighting the importance of targeting those at risk of inactivity such as those facing socioeconomic disadvantage or who are socially isolated. Preliminary evidence also suggests that social capital, operationalized at either
13
the individual or community level, is likely to have a positive relationship with physical activity. Evidence concerning the influence of personal safety and crime on physical activity is equivocal, and further research is required to better current understandings of these relationships. Access to and the availability and aesthetics of the physical environment appear to be positively associated with physical activity, while the evidence surrounding relationships between neighborhood infrastructure and road safety and physical activity remains inconclusive. Evidence suggests a consistent positive association between socioeconomic position and social support and healthy eating, highlighting the importance of targeting those most at risk of social disadvantage or lack of support. While marital status and the presence and number of children in the household appear to be important influences on healthy eating, the lack of empirical studies makes strong conclusions and generalizations difficult to affirm. The evidence regarding access to and availability of healthy and unhealthy food choices is contentious, which is possibly related to conceptual and methodological issues. Limited intervention evidence suggests that food costs appear a potentially modifiable and effective intervention lever, although perceptions of affordability may also need to be addressed. Despite the current evidence, further work is required in relation to the conceptualization, measurement, and definition of the social and particularly the environmental influences on physical activity and eating behaviors. Investigations based on theoretically driven research questions with consistent and comparable measures across studies are needed and specific research in at-risk populations such as those of lower socioeconomic position and minority groups will enhance understanding of how social and environmental factors relate to physical activity and eating behavior. The measurement of physical activity and eating behavior remains challenging despite technological advances, and different measures employed across studies make comparisons difficult. Consistent definitions and measures in future investigations
14
will play a crucial role in helping to answer important research questions around determinants of physical activity and healthy eating behaviors. Understanding the relative contribution of individual, social, and environmental factors and how these factors interact to influence physical activity and healthy eating behaviors within different population groups requires further investigation. Few studies have attempted to examine the independent contributions of these factors in multivariable models to determine which are the most important for physical activity and healthy eating behaviors. Disentangling these relationships will assist in the prioritization of factors to target in strategies aimed at promoting physical activity and healthy eating. Most of the current evidence surrounding social and environmental influences on physical activity and healthy eating behaviors is crosssectional. While cross-sectional study designs provide important information about the existence of associations, they do not allow for insights into the temporal nature of relationships, or about the influence of manipulations of, for example, features of the physical environment on behavior. Longitudinal and intervention studies are needed to help establish the sequential and ultimately causal nature of relationships between social and environmental factors and physical activity and healthy eating. Until these studies are conducted, it will be unknown whether those who are physically active and eat well are attracted to social and physical environments that are supportive of these behaviors or if supportive social and physical environments encourage physical activity and healthy eating behaviors. Although there are gaps in the current scientific literature, the available evidence suggests that social and environmental factors have a role to play in influencing physical activity and healthy eating behaviors. Further development in the conceptualization, definition, and measurement of social and environmental determinants, as well as the examination of these determinants in longitudinal and intervention studies, will assist in our understanding of how these
V.J. Cleland et al.
factors relate to physical activity and healthy eating behaviors. A better comprehension of these relations will enable the development of tailored programs and strategies to effectively promote physical activity and healthy eating behaviors, and consequently improve the health of the community.
References Ajzen, I. (1985). From intentions to actions: a theory of planned behavior. In J. Kuhl & J. Beckman (Eds.), Action-Control: From Cognition to Behavior (pp. 11–39). Heidelberg: Springer. Andrieu, E., Darmon, N., and Drewnowski, A. (2006). Low-cost diets: more energy, fewer nutrients. Eur J Clin Nutr, 60, 434–436. Ball, K., Bauman, A., Leslie, E., and Owen, N. (2001). Perceived environmental aesthetics and convenience and company are associated with walking for exercise among Australian adults. Prev Med, 33, 434–440. Ball, K., Crawford, D., and Mishra, G. (2006a). Socioeconomic inequalities in women’s fruit and vegetable intakes: a multilevel study of individual, social and environmental mediators. Public Health Nutr, 9, 623–630. Ball, K., Salmon, J., Giles-Corti, B., and Crawford, D. (2006b). How can socio-economic differences in physical activity among women be explained? A qualitative study. Women Health, 43, 93–113. Ball, K., Timperio, A., Salmon, J., Giles-Corti, B., Roberts, R. et al (2007). Personal, social and environmental determinants of educational inequalities in walking: a multilevel study. J Epidemiol Community Health, 61, 108–114. Ball, K., Timperio, A. F., and Crawford, D. A. (2006c). Understanding environmental influences on nutrition and physical activity behaviors: where should we look and what should we count? Int J Behav Nutr Phys Act, 3, 33. Bandura, A. (1986). Social Foundations of Thought and Action: A Social Cognitive Theory. Englewood Cliffs, NJ: Prentice Hall. Baranowski, T., Cullen, K. W., and Baranowski, J. (1999). Psychosocial correlates of dietary intake: advancing dietary intervention. Annu Rev Nutr, 19, 17–40. Bartley, M., Martikainen, P., Shipley, M., and Marmot, M. (2004). Gender differences in the relationship of partner’s social class to behavioural risk factors and social support in the Whitehall II study. Soc Sci Med, 59, 1925–1936. Bauman, A., Smith, B., Stoker, L., Bellew, B., and Booth, M. (1999). Geographical influences upon physical activity participation: evidence of a ‘coastal effect’. Aust N Z J Public Health, 23, 322–324.
1 Social and Environmental Determinants of Health Behaviors Becker, M. H., and Maiman, L. A. (1975). Sociobehavioral determinants of compliance with health and medical care recommendations. Med Care, 13, 10–24. Booth, M. L., Owen, N., Bauman, A., Clavisi, O., and Leslie, E. (2000). Social-cognitive and perceived environment influences associated with physical activity in older Australians. Prev Med, 31, 15–22. Brownson, R. C., Hoehner, C. M., Day, K., Forsyth, A., and Sallis, J. F. (2009). Measuring the built environment for physical activity: state of the science. Am J Prev Med, 36, S99–123 e12. Brug, J., Kremers, S. P., Lenthe, F., Ball, K., and Crawford, D. (2008). Environmental determinants of healthy eating: in need of theory and evidence. Proc Nutr Soc, 67, 307–316. Burney, J., and Haughton, B. (2002). EFNEP: a nutrition education program that demonstrates cost-benefit. J Am Diet Assoc, 102, 39–45. Cheadle, A., Psaty, B. M., Curry, S., Wagner, E., Diehr, P. et al (1991). Community-level comparisons between the grocery store environment and individual dietary practices. Prev Med, 20, 250–261. Chinn, D. J., White, M., Harland, J., Drinkwater, C., and Raybould, S. (1999). Barriers to physical activity and socioeconomic position: implications for health promotion. J Epidemiol Community Health, 53, 191–192. Cleland, V. J., Ball, K., Salmon, J., Timperio, A. F., and Crawford, D. A. (2010). Personal, social and environmental correlates of resilience to physical inactivity among women from socio-economically disadvantaged backgrounds. Health Educ Res, 25(2), 268–281 (Epub ahead of print, Oct 29, DOI:10.1093/her/cyn054). Cleland, V. J., Timperio, A., and Crawford, D. (2008). Are perceptions of the physical and social environment associated with mothers’ walking for leisure and for transport? A longitudinal study. Prev Med, 47, 188–193 (Epub ahead of print, Oct 29, DOI:10.1093/her/cyn054 #1861). Cliska, D., Miles, E., O’Brien, M. A., Turl, C., Tomasik, H. H. et al (2000). Effectiveness of community-based interventions to increase fruit and vegetable consumption. J Nutr Educ, 32, 241–252. De Bourdeaudhuij, I., Sallis, J. F., and Saelens, B. E. (2003). Environmental correlates of physical activity in a sample of Belgian adults. Am J Health Promot, 18, 83–92. Devine, C. M., Wolfe, W. S., Frongillo, E. A., Jr., and Bisogni, C. A. (1999). Life-course events and experiences: association with fruit and vegetable consumption in 3 ethnic groups. J Am Diet Assoc, 99, 309–314. Diez-Roux, A. V., Nieto, F. J., Caulfield, L., Tyroler, H. A., Watson, R. L. et al (1999). Neighbourhood differences in diet: the Atherosclerosis Risk in Communities (ARIC) Study. J Epidemiol Community Health, 53, 55–63.
15
Drewnowski, A., Darmon, N., and Briend, A. (2004). Replacing fats and sweets with vegetables and fruits – a question of cost. Am J Public Health, 94, 1555–1559. Duncan, M. J., Mummery, W. K., Steele, R. M., Caperchione, C., and Schofield, G. (2009). Geographic location, physical activity and perceptions of the environment in Queensland adults. Health Place, 15, 204–209. Foster, C., Hillsdon, M., and Thorogood, M. (2004). Environmental perceptions and walking in English adults. J Epidemiol Community Health, 58, 924–928. French, S. A., Jeffery, R. W., Story, M., Breitlow, K. K., Baxter, J. S. et al (2001). Pricing and promotion effects on low-fat vending snack purchases: the CHIPS Study. Am J Public Health, 91, 112–117. French, S. A., Story, M., Jeffery, R. W., Snyder, P., Eisenberg, M. et al (1997). Pricing strategy to promote fruit and vegetable purchase in high school cafeterias. J Am Diet Assoc, 97, 1008–1010. Friel, S., Kelleher, C. C., Nolan, G., and Harrington, J. (2003). Social diversity of Irish adults nutritional intake. Eur J Clin Nutr, 57, 865–875. Galobardes, B., Morabia, A., and Bernstein, M. S. (2001). Diet and socioeconomic position: does the use of different indicators matter? Int J Epidemiol, 30, 334–340. Gidlow, C., Johnston, L. H., Crone, D., Ellis, N., and James, D. (2006). A systematic review of the relationship between socio-economic position and physical activity. Health Educ J, 65, 338–367. Giles-Corti, B., and Donovan, R. J. (2002). Socioeconomic status differences in recreational physical activity levels and real and perceived access to a supportive physical environment. Prev Med, 35, 601–611. Giles-Corti, B., Timperio, A., Bull, F., and Pikora, T. (2005). Understanding physical activity environmental correlates: increased specificity for ecological models. Exerc Sport Sci Rev, 33, 175–181. Giskes, K., Kamphuis, C. B., van Lenthe, F. J., Kremers, S., Droomers, M. et al (2007a). A systematic review of associations between environmental factors, energy and fat intakes among adults: is there evidence for environments that encourage obesogenic dietary intakes? Public Health Nutr, 10, 1005–1017. Giskes, K., Van Lenthe, F. J., Brug, J., Mackenbach, J. P., and Turrell, G. (2007b). Socioeconomic inequalities in food purchasing: the contribution of respondent-perceived and actual (objectively measured) price and availability of foods. Prev Med, 45, 41–48. Glanz, K., Basil, M., Maibach, E., Goldberg, J., and Snyder, D. (1998). Why Americans eat what they do: taste, nutrition, cost, convenience, and weight control concerns as influences on food consumption. J Am Diet Assoc, 98, 1118–1126. Hagler, A. S., Norman, G. J., Zabinski, M. F., Sallis, J. F., Calfas, K. J. et al (2007). Psychosocial correlates of
16 dietary intake among overweight and obese men. Am J Health Behav, 31, 3–12. Hoehner, C. M., Brennan Ramirez, L. K., Elliott, M. B., Handy, S. L., and Brownson, R. C. (2005). Perceived and objective environmental measures and physical activity among urban adults. Am J Prev Med, 28, 105–116. Hulshof, K. F., Brussaard, J. H., Kruizinga, A. G., Telman, J., and Lowik, M. R. (2003). Socio-economic status, dietary intake and 10 y trends: the Dutch National Food Consumption Survey. Eur J Clin Nutr, 57, 128–137. Humpel, N., Marshall, A. L., Leslie, E., Bauman, A., and Owen, N. (2004a). Changes in neighborhood walking are related to changes in perceptions of environmental attributes. Ann Behav Med, 27, 60–67. Humpel, N., Owen, N., and Leslie, E. (2002). Environmental factors associated with adults’ participation in physical activity: a review. Am J Prev Med, 22, 188–199. Humpel, N., Owen, N., Leslie, E., Marshall, A. L., Bauman, A. E. et al (2004b). Associations of location and perceived environmental attributes with walking in neighborhoods. Am J Health Promot, 18, 239–242. Huston, S. L., Evenson, K. R., Bors, P., and Gizlice, Z. (2003). Neighborhood environment, access to places for activity, and leisure-time physical activity in a diverse North Carolina population. Am J Health Promot, 18, 58–69. Inglis, V., Ball, K., and Crawford, D. (2005). Why do women of low socioeconomic status have poorer dietary behaviours than women of higher socioeconomic status? A qualitative exploration. Appetite, 45, 334–343. Jetter, K. M., and Cassady, D. L. (2006). The availability and cost of healthier food alternatives. Am J Prev Med, 30, 38–44. Kamphuis, C. B., Giskes, K., de Bruijn, G. J., WendelVos, W., Brug, J. et al (2006). Environmental determinants of fruit and vegetable consumption among adults: a systematic review. Br J Nutr, 96, 620–635. Kawachi, I. (1999). Social capital and community effects on population and individual health. Ann N Y Acad Sci, 896, 120–130. Kawachi, I., Kim, D., Coutts, A., and Subramanian, S. V. (2004). Commentary: reconciling the three accounts of social capital. Int J Epidemiol, 33, 682–90; discussion 700–704. Kim, D., Subramanian, S. V., Gortmaker, S. L., and Kawachi, I. (2006). US state- and county-level social capital in relation to obesity and physical inactivity: a multilevel, multivariable analysis. Soc Sci Med, 63, 1045–1059. Lindstrom, M., Hanson, B. S., and Ostergren, P. O. (2001). Socioeconomic differences in leisure-time physical activity: the role of social participation and social capital in shaping health related behaviour. Soc Sci Med, 52, 441–451.
V.J. Cleland et al. Lindstrom, M., Moghaddassi, M., and Merlo, J. (2003). Social capital and leisure time physical activity: a population based multilevel analysis in Malmo, Sweden. J Epidemiol Community Health, 57, 23–28. McColl, K. (2009). “Fat taxes” and the financial crisis. Lancet, 373, 797–798. McGinn, A. P., Evenson, K. R., Herring, A. H., Huston, S. L., and Rodriguez, D. A. (2007). Exploring associations between physical activity and perceived and objective measures of the built environment. J Urban Health, 84, 162–184. McGinn, A. P., Evenson, K. R., Herring, A. H., Huston, S. L., and Rodriguez, D. A. (2008). The association of perceived and objectively measured crime with physical activity: a cross-sectional analysis. J Phys Act Health, 5, 117–131. Mishra, G., Ball, K., Arbuckle, J., and Crawford, D. (2002). Dietary patterns of Australian adults and their association with socioeconomic status: results from the 1995 National Nutrition Survey. Eur J Clin Nutr, 56, 687–693. Morland, K., Wing, S., and Diez Roux, A. (2002). The contextual effect of the local food environment on residents’ diets: the atherosclerosis risk in communities study. Am J Public Health, 92, 1761–1767. Murphy, P. A., Prewitt, T. E., Bote, E., West, B., and Iber, F. L. (2001). Internal locus of control and social support associated with some dietary changes by elderly participants in a diet intervention trial. J Am Diet Assoc, 101, 203–208. Nagel, C. L., Carlson, N. E., Bosworth, M., and Michael, Y. L. (2008). The relation between neighborhood built environment and walking activity among older adults. Am J Epidemiol, 168, 461–468. National Health and Medical Research Council (2003). Dietary Guidelines for Australian Adults. Canberra: National Health and Medical Research Council. Pollard, J., Greenwood, D., Kirk, S., and Cade, J. (2001). Lifestyle factors affecting fruit and vegetable consumption in the UK Women’s Cohort Study. Appetite, 37, 71–79. Putnam, R. D. (2000). Bowling Alone. The collapse and revival of American community. New York, London: Simon & Schuster. Raynor, H. A., Kilanowski, C. K., Esterlis, I., and Epstein, L. H. (2002). A cost-analysis of adopting a healthful diet in a family-based obesity treatment program. J Am Diet Assoc, 102, 645–656. Riediger, N. D., and Moghadasian, M. H. (2008). Patterns of fruit and vegetable consumption and the influence of sex, age and socio-demographic factors among Canadian elderly. J Am Coll Nutr, 27, 306–313. Rose, D., and Richards, R. (2004). Food store access and household fruit and vegetable use among participants in the US Food Stamp Program. Public Health Nutr, 7, 1081–1088. Rose, R. (2000). How much does social capital add to individual health? A survey study of Russians. Soc Sci Med, 51, 1421–1435.
1 Social and Environmental Determinants of Health Behaviors Sallis, J., and Owen, N. (1999). Physical Activity and Behavioral Medicine. California: Sage Publications. Sallis, J., and Owen, N. (2002). Ecological models of health behavior. In K. Glanz, B. K. Rimer, & F. M. Lewis (Eds.), Health Behavior and Health Education: Theory, Research & Practice, 3rd Ed (pp. 462–484). San Francisco: Jossey-Bass. Sallis, J. F., Bowles, H. R., Bauman, A., Ainsworth, B. E., Bull, F. C. et al (2009). Neighborhood environments and physical activity among adults in 11 countries. Am J Prev Med, 36, 484–490. Shaikh, A. R., Yaroch, A. L., Nebeling, L., Yeh, M. C., and Resnicow, K. (2008). Psychosocial predictors of fruit and vegetable consumption in adults a review of the literature. Am J Prev Med, 34, 535–543. Steptoe, A., Wardle, J., Fuller, R., Holte, A., Justo, J. et al (1997). Leisure-time physical exercise: prevalence, attitudinal correlates, and behavioral correlates among young Europeans from 21 countries. Prev Med, 26, 845–854. Stokols, D. (1992). Establishing and maintaining healthy environments. Toward a social ecology of health promotion. Am Psychol, 47, 6–22. Troped, P. J., Saunders, R. P., Pate, R. R., Reininger, B., Ureda, J. R. et al (2001). Associations between selfreported and objective physical environmental factors and use of a community rail-trail. Prev Med, 32, 191–200. Trost, S. G., Owen, N., Bauman, A. E., Sallis, J. F., and Brown, W. (2002). Correlates of adults’ participation in physical activity: review and update. Med Sci Sports Exerc, 34, 1996–2001. US Department of Health and Human Services (1996). Physical activity and health: a report of the Surgeon General. Atlanta, GA: United States Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion.
17
US Department of Health and Human Services (2005). The Dietary Guidelines for Americans. USA: US Department of Health and Human Services. US Department of Health and Human Services (2008). 2008 Physical Activity Guidelines for Americans. USA: US Department of Health and Human Services. Veenstra, G. (2000). Social capital, SES and health: an individual-level analysis. Soc Sci Med, 50, 619–629. Wandel, M. (1995). Dietary intake of fruits and vegetables in Norway: influence of life phase and socioeconomic factors. Int J Food Sci Nutr, 46, 291–301. Watters, J. L., Satia, J. A., and Galanko, J. A. (2007). Associations of psychosocial factors with fruit and vegetable intake among African-Americans. Public Health Nutr, 10, 701–711. Wendel-Vos, W., Droomers, M., Kremers, S., Brug, J., and van Lenthe, F. (2007). Potential environmental determinants of physical activity in adults: a systematic review. Obes Rev, 8, 425–440. Wilcox, S., Castro, C., King, A. C., Housemann, R., and Brownson, R. C. (2000). Determinants of leisure time physical activity in rural compared with urban older and ethnically diverse women in the United States. J Epidemiol Community Health, 54, 667–672. World Cancer Research Fund and American Institute for Cancer Research (2007). Food, Nutrition, Physical Activity, and the Prevention of Cancer: A Global Perspective. Washington, DC: American Institute for Cancer Research. World Health Organization (2002). Diet, Nutrition and the Prevention of Chronic Disease. Report of a joint WHO/FAO Technical Expert Group. WHO Technical Report Series, 916. Geneva: World Health Organization. Zimmermann, E., Ekholm, O., Gronbaek, M., and Curtis, T. (2008). Predictors of changes in physical activity in a prospective cohort study of the Danish adult population. Scand J Public Health, 36, 235–241.
Chapter 2
Cognitive Determinants of Health Behavior Mark Conner
1 Introduction The prevalence of health behaviors varies across social groups. For example, in the Western World smoking is generally more prevalent among those from economically disadvantaged backgrounds. This might suggest such sociodemographic factors as the focus of interventions to change health behaviors. However, such factors are frequently impossible to change or require political intervention at national or international levels (e.g., change in income distribution). This is one reason why a considerable body of research has focused on more modifiable factors assumed to mediate the relationship between socio-demographic factors and health-related behaviors. One important set of such factors is the thoughts and feelings the individual associates with the particular healthrelated behavior. These are often referred to as health cognitions and are the focus of this chapter. Although research does examine the role of individual health cognitions (e.g., outcome expectancies), most of the research in this area uses models that include sets of health cognitions that are assumed to combine in different ways to determine behavior. These models are collectively known as social cognition models (SCMs; Conner and Norman, 2005). They prominently
M. Conner () Institute of Psychological Sciences, University of Leeds, Leeds LS2 9JT, UK e-mail: [email protected]
include the Health Belief Model (HBM; e.g., Abraham and Sheeran, 2005; Janz and Becker, 1984), Protection Motivation Theory (PMT; e.g., Maddux and Rogers, 1983; Norman et al, 2005), Theory of Reasoned Action/Theory of Planned Behavior (TRA/TPB; e.g., Ajzen, 1991; Conner and Sparks, 2005), and Social Cognitive Theory (SCT; e.g., Bandura, 2000; Luszczynska and Schwarzer, 2005). Stage models represent a different form of SCM which does not assume behavior change to be linear, but rather to occur in discrete stages. Prochaska and DiClemente’s (1984) Transtheoretical Model of Change (TTM) is the most commonly applied stage model. Below these SCMs are described and research using each is reviewed. There is considerable overlap between the models and the key health cognitions they identify. Building on this overlap some work has attempted to integrate SCMs into a unified theory of the determinants of health behaviors (Fishbein et al, 2001). This integrated model will also be described. Finally, this chapter overviews recent work in this area on intention stability as an important mediator of cognitive effects, affective expectancies as a highly predictive yet insufficiently considered variable, and implementation intentions as an important volitional technique to promote action.
2 Social Cognition Models Social cognition models (SCMs) detail the important cognitions that distinguish between
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_2, © Springer Science+Business Media, LLC 2010
19
20
those performing and not performing behaviors. The focus is on the cognitions or thought processes that intervene between observable stimuli and behavior in real-world situations (Fiske and Taylor, 1991). This approach is founded on the assumption that behavior is best understood as a function of people’s perceptions of reality, rather than objective characterizations of the stimulus environment. SCMs can be seen as one part of self-regulation research. Self-regulation processes are defined as those “... mental and behavioral processes by which people enact their self-conceptions, revise their behavior, or alter the environment so as to bring about outcomes in it in line with their self-perceptions and personal goals” (Fiske and Taylor, 1991, p. 181). Selfregulation research has emerged from a clinical tradition in psychology which views the individual as striving to eliminate dysfunctional patterns of thinking or behavior and engage in adaptive patterns of thinking or behavior (Bandura, 1982; Turk and Salovey, 1986). Self-regulation involves cognitive re-evaluation of beliefs, goal setting, and ongoing monitoring and evaluating of goal-directed behavior. Two phases of selfregulation activities have been defined: motivational and volitional (Gollwitzer, 1990). In the motivational phase costs and benefits are considered in order to choose between goals and behaviors. This phase is assumed to conclude with a decision (or intention) concerning which goals and actions to pursue at a particular time. In the subsequent volitional phase, planning and action directed toward achieving the set goal predominate. The majority of SCMs focus on the motivational phase, although work with implementation intentions focuses on the volitional phase of action.
M. Conner
the belief that people with the disease could be asymptomatic (so that screening would be beneficial) distinguished between attendees and nonattendees for chest x-rays. Haefner and Kirscht (1970) extended this research by demonstrating that an intervention designed to increase participants’ perceived susceptibility, perceived severity, and anticipated benefits resulted in a greater number of checkup visits to the doctor over an 8-month period compared to a control condition. The HBM posits that health behavior is determined by two cognitions: perceptions of illness threat and evaluation of behaviors to counteract this threat (see Fig. 2.1). Threat perceptions are based on two beliefs: the perceived susceptibility of the individual to the illness (“How likely am I to get ill?”) and the perceived severity of the consequences of the illness for the individual (“How serious would the illness be?”). Similarly, evaluation of possible responses involves consideration of both the potential benefits of and barriers/costs to action. Together these four beliefs are believed to determine the likelihood of the individual performing a health behavior. The specific action taken is determined by the evaluation of the available alternatives, focusing on the benefits or efficacy of the health behavior and the perceived costs or barriers of performing the behavior. Hence individuals are most likely to follow a particular health action if they believe themselves to be susceptible to a particular condition which they also consider to be serious and believe that the benefits outweigh the costs of the action taken to counteract the health threat.
2.1 The Health Belief Model The Health Belief Model (HBM) is the oldest and most widely used SCM (see Abraham and Sheeran, 2005, for a recent review). In one of the earliest studies, Hochbaum (1958) reported that perceived susceptibility to tuberculosis and
Fig. 2.1 The Health Belief Model
2 Cognitive Determinants of Health Behavior
Two further cognitions usually included in the model are cues to action and health motivation. Cues to action are assumed to include a diverse range of triggers to the individual taking action which may be internal (e.g., physical symptom) or external (e.g., mass media campaign, advice from others) to the individual (Janz and Becker, 1984). An individual’s perception of the presence of cues to action would be expected to prompt adoption of the health behavior if he/she already holds other key beliefs favoring action. Health motivation refers to more stable differences between individuals in the value they attach to their health and their propensity to be motivated to look after their health. Individuals with a high motivation to look after their health should be more likely to adopt relevant health behaviors. The HBM has provided a useful framework for investigating health behaviors and has been widely used. It has been found to successfully predict a range of behaviors. For example, Janz and Becker (1984) found that across 18 prospective studies, the 4 core beliefs were nearly always significant predictors of health behavior (82, 65, 81, and 100% of studies report significant effects for susceptibility, severity, benefits, and barriers, respectively). Harrison et al (1992), in a review with more stringent inclusion criteria, reported that susceptibility and barriers were the strongest predictors of behavior. Some studies have found that these health beliefs mediate the effects of demographic
Fig. 2.2 Protection Motivation Theory
21
correlates of health behavior. For example, Orbell et al (1995) reported perceived susceptibility and barriers to entirely mediate the effects of social class upon uptake of cervical screening. The HBM has also inspired a range of successful behavior change interventions (e.g., Jones et al, 1987). The main strength of the HBM is the common-sense operationalization it uses including key beliefs related to decisions about health behaviors. However, further research has identified other cognitions that are stronger predictors of health behavior than those identified by the HBM, suggesting that the model is incomplete. This prompted a proposal to add self-efficacy and intention to the model to produce an “extended health belief model” (Rosenstock et al, 1988) which has generally improved the predictive power of the model (e.g., Hay et al, 2003).
2.2 Protection Motivation Theory Protection Motivation Theory (PMT; Maddox and Rodgers, 1983; see Norman et al, 2005 for a review) is a revision and extension of the HBM which incorporates various appraisal processes identified by research into coping with stress. In PMT, the primary determinant of performing a health behavior is protection motivation or intention to perform a health behavior (see Fig. 2.2). Protection motivation is determined
22
by two appraisal processes: threat appraisal and coping appraisal. Threat appraisal is based on a consideration of perceptions of susceptibility to the illness and severity of the health threat in a very similar way to the HBM. Coping appraisal involves the process of assessing the behavioral alternatives which might diminish the threat. This coping process is itself assumed to be based on two components: the individual’s expectancy that carrying out a behavior can remove the threat (action-outcome efficacy) and a belief in one’s capability to successfully execute the recommended courses of action (self-efficacy). Together these two appraisal processes result in either adaptive or maladaptive responses. Adaptive responses are those in which the individual engages in behaviors likely to reduce the risk (e.g., adopting a health behavior) whereas maladaptive responses are those that do not directly tackle the threat (e.g., denial of the health threat). Adaptive responses are held to be more likely if the individual perceives himself or herself to be facing a health threat to which he/she is susceptible and which is perceived to be severe and where the individual perceives such responses to be effective in reducing the threat and believes that he/she can successfully perform the adaptive response. The PMT has been successfully applied to the prediction of a number of health behaviors (for a recent review see Norman et al, 2005). Meta-analytic reviews of PMT (Floyd et al, 2000; Milne et al, 2000) indicate protection motivation (i.e., intentions) and self-efficacy to be the most powerful
Fig. 2.3 Theory of Planned Behavior
M. Conner
predictors of behavior, while self-efficacy and response costs were most strongly associated with intentions.
2.3 Theory of Planned Behavior The Theory of Planned Behavior (TPB; Ajzen, 1991) was developed by social psychologists and has been widely applied to understanding health behaviors (see Conner and Sparks, 2005, for a review). It specifies the factors that determine that individual’s decision to perform a particular behavior (see Fig. 2.3). Importantly this theory added “perceived behavioral control” to the earlier Theory of Reasoned Action (TRA; Ajzen and Fishbein, 1980). The TPB proposes that the key determinants of behavior are intention to engage in that behavior and perceived behavioral control over that behavior. As in the PMT, intentions in the TPB represent a person’s motivation or conscious plan or decision to exert effort to perform the behavior. Perceived behavioral control (PBC) is a person’s expectancy that performance of the behavior is within his/her control and confidence that he/she can perform the behavior and is similar to Bandura’s (1982) concept of self-efficacy. In the TPB, intention is assumed to be determined by three factors: attitudes, subjective norms, and PBC. Attitudes are the overall evaluations of the behavior by the individual as positive or negative. Subjective norms are a person’s
2 Cognitive Determinants of Health Behavior
beliefs about whether significant others think he/she should engage in the behavior. PBC is assumed to influence both intentions and behavior because we rarely intend to do things we know we cannot and because believing that we can succeed enhances effort and persistence and so makes successful performance more likely. Attitudes are based on behavioral beliefs (or outcome expectancies), that is, beliefs about the perceived outcomes of a behavior. In particular, they are a function of the likelihood of the outcome occurring as a result of performing the behavior (e.g., “How likely is this outcome?”) and the evaluation of that outcome (e.g., “How good or bad will this outcome be for me?”). It is assumed that an individual will have a limited number of consequences in mind when considering a behavior. This expectancy-value framework is based on Fishbein’s (1967) earlier summative model of attitudes. Subjective norm is based on beliefs about salient others’ approval or disapproval of whether one should engage in a behavior (e.g., “Would my best friend want me to do this?”) weighted by the motivation to comply with each salient other on this issue (e.g., “Do I want to do what my best friend wants me to do?”). Again it is assumed that an individual will only have a limited number of referents in mind when considering a behavior. PBC is based on control beliefs concerning whether one has access to the necessary resources and opportunities to perform the behavior successfully (e.g., “How often does this facilitator/inhibitor occur?”), weighted by the perceived power, or importance, of each factor to facilitate or inhibit the action (e.g., “How much does this facilitator/inhibitor make it easier or more difficult to perform this behavior?”). These factors include both internal control factors (information, personal deficiencies, skills, abilities, emotions) and external control factors (opportunities, dependence on others, barriers). As for the other types of beliefs it is assumed that an individual will only consider a limited number of control factors when considering a behavior. The TPB has been widely tested and successfully applied to the understanding of a
23
variety of behaviors (for reviews see Ajzen, 1991; Conner and Sparks, 2005). For example, in a meta-analysis of the TPB, Armitage and Conner (2001) reported that across 154 applications, attitude, subjective norms, and PBC accounted for 39% of the variance in intention, while intentions and PBC accounted for 27% of the variance in behavior across 63 applications. Intentions emerged as the strongest predictors of behavior, while attitudes were the strongest predictors of intentions. The TPB has also informed a number of interventions designed to change behavior. For example, Hill et al (2007) employed a randomized control trial to test the effectiveness of a TPBbased leaflet compared to a control condition in promoting physical exercise in a sample of school children. The leaflet condition compared to the control condition significantly increased not only reported exercise but also intentions, attitudes, subjective norms, and PBC. Additional analyses indicated that the impact on exercise was mediated by the increases the leaflet had produced (compared to the control group) in intentions and PBC.
2.4 Social Cognitive Theory In Social Cognitive Theory (SCT; Bandura, 1982) behavior is held to be determined by three factors: goals, outcome expectancies, and selfefficacy (see Fig. 2.4). Goals are plans to act and can be conceived of as intentions to perform the behavior (see Luszczynska and Schwarzer, 2005). Outcome expectancies are similar to behavioral beliefs in the TPB but here are split into physical, social, and self-evaluative depending on the nature of the outcomes considered. Self-efficacy is the belief that a behavior is or is not within an individual’s control and is usually assessed as the degree of confidence the individual has that he/she could still perform the behavior in the face of various obstacles (and is similar to PBC in the TPB). Bandura (2000) recently added socio-structural factors to
24
M. Conner
Fig. 2.4 Social Cognitive Theory
his theory. These are factors assumed to facilitate or inhibit the performance of a behavior and affect behavior via changing goals. Sociostructural factors refer to the impediments or opportunities associated with particular living conditions, health systems, political, economic, or environmental systems. They are assumed to inform goal setting and be influenced by selfefficacy. The latter relationship arises because self-efficacy influences the degree to which individuals pay attention to opportunities or impediments in their life circumstances. This component of the model incorporates perceptions of the environment as an important influence on health behaviors. SCT has been successfully applied to predicting and changing various health behaviors. However, unlike a number of the other models considered above, many of the applications of SCT only assess one or two components of the model (usually self-efficacy) rather than all components. Self-efficacy and action-outcome expectancies along with intentions have been found to be the most important predictors of a range of health behaviors in a diverse range of studies (for reviews see Bandura, 2000; Luszczynska and Schwarzer, 2005).
2.5 Stage Models of Health Behavior The SCMs considered above assume that the cognitive determinants of health behaviors act in a similar way during initiation (e.g., quitting smoking for the first time) and maintenance of action (e.g., trying to stay quit). In contrast, in stage models psychological determinants may change across such stages of behavior change (see Sutton, 2005, for a review). An important implication of the stages view is that different cognitions may be important determinants at different stages in promoting health behavior. The most widely used stage model is Prochaska and DiClemente’s (1984) Transtheoretical Model of change (TTM). Their model has been widely applied to analyze the process of change in alcoholism treatment and smoking cessation. DiClemente et al (1991) identify five stages of change: pre-contemplation (not thinking about change), contemplation (aware of the need to change), preparation (intending to change in the near future and taking action in preparation for change), action (acting to change), and maintenance (of the new behavior). Individuals are seen to progress sequentially from one stage to the next, with maintenance the end stage of successful change. For example, in the case of
2 Cognitive Determinants of Health Behavior
smoking cessation, it is argued that in the precontemplation stage the smoker is unaware that his/her behavior constitutes a problem and has no intention to quit. In the contemplation stage, the smoker starts to think about changing his/her behavior, but is not committed to try to quit. In the preparation stage, the smoker has an intention to quit and starts to make plans to quit. The action stage is characterized by active attempts to quit, and after 6 months of successful abstinence the individual moves into the maintenance stage. This stage is characterized by attempts to prevent relapse and to consolidate the newly acquired non-smoking status. Although widely applied, the evidence in support of stage models and different stages is modest (see Sutton, 2000, 2005). Sutton (2000) concludes that the distinctions between TTM stages are “logically flawed” and based on “arbitrary time periods.” The sequential movement through stages has not generally been supported (Sutton, 2005). In addition, it has proved difficult to support the key prediction that there are different determinants of behavior change in different stages. Evidence from stage-matched versus stage-mismatched intervention studies does not generally provide support for the TTM (see Littell and Girvin, 2002, for a systematic review of the effectiveness of interventions applying the TTM to health-related behaviors). Thus, at present, research findings do not support the added complexity and increased cost of stagetailored interventions compared to the linear approach advocated in other SCMs. West (2005) in reviewing stage models in relation to smoking has recently suggested that work on the TTM should be abandoned.
3 Integration of Social Cognition Models The overlap between SCMs has prompted attempts to integrate them. This may be valuable, especially since they include some of the same cognitive determinants. For example, intention, self-efficacy, and outcome
25
expectancies appear in several models. One important attempt to integrate these models was that by Bandura (SCT), Becker (HBM), Fishbein (TRA), Kaufen (self-regulation), and Triandis (Theory of Interpersonal Behavior) as part of a workshop organized by the US National Institute of Mental Health in response to the need to promote HIV-preventive behaviors. The workshop sought to “identify a finite set of variables to be considered in any behavioral analysis” (Fishbein et al, 2001, p. 3). They identified eight variables which, they argued, should account for most of the variance in any (deliberative) behavior. These were organized into two groups. First, those variables which were viewed as necessary and sufficient determinants of behavior. Thus, for behavior to occur an individual must (i) have a strong intention, (ii) have the necessary skills to perform the behavior, and (iii) experience an absence of environmental constraints that could prevent behavior. Second were those variables that were seen primarily to influence intention (although a direct effect on behavior was noted as possible). Thus, a strong intention is likely to occur when an individual (i) perceives the advantages (or benefits) of performing the behavior to outweigh the perceived disadvantages (or costs, i.e., outcome expectancies), (ii) perceives the social (normative) pressure to perform the behavior to be greater than that not to perform the behavior, (iii) believes that the behavior is consistent with his/her self-image, (iv) anticipates the emotional reaction to performing the behavior to be more positive than negative, and (v) has high levels of self-efficacy. Figure 2.5 illustrates this integrated model. This approach has been further developed by Fishbein (2008) in his integrative model (IM) of behavioral prediction although this has not, as yet, been widely tested.
4 Current Directions A clear contribution of work with SCMs has been their ability to identify key correlates of health behavior that can be targeted
26
M. Conner
Self-discrepancy
Advantages/ Disadvantages
Social Pressure
Self-efficacy
Environmental Constraints
Intention
Behavior
Skills
Emotional Reaction
Fig. 2.5 The “major theorists” integrated social cognition model
in interventions to change behavior. Across studies the strongest relationships with behavior emerge for intentions, self-efficacy, and outcomes expectancies (Conner and Norman, 2005). However, in focusing on correlates of health behavior rather than examining causal relationships research may have overestimated the size of relationships. For example, while correlational research indicates intentions to have a strong effect size on behavior (Armitage and Conner, 2001), studies looking at manipulations of intentions indicate that a medium to large change in intentions is associated with only a small to medium effect sized change in behavior (Webb and Sheeran, 2006). A further important limitation with much work on SCMs is that while they usefully identify cognition change targets, they commonly do not specify the best means to change such cognitions (work on self-efficacy is an exception to this trend; Bandura, 2000). Recent work on classifying behavior change
interventions (e.g., Abraham and Michie, 2008) and the more widespread assessment of mediating cognitions in intervention studies may provide the basis for further insights into how best to change cognitions and assessing their causal impact on behavior change for health behaviors. In the remainder of this section three directions of current research on cognitive determinants of health behavior are briefly reviewed.
4.1 Intention Stability In the vast majority of quality applications of SCMs to predicting health behavior, a prospective design is employed where the predictors of behavior are measured by questionnaire (at time 1) and then behavior is measured at a second time point (in stronger designs behavior change is the focus of interest). An important
2 Cognitive Determinants of Health Behavior
assumption of such a design is that the measured cognitions (e.g., attitudes) remain unchanged between their measurement and the opportunity to act. So, for example, the assumption is that intentions do not change in between when the (time 1) questionnaire is completed and the time points at which the respondent has the opportunity to act. This is an explicit limiting condition of the TRA/TPB (Ajzen and Fishbein, 1980). However, cognitions including intentions may indeed change in this time period and such change provides one important limitation on their power to predict behavior. Several studies have now demonstrated the power of intention stability to moderate the intention-behavior relationship (see Conner and Godin, 2007, for a review). For example, Conner et al (2002) found that intentions were strong predictors of healthy eating up to 6 years later, but only among those whose intentions had remained stable over an initial period of 6 months. A number of factors have been found to influence the intention-behavior relationship. For example, anticipating feeling regret if one does not perform a behavior or perceiving a strong moral norm to act have both been found to significantly increase the intentionbehavior relationship (see Cooke and Sheeran, 2004, for a review). Sheeran and Abraham (2003) showed intention stability to moderate the intention-behavior relationship for exercising and that intention stability mediated the effect of other moderators of the intentionbehavior relationship (e.g., anticipated regret, certainty). This suggests that the mechanism by which a number of these other moderators may have their effect on intention-behavior relationships is through changing the temporal stability of intentions. Hence, factors that might be expected to make individual intentions more stable over time would be expected to increase the impact that these intentions have on behavior and so increase the intention-behavior relationship. Thus intention stability might be a useful focus of attention as a key mediating variable in intervention studies attempting to change health behavior.
27
4.2 Affective Influences One criticism of work with SCMs has been the failure to explicitly consider affective influences on behavior (Conner and Armitage, 1998). Outcome expectancies included in PMT, TPB, and SCT do not preclude consideration of affective outcomes, although the outcomes typically considered do not focus on affective states. Over the last few years a number of studies have examined the impact of expectations of affect associated with performance of a behavior. For example, studies have examined anticipated regret as a determinant of behavior within the context of the TPB (see Sandberg and Conner, 2008, for a review). Regret is a negative affective state that can be anticipated pre-behaviorally and so influence subsequent behavior. Studies generally report that such anticipated affective states add significant variance to predictions of intentions but not behavior and may be particularly important in relation to certain affective behaviors (e.g., condom use; Glasman and Albarracin, 2006). Other studies have shown affective outcomes to be better predictors of behavior than more instrumental outcomes (e.g., Lawton et al, 2007). Work has also examined the affect that accompanies performance of the behavior (sometimes referred to as anticipatory affect or affective attitudes; Loewenstein, 1996) rather than following performance of the behavior. Such affective attitudes have been explicitly added to the TPB (Conner and Sparks, 2005) and been reported to be stronger predictors of intentions and behavior than instrumental attitudes (Ajzen, 2001; Lawton et al, 2009). In addition, some studies indicate affective attitudes to directly predict behavior independent of intentions (e.g., Lawton et al, 2009). Affective expectations and their influence on health behavior would appear to be an important and growing focus for research in this area.
4.3 Implementation Intentions The majority of research reviewed thus far has focused on motivational influences of cognitive
28
variables on behavior (i.e., impacting on intention formation). However, other research has begun to focus on the volitional phase of action (Bagozzi, 1993). One volitional variable that has been widely tested in relation to health behavior is implementation intentions. Gollwitzer (1993) makes the distinction between goal intentions and implementation intentions. While the former is concerned with intentions to perform a behavior or achieve a goal (i.e., “I intend to do x”), the latter is concerned with if-then plans which specify an environmental prompt or context that will determine when the action should be taken (i.e., “I intend to initiate the goal-directed behavior x when situation y is encountered”). Importantly, the if–then plan in an implementation intention commits the individual to a specific course of action when certain environmental conditions are met. Sheeran et al (2005) note that “to form an implementation intention, the person must first identify a response that will lead to goal attainment and, second, anticipate a suitable occasion to initiate that response. For example, the person might specify the behavior ‘go jogging for 20 minutes’ and specify a suitable opportunity ‘tomorrow morning before work’” (p. 280). Gollwitzer (1993) argues that by forming implementation intentions individuals pass control of intention enactment to the environment. The specified environmental cue prompts the action so that the person does not have to remember the goal intention or decide when to act. Sheeran et al (2005) provide an in-depth review of both basic and applied research with implementation intentions. For example, Milne et al (2002) found that an intervention using persuasive text based on PMT prompted positive pro-exercise cognition change but did not increase exercise. However, when this intervention was combined with encouragement to form implementation intentions, significant behavior change was observed (see Gollwitzer and Sheeran, 2006, for a meta-analysis of such studies). Thus implementation intention formation moderates the intention-behavior relationship demonstrating that two people with equally strong goal intentions may differ in their
M. Conner
volitional readiness depending on whether they have taken the additional step of forming an implementation intention. Implementation intention formation has been shown to increase the performance of a range of health behaviors with, on average, a medium effect size. Implementation intentions appear to be particularly effective for those with strong goal intentions and in overcoming forgetting that appears to be a common problem in enacting intentions. Provided effective cues are identified in the implementation intention (i.e., ones that will be commonly encountered and are sufficiently distinctive) forgetting appears to be much less likely.
5 Conclusions A number of social cognition models have been developed to describe the key cognitive determinants and their relationship to behavior. These key cognitions include intentions, self-efficacy, and outcome expectancies. Recent research has sought to integrate such models (Fishbein et al, 2001). Current research has focused on intention stability as an important mediating variable explaining the impact of health cognitions on behavior. Other work is examining affective influences on health behaviors and how the formation of implementation intentions promotes the performance of behavior.
References Abraham, C. and Michie, S. (2008). A taxonomy of behavior change techniques used in interventions. Health Psychol, 27, 379–387. Abraham, C., and Sheeran, P. (2005). The health belief model. In M. Conner & P. Norman (Eds.), Predicting Health Behaviour: Research and Practice with Social Cognition Models, 2nd Ed (pp. 28–80). Maidenhead: Open University Press. Ajzen, I. (1991). The theory of planned behavior. Organiz Behav Hum Dec Proc, 50, 179–211. Ajzen, I. (2001). Nature and operation of attitudes. Ann Rev Psychol, 52, 27–58.
2 Cognitive Determinants of Health Behavior Ajzen, I., and Fishbein, M. (1980). Understanding Attitudes and Predicting Social Behavior. Englewood Cliff, NJ: Prentice Hall. Armitage, C. J., and Conner, M. (2001). Efficacy of the theory of planned behaviour: a meta-analytic review. Br J Soc Psychol, 40, 471–499. Bagozzi, R. P. (1993). On the neglect of volition in consumer research: a critique and proposal. Psychol Marketing, 10, 215–237. Bandura, A. (1982). Self-efficacy mechanism in human agency. Am Psychol, 37, 122–147. Bandura, A. (2000). Health promotion from the perspective of social cognitive theory. In P. Norman, C. Abraham, & M. Conner (Eds.), Understanding and Changing Health Behaviour: From Health beliefs to Self-Regulation (pp. 229–242). Switzerland: Harwood Academic. Conner, M., and Armitage, C. J. (1998). Extending the theory of planned behavior: a review and avenues for further research. J Appl Soc Psychol, 28, 1430–1464. Conner, M., and Godin, G. (2007). Temporal stability of behavioural intention as a moderator of intentionhealth behaviour relationships. Psychol Health, 22, 875–896. Conner, M., and Norman, P. (Eds.) (2005). Predicting Health Behaviour: Research and Practice with Social Cognition Models, 2nd Ed. Maidenhead: Open University Press. Conner, M., Norman, P., and Bell, R. (2002). The theory of planned behavior and healthy eating. Health Psychol, 21, 194–201. Conner, M., and Sparks, P. (2005). The theory of planned behaviour and health behaviours. In M. Conner & P. Norman (Eds.), Predicting Health Behaviour: Research and Practice with Social Cognition Models, 2nd Ed (pp. 170–222). Maidenhead: Open University Press. Cooke, R., and Sheeran, P. (2004). Moderation of cognition-intention and cognition-behaviour relations: a meta-analysis of properties of variables from the theory of planned behaviour. Br J Soc Psychol, 43, 159–186. DiClemente, C. C., Prochaska, J. O., Fairhurst, S. K., Velicer, W. F., Velasquez, M. M., and Rossi, J. S. (1991). The process of smoking cessation: an analysis of precontemplation, contemplation, and preparation stages of change. J Consult Clin Psychol, 59, 295–304. Fishbein, M. (1967). Attitude and the prediction of behavior. In M. Fishbein (Ed.), Readings in Attitude Theory and Measurement (pp. 477–492). New York: Wiley. Fishbein, M. (2008). A reasoned action approach to health promotion. Med Dec Making, 28, 834–844. Fishbein, M., Triandis, H. C., Kanfer, F. H., Becker M., Middlestadt, S. E., and Eichler, A. (2001). Factors influencing behavior and behavior change. In A. Baum, T. A. Revenson, & J. E. Singer (Eds.),
29 Handbook of Health Psychology (pp. 3–17). Mahwah, NJ: Lawrence Erlbaum Associates. Fiske, S. T., and Taylor, S. E. (1991). Social Cognition, 2nd Ed. New York: McGraw-Hill. Floyd, D. L., Prentice-Dunn, S., and Rogers, R. W. (2000). A meta-analysis of protection motivation theory. J Appl Soc Psychol, 30, 407–429. Glasman, L. R., and Albarracin, D. (2006). Forming attitudes that predict future behavior: a meta-analysis of the attitude-behavior relation. Psychol Bull, 132, 778–822. Gollwitzer, P. M. (1990). Action phases and mind-sets. In E. T. Higgins & R. M. Sorrentino (Eds.), Handbook of Motivation and Cognition: Foundations of Social Behavior, Vol. 2 (pp. 53–92). New York: Guilford Press. Gollwitzer, P. M. (1993). Goal achievement: the role of intentions. Eur Rev Soc Psychol, 4, 142–185. Gollwitzer, P., and Sheeran, P. (2006). Implementation intentions and goal achievement: a meta analysis of effects and processes. Adv Exp Soc Psychol, 38, 69–119. Haefner, D. P. and Kirscht, J. P. (1970). Motivational and behavioural effects of modifying health beliefs. Public Health Rep, 85, 478–484. Harrison, J. A., Mullen, P. D., and Green, L. W. (1992). A meta-analysis of studies of the health belief model with adults. Health Educ Res, 7, 107–116. Hay, J. L., Ford, J. S., Klein, D., Primavera, L. H., Buckley, T. R., Stein, T. R., Shike, M., and Ostroff, J. S. (2003). Adherence to colorectal cancer screening in mammography-adherent older women. J Behav Med, 26, 553–576. Hill, C., Abraham, C., and Wright, D. (2007). Can theory-based messages in combination with cognitive prompts promote exercise in classroom settings? Soc Sci Med, 65, 1049–1058. Hochbaum, G. M. (1958). Public Participation in Medical Screening Programs: A Socio-psychological Study. Public Health Service Publication No 572. Washington, DC: United States Government Printing Office. Janz, N. K., and Becker, M. H. (1984). The health belief model: a decade later. Health Educ Q, 11, 1–47. Jones, P. K., Jones, S. L., and Katz, J. (1987). Improving compliance for asthmatic patients visiting the emergency department using a health belief model intervention. J Asthma, 24, 199–206. Lawton, R., Conner, M., and McEachan, R. (2009). Desire or reason: predicting health behaviors from affective and cognitive attitudes. Health Psychol, 28, 56–65. Lawton, R., Conner, M., and Parker, D. (2007). Beyond cognition: predicting health risk behaviors from instrumental and affective beliefs. Health Psychol, 26, 259–267. Loewenstein, G. (1996). Out of control: visceral influences on behavior. Organiz Behav Hum Dec Proc, 65, 272–292.
30 Littell, J. H., and Girvin, H. (2002). Stages of change. A critique. Behav Modif, 26, 223–273. Luszczynska, A., and Schwarzer, R. (2005). Social cognitive theory. In M. Conner & P. Norman (Eds.), Predicting Health Behaviour: Research and Practice with Social Cognition Models, 2nd Ed (pp. 127–169). Maidenhead: Open University Press. Maddux, J. E., and Rogers, R. W. (1983). Protection motivation and self-efficacy: a revised theory of fear appeals and attitude change. J Exp Social Psychol, 19, 469–479. Milne, S., Sheeran, P., and Orbell, S. (2000). Prediction and intervention in health-related behavior: a metaanalytic review of protection motivation theory. J Appl Soc Psychol, 30, 106–143. Milne, S., Orbell, S., and Sheeran, P. (2002). Combining motivational and volitional interventions to promote exercise participation: protection motivation theory and implementation intentions. Br J Health Psychol, 7, 163–184. Norman, P., Boer, H., and Seydel, E. R. (2005). Protection motivation theory. In M. Conner & P. Norman (Eds.), Predicting Health Behaviour: Research and Practice with Social Cognition Models, 2nd Ed (pp. 81–126). Maidenhead: Open University Press. Orbell, S., Crombie, I., and Johnston, G. (1995). Social cognition and social structure in the prediction of cervical screening uptake. Br J Health Psychol, 1, 35–50. Prochaska, J. O., and DiClemente, C. C. (1984). The Transtheoretical Approach: Crossing Traditional Boundaries of Therapy. Homewood, IL: Dow Jones Irwin. Rosenstock, I. M., Strecher, V. J., and Becker, M. H. (1988). Social learning theory and the health belief model. Health Educ Q, 15, 175–183.
M. Conner Sandberg, T., and Conner, M. (2008). Anticipated regret as an additional predictor in the theory of planned behaviour: a meta-analysis. Br J Soc Psychol, 47, 589–606. Sheeran, P., and Abraham, C. (2003). Mediator of moderators: temporal stability of intention and the intentionbehavior relationship. Pers Soc Psychol Bull, 29, 205–215. Sheeran, P., Milne, S., Webb, T. L., and Gollwitzer, P. M. (2005). Implementation intentions and health behaviours. In M. Conner & P. Norman (Eds.), Predicting Health Behaviour: Research and Practice with Social Cognition Models, 2nd Ed (pp. 276–323). Maidenhead: Open University Press. Sutton, S. (2000). A critical review of the transtheoretical model applied to smoking cessation. In P. Norman, C. Abraham, & M. Conner (Eds.), Understanding and Changing Health Behaviour: From Health Beliefs to Self-Regulation (pp. 207–225). Reading, England: Harwood Academic Press. Sutton, S. (2005). Stage models of health behaviour. In M. Conner & P. Norman (Eds.), Predicting Health Behaviour: Research and Practice with Social Cognition Models, 2nd Ed (pp. 223–275). Maidenhead: Open University Press. Turk, D. C., and Salovey, P. (1986). Clinical information processing: bias inoculation. In R. E. Ingham (Ed.), Information Processing Approaches to Clinical Psychology (pp. 305–323). New York: Academic Press. Webb, T. L., and Sheeran, P. (2006). Does changing behavioral intentions engender behavior change? A meta-analysis of the experimental evidence. Psychol Bull, 132, 249–268. West, R. (2005). Time for a change: putting the transtheoretical (stages of change) model to rest. Addiction, 100, 1036–1039.
Chapter 3
Assessment of Physical Activity in Research and Clinical Practice Lephuong Ong and James A. Blumenthal
1 Introduction It is well established that physical activity is associated with significant physical and mental health benefits including increased longevity (Camacho et al, 1991; Leon et al, 1987; Paffenbarger et al, 1986; Powell et al, 1987). Physical inactivity, on the other hand, is associated with adverse health consequences and has been identified as a modifiable behavioral risk factor for mortality and diseases of lifestyle, such as cardiovascular disease, cancer, and diabetes mellitus (see Lee, 2003; Warburton et al, 2006). These data have prompted an increased interest in promoting physical activity, which requires accurate and objective quantification of activity. Because the validity of these associations rests upon the utilization of valid and reliable assessments of physical activity, precise measurements of physical activity are required to improve our understanding of the impact of physical activity on health outcomes and to provide a metric to evaluate the efficacy of clinical interventions designed to promote health and physical activity.
J.A. Blumenthal () Department of Psychiatry and Behavioral Sciences, Duke University Medical Center, Box 3119, Durham, NC 27710, USA e-mail: [email protected]
2 Physical Activity and Health Outcomes 2.1 All-Cause and CHD-Related Mortality Epidemiologic studies have consistently identified an association between physical inactivity and a variety of poor health outcomes, ranging from cancer, heart disease, and osteoarthritis to all-cause mortality. In one of the earliest studies, Morris and colleagues (Morris and Heady, 1953; Morris et al, 1953) examined mortality data from the London Transport Executive between 1949 and 1952 and reported a lower total incidence of initial coronary episodes and cardiac-related deaths among middle-aged males engaged in more physically active occupations (e.g., postmen and bus conductors) compared to those in less active occupations (e.g., telephone operators and bus drivers; Morris et al, 1953). When cardiac-related mortality was examined for other occupations, a similar pattern of findings emerged, such that males performing “heavy” work (e.g., coal workers, laborers) had lower mortality rates relative to males performing “light” work (e.g., hairdressers, textile workers) (Morris et al, 1953). A trend for increased mortality due to lung cancers, appendicitis, prostate disease, duodenal ulcers, diabetes, and liver cirrhosis in middle-aged males performing light work as compared to heavy work was also found (Morris and Heady, 1953). This relationship between poorer health outcomes and lower
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_3, © Springer Science+Business Media, LLC 2010
31
32
L. Ong and J.A. Blumenthal
physical activity was prospectively related to decreased risk of mortality in men who engaged in vigorous physical activities (i.e., ≥ 6 times the resting metabolic rate [MET]; Lee et al, 1995). In a follow-up study, Lee and Paffenbarger (2000) reported that vigorous physical activity conferred the greatest benefit in terms of reduced mortality, moderate physical activity was found to be somewhat beneficial, and light physical activity conferred no benefit. Although there have been fewer studies in females, available data suggest a similar pattern. For example, in the Nurses’ Health Study, in which 116,564 initially healthy, middle-aged women were followed for 24 years, physical inactivity ( 0.90), and satisfactory test–retest reliabilities of 0.71 over 1 week for unhealthy snacking (e.g., Verplanken, 2006) and 0.87 over 1 month for exercising (Verplanken and Melkevik, 2008) have been obtained. Importantly, Conner et al (2007) showed that the SRHI moderated the relationships between implicit measures of attitude and behavior, while no moderation was found in the relationship between explicit measures and behavior. These results validate the relationship between the SRHI and automaticity.
5.7 Conclusions Which is the best measure? First, the availability of a set of different habit measures should be celebrated as an important step forward (Ajzen and Fishbein, 2005). The conceptual problem of the one-item self-reported past behavioral frequency measure (i.e., the fact that frequency is a necessary but not sufficient feature of habit), and potential reliability problems, renders this as an inadequate measure of habit. The combined oneitem self-reported frequency and self-reported habit measure should not be used due to being double-barrelled. The habit-as-reason measure awaits further testing and validation. As for the other measures, each seems to capture some unique aspect of habit. Selecting the best alternative measure depends on the researcher’s goal and the type of behavior under study. Different measures may also be used in conjunction with each other. The context-focused habit measure captures an important situational aspect of habitual behavior, i.e., context stability, in addition to past behavioral frequency. The response frequency measure (if properly applied) focuses on habits that are executed in multiple-choice contexts. The SRHI captures the experience of both frequency and automaticity and seems the most solid measure in terms of reliability and validity. In addition, this measure is generic and thus needs no adaptations or pilot testing for each new domain and can easily be used in questionnaires.
B. Verplanken
6 General Conclusions Since the decline of behaviorism, habit has long been a forgotten concept in the social and behavioral sciences. This is the case in spite of the fact that many unhealthy behaviors are strongly habitual and that we would like to see healthy behaviors become habitual. The focus on deliberative thinking and motivated behavior such as represented by the prevalent socio-cognitive models may now be supplemented by the notion that these factors may wear off over time and be replaced by the more automatic and contextdriven powers of habit (Dawes, 1998). The habit concept has much to offer to those who want to understand why people behave unhealthily, or why it remains such a challenge to establish healthier lifestyles. Researchers have now a choice of instruments at their disposal for measuring and monitoring habit strength. In all, habit theory seems a valuable contribution to the behavioral medicine field.
References Aarts, H. (1996). Habit and Decision Making: The Case of Travel Mode Choice. Unpublished doctoral dissertation, University of Nijmegen, The Netherlands. Aarts, H., and Dijksterhuis, A. (2000). Habits as knowledge structures: automaticity in goal-directed behavior. J Pers Soc Psychol, 78, 53–63. Aarts, H., Verplanken, B., and van Knippenberg, A. (1997). Habit and information use in travel mode choices. Acta Psychol, 96, 1–14. Aarts, H., Verplanken, B., and van Knippenberg, A. (1998). Predicting behavior from actions in the past: repeated decision-making or a matter of habit? J Appl Soc Psychol, 28, 1355–1374. Ajzen, I. (1991). The theory of planned behavior. Organ Behav Hum Decis Process, 50, 179–211. Ajzen, I. (2002). Residual effects of past on later behavior: habituation and reasoned action perspectives. Pers Soc Psychol Rev, 6, 107–122. Ajzen, I., and Fishbein, M. (2005). The influence of attitudes on behavior. In D. Albarracín, B.T. Johnson, & M.P. Zanna (Eds.), The Handbook of Attitudes (pp. 173–221). Mahwah, NJ: Erlbaum. Albarracín, D., Johnson, B. T., Fishbein, M., and Muellerleile, P. A. (2001). Theories of reasoned action and planned behavior as models of condom use: a meta-analysis. Psychol Bull, 127, 142–161.
6
By Force of Habit
Bamberg, S. (2006). Is a residential relocation a good opportunity to change people’s travel behavior? Results from a theory-driven intervention study. Environ Behav, 38, 820–840. Bargh, J. A. (1994). The four horsemen of automaticity: awareness, intention, efficiency, and control in social cognition. In: R. S. Wyer & T. K. Srull (Eds.), Handbook of Social Cognition, vol. 1 (pp.1–40). Hillsdale, NJ: Erlbaum. Brug, J., de Vet, E., Wind, M., de Nooijer, J., and Verplanken, B. (2006). Predicting fruit consumption: cognitions, intention, and habits. J Nutr Educ Behav, 38, 73–81. Chatzisarantis, N. L., and Hagger, M. S. (2007). Mindfulness and the intention-behavior relationship within the theory of planned behavior. Pers Soc Psychol Bull, 33, 663–676. Conner, M. T., Perugini, M., O’Gorman, R., Ayres, K., and Prestwich, A. (2007). Relations between implicit and explicit measures of attitude and behavior: evidence of moderation by individual difference variables. Pers Soc Psychol Bull, 33, 1727–1740. Danner, U., Aarts, H., and de Vries, N. K. (2008). Habit vs. Intention in the prediction of future behaviour: the role of frequency, context stability and mental accessibility of past behaviour. Br J Soc Psychol, 47, 245–265. Dawes, R. M. (1998). Behavioral decision making and judgment. In: D. T. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The Handbook of Social Psychology, 4th Ed, (pp. 497–548). Boston: McGraw-Hill. de Bruijn, G.-J., Kremers, S., de Vet, E., de Nooijer, J., van Mechelen, W., and Brug, J. (2007). Does habit strength moderate the intention-behaviour relationship in the Theory of Planned Behaviour? The case of fruit consumption. Psychol Health, 22, 899–916. Eagly, A. H., and Chaiken, S. (1993). The Psychology of Attitudes. Fort Worth, TX: Harcourt Brace Jovanovich. Fazio, R. H., Ledbetter, J. E., and Towles-Schwen, T. (2000). On the costs of accessible attitudes: detecting that the attitude objects has changed. J Pers Soc Psychol, 78, 197–210. Ferguson, E., and Bibby, P. A. (2002). Predicting future blood donor returns: past behavior, intentions, and observer effects. Health Psychol, 21, 513–518. Hinsz, V. B., Nickell, G. S., and Park, E. S. (2007). The role of work habits in the motivation of food safety behaviors. J Exp Psychol, 13, 105–114. Honkanen, P., Olsen, S. O., and Verplanken, B. (2005). Intention to consume seafood: the importance of habit strength. Appetite, 45, 161–168. Hull, C. L. (1943). Principles of Behaviour: An Introduction to Behaviour Theory. New York: Appleton-Century Crofts. Janz, N. K., and Becker, M. H. (1984). The health belief model: a decade later. Health Educ Q, 11, 1–47.
81 Ji, M. F., and Wood, W. (2007). Purchase and consumption habits: not necessarily what you intend. J Consum Psychol, 17, 261–276. Knussen, C., and Yule, F. (2008). “I’m not in the habit of recycling”: the role of habitual behavior in the disposal of household waste. Environ Behav, 40, 683–702. Knussen, C., Yule, F., Mackenzie, J., and Wells, M. (2004). An analysis of intentions to recycle household waste: the roles of past behaviour, perceived habit, and perceived lack of facilities. J Environ Psychol, 24, 237–246. Kremers, S. P., van der Horst, K., and Brug, J. (2007). Adolescent screen-viewing behaviour is associated with consumption of sugar-sweeted beverages: the role of habit strength and perceived parental norms. Appetite, 48, 345–350. Lally, P. J. (2007). Habitual Behavior and Weight Control. Unpublished doctoral dissertation. University College London. Lally, P., van Jaarsveld, C.H.M., Potts, H.W.W., and Wardle, J. (2010). How are habits formed: modelling habit formation in the real world. Eur J Soc Psychol (in press). Lintvedt, O. K., Sørensen, K., Østvik, A. R., Verplanken, B., and Wang, C. E. (2008). The need for web-based cognitive behaviour therapy among university students. J Tech Hum Serv, 26, 239–258. Mittal, B. (1988). Achieving higher seat belt usage: the role of habit in bridging the attitude-behavior gap. J Appl Soc Psychol, 18, 993–1016. Ouellette, J. A., and Wood, W. (1998). Habit and intention in everyday life: the multiple processes by which past behavior predicts future behavior. Psychol Bull, 124, 54–74. Reckwitz, A. (2002). Toward a theory of social practices. A development in culturalist theorizing. Eur J Soc Theor, 5, 243–263. Rogers, R. W., and Mewborn, C. R. (1976). Fear appeals and attitude change: effects of anxiousness, probability of occurrence, and the efficiency of coping responses. J Pers Soc Psychol, 34, 54–61. Ronis, D. L., Yates, J. F., and Kirscht, J. P. (1989). Attitudes, decisions, and habits as determinants of repeated behavior. In: A. R. Pratkanis, S. J. Breckler, & A. G. Greenwald (Eds.), Attitude Structure and Function (pp. 213–239). Hillsdale, NJ: Erlbaum. Sheeran, P., Aarts, H., Custers, L., Rivis, A., Webb, T. L., and Cooke, R. (2005). The goal-dependent automaticity of drinking habits. Br J Soc Psychol, 44, 47–63. Thompson, J. K., and Smolak, L. (Eds.) (2001). Body Image, Eating Disorders, and Obesity in Youth: Assessment, Prevention, and Treatment. Washington, DC: American Psychological Association. Triandis, H. C. (1980). Values, attitudes, and interpersonal behavior. In: H. E. Howe, Jr. & M. M. Page
82 (Eds.), Nebraska Symposium on Motivation, 1979 (pp. 195–259). Lincoln, NE: University of Nebraska Press. Vallacher, R. R., and Wegner, D. M. (1987). What do people think they’re doing? Action identification and human behavior. Psychol Rev, 94, 3–15. Verplanken, B. (2004). Value congruence and job satisfaction among nurses: a human relations perspective. Int J Nurs Stud, 599–605. Verplanken, B. (2006). Beyond frequency: habit as mental construct. Br J Soc Psychol, 45, 639–656. Verplanken, B., and Aarts, H. (1999). Habit, attitude, and planned behaviour: is habit an empty construct or an interesting case of automaticity? Eur Rev Soc Psychol, 10, 101–134. Verplanken, B., Aarts, H., and van Knippenberg, A. (1997). Habit, information acquisition, and the process of making travel mode choices. Eur J Soc Psychol, 27, 539–560. Verplanken, B., Aarts, H., van Knippenberg, A., and Moonen, A. (1998). Habit versus planned behaviour: a field experiment. Br J Soc Psychol, 37, 111–128. Verplanken, B., Aarts, H., van Knippenberg, A., and van Knippenberg, C. (1994). Attitude versus general habit: antecedents of travel mode choice. J Appl Soc Psychol, 24, 285–300. Verplanken, B., Friborg, O., Wang, C. E., Trafimow, D., and Woolf, K. (2007). Mental habits: metacognitive reflection on negative self-thinking. J Pers Soc Psychol, 92, 526–541. Verplanken, B., Herabadi, A. G., Perry, J. A., and Silvera, D. H. (2005). Consumer style and health: the role of impulsive buying in unhealthy eating. Psychol Health, 20, 429–441. Verplanken, B., and Melkevik, O. (2008). Predicting habit: the case of physical exercise. Psychol Sport Exerc, 9, 15–26.
B. Verplanken Verplanken, B., and Orbell, S. (2003). Reflections on past behavior: a self-report index of habit strength. J Appl Soc Psychol, 33, 1313–1330. Verplanken, B., and Tangelder, Y. (2010). No body is perfect: The significance of habitual negative thinking about appearance for body dissatisfaction, eating disorder propensity, self-esteem, and suacking. Psychology and Health, in press. Verplanken, B., and Velsvik, R. (2008). Habitual negative body image thinking as psychological risk factor in adolescents. Body Image, 5, 133–140. Verplanken, B., Walker, I., Davis, A., and Jurasek, M. (2008). Context change and travel mode choice: combing the habit discontinuity and self-activation hypotheses. J Environ Psychol, 9, 15–26. Verplanken, B., and Wood, W. (2006). Interventions to break and create consumer habits. J Publ Pol Market, 25, 90–103. Watkins, E. R. (2008). Constructive and unconstructive repetitive thought. Psychol Bull, 134, 163–206. Wittenbraker, J., Gibbs, B. L., and Kahle, L. R. (1983). Seat belt attitudes, habits, and behaviors: an adaptive amendment to the Fishbein model. J Appl Soc Psychol, 13, 406–421. Wood, W., and Neal, D. T. (2007). A new look at habits and the habit-goal interface. Psychol Rev, 114, 843–863. Wood, W., Quinn, J. M., and Kashy, D. A. (2002). Habits in everyday life: thought, emotion, and action. J Pers Soc Psychol, 83, 1281–1297. Wood, W., Tam, L., and Guerrero Witt, M. (2005). Changing circumstances, disrupting habits. J Pers Soc Psychol, 88, 918–933.
Chapter 7
Adherence to Medical Advice: Processes and Measurement Jacqueline Dunbar-Jacob, Martin P. Houze, Cameron Kramer, Faith Luyster, and Maura McCall
1 Introduction Traditionally adherence has referred to the percent of a prescribed or recommended regimen that is carried out historically by patients and more recently by providers (Haynes, 1979). The definition is nonjudgmental and does not imply responsibility. The value of knowing adherence rates lies in the ability to assess the effectiveness of treatment, whether it be in the evaluation of new treatments or in the development of effective treatment for the individual. A review of the research over the past 35 years suggests that adherence has been viewed in a global manner, with an emphasis on the identification of patient characteristics which may influence treatment behavior. Data have historically shown that adherence rates across regimen hover around 50% for both patients and providers (Baumhakel et al, 2009; Claxton et al, 2001; Dunbar-Jacob et al, 2000; Thier et al, 2008). Prediction has been difficult as the same characteristics examined in different studies show varying degrees of influence on the level of adherence, and many studies have focused on a limited number of characteristics (Baiardini et al, 2009; Stilley et al, 2004). Further confusing the picture is the fact that different studies both measure and define adherence in different
J. Dunbar-Jacob () University of Pittsburgh, 350 Victoria Building, 3500 Victoria St, Pittsburgh, PA 15261, USA e-mail: [email protected]
ways. Thus, the behavioral processes underlying adherence and related measurement strategies become important considerations for the furtherance of an understanding of adherence and ultimately the prevention and remediation of poor adherence. Any examination of adherence needs to consider the multiple steps from prescription to action and to consider these steps in refining the definition of adherence. First, of course, is the clarity and completeness of the prescription and related instruction. Second is the capability of the patient to carry out the instruction. Third is the availability of the resources needed to carry out the instruction. Fourth is the motivation to adhere to the prescription in part or in whole. And lastly is the system to support continued adherence, e.g., cues, self-monitoring, feedback, etc. Most commonly adherence studies have focused upon motivational factors with little attention to these other key elements. Any examination of adherence also needs to consider the patient’s decision making (Bieber et al, 2006; Loh et al, 2007). First the patient must decide whether to accept the recommended treatment. If treatment is accepted, then the patient must decide whether to initiate the treatment. If the patient decides to initiate the treatment, then she/he must determine whether the value of the treatment offsets any negative consequences to following it. If the patient decides to pursue the treatment, then the decision is whether to make it an integral part of daily habits. And finally the patient must decide whether to persist when problems occur.
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_7, © Springer Science+Business Media, LLC 2010
83
84
A further consideration is whether the patient knows the state of their adherence. Considerable research suggests there is a poor relationship between patient self-report of adherence and adherence assessed through more direct mechanisms (Dunbar-Jacob et al, 2000; Wagner and Rabkin, 2000). While a portion of this may reflect a reluctance to report poor adherence to the provider (Sankar et al, 2007), it is likely that memory is a major factor in this discrepancy. To accurately report adherence, the patient must be able to recall and summarize their behavior over a period of time between provider visits, often as long as 6 months to a year. For the patient who has persisted with the regimen at some level over time, the regimen becomes habitual but not necessarily accurate. Such habitual behaviors become less salient and become part of a more general memory, making discrete events more difficult to recall (Barnhofer et al, 2005; McPherson, 2001; Warnecke et al, 1997). Cramer and colleagues (1990) showed that adherence improved 5 days prior to (88%) and after (86%) contact with the provider, in comparison to 1-month postvisit (67%). Hence it is reasonable to assume that many patients are recalling most recent behavior and not summarizing across time. Thus, measures may have different accuracy depending upon the variability of behavior and the length of time the patient is assessed. Also of importance in the processes surrounding adherence is the quality of the communication that occurs within relationships (van Dulmen et al, 2008). Communication between the patient and the providers, communication between providers, communication within the interdisciplinary treatment team, and communication between inpatient and outpatient teams are all important to subsequent patient adherence. Problems in communication may further erode trust in the advice offered by the providers (Kerse et al, 2004; Thom et al, 2001). An assessment of conflicting recommendations or instruction may be important to the determination of whether adherence behavior represents poor adherence or selection among conflicting or suspect advice. This also appears in the adherence
J. Dunbar-Jacob et al.
of providers to guideline recommendations when multiple guidelines from different agencies are not consistent (Lewiecki, 2005).
2 Classification of Adherence The multiple steps that the client or patient takes and the process through which the regimen is recommended leads to multiple points at which the patient may encounter errors or need to make decisions. At each of these points adherence may become a problem. Each point may suggest a different definition or method of assessment.
2.1 Acceptance of the Regimen The first step is the period in which the regimen is initially presented, and the patient makes a decision about whether to follow it. One area of consideration is readiness to change. Studies examining readiness to adopt a regimen have varying results in predicting subsequent adherence (Aloia et al, 2005). Many factors may go into a patient’s willingness to accept a regimen, such as the patient’s preferences for type of treatment, the trust that the patient has in the provider, the level of burden imposed, the patient’s beliefs about the illness or the treatment, the satisfaction with care, the consistency of the advice with previous advice or knowledge, and a host of other factors. It is at this step that negotiating a mutually satisfactory treatment may influence whether the patient adheres to the recommendation or not.
2.2 Adoption of the Regimen Patients may agree to the regimen, or at least not object to it, but fail to initiate treatment. For example, between 66 and 84% of new antihypertensive prescription medicines were filled by persons with hypertension and who had at
7
Adherence to Medical Advice
least two clinical encounters (Shah et al, 2009). In the same practice, 85% of new diabetes prescriptions were filled (Shah et al, 2008). For patients recently discharged from hospital after a myocardial infarction, 77% of discharge prescriptions were filled within 7 days (Jackevicius et al, 2006). In this situation, closed health-care systems may detect failure to fill through close monitoring of pharmacy fills. But for the open systems where patients may utilize any number of pharmacies, failure to adopt the regimen is unlikely to be detected until the next healthcare visit, perhaps as long as 6–12 months after the prescription is written. It is unknown how many persons take the first step in behavioral interventions. Many factors may influence the patient’s adoption of treatment including those noted above combined with a reluctance to question or challenge the provider. Other factors may include barriers to obtaining the prescription such as cost, accessibility, and availability.
2.3 Initiation of the Regimen Even though the patient acquires the treatment or its resources, the regimen may not be initiated at all or may be discontinued after a brief exposure.
Fig. 7.1 Good adherer to once-a-day regimen
85
Indeed the first 6 months on treatment show a significant withdrawal from treatment (Perreault et al, 2005; Donnelly et al, 2008; Chapman et al, 2005). Data show as many as 50% or more of patients may terminate treatment by this point (Chapman et al, 2005; Rutledge et al, 1999; Newman et al, 2004). The factors which predict early termination of treatment are not clear. Hypotheses are directed toward the impact of side effects, financial concerns, or difficulty in carrying out the regimen.
2.4 Treatment Continuation For those patients who continue treatment beyond the 6-month period, several adherence patterns emerge. This may constitute as many as 50% of this group. The series of figures below displays the variable patterns of adherence found in patients on medication for chronic disease who were monitored with the AARDEX Medication Event Monitoring System. Each of these patients had been on treatment for 1 year or longer before monitoring was initiated. For a portion of persons, adherence remains high and stable, though not necessarily perfect, over time (see Fig. 7.1).
86
Other patients may demonstrate a persistently low adherence or a decline over time, as can be seen in Fig. 7.2. The majority of the patients in this group, however, demonstrate variable levels of adherence over time showing a combination of missed doses (or episodes), double doses, and mistimed doses. These variable patterns are difficult to detect with the majority of measures of
Fig. 7.2 Poor adherer to once-a-day regimen
Fig. 7.3 Variable adherence to twice-a-day regimen
J. Dunbar-Jacob et al.
adherence. See Fig. 7.3 for a visual view of a variable pattern of adherence for a twice-a-day medication. Thus, adherence can be classified at several points, depending upon the outcome of interest, acceptance of the regimen (yes/no), initiation of the regimen (yes/no), and continuation or persistence with the regimen at varying levels of adherence.
7
Adherence to Medical Advice
87
3 Defining Adherence
hypertensive treatment may create a rise in various serum cholesterol measures by 4–56%; however, Ott and colleagues (2003) conducted an RCT in elderly that did not find such differences (Ott et al, 2003). Similarly, anti-hyperglycemic agents may lower blood glucose levels and glycated hemogloblin (HbA1c), but may lead to an increase in serum cholesterol (Gershberg et al, 1968). Thus, attempts to establish an optimal adherence to a cholesterol lowering regimen may potentially be confounded by adherence to the concurrent hypertension or diabetes regimen. The answer to this dilemma has been to adopt a behavioral definition regarding the proportion of the regimen taken as the standard, typically about 80%. Alternatives to this have been to use unique definitions or qualitative definitions (good vs. poor with no numeric referent) or to fail to provide any definition at all. These variations in defining adherence impair the ability to perform adequate meta-analyses or systematic reviews of the magnitude of the problem or to evaluate the effectiveness of adherence interventions. At a minimum the provision of numeric definitions of adherence or cut points for classification is more informative. While it may not be clinically useful to set one behavioral standard across regimens or conditions, it is useful for comparison and summarization of adherence across populations.
Before measuring adherence it is important to clearly define and specify just what adherence is and the step in the process that is of interest. Ideally, adherence would be defined as the proportion of the prescription or regimen required to create the desired clinical outcome. Haynes (1979) did so in the first adherence improvement randomized controlled trial (RCT) conducted. He examined adherence to antihypertensives and identified the average adherence (by home pill count) to obtain a diastolic blood pressure level of less than 90 mmHg (Haynes et al, 1976). Adherence was determined to be pill counts greater than 80%. Similarly, in the Lipid Research Clinics Coronary Primary Prevention Trial (LRC-CPPT) research subjects were prescribed six packets of cholestyramine per day designed to achieve a 20% reduction in lowdensity lipoprotein cholesterol (LDL). At the end of the trial, it was found that 70% adherence led to >20% reduction in LDL (Schaefer et al, 1994). Blagden and Chipperfield (2007) examined LDL cholesterol level changes with atorvastatin vs. atorvastatin plus ezetimibe. With 70% or greater adherence in their sample, monitored via pill counts, the LDL levels decreased by 36.5 and 50.5%, respectively. In contrast such levels of adherence are not effective in reducing viral load in HIV patients. Recommendations for effective treatment of HIV are to achieve adherence levels close to 100% (www.apha.org/ppp/hiv). Unfortunately, data are not readily available on other drugs to establish an optimal adherence level. Even less is known about optimal adherence for nonpharmacological interventions. For patients on multiple treatments, the common pattern for those with chronic disease, the picture of optimal adherence is even more confusing. Further confounding the picture is the problem of medication which may have positive effects on one clinical parameter but negative effects on another. For example, Ames (1986) studied and reviewed studies of diuretics used in the treatment of hypertension to reduce high blood pressure levels and found that the
4 Measurement of Adherence 4.1 Numeric Assessment of Adherence The definition of adherence initially posed by Haynes and colleagues (1979), the percent or portion of the regimen carried out, suggests a numeric definition of adherence and the ability to count doses prescribed and taken. Four methods of assessment permit this, each with advantages and disadvantages. These include electronic monitoring, pill count, daily diary, and patient recall.
88
4.1.1 Electronic Monitoring Electronic event monitoring (EEM) has been used increasingly over the past two decades for adherence to medication and to exercise regimens. In these cases the monitor itself is connected with the regimen and accepts passive participation on the part of the patient. The most commonly used EEM for medication consists of a microprocessor inserted in a medication bottle cap, which is activated by opening or closing of the cap (AARDEX MEMS). The date and time of opening (and subsequently closing) the cap are recorded on the microprocessor. Thus, it is possible to monitor the number of doses accessed, as well as the timing of doses. The interval between doses may be important for drug efficacy. Errors in timing (or intervals between doses) have been found to be the error of greatest magnitude in medication adherence (Claxton, 2001). Data may be monitored for short or long intervals. Electronic technologies are also used in the assessment of adherence to lifestyle interventions, exercise, dietary behavior, and treatment of sleep apnea. Pedometers, accelerometers, and heart rate monitors allow freedom for the person exercising and are often used in research to measure physical activity (see Chapter 3). Studies have shown the reliability of these devices and often describe them as relatively inexpensive and simple to operate (Baker and Mautrie, 2005; Evangelista et al, 2005; Wilbur et al, 2001). Pedometers sense body motion and count footsteps. They are usually considered accurate if worn correctly and stride distance is predetermined. Pedometer readouts should be checked against known measurements like distance and time. Accelerometers are motion sensors that can detect changes in acceleration. Heart rate monitors usually strap a monitoring box with electrode on the chest and transmit the data to a watch-like receiver on the wrist that records heart rate over time. Dietary behavior also can be monitored through electronic diaries. Although current technologies such as personal digital assistants (PDAs) can provide dietary data in real time, all methods for
J. Dunbar-Jacob et al.
collecting dietary data have inherent problems for monitoring adherence to diet recommendations (Glanz and Murphy, 2007). Similarly, in the management of sleep disorders, continuous positive airway pressure (CPAP) devices utilize smart cards, modem, or web-based methodology to convey data regarding the nightly duration of therapy at effective pressure and patterns of use. CPAP adherence typically is defined as ≥4 h of use for 70% of days. However, a standard definition of CPAP adherence has not been established. With electronic monitoring, adherence itself is calculated by the number of presumptive events divided by the number of dosing events prescribed within the monitored time interval. The determination of what constitutes “good” adherence is left to the investigator or clinician. When gaps appear in dosing or a cessation of recorded events occurs, a concurrent interview is necessary to determine whether the patient utilized the monitor or whether the patient was hospitalized or otherwise had a change in either prescription or circumstances. The ultimate calculation of adherence permits the determination of the percent of doses or events, the percent of days on which the patient was adherent, as well as the percent of doses occurring within the scheduled interval. For exercise, intensity and duration can also be observed, and duration of CPAP use monitored. Additionally, information on “drug holidays,” periods of time off, as well as patterns of adherence may be viewed. There is evidence that these monitors can stimulate behavior change itself (Baker and Mautrie, 2005; Deschamps et al, 2006). There is evidence that the use of pedometers alone can increase reported motivation to exercise as well as increase self-reported physical activity, and actual physical activity as recorded by the pedometer, at least for the short term (Baker and Mutrie, 2005). In this particular study of intervention groups using the transtheoretical model to increase step count, significant reported increases in motivation and activity were only seen in the group using the pedometer.
7
Adherence to Medical Advice
4.1.2 Pill Counts Pill counts, one of the common measures of adherence in pharmacological clinical studies, also permit a numeric estimate of adherence. Adherence is calculated as the number of pills taken divided by the number prescribed over the interval of interest, typically between periods of dispensing. It is important to note that the pill count does not identify patterns of adherence nor interdose intervals. Patients who miss a dose of medication and then compensate by taking an extra dose the next day, a pattern found with electronic monitoring, is not identifiable; nor is it possible to discriminate early cessation of treatment from low but relatively stable dosing, both resulting in low adherence estimates. The pill count has been found to have a low but statistically significant correlation with EEM measures. For example, Hamilton (2003) reported correlation rates of 0.29–0.39, p < 01, for hypertensive patients. For AIDS patients, Bangsberg et al (2001) noted a correlation of 0.7, p < 0.001, between unadjusted EEM and pill counts. Pill counts typically estimate a higher adherence than EEM (Bangsberg et al, 2001; Hamilton, 2003). Therefore the choice of methods of adherence depends upon what the clinician or investigator is interested in detecting. If one is interested in early changes in patterns of adherence, poor timing of medication, or information for the development of early intervention strategies, the EEM will most likely be useful. If an overall interest in adherence is of interest, then the pill count may give a reasonable estimate.
4.1.3 Pharmacy Refills In a closed health system, where the provider of care and the dispensing pharmacy are fixed within the system, pharmacy refills may be used to estimate adherence. As with pill counts, the daily patterns of medication taking are not available. However a percent adherence can be calculated by examining the amount of medication dispensed divided by the number of tablets that
89
should have been taken between refills. As long as the patient remains in the system it is possible to detect withdrawal from treatment. Multiple methods of extracting data and estimating adherence are used, based on pharmacy fill rates and result in several measures. Hess and colleagues (2006) identified 11 measures in examination of pharmacy administration databases. These included “Continuous Measure of Medication Acquisition (CMA); Continuous Multiple Interval Measure of Oversupply (CMOS); Medication Possession Ratio (MPR); Medication Refill Adherence (MRA); Continuous Measure of Medication Gaps (CMG); Continuous, Single Interval Measure of Medication Aquisition (CSA); Proportion of Days Covered (PDC); Refill Compliance Rate (RCR); Medication Possession Ratio, modified (MPRm); Dates Between Fills Adherence Rate (DBR); and Compliance Rate (CR)” (Hess et al, 2006, p.280). Calculating rates of adherence to medication adherence by each mechanism for participants in a weight loss trial showed adherence rates ranging from 63 to 109.7%, depending on method of calculation. Thus, it is important to consider the procedure for calculating adherence over time from databases.
4.1.4 Daily Diaries Daily diaries form a third method of evaluating event data. Diaries have been used for patient reporting of treatment-related behavior for several decades. Patients or research participants are instructed to record events near to the time of occurrence to minimize forgetting. Further detail around the events may be recorded either qualitatively or as a component of the structured diary. Thus, information can be learned about the circumstances that surround errors in regimen management or successful performance. Diaries have been particularly useful in monitoring food intake and exercise. However, there have been some examples of use in medication management.
90
Unfortunately, studies of the accuracy of selfreport indicate that the data may be problematic. In a sample of women with sedentary lifestyles participating in a home-based walking program, self-report logs indicated that the women reported performing 64% of the prescribed walking exercises while the heart rate monitor data revealed that the women on average met 60% of the goal. This indicates a greater than 90% agreement (Wilbur et al, 2001). In a study comparing an instrumental paper and electronic diaries, however, 90% of events were reported on time but electronic assessment indicated that actual adherence was just 11%; in 32% of days with events entered, the diary had not been opened. Thus, false reporting was high (Stone et al, 2003). This also happens in the case of dietary diary entries. Patients may neglect to complete the diary as instructed and will consequently complete it prior to the clinic visit. The diary is dependent further upon the individual’s recall of the foods and beverages consumed and, in some instances, the amount consumed and the nutritional and caloric content. Furthermore, the act of recording food consumption may influence the person’s eating behavior resulting in an inaccurate representation of patient’s dietary intake. Additionally, patients may censor the report of food consumption in order to be in accordance with known dietary recommendations.
4.1.5 Daily Recall Recalls over a specific number of days may also be used to estimate percent adherence. If the patient can recall and is willing to report accurately, event data and timing can be assessed. Chesney and colleagues (2000) reported utility in 3 day recalls in identifying HIV patients with raised viral load. Studies have shown that physical activity recall questionnaires can provide a relatively accurate account of physical activity when compared with accelerometers with as much as 90% agreement. However, correlations between the subjective self-report data and the electronic data vary between gender, intensity of
J. Dunbar-Jacob et al.
the activity, and weight status of the individual (Timperio et al, 2003). Lu and colleagues (2008) reported that 1-month estimates were better than 3- or 7-day recalls when compared with EEM data. However, our own research has suggested that patients with rheumatoid arthritis may have difficulty in remembering the detail of medication taking beyond 3 days. In a 7-day recall of medication taking it was common for patients to begin to report “the same” beyond the third day (unpublished data). Lee et al, (2007) further reported that 24-h recalls were unrelated to pill counts and insensitive to temporal change. Thus, brief recalls may or may not correlate with concurrent clinical data. The question arises as to how much data can be reliably collected to build a picture of adherence over time.
5 Global Assessment of Adherence Many adherence studies have used assessment strategies which lack a numeric estimate of the portion of the regimen carried out. Examples include a variety of self-report questionnaires, interviews, and clinician estimates. An examination of one measure may present the issues that arise when self-reported questionnaire assessment is used. The most commonly used generic adherence questionnaire is the Morisky Medication Adherence Questionnaire (MMAQ), a four-item (or eight-item version) self-report inventory used to screen for poor adherence (Morisky et al, 1986). An adherence percent is not obtained. The questionnaire yields a score of 0–4, with 0 reflecting good adherence. Studies reflect varying levels of sensitivity and usefulness. For example, Ruslami and colleagues (2008) reported that a combination of the self-report and clinical estimate detected all cases of nonadherence reported by the Medication Event Monitoring System. Yet Shalansky et al, (2004) noted a considerable difference in the detection of nonadherence between the MEMS (13%) and the MMAQ (3%). It has been noted that questionnaire data for adherence may not correlate with clinical
7
Adherence to Medical Advice
data (Södergård et al, 2006) and that its utility may vary across settings (van de Steeg et al, 2009). As with patient recalls, the data rely upon the accuracy of the patient’s memory. And, as noted, the questionnaire does not yield information on the level of adherence over time nor the pattern of adherence. To be meaningful in assessing adherence the scoring and establishment of cut points would need to be considered carefully in conjunction with either more direct measures of adherence or established clinical cut points. It is also likely that the global measures will be most useful for recent periods when memory is most accurate. Analysis would only permit an estimation of the proportion of persons who recall and report problems related to adherence. It is unlikely to be useful in the assessment of adherence interventions as the sensitivity to change is unknown and unlikely to be sufficient to detect the modest changes seen in intervention studies (Arbuthnott and Sharpe, 2009; Conn et al, 2009; Kripalani et al, 2007).
6 Issues in Analysis of Adherence Data Analysis is influenced by the method of measurement chosen within a study. For the use of electronic monitoring, where the most detailed information is collected, several issues arise. First is the length of time that data are collected and summarized. Current technology permits the capture of data for 1 day up to 1–2 years. Thus it is important to examine the length of time that data need to be collated to reach a stable estimate of adherence (Houze, Sereika, DunbarJacob, unpublished). Deschamps and colleagues (2006) suggest that in HIV and in kidney transplant patients, an intervention effect of electronic monitoring can be found which decreased and stabilized over 35–50 days. Data can then be summarized over the relevant time period. The next issue with electronic monitoring is the determination of what view of adherence
91
is important. For example, a simple count of adherence events can be determined, much like a pill count. This will provide a percent of actual events compared with the percent of prescribed events. The outcome can be influenced by over-adherent events, yielding rates greater than 100% or masking the extent of poor adherence. Summarizing across patients can inflate the level of group adherence if there are overadherers within the group. An alternative view is the proportion of days in which the events were accurate. This overcomes the problem of adherence above 100%. Individuals who miss a dose in the evening and make it up the next day will appear as adherent when the count of doses is performed but will have 2 days of poor adherence when the proportion of days adherent is calculated. A third view is considering doses taken at the advised time, within a range. Adherence is likely to be lowest with this estimate. In cases where the timing of medication is important this assessment provides very useful data. Thus, estimations of individual adherence and of group adherence need to consider the view of adherence that is important. Similar considerations can be given to daily diary adherence, although there is less likely to be reliable data gathered. These are the only two methods of assessment which require a decision of this nature before adherence can be estimated and ultimately analyzed. Regardless of assessment method, the data for adherence over a group tend to be J-shaped (Dunbar-Jacob et al, 1998, see Fig. 7.4). Multiple strategies to transform the data have failed. Therefore non-parametric analyses are most useful. Newer strategies for analyzing J-shaped data are being examined and may yield more sensitive and accurate analytic strategies (Rohay, 2009). Unfortunately, often the level of detail just noted is missing from studies of adherence. Further parametric analyses are often presented, typically in the absence of information about the nature of the distribution of the data. Attention to the nature of measurement, the definitions, and view of adherence, as well as the use of appropriate analytic strategies are important
92
J. Dunbar-Jacob et al.
Fig. 7.4 Distribution of days adherent by EEM
to moving the field forward. Similarly metaanalyses need to consider not only intervention strategies, but also definitions and assessment strategies as well. Thus, the full picture of adherence, phase of adopting/managing the regimen, a prior definition of adherence, measurement method, and appropriate analytic strategy are crucial as we continue to develop an understanding of patient adherence.
7 Implications for Understanding Adherence Numerous studies have been undertaken in an effort to understand who is likely to have adherence difficulties. The results of these studies have shown inconsistent relationships between predictors and adherence (Dunbar-Jacob et al, 2009). Few predictors have been found to be very robust within studies. It is not unreasonable to find inconsistency in the prediction of adherence when we note the variability in the phase of adhering to a treatment and the inconsistency in classification of a person’s adherence given
the varying methods of defining and assessing adherence. More careful description of the population and its stage of treatment (agreement with treatment, initiation of treatment, adjustment to new treatment, continuation of treatment) as well as clearer descriptions of the definition and assessment methodology will be required before we can begin to understand the predictors of adherence. Similarly, numerous studies have examined strategies to improve adherence. A meta-analysis by Peterson et al (2003) showed that interventions increased adherence by 4–11%, a very small amount. Kripalani and colleagues (2007) reported that just 54% of studies reviewed reported improvements in adherence while just 30% showed clinical improvements, not always related to adherence. Looking within hypertension care, Schroeder et al (2004) found that 78% of adherence studies which simplified the regimen, 44% of those using complex interventions, and 42% of those using motivational strategies reported improvements in adherence. Adherence improvements ranged from 5 to 41%. However, the heterogeneity in measurement of adherence and methods of study prevented conduct
7
Adherence to Medical Advice
of a pooled analysis. Thus, our knowledge of both intervention strategies and of predictors of adherence is hampered by the variability with which adherence is treated in studies.
8 Summary and Recommendations As we examine the processes and measurement of patient adherence, we find considerable heterogeneity between studies in terms of definitions, measurement, analytic strategies, and the patient’s phase of adopting and maintaining a new treatment regimen. This has resulted in difficulty in evaluating strategies for improving adherence as well as in identifying factors robustly and consistently associated with adherence. The processes required at the different phases of regimen behaviors are likely to be associated with different predictor variables and likely to be responsive to different strategies. However, future research is needed to evaluate more precisely the factors that impact the patient’s behaviors during the various processes of accepting, initiating, implementing, and sustaining adherence to a new treatment. Similarly future research needs to examine intervention strategies designed for each phase. Both measurement and analysis strategies can influence the outcomes of studies. Measurement strategies should be chosen with care, selected with attention to the sensitivity to adherence itself and sensitivity to change. Similarly, analysis strategies need to be appropriate for the measurement strategy and the nature of the adherence distribution. While much has been learned about adherence over recent decades, our future understanding can be deepened with greater attention to processes and measures of adherence.
References APHA (2004). Adherence to HIV treatment regimens: recommendations for best practices. www.apha.org/ppp/hiv June 2004. Accessed January 6, 2010.
93 Aloia, M. S., Arnedt, J. T., Stpenowsky, C., Hecht, J., and Borelli, B. (2005). Predicting treatment adherence in obstructive sleep apnea using principles of behavior change. J Clin Sleep Med, 1, 354–356. Ames, R. P. (1986). The effects of antihypertensive drugs on serum lipids, I. diuretics. Drugs, 32, 260–278. Arbuthnott, A., and Sharpe, D. (2009). The effect of physician-patient collaboration on patient adherence in non-psychiatric medicine. Pat Educ Couns, 77, 60–67. Baiardini, I., Braido, F., Bonini, M., Compalati, E. and Canonica, G. W. (2009). Why do doctors and patients not follow guidelines? Curr Opin Allergy CI, 9, 228–233. Baker, G., and Mutrie, N. (2005). Are pedometers useful motivational tools for increasing walking in sedentary adults? Paper presented at Walk21-VI, 6th international conference on walking in the 21st century, Zurich, Switzerland. Bangsberg, D. R., Hecht, F. M., Charlebois, E. D., Chesney, M., and Moss, A. (2001). Comparing objective measures of adherence to HIV antiretroviral therapy: electronic medication monitors and unannounced pill counts. AIDS Behav, 5, 275–281. Barnhofer, T., Kuehn, E., and de Jong-Meyer, R. (2005). Specificity of autobiographical memory and basal cortisol levels in patients with major depression. Psychoneuroendrocrinology, 30, 403–411. Baumhakel, M., Muller, U., and Bohm, M. (2009). Influence of gender of physicians and patients on guideline-recommended treatment of chronic heart failure in a cross-sectional study. Eur J Heart Fail, 11, 299–303. Bieber, C., Muller, K. G., Blumenstiel, K., Schneider, A., Richter, A., et al (2006). Long-term effects of a shared decision-making intervention on physicianpatient interaction and outcome in fibromyalgia: a qualitative and quantitative 1 year follow-up of a randomized controlled trial. Patient Educ Couns, 63(3), 357–366. Blagden, M. D., and Chipperfield, R. (2007). Efficacy and safety of ezetimibe co-administered with atorvastatin in untreated patients with primary hypercholesterolaemia and coronary heart disease. Curr Med Res Opin, 23, 767–775. Chapman, R. H., Benner, J. S., Petrilla, A. A., Tierce, J. C., Collins, S. R. et al (2005). Predictors of adherence with antihypertensive and lipid-lowering therapy. Arch Intern Med, 165, 1147–1152. Chesney, M. A., Ickovics, J. R., Chambers, D. B., Gifford, A. L., Neidig, J. et al (2000). Self-reported adherence to antiretroviral medications among participants in HIV clinical trials: The AACTG Adherence Instruments. AIDS Care, 12, 255–266. Claxton, A. J., Cramer, J. and Pierce, C. (2001). A systematic review of the associations between dose regimens and medication compliance. Clin Ther, 23, 1296–1310.
94 Conn, V. S., Hafdahl, A. R., Cooper, P. S., Ruppar, T. M., Mehr, D. R. et al (2009). Interventions to improve medication adherence among older adults: meta-analysis of adherence outcomes among randomized controlled trials. Gerontologist, 49, 447–462. Cramer, J. A., Scheyer, R. D., and Mattson, R. H. (1990). Compliance declines between clinic visits. Arch Intern Med 150, 1509–1510. Deschamps, A. E., van Wijngaerden, E., Denhaerynck, K., De Geest S., and Vandamme, A. M. (2006). Use of electronic monitoring induces a 40-day intervention effect in HIV patients. J Acq Immun Def Synd, 43, 247–248. Donnelly, L. A., Doney, A. S. F., Morris, A. D., Palmer, C. N. A., and Donnan, P. T. (2008). Long-term adherence to statin treatment in diabetes. Diabetes Med, 25, 850–855. Dunbar-Jacob, J., Erlen, J. A., Schlenk, E. A., Ryan, C. M., Sereika, S. M. et al (2000). Adherence in chronic disease. Annu Rev Nurs Res, 18, 48–90. Dunbar-Jacob, J. Gemmell, l. A., and Schlenk, E. A., (2009). Predictors of patient adherence: patient characteristics. In J. K. Ockene & K. A. Riekert (Eds.). The Handbook of Health Behavior Change, 3rd Ed (pp. 397–410). New York: Springer. Dunbar-Jacob, J., Sereika, S., Rohay, J., and Burke, L. (1998). Electronic methods in assessing adherence to medical regimens. In D. Krantz & A. Baum (Eds.). Technology and Methods in Behavioral Medicine (pp. 95–113). Mahwah, NJ, Lawrence Erlbaum Associates. Evangelista, L. S., Dracup, K., Erickson, V., McCarthy, W. J., Hamilton, M. A. et al (2005). Validity of pedometers for measuring exercise adherence in heart failure patients. J Cardiac Fail, 11, 366–371. Gershberg, H., Javier, Z., Hulse, M. and Hecht, A. (1968). Influence of hypoglycemic agents on blood lipids and body weight in ketoacidosis-resistant diabetics. Ann New York Acad Sci, 148, 914–924. Glanz, K. and Murphy, S. (2007). Dietary assessment and monitoring in real time. In A. Stone, S. Shiffman, A. Atienza, & L. Nebeling (Eds.), The Science of Real Time Data Capture: Self-Reports in Health Research (pp. 151–168). New York: Oxford University Hamilton, G. (2003). Measuring adherence in a hypertension clinical trial. Eur J Cardiovasc Nurs, 2, 219–228. Haynes, R. B. (1979). Introduction. In R. B. Haynes, D. W. Taylor, & D. L. Sackett (Eds.), Compliance in Health Care (pp. i–xv). Baltimore: Johns Hopkins University Press. Haynes, R. B., Sackett, D. L., Gibson, E. S., Taylor, D. W., Hackett, B. C. et al (1976). Improvement of medication compliance in uncontrolled hypertension. Lancet, 1, 1265–1268. Hess, L. M., Raebel, M. A., Conner, D. A., and Malone, D. C. (2006). Measurement of adherence in pharmacy administrative databases: a proposal for standard definitions and preferred measures. Ann Pharmacother, 40, 1280–1288.
J. Dunbar-Jacob et al. Houze, M., Sereika, S., and Dunbar-Jacob, J. (2009). Medication adherence: time to stability. Unpublished manuscript. Rohay, J. (2009). Statistical assessment of medication adherence data: a technique to analyze the J-shaped curve. Doctoral Thesis, University of Pittsburgh. Jackevicius, C. A., Paterson, J. M., and Naglie, G. (2006). Concordance between discharge prescriptions and insurance claims in post-myocardial infarction patients. Pharmacoepidem Dr S,16, 207–215. Kerse, N., Buetow, S., Mainous, A. G., Young, G., Coster, G. et al (2004). Physician-patient relationship and medication compliance: a primary care investigation. Ann Fam Med, 2, 455–461. Kripalani, S., Yao, X., and Haynes, R. B. (2007). Interventions to enhance medication adherence in chronic medical conditions: a systematic review. Arch Intern Med, 167, 540–549. Lee, J. K., Grace, K. A., Foster, T. G., Crawley, M. J., Erowele, G. I. (2007). How should we measure medication adherence in clinical trials and practice? Ther Clin Risk Manag, 3, 685–690. Lewiecki, E. M. (2005). Review of guidelines for bone mineral density testing and treatment of osteoporosis Curr Osteo Rep, 3, 75–83. Loh, A., Simon, D., Wills, C. E., Kriston, L., Niebling, W. et al (2007). The effects of a shared decisionmaking intervention in primary care of depression: a cluster-randomized control trial. Pat Educ Couns, 67, 324–332. Lu, M., Safren, S. A., Skolnik, P. R., Rogers, W. H., Coady, W. et al (2008). Optimal recall period and response task for self-reported HIV medication adherence. AIDS Behav, 12, 86–94. McPherson, F. (2001). Autobiographical memory. http://www.memory-key.com/EverydayMemory/ autobiographical.htm. Morisky, D. E., Green, L. W., and Levine, D. M. (1986). Concurrent and predictive validity of a self-reported measure of medication adherence. Med Care, 24, 67–74. Newman, S., Steed, L., and Mulligan, K. (2004). Selfmanagement interventions for chronic illness. Lancet, 364, 1523–1537. Ott, S. M., LaCroix, A. Z., Ichikawa, L. E., Scholes, D., and Barlow, W. E. (2003). Effect of low-dose thiazide diuretics on plasma lipids: results from a double-blind, randomized clinical trial in older men and women. J Am Ger Soc, 5, 340–347. Perreault, S., Lamarre, D., Blais, L., Dragomir, A., Berbiche, D. et al (2005). Persistence with treatment in newly treated middle-aged patients with essential hypertension. Ann Pharmacother, 39, 1401–1408. Peterson, A. M., Takiya, L., and Finley, R. (2003). Meta-analysis of trials of interventions to improve medication adherence. Am J Health-Syst Pharm, 60, 657–665. Ruslami, R., Crevel, R. v., de, B. E. v., Alisjahbana, B., and Aarnouste, R. E. (2008). A step-wise approach
7
Adherence to Medical Advice
to find a valid and feasible method to detect nonadherence to tuberculosis drugs. SE Asian J Trop Med, 39, 1083–1087. Rutledge, J. C., Hyson, D. A., Garduno, D., Cort, D. A. et al (1999). Lifestyle modification program in management of patients with coronary artery disease: the clinical experience in a tertiary care hospital. J Cardiopul Rehab Prev, 19, 226–234. Sankar, A. P., Nevendal, D. C., Neufeld, S., and Luborsky, M. R. (2007). What is a missed dose? Implications for construct validity and patient adherence. AIDS Care, 19, 775–780. Schaefer, E. J., Lamon-Fava, S., Jenner, J. L., McNamara, J. R., Ordovas, J. M. et al (1994). Lipoprotein(a) levels and risk of coronary heart disease in men: The Lipid Research Clinics Coronary Primary Prevention Trial. JAMA, 271, 999–1003. Schroeder, K., Fahey, T., and Ebrahim, S. (2004). How can we improve adherence to blood pressure-lowering medication in ambulatory care? Arch Intern Med, 164, 722–732. Shah, N. R., Hirsch, A. G., Zacker, C., Taylor, S., Wood, G. C. et al (2008). Factors associated with first-filled adherence rates for diabetic medications: a cohort study. J Gen Intern Med, 24, 233.237. Shah, N. R., Hirsch, A. G., Zacker, C., Wood, G. C., Schoenthaler, A. et al (2009). Predictors of firstfill adherence for patients with hypertension. Am J Hypertension, 22, 392–396. Shalansky, S. J., Levy, A. R., and Ignaszewski, A. P. (2004). Self-reported Morisky score for identifying nonadherence with cardiovascular medications. Ann Pharmacother, 38, 1363–1368. Södergård, B., Halvarsson, M., Brannstrom, J., Sonnerborg, A., and Tully, M. P. (2006). A comparison between AACTG adherence questionnaire and the 9-item Morisky medication adherence scale in HIV-patients. Int Cong Drug Therapy HIV, 8: Abstract No. P174. Stilley, C. S., Sereika, S., Muldoon, M. F., Ryan, C. M., and Dunbar-Jacob, J. (2004). Psychological and cognitive function: predictors of adherence with
95 cholesterol lowering treatment. Ann Behav Med, 27, 117–124. Stone, A. A., Shiffman, S., Schwartz, J. E., Broderick, J. E., and Hufford, M. R. (2003). Compliance with paper and electronic diaries. Comp Clin Trials 24, 182–199. Thier, S. L., Yu-Eisenberg, K. S., Leas, B. F., Cantrell, R., DeBussey, S., Goldfarb, N. I., and Nash, D. B. (2008). In chronic disease, nationwide data show poor adherence by patients to medication and by physicians to guidelines. Managed Care, 17, 48–52, 55–47. Thom, D. H., and the Stanford Trust Study Physicians. (2001). Physician behaviors that predict patient trust. J Fam Pract, 50, 323–328. Timperio, A., Salmon, J., and Crawford D. (2003). Validity and reliability of a physical activity recall instrument among overweight and non-overweight men and women. J Sci Med Sport, 6, 477–491. Van Dulmen, S., Sluijs, E., van.Dijk, L., de.Ridder, D., Heerdink, R. et al (2008). Furthering patient adherence: a position paper of the international expert forum on patient adherence based on an internet forum discussion. BMC Health Serv Res, 8, 1–8. Van de Steeg, N., Sielk, Pentzek, M., Bakx, C., and Altiner, A. (2009). Drug-adherence questionnaires not valid for patients taking blood-pressure-lowering drugs in a primary health care setting. J Eval Clin Practice, 15, 468–472. Wagner, G., and Rabkin, J. G. (2000). Measuring medication adherence: are missing doses reported more accurately than perfect adherence?. AIDS Care, 12, 405–408. Warnecke, R. B., Sudman, S., Johnson, T. P., ORourke, D., Davis, A. M. et al (1997). Cognitive aspects of recalling and reporting health-related events: Papanicolaou smears, clinical breast examinations, and mammograms. Am J Epidemiol, 148, 11, 982–992. Wilbur, J., Chandler, P., and Miller, A. M. (2001). Measuring adherence to a women’s walking program. West J Nurs Res, 23, 8–24.
Part II
Psychological Processes and Measures
Chapter 8
Ecological Validity for Patient Reported Outcomes Arthur A. Stone and Saul S. Shiffman
Asking people about their health, symptoms, attitudes, opinions, and behaviors is ubiquitous in the behavioral, social, and medical sciences (Stone et al, 2000). For many areas of inquiry in these fields, it is impossible to contemplate research programs without selfreports. Self-reports often serve as primary outcome measures, for instance, in assessing pain, fatigue, opinions, or attitudes; self-reports are the accepted standard for these constructs and “objective” alternative measures usually are not available. Even when objective measures are possible in principle, we often rely on self-report data (e.g., smoking behavior, asthma attacks, social interactions), because the costs of objective data collection (via behavioral observations, for example) are prohibitive. Patient Reported Outcome (PRO) is a new term used to describe self-reports when they are used as outcome measures in trials (FDA Docket No. 2006D-0044; Rock et al, 2007). The importance of PROs to the behavioral and medical research enterprise has been highlighted
A.A. Stone () Department of Psychiatry and Behavioral Science, Stony Brook University, Stony Brook, NY 11994-8790, USA e-mail: [email protected] AAS is the associate chair of the Scientific Advisory Board of invivodata, inc., a company that supplies electronic data capture services for clinical research and is a senior scientist at the Gallup Organization. SS is a founder of invivodata, inc. and the chair of its Scientific Advisory Board.
recently. The US Food and Drug Administration is in the process of setting standards for PROs used in clinical trials submitted in support of drug or device approvals and claims (FDA Docket No. 2006D-0044). The National Institutes of Health (NIH) has also devoted one of its Roadmap Projects, which are large-scale, high priority initiatives intended to advance health research, to the development of psychometrically sophisticated PROs for use with chronically ill individuals participating in clinical trials (www.nihpromis.org). There is also no doubt that PROs are essential for the delivery of medical care, where they provide essential information about patient functioning and satisfaction with services. An important feature of PRO assessments, as they have traditionally been implemented, is that they have generally been obtained in relatively artificial or unusual settings, such as clinics and research laboratories, and by having participants recollect and/or reflect on their past experiences. The purpose of this article is to discuss the potential value of collecting PRO data in participants’ natural environments and with minimal recourse to recall by systematically and repeatedly sampling self-reports in peoples’ daily environments, offering the possibility of truly representative sampling. In the first section of the article, we review the concept of sampling everyday life, its implications for ecological validity, and how it could affect self-report information and PROs. We discuss studies from cognitive science, autobiographic memory, and survey design inform this
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_8, © Springer Science+Business Media, LLC 2010
99
100
discussion. In the second section, we review methodologies and technologies that enable collection of self-reports in peoples’ typical environments, enhancing the representativeness of the resulting data.
1 Ecological Validity and Self-Reports Today, when we think of the degree to which behavior observed in a research setting such as a research laboratory is generalizable to realworld behavior, we call this “ecological validity” (Hammond and Stewart, 2001). Over 70 years ago, ecological validity was first used in Brunswik’s 1944 paper examining the perceptual phenomenon known as size constancy (Brunswik, 1944) – the ability of people to correctly judge the size of objects despite the fact that the projection of objects on the retina varies with viewing distance. Brunswik’s interest was in how naturally occurring cues associated with objects, such as distance from object, were used by the individual to estimate size. In one study that presages the methods described later in this chapter, he recorded over several weeks randomly selected moments from a subject’s daily routine and noted the retinal projection (via a photograph of the object), object size, and the subject’s estimation of size. The innovative feature of his design was the evaluation of the natural, ecological association of objects and their associated cues, in contrast to possibly artificial associations based on laboratory investigations, where the constellation of stimulus qualities bore little resemblance to those encountered by people in everyday life. “Representative design” was the term Brunswik coined to refer to the degree that a laboratory experiment corresponded with a particular set of environmental circumstances to which the results of the experiment were to be generalized – what we now call ecological validity. In keeping with contemporary parlance, we use the term ecological validity in its modern
A.A. Stone and S.S. Shiffman
meaning, while acknowledging its historical evolution.
2 Momentary, Retrospective, and Global Self-Report For this discussion, we describe three types of self-reports defined by the cognitive tasks inherent in making the reports. We shall refer to these as momentary states, retrospective summaries, and global reports. Momentary state questions ask people to describe some aspect of their immediate state, for example, their current mood, symptoms, and circumstances. A question about immediate pain intensity could use, for example, the following wording: “Please indicate your current pain intensity.” Most assessment in medicine and behavioral science, however, does not focus on momentary assessment, but for practical reasons typically asks for a summary of experience over a period of time or about a past experience at a particular time. These are called retrospective self-reports, and the time frame for these questions can range from the last day to one’s entire life. The important idea is that the intention of the question is to capture information outside of immediate experience, which is presumed to be available in memory. Examples of typical recall questions include “Please indicate your average pain intensity over the past month,” which asks the respondent not only to recall but then to summarize (average) the retrieved results, and “When was the last time you stayed overnight in a hospital?” which asks for a specific fact relating to a particular occasion. The third type of self-report, global report, does not have any time frame at all, but rather asks the respondent to generalize globally or universally. “Generally speaking, how happy a person are you?” and “Are you prone to anxiety?” are examples of global questions. These questions seek information about a person in general. They might be equivalent to retrospective summaries over a lifetime, but that is not clear.
8
Ecological Validity for Patient Reported Outcomes
3 Does Ecological Validity Matter for Self-Report? Our focus is on the relevance of ecological validity in the three kinds of self-reports, and we believe this depends on two things. The first is whether or not the phenomenon to be captured by self-report varies over time and situation, and the second is whether or not individuals can accurately recall and summarize it without distortion. In brief, we believe that special procedures are needed to assure ecological validity when a phenomenon varies over time and when respondents are not able to accurately recall and/or summarize it. Under these conditions, asking respondents for their impression of experience over some finite time period will yield results that may not accurately reflect real-world experience.
3.1 Variability over Time and Situation When the variable under study does not vary with time and circumstances (e.g., the respondent’s gender), any method of self-report (your current state, your state yesterday, your state in general) will yield the same answer, making issues of recall and ecological validity moot. However, most of the phenomena we study do vary over time for several reasons. They may vary due to the impact of the immediate context (physical setting [work/home, outside/inside, and other physical qualities] or social setting [whom with, type of activity, and other interpersonal qualities]); due to maturation of the individual and associated change; or due to temporal effects such as time-of-day, day-of-week, season, etc. These factors create true variation, not just variation due to measurement error (noise), and investigators have an interest in that true variation. When such variation exists, and individuals are not capable of producing an unbiased summary of the variable experience (discussed
101
in next section), consideration of ecological validity is essential. As an example of how environmental variability demands consideration in the design of studies to ensure ecological validity, consider an investigator who is trying to characterize participants’ emotional state over a period of time. Now, affect is known to vary depending on the circumstances and setting. To achieve an accurate assessment of “average mood,” which might serve as an investigator’s outcome variable in trial, one would need to consider the full range of settings that the individual encountered – their mood may have been relaxed at home, but tense at work, or relaxed at work on Tuesday, but tense at work on Wednesday. In this case, it would be misleading to assess mood at work only or on Tuesdays only. The full range (or a representative range) of experiences and contexts would need to be taken into account, and properly weighted, to achieve an unbiased assessment of mood over the period. If one believes that individuals are capable of retrieving this information and weighting it appropriately, then recall summaries would be considered valid. If one concludes that we are not consistently capable of such cognitive feats, then one may need to actually sample and assess experiences across a range of time and settings (methods for doing so are discussed later).
3.2 Accuracy of Recall and Summary Processes We have suggested that conclusions about respondents’ ability to accurately recall and summarize the past are vital to determining how one collects data. Key to appreciating the limits of autobiographical memory is understanding the process of recording, retrieval, and summary of information about past experience. How, then, do we generate recall of and summarize our past states? Research indicates that the process of generating such “memories” is more accurately characterized as reconstruction (Menon
102
and Yorkston, 2000; Schwarz and Strack, 1991) than simple retrieval.1 Memories can be reconstructed using a variety of heuristic strategies to build plausible responses that usually serve adequately for memory’s everyday adaptive uses. The use of heuristics is a critical point, because heuristic strategies can introduce significant bias. Ironically, for environmentally sensitive variables, the subject’s state at the time of the recall, which is itself subject to the effects of the recall setting, can influence recall and summary processes. For example, several studies have shown that the pain levels experienced at the time of assessment biases the recall of past pain, such that respondents in current high levels of pain recall more pain (Eich et al, 1985; Linton and Melin, 1982; Smith and Safer, 1993).2 In another example, Schwarz has shown that very small pleasures (finding a dime) just prior to assessment can have large impacts on responses to global well-being questions (Schwarz, 1996). Or, that bringing to mind remembrances of events that pertain, at least in part, to the broad question have the effect biasing responses toward the recently recalled experiences (Schwarz, 1996). Current states skew both what information we retrieve about the past (e.g., mood congruent recall; Clark and Teasdale, 1982) and how we interpret that information. In other words, our summaries of past experience are not built from objective, statistical summaries of the past, but are influenced by our present condition. In a similar way, participants’ recall of experience is overly influenced by the most intense and the most recent experiences during the target reporting period; this has been called the
1In
fact, retrieval is not a simple process in that what is retrieved may be influenced by the individual’s psychological state at the time of retrieval. For example, unpleasant memories are more accessible when an individual is in a negative affective state than when in positive affective state (Kihlstrom, et al 2000). 2A respondent’s affective or pain state at the time of retrospection also influences the accessibility of certain memories and the heuristic processes used to summarize retrieved memories.
A.A. Stone and S.S. Shiffman
“peak-end” effect (Fredrickson, 2000). Both the undue influence of our current state and that of recent and intense experiences are attributable to the influence of what is most “memorable” or salient, and the consequent under-weighting of routine experience – the fabric of everyday life – often resulting in systematic bias in recall (Kahneman et al, 1999). It should be noted that these heuristics operate rapidly and out of consciousness, as demonstrated by their impact in laboratory studies examining shortterm recall (e.g., Redelmeier’s colonscopy studies, Redelmeier and Kahneman, 1996). So, research participants, who are usually doing their best to provide accurate recall, are not aware that their recall reports are biased and have no ready way to avoid the bias. Not only do heuristics produce bias (that is, systematic errors) in contrast to merely injecting “noise” (random error) into recall, but the use of particular heuristics may vary between persons and across contexts, making it difficult to devise strategies that correct for heuristic bias and, more broadly, making the interpretation of recall reports exceedingly challenging. Recall is also influenced by semantic memory, that is, generalized knowledge or belief (e.g., about myself, about work; Robinson and Clore, 2002; Ross, 1989). This may be especially prevalent when memories of an event, which may or may not be accurate, do not spring into mind. Memories constructed in this way are often “adjusted” to make them conform to logical scripts about events based on broader beliefs about behavior (in general or one’s own) – they represent “what should have happened” or “what must have happened.” Ross (1989) has shown that participants distort their recall to conform to their “personal theories” about behavior, for example, ideas about how stable or changeable their behavior is, beliefs about the influence of events on behavior, or their beliefs or ideals about themselves. These biases are particularly troubling, because they can generate “recall” that is psychologically coherent and consistent with theory (and thus easily accepted by scientists), but not based on fact. For example, participants who believe they have painful menstrual
8
Ecological Validity for Patient Reported Outcomes
periods tend to “recall” such pain (and investigators may accept such reports), even when their own real-time reports showed they did not experience them (McFarland et al, 1989; see also Shiffman et al, 1997). Thus, cognitive science tells us that autobiographical memory is subject to substantial biases that can significantly distort self-reports. We next examine the implications of recall and summary processes for the different types of self-reports.
3.3 Implications for Global Reports Evaluating the impact of accuracy and summary processes on global reports is difficult because it is not clear exactly how global assessment should line up with actual experience. If one assumes that global questions are meant to or are interpreted as reflecting experience – perhaps not an unreasonable assumption in many cases – then all of the troublesome processes associated with recall reports are applicable. Furthermore, there is evidence that ambiguity about what information is sought by a question and/or the inability to access that information from memory disposes respondents to answer on the basis of semantic memory (Robinson and Clore, 2002). Particularly when it is not clear what memories are relevant over what period, global questions will tend to pull for answers based on beliefs and attitudes. Although semantic memory has a connection to experience, that connection can be a loose one because other factors, such as beliefs, personality, and contextual cues. If it is actual experience that one seeks, then answers based on semantic memory are not ideal. On the other hand, if one is interested in beliefs or opinions – and not actual experience – then global reports may be optimal. Beliefs and opinions can shape current and future behavior, so are of practical value and worthy of study in their own right, but care must be taken to distinguish between these beliefs and actual past behavior and experience, which may not be accurately reflected.
103
3.4 Implications for Retrospective Reports The validity of retrospective self-reports depends on reporters’ ability to accurately recall experience. As discussed above, cognitive research suggests that much of the information we seek about past behavior or experiences is not available in memory; we simply do not store such detailed and comprehensive information (Bradburn et al, 1987; Robinson and Clore, 2002; Schwarz and Oyserman, 2001; Schwarz et al, 1994; Thompson et al, 1996). Accuracy of recall and summary processes are, then, a major concern for interpreting recall reports. The extent of the concern, however, should be moderated by the nature of the recall content, as certain material (e.g., major, “unforgettable” events) may be less susceptible to memory failures, although still may be subject to the vagaries of summary processes. It is also important to recognize that even “incorrect” or distorted recall can have substantial predictive validity. Some studies have shown that one’s distorted memory or characterization of events can be a better predictor of future behavior than the actual experience. After all, it is this stored summary, however biased, that we later retrieve as a reference for future informing attitudes or directing our behavior (e.g., recalling how painful a previous colonoscopy was in order to decide whether to get another one; Redelmeier et al, 2003). Thus, there is value to the information held in retrospective reports, even when it does not faithfully reflect experience, but care must be taken not to interpret it as a true account of past events.
3.5 Implications for Momentary Reports Assessments of current experience are not subject to recall bias, so the heuristics associated with memory processes are not much of a problem for these assessments. In contrast to the
104
difficulty recovering accurate information about the past, Robinson and Clore (2002) and others have argued that we have good access to our current or very recent states; that is, questions about immediate state are answered by retrieval of experience and not by reference to beliefs. However, to say that momentary reports are immune to recall biases is not to say that such reports are entirely accurate and reliable, because self-reports are susceptible to other distortions that can influence the assessment (for example, the desire to present oneself in a favorable light; Schwarz, 2007), but at least the biases introduced by memory processes are minimized. In summary, recall and global questions are prone to bias due to the limitations of memory capacity and to the ways that people reconstruct and summarize experiences over time. These biases threaten the validity of the resulting reports when those reports are meant to represent the actual experiences the individual had over the specified period recalled. Immediate reports can escape biases due to recall processes, but raise new challenges for achieving ecological validity.
4 Rationale for Taking Self-Report into Everyday Life Despite the potential problems identified for recall and global questions, these types of questions have dominated the field of self-report assessment. First, recall is subjectively compelling: We trust our own memories unquestioningly most of the time, so it seems natural to trust our participants’ memory as well, particularly when they don’t seem to have a motive for dissembling. Yet research has shown that confidence in a particular memory is often unrelated to its accuracy (Busey et al, 2000; Wells and Bradfield, 1998). Additionally, the nature of memory and its tendency to bias is a relatively recent discovery and has not yet penetrated deeply into thinking about research methods. Recall methods are also used because they are enormously convenient and efficient: In a relatively brief period, the researcher or clinician
A.A. Stone and S.S. Shiffman
is able to gather information on long periods of time, often up to years in duration, and on a wide variety of environments. If recall and global methods were capable of providing accurate information over such periods, there would be very little reason to consider alternatives. But, as the prior section of this chapter has shown, recall and recall self-reports may not be up to the task of providing truly accurate information about experience, at least some of the time. If memory cannot be relied upon, then momentary assessments become essential. However, momentary assessments are limited by their very immediacy and narrow focus to what is happening now, at the moment of assessment, which is not often the investigator’s focus. We earlier stated that many phenomena of interest vary across time and environmental context. It follows that momentary reports of those phenomena will vary by context. Thus, if momentary reports are to represent the person’s overall experience, they would have to be collected in those contexts. No one momentary report could represent the subject’s experience – there would have to be many. And, to achieve ecological validity, they would have to be collected in a wide range of real-world contexts, representatively sampling participants’ momentary states across the range of settings they encounter. These elements – real-time data collection about momentary states, repeated assessments, and sampling of real-world settings – form the core of the approach we have called Ecological Momentary Assessment (EMA; Stone et al, 1994, 2007; Shiffman et al, 2008). Modern EMA methods have made use of innovations in data-collection technology, but EMA is not primarily a technological development. It more fundamentally addresses the design of data-collection protocols in relation to study objectives. Bolger and colleagues (2003) have enumerated three broad functions of EMA data collection: characterizing persons and individual differences (e.g., level of depressive symptoms); estimating within-person variability (e.g., standard deviation of pain intensity levels over a 1 month period); and estimating
8
Ecological Validity for Patient Reported Outcomes
within-person associations among two or more variables (e.g., association between changes in sleep and gastrointestinal symptoms the following day or between time-of-day and fatigue levels). The reader is referred to Stone et al (2006) and Shiffman et al (2009) for examples demonstrating these uses of EMA data. Aside from addressing issues of recall bias, EMA methods conceptually address other issues discussed in the psychological assessment literature. The first concerns the arbitrary nature of measurement often associated with psychological assessments, a topic recently reviewed by Blanton and Jaccard (2006). In essence, it is difficult to understand the meaning of scale scores on many instruments, because they are not linked to other referents. So, when an individual moves from an affect score of 50–60 on a 100-point scale, it is impossible to know exactly how their affect has changed. Because EMA protocols can representatively sample over time, it is possible to express the observations by estimating the proportion of time an individual has experienced some state (e.g., is angry, by some definition) or is in a particular environment (e.g., at work). Such “prevalence” metrics offer the advantage of being easily interpretable and, further, they possess ratio level measurement qualities. The clear labeling afforded by such measures enhances the opportunity to develop strong theories and interventions, which is not the case when there is less certainty about a measure’s meaning (Blanton and Jaccard, 2006). Also consistent with recommendations of Blanton and Jaccard is the emphasis on the assessment of real-world occurrences. A second conceptual issue concerns the place of EMA data in an assessment model, which is pertinent to considerations of its usefulness in theory development. Here we refer to the framework developed by McFall (2005) in an article on theory and utility in evidence-based assessment. In our view, self-report EMA data can be considered an instance of a “sample” approach versus the alternative “sign” approach to assessment. This is because EMA measures often directly assess the target experience or behavior, rather than some other construct that is simply
105
associated with the target. Signs, on the other hand, are indirect measures that simply have predictive utility (as in an actuarial prediction where any variable statistically associated with an outcome can be used to improve prediction, even if it has no conceptual overlap). Importantly, because there can be recall and summary problems with self-report data that can invalidate an assessment, EMA data may have unique value in providing proximal sample data for assessment. For example, one method for measuring coping with difficult events is based upon 1month recall of the problem and the thoughts and behaviors used to cope with the problem. We compared real-time reports of these thoughts and behaviors with the recalled ones and found major discrepancies (Stone et al, 1998). Similarly, we compared global reports of smoking patterns to detailed real-time self-monitoring and found little correspondence (Shiffman, 1993); only the real-time data predicted subsequent relapse (Shiffman et al, 2007). In both cases, the real-time data might be considered a preferable sample. The next issue concerns the isomorphism between recall measures of an outcome and EMA measures of an outcome. As mentioned above, using EMA to characterize overall levels of a self-report variable over a defined period of time is one of its primary uses. Little would be gained by using EMA methods if the resulting data were identical to those obtained by recall methods. Although there is a surfeit of information about potential reasons for achieving different results with the two methods, there is a paucity of empirical data documenting differences. In directly comparing data produced by the methods, two types of comparison emerge: (1) differences in levels (assuming the same measurement metric was used for both methods) and (2) differences in correspondence between rank-orderings of individuals by the methods (e.g., the correlation between the scores) (Stone et al, 2004; Shiffman et al, 2008). Our own work on the assessment of pain intensity in patients with chronic pain disorders has partially addressed this question. Regarding differences in level of reporting, retrospective
106
assessments produce higher levels of pain when compared to the average of momentary reports for the same period of time, and the discrepancy between the methods increases as the reporting period increases (Broderick et al, 2008). One possible explanation for these results is that the peak heuristic, which posits a particular focus on high levels of past pain, leads to an overemphasis of bouts of pain in the recalled reports (Stone et al, 2004). Others have also observed the higher level of reporting with recall measures (Linton and Gotestam, 1983; vandenBrink et al, 2001). On correspondence between the two methods, the situation is less clear because although there is a substantial correlation between the pain reports from the two methods (about 50% of the variance is shared), there is also a substantial proportion of variance that is unique to each method. This general finding led earlier researchers to call it a “half-empty or halffull” situation, depending upon one’s perspective (Salovey et al, 1993). Whether or not the magnitude of the association seems acceptable, there is evidence that recalled reports can be distorted in undesirable ways. For example, we found that how much pain a respondent experienced at the time of reporting their retrospective weekly level influenced the magnitude of retrospective report (Broderick et al, 2006). We have also reported that the degree of variability in EMA pain reports over a week is associated with recall of pain over the same week (Stone et al, 2005). The degree and direction of differences between recalled and actual immediate experience, and how these are affected by study conditions, needs further empirical exploration.
5 Conducting EMA Studies Our purpose in this section of the chapter is to provide the reader with an overview of the many issues that confront the researcher endeavoring to collect self-reports from everyday life.
A.A. Stone and S.S. Shiffman
The presentation focuses on design considerations relating to ecological validity, but the reader is referred to many excellent comprehensive reviews (Affleck et al, 1999; Bolger et al, 2003; Delespaul, 1995; Shiffman et al, 2008; Stone et al, 2006). EMA is comprised of a variety of sampling designs that can be used singly or can be combined to meet the needs of investigators (Shiffman, 2006). A variety of schemes for scheduling assessments to ensure a representative sample of moments have been described (Delespaul, 1995; Shiffman, 2006). The most commonly used schedules sample participants’ experience through time-sampling; that is, they select a random sample of moments for assessment. The classic examples, from Experience Sampling Methodology (Csikszentmihalyi and Larsen, 1987; DeVries, 1987), are studies where participants are “beeped” at random times and prompted to complete an assessment of their momentary state. Random sampling of moments is seen as the key to representativeness, much as random sampling of individuals is seen as important for characterizing populations. As with sampling of individuals, any given sample of moments from a period of time will not yield a perfectly representative picture of a self-report variable; there will be an associated sampling error, just as there is when sampling people. Greater numbers of samples yield estimates with smaller sampling error. Random time-sampling is not the only assessment schedule used in the EMA literature. An alternative is to schedule assessments at particular times of day, for example, every 2 h after 10 am, as a way to capture the day’s experience. The limitations of this approach are discussed in Shiffman et al, (2008). Another alternative scheduling scheme is not based on time at all, but instead focuses on assessing particular events of interest. Thus, participants might be asked to complete an assessment every time they smoke a cigarette or engage in a social interaction. These event-based methods, which evolved from behavioral self-monitoring (Korotitsch and Nelson-Grey, 1999), are best suited to contexts
8
Ecological Validity for Patient Reported Outcomes
where the phenomenon of interest is a discrete event (e.g., an asthma attack) or can be construed into episodes (e.g., exacerbations of pain). A few examples can help characterize EMA methodology: In one study, patients with rheumatologic disorders rated their pain and mood up to 12 times a day when prompted at random times by a computer to complete an assessment (Stone et al, 2004). In another study, problem drinkers tracked each episode of drinking, recorded their level of intoxication and how they felt about their drinking (Muraven et al, 2005). A third study assessed the symptoms of people complaining of multiple chemical sensitivity several times per day, while simultaneously sampling the surrounding air for analysis of chemical exposures (Saito et al, 2005). In a study illustrating a combination of time-based and event-based sampling, Shiffman and Waters (2004) used time-sampled data to examine trends in affect in the days and hours preceding a focal event (smoking relapse). While the subject populations, assessments, and content focus differed, these EMA studies and others (Stone et al, 2006) share an approach involving multiple momentary assessments, collected near the time of experience, across a broad range of realworld settings the participants inhabit, and with attention to sampling of experience (e.g., random time-sampling). These are the core elements of EMA. In another parallel with sampling of research participants, EMA researchers have been concerned about the loss of observations from the planned sample and accordingly have emphasized the importance of compliance with scheduled assessments and inclusion of all relevant moments in the sampling frame as key to representativeness (Hufford and Shields, 2002; Stone et al, 2002; Shiffman et al, 2008). Just as attrition from a sample of participants threatens the representativeness of the sample, so noncompliance with assessment prompts threatens the representativeness of the sample of moments. A variety of EMA sampling schemes, paralleling the variety of sampling designs for individuals in populations, have been described and
107
used (stratified sampling, over-sampling, etc.) (Shiffman, 2006).
5.1 Implementation of EMA and Application of Technology Advances in technology have enabled the conduct of efficient and imaginative EMA studies. Early diary studies had no reliable way of scheduling assessments or prompting participants to complete them, so assessments were often linked to standard events in participants’ lives, such as meals or bedtime. However, these are hardly random moments in a person’s day. An innovation was introduced by the developers of the Experience Sampling Method, who provided participants with electronic pagers and arranged to “beep” them to prompt them to complete a diary card (Csikszentmihalyi and Larsen, 1987). By providing a means of signaling the subject, beepers gave the investigator control over the intended schedule of assessments, which were typically recorded on traditional paper diary cards. The use of electronic data capture for EMA has become increasingly common. Besides scheduling and issuing prompts, a palmtop can also collect and store the assessment data, while recording the exact time the assessment was completed. This is regarded as an important advantage, because of concerns about backfilling of data – that is, the completion of assessments after-the-fact, with falsification of the completion date and time, which negates the advantages of real-time data collection. There has been controversy about how often backfilling occurs, how it might be minimized, and what effects back-filling has on the resulting data (Green et al, 2006). Nevertheless, several studies, with diverse populations and methods have demonstrated that participants do back-fill paper diary entries, even when they are electronically prompted for completion, and sometimes even when they are
108
aware their entries are subject to verification (Hufford, 2007). This can be a serious concern, because participants who complete their diaries in retrospect reintroduce all of the problems of retrospective recall that the method was designed to avoid. Moreover, when participants choose when they complete the diaries, even if it’s not long after the scheduled time, they can introduce additional bias because participants’ choice of occasions can be biased (e.g., waiting until a symptom-free time to complete a diary or completing it when symptoms occur and serve as a reminder to do the diary). In essence, the sample of moments becomes like a convenience sample of volunteers, rather than like a random population sample. Accordingly, the ability of electronic datacollection methods to accurately record the time of diary completion is regarded by many investigators as an advantage over paper-and-pencil diaries. Another advantage of many electronic datacollection systems is that they allow flexibility in the administration of questions, for example, item presentation can be contingent upon responses to prior items (e.g., skip patterns), greatly enhancing efficiency and reducing subject burden. Moreover, such electronic systems can also modify the sampling schedule based on algorithms applied to subject input, for instance, increasing the density of assessment when an event of interest has occurred or scheduling a series of assessments to follow up on a trigger event. The most commonly used electronic devices for collecting self-report EMA data are palmtop computers and interactive voice response systems (IVRS). An advantage of palmtop computers is that they function independently and thus are not dependent on communication to a central center. They are also capable of presenting a variety of response options (Likert scales, Visual Analog Scale [VAS], Numeric Rating Scale, body diagrams) that are typically used in assessments. Since they present assessment content as text, the assessments resemble their paper ancestors, which probably accounts for the finding that such electronic assessments
A.A. Stone and S.S. Shiffman
are psychometrically equivalent to parallel paper forms (Gwaltney et al, 2008). In IVRS, assessment content is played to participants via recorded voice, and participants record their responses using the keypad (“press ‘1’ if you are suicidal. . .”). While IVRS is most often used as a passive system requiring participants to call in, it can also be used to call participants on a schedule enabling time-sampling designs. With the advent of cell phones, the phone system can be used to reach participants in a wide variety of settings. An advantage is that IVRS uses the telephone – a technology familiar to participants. A disadvantage is that aural presentation of assessment and response options can limit the assessment (e.g., memory capacity limits the number of response options) and might change how participants respond. As cell phones become more sophisticated, “smart phones” are increasingly able to function much like palmtop computers, displaying text-based assessments and sending assessment data to a central server. Desktop computer systems (web-based or otherwise), while not portable and thus not amenable to assessment in the full range of participants’ settings, can be used to administer end-of-day or periodic assessments. At the same time, these approaches are used to collect self-report data; a variety of specialized hardware can be used to assess participants’ objective physiological states in a momentary way (e.g., ambulatory blood pressure, blood glucose, pulmonary function) (Kamarck et al, 1998). Other devices can objectively capture subject behavior (e.g., instrumented pill bottles, motion-detectors, audio or video recordings (Byerly et al, 2005)) or environmental conditions (e.g., noise, temperature, presence of chemical pollutants (e.g., Saito et al, 2005)). Collection of such objective data is often enriched by collecting concomitant self-report data, allowing these objective assessments to be linked to subjective states. Thus, technology has enabled a new age for collection of real-world data in real time (Kamarck et al, 1998).
8
Ecological Validity for Patient Reported Outcomes
5.2 Concerns About EMA Nevertheless, there are issues that threaten the validity of these new methodologies. The frequency of EMA measurement and the fact that it takes place in participants’ natural environments have raised concerns about reactivity – that is, the possibility that the act of measurement itself affects the phenomenon being measured. Evidence to date suggests that reactivity is minimal. One study randomized patients being assessed for pain to be assessed 3, 6, or 12 times daily, and it found no systematic change in their pain ratings (Stone et al, 2004), consistent with findings from an earlier study (Cruise et al, 1996). Other studies have found no effect on monitoring of behaviors such as drinking or smoking (Hufford et al, 2002). Empirical investigations have, then, reduced concern about reactivity, but further study may turn up contexts in which reactivity is a problem. EMA studies can be demanding, often requiring participants to complete many assessments each day. This raises concerns about participants’ ability or willingness to comply. Yet, across studies with diverse protocols and populations, a high degree of compliance is often achieved (Hufford and Shields, 2002). Some EMA studies make particularly high demands on participants, but what is striking is the degree of compliance observed even when the study demands might seem unrealistic on first blush. In that study where pain patients were randomized to complete 3, 6, or 12 assessments per day, compliance was excellent (averaging 94%) and was unaffected by the frequency of assessment (Stone et al, 2004). Even protocols with more than 20 prompts per day have achieved high compliance rates (Kamarck et al, 2007) Further, Freedman and colleagues (2006) showed that even homeless, crack cocaine addicts were able to complete an EMA study with multiple daily assessments with reasonable compliance. Thus, with proper management, participants seem able to bear the burden of intensive EMA sampling. A related concern is whether the demands of EMA studies lead to bias in subject samples.
109
We are not aware of any formal data on this, but some participants may not be willing or able to engage in these demanding protocols. In our experience, the demands of a subject’s work are a common source of conflict; for example, neither surgeons nor waitresses can afford to be interrupted by unscheduled prompts. Such participant sampling bias should be evaluated and weighed in interpreting EMA data. Sometimes concerns are raised about whether older participants might have difficulty with technology such as palmtop computers. Analysis of compliance by age has demonstrated that older participants can be trained to operate the palmtops and actually demonstrate better compliance than younger participants. There are, though, issues that may limit participants’ participation. Deficits in eyesight (to see questions), hearing (to hear the phone or “beeps”), or manual dexterity (to manipulate a stylus or keypad) could certainly make some participants incapable of performing in an EMA study, though some of these deficits would also make traditional assessment difficult. More data on how EMA methods influence study participation and representativeness of subject samples would be useful.
6 Conclusion We have argued that ecological validity is a critical component of self-report assessment for retrospective and global methods, one that is necessary for the validity of many content domains. Brunswik (1949) was correct in his assessment of the “formidable” nature of implementing representative designs to achieve what we now call “ecological validity,” although he was not specifically referring to self-report data at that time. Recent developments in technology have made representative sampling of selfreports practical for most researchers, through the advent of sophisticated electronic diaries and interactive voice recording. There is no longer a need to personally shadow research participants as Brunswik did in order collect self-reports in
110
a representative manner to achieve ecological validity. It is our hope that knowledge of these developments will hasten the adoption of methods for collecting real-time real-world data from research participants and overcome at least some aspects of the task envisioned by Brunswik over 50 years ago.
References Affleck, G., Tennen, H., Keefe, F. J., Lefebvre, J. C., Kashikar-Zuck, S. et al (1999). Everyday life with osteoarthritis or rheumatoid arthritis: independent effects of disease and gender on daily pain, mood and coping. Pain, 83, 601–609. Blanton, H., and Jaccard, J. (2006). Arbitrary metric in psychology. Am Psychol, 61, 27–41. Bolger, N., Davis, A., and Rafaeli, E. (2003). Dairy methods: capturing life as it is lived. Ann Rev Psychol, 54, 579–616. Bradburn, N., Rips, L., and Shevell, S. (1987). Answering autobiographical questions: the impact of memory and inference on surveys. Science, 236, 151–167. Broderick, J., Schwartz, J., and Stone, A. (2006, 3–6 May). Context (pain and affect) influences recall pain ratings [Poster presented at the Annual Meeting of the American Pain Society]. San Antonio, TX. Broderick, J., Schwartz, J., Vikingstad, G., Pribbernow, M., Grossman, S., and Stone, A. (2008). The accuracy of pain and fatigue items across different reporting periods. Pain, 139, 146–157. Brunswik, E. (1944). Distal focussing of perception: size constancy in a representative sample of situations. Psychol Monogr, 56, 1–49. Brunswik, E. (1949). Systematic and Representative Design of Psychological Experiments. Berkeley and Los Angeles: University of California Press. Busey, T., Tunnicliff, J., Loftus, G., and Loftus, E. (2000). Accounts of the confidence-accuracy relation in recognition memory. Psychon Bull Rev, 7, 26–48. Byerly, M., Fisher, R., Whatley, K., Holland, R., Varghese, F. et al (2005). A comparison of electronic monitoring vs clinician rating of antipsychotic adherence in outpatients with schizophrenia. Psychiat Res, 133, 129–133. Clark, D., and Teasdale, J. (1982). Diurnal variation in clinical depression and accessibility of memories of positive and negative experiences. J Abnorm Psychol, 91, 87–95. Cruise, C., Porter, L., Broderick, J., Kaell, A., and Stone, A. (1996). Reactive effects of diary self-assessment in chronic pain patients. Pain, 67, 253–258. Csikszentmihalyi, M., and Larsen, R. E. (1987). Validity and reliability of the experience sampling method. J Nerv Med Dis, 175, 526–536.
A.A. Stone and S.S. Shiffman Delespaul, P. (1995). Assessing Schizophrenia in Daily Life -- The Experience Sampling Method. Maastricht: Maastricht University Press. DeVries, M. (1987). Investigating mental disorders in their natural settings: introduction to the special issue. J Nerv Men Dis, 175, 509–513. Eich, E., Reeves, J., Jaeger, B., and Graff-Radford, S. (1985). Memory for pain: relation between past and present pain intensity. Pain, 223, 375–379. Fredrickson, B. (2000). Extracting meaning from past affective experiences: the importance of peaks, ends, and specific emotions. Cogn Emot, 14, 577–606. Freedman, M., Lester, K., McNamara, C., Milby, J., and Schumacher, J. (2006). Cell phones for Ecological Momentary Assessment with cocaine-addicted homeless patients in treatment. J Subst Abuse Treat, 30, 105–111. Green, A., Rafaeli, E., Bolger, N., Shrout, P., and Reis, H. (2006). Paper or plastic? Data equivalence in paper and electronic diaries. Psychol Methods, 11, 87–105. Gwaltney, C., Shields, A., and Shiffman, S. (2008). Equivalence of electronic and paper-and-pencil administration of patient reported outcome measures. Val Health , 11, 322–333. Hammond, K., and Stewart, T. (2001). The Essential Brunswik: Beginnings, Explications, Applications. New York, NY: Oxford University Press. Hufford, M. (2007). Special methodological challenges and opportunities in Ecological Momentary Assessment. In A. Stone, S. Shiffman, A. Atienza, & L. Nebling (Eds.), The Science of Real-Time Data Capture: Self-Reports in Health Research (pp. 54– 75). New York, NY: Oxford University Press. Hufford, M., and Shields, A. (2002). Electronic diaries: an examination of applications and what works in the field. Appl Clin Trials, 11, 46–56. Hufford, M., Shields, A., Shiffman, S., Paty, J., and Balabanis, M. (2002). Reactivity to ecological momentary assessment: an example using undergraduate problem drinkers. Psychol Addict Behav, 16, 205–211. Kahneman, D., Diener, E., and Schwarz, N. (1999). WellBeing: The Foundations of Hedonic Psychology. New York: Russell Sage Foundation. Kamarck, T., Shiffman, S., Smithline, L., Goodie, J., Paty, J. et al (1998). The effects of task strain, social conflict, and emotional activation on ambulatory cardiovascular activity: daily life consequences of “recurring stress” in a multiethnic sample. Health Psychol, 17, 17–29. Kamarck, T., Shiffman, S., Muldoon, M., SuttonTyrell, K., Gwaltney, C. et al (2007). Ecological Momentary Assessment as a resource for social epidemiology. In A. Stone, S. Shiffman, A. Atienza, & L. Nebling (Eds.), The Science of Real-Time Data Capture: Self-Reports in Health Research (pp. 268–285). New York: Oxford University Press.
8
Ecological Validity for Patient Reported Outcomes
Kihlstrom, J., Eich, E., Sandbrand, D., and Tobias, B. (2000). Emotion and memory: implications for self-report. In A. Stone, J. Turkkan, C. Bachrach, J. Jobe, H. Kurtzman, & V. Cain (Eds.), The Science of Self-Report: Implication for Research and Practice (pp. 81–99). Mahwah, NJ: Erlbaum. Korotitsch, W., and Nelson-Grey, R. (1999). An overview of self-monitoring research assessment and treatment. Psychol Assess, 2, 415–425. Linton, S., and Gotestam, K. (1983). A clinical comparison of two pain scales: correlation, remembering chronic pain, and a measure of compliance. Pain, 17, 53–65. Linton, S., and Melin, L. (1982). The accuracy of remembering chronic pain. Pain, 13, 281–285. McFall, R. (2005). Theory and utility -- key themes in evidence-based assessment: comment on special section. Am Psychol, 17, 312–323. McFarland, C., Ross, M., and DeCourville, N. (1989). Women’s theories of menstruation and biases in recall of menstrual symptoms. J Pers Soc Psychol, 57, 522–531. Menon, G., and Yorkston, E. (2000). The use of memory and contextual cues in the formation of behavioral frequency judgements. In A. Stone, J. Turkkan, C. Bachrach, J. Jobe, H. Kurtzman, & V. Cain (Eds.), The Science of Self-Report: Implications for Research and Practice (pp. 63–79). Mahwah, NJ: Lawrence Erlbaum Associates. Muraven, M., Collins, R., Shiffman, S., and Paty, J. (2005). Daily fluctuations in self-control demands and alcohol intake. Psychol Addict Behav, 19, 140–147. Redelmeier, D., and Kahneman, D. (1996). Patients’ memories of pain medical treatments: real-time and retrospective evaluations of two minimally invasive procedures. Pain, 66, 3–8. Redelmeier, D., Katz, J., and Kahneman, D. (2003). Memories of colonoscopy: a randomized trial. Pain, 104, 187–194. Robinson, M., and Clore, G. (2002). Belief and feeling: evidence for an accessibility model of emotional selfreport. Psychol Bull, 128, 934–960. Rock, E., Scott, J., Kennedy, D., Sridhara, R., Pazdur, R., and Burke, L. (2007). Challenges to use of health-related quality of life for Food and Drug Administration Approval of anticancer products. J Natl Cancer Inst Monogr, 25, 27–30. Ross, M. (1989). Relation of implicit theories to the construction of personal histories. Psychol Rev, 96, 341–357. Saito, M., Kumano, H., Yoshiuchi, K., Kokubo, N., Ohashi, K., Yamamoto, Y. et al (2005). Symptom profile of multiple chemical sensitivity in actual life. Psychosom Med, 67, 318–325. Salovey, P., Sieber, W., Jobe, J., and Willis, G. (1993). The recall of physical pain. In N. Schwarz & S. Sudman (Eds.), Autobiographical Memory and the Validity of Retrospective Reports (pp. 89–106). New York: Springer-Verlag.
111 Schwarz, N. (1996). Cognition and Communication: Judgmental Biases, Research Methods, and the Logic of Conversation. Hillsdale, NJ: Erlbaum. Schwarz, N. (2007). Retrospective and concurrent selfreport: the rationale for real-time data capture. In A. Stone, S. Shiffman, A. Atienza, & L. Nebling (Eds.), The Science of Real-Time Data Capture: Self-Reports in Health Research (pp. 11–26). New York: Oxford University Press. Schwarz, N., and Oyserman, D. (2001). Asking questions about behavior: cognition, communication and questionnaire construction. Am J Eval, 22, 127–160. Schwarz, N., Wanke, M., and Bless, H. (1994). Subjective assessments and evaluations of change: some lessons learned from social cognitive research. Eeuro Rev Soc Psychol, 5, 181–210. Schwarz, N., and Strack, F. (1991). Evaluating one’s life: a judgment model of subjective well-being. In F. Strack, M. Argyle, & N. Schwarz (Eds.), Subjective Well-Being: An Interdisciplinary Approach (pp. 27– 47). Oxford: Pergamon Press. Shiffman, S. (2006). Designing protocols for Ecological Momentary Assessment. In A. Stone, S. Shiffman, A. Atienza, & L. Nebling (Eds.), The Science of RealTime Data Capture: Self-Reports in Health Research. New York: Oxford University Press. Shiffman, S. (1993). Assessing smoking patterns and motives. J Consult Clin Psychol, 61, 732–742. Shiffman, S., Balabanis, M., Gwaltney, C., Paty, J., Gnys, M. et al (2007). Prediction of lapse from associations between smoking and situational antecedents assessed by ecological momentary assessment. Drug Alc Depend, 91, 159–168. Shiffman, S., Hufford, M., Hickcox, M., Paty, J. A., Gnys, M., and Kassel, J. D. (1997). Remember that? A comparison of real-time vs. retrospective recall of smoking lapses. J Consult Clin Psychol, 65, 292–300. Shiffman, S., Hufford, M., and Stone, A. (2008). Ecological momentary assessment in clinical psychology. Annu Rev Clin Psychol, 4, 1–32. Shiffman, S., and Waters, A. (2004). Negative affect and smoking lapses: a prospective analysis. J Consult Clin Psychol, 72, 1192–201. Smith, W., and Safer, M. (1993). Effects of present pain level on recall of chronic pain and medication use. Pain, 55, 355–361. Stone, A., Schwartz, J., Broderick, J., and Shiffman, S. (2005). Variability of momentary pain predicts recall of weekly pain: a consequence of the peak (or salience) memory heuristic. Person Soc Psychol Bull 31, 1340–1346. Stone, A., Schwartz, J., Neale, J., Shiffman, S., Marco, C., Hickcox, M. et al (1998). How accurate are current coping assessments? A comparison of momentary versus end-of-day reports of coping efforts. J Person Soc Psychol, 74, 1670–1680.
112 Stone, A., Shiffman, S., Atienza, A., and Nebling, L. (2007). The Science of Real-Time Data Capture: Self-Reports in Health Research. New York: Oxford University. Stone, A., Shiffman, S., Schwartz, J., Broderick, J., and Hufford, M. (2002). Patient non-compliance with paper diaries. Br Med J, 324, 1193–1194. Stone, A., Turkkan, J., Jobe, J., Bachrach, C., Kurtzman, H., and Cain, V. (2000). The science of self report. Mahwah, NJ: Erlbaum. Stone, A., Broderick, J., Shiffman, S., and Schwartz, J. (2004). Understanding recall of weekly pain from a momentary assessment perspective: absolute accuracy, between- and within-person consistency, and judged change in weekly pain. Pain, 107, 61–69.
A.A. Stone and S.S. Shiffman Stone, A. A., and Shiffman, S. (1994). Ecological Momentary Assessment (EMA) in behavioral medicine. Annals of Behavioral Medicine, 16, 199–202. Thompson, C., Skowronski, J., Larsen, S., and Betz, A. (1996). Autobiographical Memory: Remembering What and Remembering When. Mahwah, NJ: Erlbaum. vandenBrink, M., Bandell-Hoekstra, F., and Abu-Saad, H. (2001). The occurrence of recall bias in pediatric headache: a comparison of questionnaire and diary data. Headache, 41, 11–20. Wells, G., and Bradfield, A. (1998). “Good, you identified the suspect”: feedback to eyewitnesses distorts their reports of the witnessing experience. J Apply Psychol, 83, 360–376.
Chapter 9
Item Response Theory and Its Application to Measurement in Behavioral Medicine Mee-Ae Kim-O and Susan E. Embretson
1 Introduction Item response theory (IRT) has become a mainstream approach for developing psychological measurement and standardized educational tests development in the 21st century. IRT is currently the mainstream method for measuring cognitive abilities and achievement. For the measurement in behavioral medicine, IRT models can be applied to personality traits (Reise and Waller, 1990), attitude measurements and behavioral ratings (Engelhard and Wilson, 1996), clinical testing issues (Santor et al, 1994), as well as to measures of psychopathology, moods, behavioral dispositions, and situational evaluations. Applications of IRT models and associated methods can solve many practical problems in behavioral medicine. In the USA, the PatientReported Outcomes Measurement Information System (PROMIS) has been funded by the National Institute of Health to provide publically available computerized tests for many patient-reported outcomes of disease, such as depression, fatigue, and pain. IRT has been a major method for scaling and equating these tests because it solves many practical problems. For example, subsets of standardized self-report measures are often administered to reduce testing time in many clinical studies. IRT can be
M.-A. Kim-O () School of Psychology, Georgia Institute of Technology, 654 Cherry St, Atlanta, GA 30332-0170, USA e-mail: [email protected]
applied to equate item subsets to the original test. Further, IRT is the primary basis for adaptive testing to permit more reliable measurement of all levels of performance. This feature is particularly important in measuring change over time and treatment. Simulation studies have shown that treatment effects may not be adequately estimated if the test does not provide reliable measurement at the initial and later stages (Embretson, 1996). This chapter will provide the overview of IRT and its models, as well as an example to illustrate applications. In Sections 2.1 and 2.2, the limitations of classical test theory will be reviewed and the contribution of IRT to overcome the limitations will be described. In Sections 3.1, 3.2, and 3.3, some fundamental IRT models will be reviewed in two categories (binary IRT models and polytomous IRT models). Then, in Sections 4.1 and 4.2, an application of IRT to questionnaires in behavioral medicine will be described.
2 Item Response Theory Versus Classical Test Theory 2.1 Limitations of Classical Test Theory Psychometric theory can be divided into two categories: classical test theory (CTT) and item response theory (IRT). CTT was pioneered by Spearman (1904, 1907, 1913) and it has defined the standard for test development since the 1930s
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_9, © Springer Science+Business Media, LLC 2010
113
114
M.-A. Kim-O and S.E. Embretson
(see Embretson and Reise, 2000). Allen and Yen (1979) characterize CTT as a simple model that describes how errors of measurement can influence observed scores. The CTT model can be expressed as Xip = Tip + Eip ,
level rather than at the test level, (d) a model that does not require the strictly parallel tests assumption, and (e) a model that provides a measure of precision for an individual’s ability level.
(9.1)
where Xip is an observed score for test i and person p, Tip is a true score of test i and person p, and Eip is an error score of test i and person p. In the CTT model, estimates of examinees’ true test scores are typically linear transformations of the raw test score, which are related to relevant normative populations by the transformation. Alternative test forms can be used to estimate true scores if the forms are parallel tests with the same expected true scores and error distributions. Psychometric indices for items in CTT are related to the properties of the test scores, particularly reliability and variance. That is, item difficulty is the proportion of persons passing or endorsing an item while item discrimination is the correlation of the item with the total test score. However, CTT has three obvious limitations. First, an examinee’s true score depends on the difficulty level of a test (test dependent). Scores will not be comparable between easy and hard tests. Second, the item characteristics depend on the ability of examinees (sample dependent). Item difficulty, for example, will vary substantially if the true score distributions vary between populations. Third, the parallel test assumption that two true test scores and two error variances are identical in the two tests is never fully met in practice. Therefore, it is difficult to compare examinees who take different tests and to contrast items whose characteristic indices are computed using different groups of examinees. Because of these fundamental limitations of CTT, an alternative theory and model of mental measurement are desirable. Hambleton et al (1991) asserted that the desirable features of an alternative test theory would include (a) item characteristics that are not sample dependent, (b) examinees’ true scores that are not test dependent, (c) a model that is expressed at the item
2.2 Item Response Theory as Ideal Model During the 1950s and 1960s, a revolutionary test theory, now known as IRT, was developed (Birnbaum, 1968; Lord, 1952; Lord and Novick, 1968; Rasch, 1960). IRT had the desirable features of an alternative test theory that were described above. That is, unlike CTT, the examinee’s true score is not test dependent, the item parameters are not sample dependent, and the parallel test assumption is not necessary in IRT. In other words, if a given IRT model fits the test data of interest, ability estimates obtained from different sets of items will be comparable. Furthermore, item parameter estimates are also comparable regardless of the groups of examinees (Hambleton et al, 1991). IRT also includes indices to discern the strength and weakness of each item in a test. In contrast, the CTT analyses are focused on the scale at the test level. For example, in IRT, we can distinguish good and bad items in terms of how accurately an item can measure examinees’ at the different trait levels (i.e., item information). Also, IRT has provided solutions for many practical testing problems such as equating different test forms and examining measurement bias (Embretson and Reise, 2000). There are two basic assumptions of IRT models about the data to which the models are applied: appropriate dimensionality and local independence. The first assumption means that the number of latent traits measured by the items corresponds to the number of trait parameters in the IRT model. For example, if test items depend on two or more latent traits, then IRT models with a single person trait parameter will not be appropriate. Factor analysis, among other methods, can be used to
9
IRT and Its Application to Measurement in Behavioral Medicine
test the assumption. Models which assume the measurement of more than one trait for examinees’ test scores are referred to as multidimensional models (Hambleton et al, 1991). Several multidimensional IRT (MIRT) models allow for more than one trait (θ) to be estimated, even though the most widely applied IRT models assume a unidimensional construct for which one θ estimate is sufficient to explain item responses (Reckase, 1997). The unidimensionality assumption is closely related to the assumption of local independence. The local independence assumption means that when the abilities to influence test scores are controlled, examinees’ responses to any of the items are statistically independent. Alternatively, within a given trait level, the probability of getting one item correct is independent of the probability of getting other items correct.
3 IRT Models IRT models can be classified into two basic categories depending on how the items to analyze are scored: binary models and polytomous models. The binary IRT models are used for analyzing the items with dichotomously scored responses (e.g., yes/no or right/wrong), whereas the polytomous models can treat multiple category formats such as rating scales. IRT models were originally developed to handle binary response data. The polytomous IRT models were introduced later as generalized forms of the binary IRT models (e.g., Samejima, 1969). The polytomous IRT models can be divided into two categories depending on whether the test items have ordered response categories (e.g., Likert scale) or unordered response categories (e.g., unordered multiple choices). The models in each category will be described in detail.
3.1 Binary IRT Models Binary response data may include ability tests (Right or Wrong), personality self-reports (True
115
or Not True), attitude endorsements (Agree or Disagree), and behavioral rating scales (Yes or No). Two separate lines of development lead to the currently available IRT models. Rasch (1960) developed a family of IRT models that fully met the properties of specific objectivity; that is, according to Rasch (1960) specific objectivity is met when item invariant person scores across items and person invariant item indices can be obtained. These models assume that the items are equally discriminating for the latent trait. In contrast, in the United States, families of models were developed that included item discrimination and other parameters. The normal ogive models (Lord, 1952) utilize the cumulative normal curve to model the relationship of item response probabilities to the latent trait. Birnbaum (1968) developed logistic models with multiple item parameters because they are mathematically and computationally simpler than the normal ogive models. There are three logistic models which are widely applied: the one-parameter logistic (1PL) model, the twoparameter logistic (2PL) model, and the three parameter logistic (3PL) model. The 1PL model may be written as follows:
P(Xis = 1|θs , βi ) =
exp α(θs − βi ) , (9.2) 1 + exp α(θs − βi )
where Xis = response of person s to item i (0, 1), θs = trait level for person s, α = a constant for item discrimination, and βi = difficulty of item i. The 1PL model is identical to the Rasch model if the value of the constant item discrimination is fixed to 1. The 2PL model adds item discrimination parameter to the Rasch model as follows:
P(Xis = 1|θs , βi , αi ) =
exp(αi (θs − βi )) , 1 + exp(αi (θs − βi )) (9.3)
where Xis , θs , and βi are defined as above and αi = discrimination for item i.
116
M.-A. Kim-O and S.E. Embretson
The 3PL model adds a lower-asymptote parameter to accommodate guessing possibility as follows: P(Xis = 1|θs , βi , αi , γi ) = γi exp(αi (θs − βi )) + (1 − γi ) , 1 + exp(αi (θs − βi ))
(9.4)
where Xis , θs , βi , and αi are defined as above, and γi = lower asymptote (guessing) for item i. The 1PL model is a special case of the 2PL model where all items have the a-parameter of a constant value, and the 2PL model is a special case of the 3PL model when the lower asymptote is 0. Which model is the best for the test development in behavioral medicine? The choice of a particular model depends on several considerations as follows: (a) the weights of items for scoring (equal vs. unequal), (b) the desired scale properties for the measure, (c) fit to the data, and (d) the purpose for estimating the parameters. If items are to be equally weighted and the strongest justification for scale properties is desired, then the 1PL or Rasch model is favored. For many tests, better fit is often obtained with the 2PL or 3PL models. However, a disadvantage of models with item discrimination parameters is that persons with the same total score may have different estimates of the latent trait, depending on which items they answer in the keyed direction.
3.2 Polytomous IRT Models Several polytomous IRT models have been developed and they can be divided into two types: the indirect (or difference) models and the direct (or divided-by-total) models. The indirect (difference) models include the graded response model (GRM; Samejima, 1969) and the modified graded response model (M-GRM; Muraki, 1990). The direct (divided-by-total) models include the partial credit model (PCM; Masters, 1982), the generalized partial credit model (G-PCM; Muraki, 1992), the rating scale
model (RSM; Andrich, 1978a,b), and the nominal response model (NRM; Bock, 1972). The GRM, M-GRM, PCM, and G-PCM are appropriate for responses with ordered multiple categories. Especially, the GRM and G-PCM models are appropriate for analyzing attitude or personality scale responses where subjects rate their beliefs or respond to statements on a multi-point scale (Embretson and Reise, 2000). The mathematical models and application of the GRM, G-PCM, and RSM will be introduced below. The NRM can be used when the response categories are not necessarily ordered along the persons’ trait continuum. The GRM was designed for a Likert-style survey questionnaire that was scored using more than two ordered categories such as “Strongly disagree (1),” “Disagree (2),” “Neutral (3),” “Agree (4),” and “Strongly agree (5).” The GRM is just an extension of the 2PL model for dichotomous data to polytomous data. Given the individual’s trait level of θ, the probability that an individual responds to category x or higher is given as follows: P∗ix (θ ) =
exp[αi (θ − βij )] , 1 + exp[αi (θ − βij )]
(9.5)
where x is the response category j = 0, 1,. . ., M score, α i is common item slope parameter, and βij is a category threshold parameter. The number of threshold parameters, β ij , for each item is the number of response categories minus 1. Thus, the item operating characteristic curve (OCC) of the GRM includes M−1 separate curves, which represent the probabilities of responding in the lower category versus the successively higher categories. The category response curve (CRC) of the GRM, the probability of responding to a specific category, can be expressed as Pix (θ) = P∗ix (θ ) − P∗i, x+1 (θ),
(9.6)
where P∗i, x+1 (θ)is a cumulative probability of selecting a category score of x+1 or higher on item i given θ . Assuming five categories, the
9
IRT and Its Application to Measurement in Behavioral Medicine
probability of responding in each of the five categories is as follows: Pi0 (θ) = 1.0 − P∗i1 (θ) Pi1 (θ) = P∗i1 (θ) − P∗i2 (θ) Pi2 (θ) = P∗i2 (θ) − P∗i3 (θ) Pi3 (θ) = P∗i3 (θ) − P∗i4 (θ) Pi4 (θ) = P∗i4 (θ) − 0.
Pix (θ ) =
mi
r
r=0
j=0
exp αi [θ − (λi + δj )] j=0 , Px (θ) = mi x exp αi [θ − (λi + δj )] x
j=0
o
(9.8)
where (θ − δij ) ≡ 0 (when j is 0), α i is the j=0
x
j=0
r=0
The PCM is an extension of the Rasch model using only item location parameters (b). The G-PCM is the extension of the PCM by substituting the 2PL model for the Rasch model (Dodd et al, 1995). Among several polytomous IRT models, G-PCM is widely used for ordered response categories. The G-PCM is expressed as follows: exp[ αi (θ − δij )]
117
item slope, λi is the item location, and δj is the category intersections. The items vary in the item location and item discrimination in the RSM, but the relative distances of thresholds are uniform across items. Thus, the model often provides adequate fit to rating scale data with the same numerical format.
3.3 Evaluating Item Quality ,
(9.7)
[exp αi (θ − δij )]
where Pix (θ ) is the probability of selecting a category score of x on item i given an individual’s o
trait level of θ, (θ − δij ) ≡ 0 (when j is 0), j j=0
is the category score for item i(j − 0, 1, . . . mi ) is the discrimination parameter of item i and, δij is the step difficulty parameter of item i with a category score of j. Of the parameters above, the item discrimination parameter, α i (also called slope), indicates how well an item uncovers the examinees’ ability or trait. Item information, described earlier, depends partially on the item’s discriminating power (Hambleton et al, 1991). Therefore, those two indices, item discrimination parameter and item information, play an important role to provide a basis for distinguishing good and bad items in IRT. The PCM can be regarded as a special case of the G-PCM in which item discriminations have a common value (usually constrained to 1.0). The generalized rating scale model, G-RSM, can be expressed as
Several indices are available to evaluate item quality; the item discrimination index, item information, and item fit. The item discrimination index is proportional to the slope of the ICC at the point β i on the ability scale. Items with steeper slopes are more useful for separating examinees into different ability levels than are items with the less steep slopes. An item information value or function, Ii (θ), is a powerful method for describing and selecting items by displaying the accuracy of each item in measuring examinees’ abilities or traits. The reciprocal of the sum of the Ii (θ) index is the standard error of measurement at each θ level. Thus, Ii (θ ) index determines the contribution of an item to reducing measurement error and if it is not sufficiently high, the item can be deleted from the test. Finally, item fit indices are available. One widely used index is a goodness of fit test that compares the prediction of item probabilities from the model to the proportion of people responding to the item. All three indices are available in many computer programs, such as PARSCALE (Muraki and Bock, 1997).
118
M.-A. Kim-O and S.E. Embretson
4 Applying IRT to Questionnaires in Behavioral Medicine A sample of 220 patients who were being treated for rheumatoid arthritis were surveyed with the Center for Epidemiologic Studies Depression Scale (CES-D). The mean age was 54.54 (SD =8.795) with a range between 38 and 70. Of the 220 subjects, 182 were females (82.7%) and 38 were males (17.3%). They were also composed of 187 (85%) Caucasians, 21 (9.5%) African Americans, 9 (4.1%) Hispanics, and 3 (1.4%) native Americans or Alaskan Natives.
4.1 Questionnaire and Analysis with Polytomous IRT CES-D is a 20-item questionnaire which asks how subjects have felt and behaved during the last week for determining their depression level. A rating scale format with four categories is used
in the scales such as 0= rarely (G polymorphism and smoking initiation, persistent smoking or smoking cessation. Pharmacogenetics, 12, 265–268. David, S. P., Munafo, M. R., Murphy, M. F., Proctor, M., Walton, R. T. et al (2008). Genetic variation in the dopamine D4 receptor (DRD4) gene and smoking cessation: follow-up of a randomised clinical trial of transdermal nicotine patch. Pharmacogenomics J, 8, 122–128.
493 David, S. P., Munafo, M. R., Murphy, M. F., Walton, R. T., and Johnstone, E. C. (2007b). The serotonin transporter 5-HTTLPR polymorphism and treatment response to nicotine patch: follow-up of a randomized controlled trial. Nicotine Tob Res, 9, 225–231. David, S. P., Niaura, R., Papandonatos, G. D., Shadel, W. G., Burkholder, G. J. et al (2003). Does the DRD2Taq1 A polymorphism influence treatment response to bupropion hydrochloride for reduction of the nicotine withdrawal syndrome? Nicotine Tob Res, 5, 935–942. David, S. P., Strong, D. R., Munafo, M. R., Brown, R. A., Lloyd-Richardson, E. E. et al (2007c). Bupropion efficacy for smoking cessation is influenced by the DRD2 Taq1A polymorphism: analysis of pooled data from two clinical trials. Nicotine Tob Res, 9, 1251–1257. Duga, S., Solda, G., Asselta, R., Bonati, M. T., Dalpra, L. et al (2001). Characterization of the genomic structure of the human neuronal nicotinic acetylcholine receptor CHRNA5/A3/B4 gene cluster and identification of novel intragenic polymorphisms. J Hum Genet, 46, 640–648. Ehringer, M. A., Clegg, H. V., Collins, A. C., Corley, R. P., Crowley, T., et al (2007). Association of the neuronal nicotinic receptor beta2 subunit gene (CHRNB2) with subjective responses to alcohol and nicotine. Am J Med Genet B Neuropsychiatr Genet, 144B, 596–604. Gelernter, J., Yu, Y., Weiss, R., Brady, K., Panhuysen, C. et al (2006). Haplotype spanning TTC12 and ANKK1, flanked by the DRD2 and NCAM1 loci, is strongly associated to nicotine dependence in two distinct American populations. Hum Mol Genet, 15, 3498–3507. Gerra, G., Garofano, L., Zaimovic, A., Moi, G., Branchi, B. et al (2005). Association of the serotonin transporter promoter polymorphism with smoking behavior among adolescents. Am J Med Genet B Neuropsychiatr Genet, 135B, 73–78. Grenhoff, J., Aston-Jones, G., and Svensson, T. H. (1986). Nicotinic effects on the firing pattern of midbrain dopamine neurons. Acta Physiol Scand, 128, 351–358. Gu, D. F., Hinks, L. J., Morton, N. E., and Day, I. N. (2000). The use of long PCR to confirm three common alleles at the CYP2A6 locus and the relationship between genotype and smoking habit. Ann Hum Genet, 64, 383–390. Guindon, G. E. (2006). The Cost Attributable to Tobacco Use: A Critical Review of the Literature. Geneva: World Health Organization. Haberstick, B. C., Timberlake, D., Ehringer, M. A., Lessem, J. M., Hopfer, C. J. et al (2007). Genes, time to first cigarette and nicotine dependence in a general population sample of young adults. Addiction, 102, 655–665. Hamilton, A. S., Lessov-Schlaggar, C. N., Cockburn, M. G., Unger, J. B., Cozen, W. et al (2006). Gender differences in determinants of smoking initiation and
494 persistence in California twins. Cancer Epidemiol Biomarkers Prev, 15, 1189–1197. Hardie, T. L., Moss, H. B., and Lynch, K. G. (2006). Genetic correlations between smoking initiation and smoking behaviors in a twin sample. Addict Behav, 31, 2030–2037. Heitjan, D. F., Guo, M., Ray, R., Wileyto, E. P., Epstein, L. H. et al (2007). Identification of pharmacogenetic markers in smoking cessation therapy. Am J Med Genet B Neuropsychiatr Genet, 147B, 712–719. Hoft, N. R., Corley, R. P., McQueen, M. B., Schlaepfer, I. R., Huizinga, D. et al (2009). Genetic association of the CHRNA6 and CHRNB3 genes with tobacco dependence in a nationally representative sample. Neuropsychopharmacology, 34, 698–706. Hopfer, C. J., Crowley, T. J., and Hewitt, J. K. (2003). Review of twin and adoption studies of adolescent substance use. J Am Acad Child Adolesc Psychiatry, 42, 710–719. Hu, S., Brody, C. L., Fisher, C., Gunzerath, L., Nelson, M. L. et al (2000). Interaction between the serotonin transporter gene and neuroticism in cigarette smoking behavior. Mol Psychiatry, 5, 181–188. Huang, S., Cook, D. G., Hinks, L. J., Chen, X. H., Ye, S. et al (2005). CYP2A6, MAOA, DBH, DRD4, and 5HT2A genotypes, smoking behaviour and cotinine levels in 1518 UK adolescents. Pharmacogenet Genomics, 15, 839–850. Huang, W., Ma, J. Z., Payne, T. J., Beuten, J., Dupont, R. T. et al (2008a). Significant association of DRD1 with nicotine dependence. Hum Genet, 123, 133–140. Huang, W., Payne, T. J., Ma, J. Z., Beuten, J., Dupont, R. T. et al (2009). Significant association of ANKK1 and detection of a functional polymorphism with nicotine dependence in an African-American sample. Neuropsychopharmacology, 34, 319–330. Huang, W., Payne, T. J., Ma, J. Z., and Li, M. D. (2008b). A functional polymorphism, rs6280, in DRD3 is significantly associated with nicotine dependence in European-American smokers. Am J Med Genet B Neuropsychiatr Genet, 147B, 1109–1115. Hutchison, K. E., Allen, D. L., Filbey, F. M., Jepson, C., Lerman, C. et al (2007). CHRNA4 and tobacco dependence: from gene regulation to treatment outcome. Arch Gen Psychiatry, 64, 1078–1086. Hutchison, K. E., LaChance, H., Niaura, R., Bryan, A., and Smolen, A. (2002). The DRD4 VNTR polymorphism influences reactivity to smoking cues. J Abnorm Psychol, 111, 134–143. Ishikawa, H., Ohtsuki, T., Ishiguro, H., YamakawaKobayashi, K., Endo, K. et al (1999). Association between serotonin transporter gene polymorphism and smoking among Japanese males. Cancer Epidemiol Biomarkers Prev, 8, 831–833. Jacobsen, L. K., Pugh, K. R., Mencl, W. E., and Gelernter, J. (2006). C957T polymorphism of the dopamine D2 receptor gene modulates the effect of nicotine
R. Ray et al. on working memory performance and cortical processing efficiency. Psychopharmacology (Berl), 188, 530–540. Johnstone, E. C., Elliot, K. M., David, S. P., Murphy, M. F., Walton, R. T. et al (2007). Association of COMT Val108/158Met genotype with smoking cessation in a nicotine replacement therapy randomized trial. Cancer Epidemiol Biomarkers Prev, 16, 1065–1069. Johnstone, E. C., Yudkin, P., Griffiths, S. E., Fuller, A., Murphy, M. et al (2004a). The dopamine D2 receptor C32806T polymorphism (DRD2 Taq1A RFLP) exhibits no association with smoking behaviour in a healthy UK population. Addict Biol, 9, 221–226. Johnstone, E. C., Yudkin, P. L., Hey, K., Roberts, S. J., Welch, S. J. et al (2004b). Genetic variation in dopaminergic pathways and short-term effectiveness of the nicotine patch. Pharmacogenetics, 14, 83–90. Jones, I. W., and Wonnacott, S. (2004). Precise localization of alpha7 nicotinic acetylcholine receptors on glutamatergic axon terminals in the rat ventral tegmental area. J Neurosci, 24, 11244–11252. Kalivas, P. W. (1993). Neurotransmitter regulation of dopamine neurons in the ventral tegmental area. Brain Res Brain Res Rev, 18, 75–113. Karp, I., O’Loughlin, J., Hanley, J., Tyndale, R. F., and Paradis, G. (2006). Risk factors for tobacco dependence in adolescent smokers. Tob Control, 15, 199–204. Klink, R., de Kerchove d’Exaerde, A., Zoli, M., and Changeux, J. P. (2001). Molecular and physiological diversity of nicotinic acetylcholine receptors in the midbrain dopaminergic nuclei. J Neurosci, 21, 1452–1463. Kremer, I., Bachner-Melman, R., Reshef, A., Broude, L., Nemanov, L. et al (2005). Association of the serotonin transporter gene with smoking behavior. Am J Psychiatry, 162, 924–930. Kubota, T., Nakajima-Taniguchi, C., Fukuda, T., Funamoto, M., Maeda, M. et al (2006). CYP2A6 polymorphisms are associated with nicotine dependence and influence withdrawal symptoms in smoking cessation. Pharmacogenomics J, 6, 115–119. Lang, U. E., Sander, T., Lohoff, F. W., Hellweg, R., Bajbouj, M. et al (2007). Association of the met66 allele of brain-derived neurotrophic factor (BDNF) with smoking. Psychopharmacology (Berl), 190, 433–439. Laucht, M., Becker, K., El-Faddagh, M., Hohm, E., and Schmidt, M. H. (2005). Association of the DRD4 exon III polymorphism with smoking in fifteen-yearolds: a mediating role for novelty seeking? J Am Acad Child Adolesc Psychiatry, 44, 477–484. Laucht, M., Becker, K., Frank, J., Schmidt, M. H., Esser, G. et al (2008). Genetic variation in dopamine
32
Nicotine Dependence and Pharmacogenetics
pathways differentially associated with smoking progression in adolescence. J Am Acad Child Adolesc Psychiatry, 47, 673–681. Lee, A. M., Jepson, C., Hoffmann, E., Epstein, L., Hawk, L. W. et al (2007a). CYP2B6 genotype alters abstinence rates in a bupropion smoking cessation trial. Biol Psychiatry, 62, 635–641. Lee, A. M., Jepson, C., Shields, P. G., Benowitz, N., Lerman, C. et al (2007b). CYP2B6 genotype does not alter nicotine metabolism, plasma levels, or abstinence with nicotine replacement therapy. Cancer Epidemiol Biomarkers Prev, 16, 1312–1314. Lerman, C., Caporaso, N., Main, D., Audrain, J., Boyd, N. R. et al (1998). Depression and selfmedication with nicotine: the modifying influence of the dopamine D4 receptor gene. Health Psychol, 17, 56–62. Lerman, C., Caporaso, N. E., Audrain, J., Main, D., Bowman, E. D. et al (1999). Evidence suggesting the role of specific genetic factors in cigarette smoking. Health Psychol, 18, 14–20. Lerman, C., Caporaso, N. E., Audrain, J., Main, D., Boyd, N. R. et al (2000). Interacting effects of the serotonin transporter gene and neuroticism in smoking practices and nicotine dependence. Mol Psychiatry, 5, 189–192. Lerman, C., Jepson, C., Wileyto, E. P., Epstein, L. H., Rukstalis, M. et al (2006a). Role of functional genetic variation in the dopamine D2 receptor (DRD2) in response to bupropion and nicotine replacement therapy for tobacco dependence: results of two randomized clinical trials. Neuropsychopharmacology, 31, 231–242. Lerman, C., LeSage, M. G., Perkins, K. A., O’Malley, S. S., Siegel, S. J. et al (2007). Translational research in medication development for nicotine dependence. Nat Rev Drug Discov, 6, 746–762. Lerman, C., Shields, P. G., Wileyto, E. P., Audrain, J., Hawk, L. H., Jr. et al (2003). Effects of dopamine transporter and receptor polymorphisms on smoking cessation in a bupropion clinical trial. Health Psychol, 22, 541–548. Lerman, C., Shields, P. G., Wileyto, E. P., Audrain, J., Pinto, A. et al (2002). Pharmacogenetic investigation of smoking cessation treatment. Pharmacogenetics, 12, 627–634. Lerman, C., Tyndale, R., Patterson, F., Wileyto, E. P., Shields, P. G. et al (2006b). Nicotine metabolite ratio predicts efficacy of transdermal nicotine for smoking cessation. Clin Pharmacol Ther, 79, 600–608. Lerman, C., Wileyto, E. P., Patterson, F., Rukstalis, M., Audrain-McGovern, J. et al (2004). The functional mu opioid receptor (OPRM1) Asn40Asp variant predicts short-term response to nicotine replacement therapy in a clinical trial. Pharmacogenomics J, 4, 184–192. Levine, R., and Kendler, M. (2004). Millions Saved: Proven Success in Global Health. Washington, DC: Center for Global Development.
495 Li, M. D. (2003). The genetics of smoking related behavior: a brief review. Am J Med Sci, 326, 168–173. Li, M. D. (2008). Identifying susceptibility loci for nicotine dependence: 2008 update based on recent genome-wide linkage analyses. Hum Genet, 123, 119–131. Li, M. D., Beuten, J., Ma, J. Z., Payne, T. J., Lou, X. Y. et al (2005). Ethnic- and gender-specific association of the nicotinic acetylcholine receptor alpha4 subunit gene (CHRNA4) with nicotine dependence. Hum Mol Genet, 14, 1211–1219. Li, M. D., Cheng, R., Ma, J. Z., and Swan GE. (2003). A meta-analysis of estimated genetic and environmental effects on smoking behavior in male and female adult twins. Addiction, 98, 23–31. Li, M. D., Sun, D., Lou, X. Y., Beuten, J., Payne, T. J. et al (2007). Linkage and association studies in African- and Caucasian-American populations demonstrate that SHC3 is a novel susceptibility locus for nicotine dependence. Mol Psychiatry, 12, 462–473. Lou, X. Y., Ma, J. Z., Payne, T. J., Beuten, J., Crew, K. M. et al (2006). Gene-based analysis suggests association of the nicotinic acetylcholine receptor beta1 subunit (CHRNB1) and M1 muscarinic acetylcholine receptor (CHRM1) with vulnerability for nicotine dependence. Hum Genet, 120, 381–389. Lou, X. Y., Ma, J. Z., Sun, D., Payne, T. J., and Li, M. D. (2007). Fine mapping of a linkage region on chromosome 17p13 reveals that GABARAP and DLG4 are associated with vulnerability to nicotine dependence in European-Americans. Hum Mol Genet, 16, 142–153. Loughead, J., Wileyto, E. P., Valdez, J. N., Sanborn, P., Tang, K. et al (2009). Effect of abstinence challenge on brain function and cognition in smokers differs by COMT genotype. Mol Psychiatry, 14, 820–826. Ma, J. Z., Beuten, J., Payne, T. J., Dupont, R. T., Elston, R. C. et al (2005). Haplotype analysis indicates an association between the DOPA decarboxylase (DDC) gene and nicotine dependence. Hum Mol Genet, 14, 1691–1698. Malaiyandi, V., Lerman, C., Benowitz, N. L., Jepson, C. Patterson, F. et al (2006). Impact of CYP2A6 genotype on pretreatment smoking behaviour and nicotine levels from and usage of nicotine replacement therapy. Mol Psychiatry, 11, 400–409. Malaiyandi, V., Sellers, E. M., and Tyndale, R. F. (2005). Implications of CYP2A6 genetic variation for smoking behaviors and nicotine dependence. Clin Pharmacol Ther, 77, 145–158. Mansvelder, H. D., De Rover, M., McGehee, D. S., and Brussaard, A. B. (2003). Cholinergic modulation of dopaminergic reward areas: upstream and downstream targets of nicotine addiction. Eur J Pharmacol, 480, 117–123. Mansvelder, H. D., Keath, J. R., and McGehee, D. S. (2002). Synaptic mechanisms underlie
496 nicotine-induced excitability of brain reward areas. Neuron, 33, 905–919. McClernon, F. J., Hutchison, K. E., Rose, J. E., and Kozink, R. V. (2007). DRD4 VNTR polymorphism is associated with transient fMRI-BOLD responses to smoking cues. Psychopharmacology (Berl), 194, 433–441. McKinney, E. F., Walton, R. T., Yudkin, P., Fuller, A., Haldar, N. A. et al (2000). Association between polymorphisms in dopamine metabolic enzymes and tobacco consumption in smokers. Pharmacogenetics, 10, 483–491. Mifsud, J. C., Hernandez, L., and Hoebel, B. G. (1989). Nicotine infused into the nucleus accumbens increases synaptic dopamine as measured by in vivo microdialysis. Brain Res, 478, 365–367. Munafo, M., Clark, T., Johnstone, E., Murphy, M., and Walton, R. (2004). The genetic basis for smoking behavior: a systematic review and meta-analysis. Nicotine Tob Res, 6, 583–597. Munafo, M. R., Elliot, K. M., Murphy, M. F., Walton, R. T., and Johnstone, E. C. (2007). Association of the mu-opioid receptor gene with smoking cessation. Pharmacogenomics J, 7, 353–361. Munafo, M. R., Johnstone, E. C., Guo, B., Murphy, M. F., and Aveyard, P. (2008). Association of COMT Val108/158Met genotype with smoking cessation. Pharmacogenet Genomics, 18, 121–128. Munafo, M. R., Johnstone, E. C., Murphy, M. F., and Aveyard, P. (2009a). Lack of association of DRD2 rs1800497 (Taq1A) polymorphism with smoking cessation in a nicotine replacement therapy randomized trial. Nicotine Tob Res. Munafo, M. R., Johnstone, E. C., Wileyto, E. P., Shields, P. G., Elliot, K. M. et al (2006a). Lack of association of 5-HTTLPR genotype with smoking cessation in a nicotine replacement therapy randomized trial. Cancer Epidemiol Biomarkers Prev, 15, 398–400. Munafo, M. R., Murphy, M. F., and Johnstone, E. C. (2006b). Smoking cessation, weight gain, and DRD4 -521 genotype. Am J Med Genet B Neuropsychiatr Genet, 141B, 398–402. Munafo, M. R., Timpson, N. J., David, S. P., Ebrahim, S., and Lawlor, D. A. (2009b). Association of the DRD2 gene Taq1A polymorphism and smoking behavior: a meta-analysis and new data. Nicotine Tob Res, 11, 64–76. Nestler, E. J. (2005). Is there a common molecular pathway for addiction? Nat Neurosci, 8, 1445–1449. Nisell, M., Nomikos, G. G., and Svensson, T. H. (1994). Systemic nicotine-induced dopamine release in the rat nucleus accumbens is regulated by nicotinic receptors in the ventral tegmental area. Synapse, 16, 36–44. Nussbaum, J., Xu, Q., Payne, T. J., Ma, J. Z., Huang, W. et al (2008). Significant association of the neurexin-1 gene (NRXN1) with nicotine dependence in European- and African-American smokers. Hum Mol Genet, 17, 1569–1577.
R. Ray et al. O’Gara, C., Stapleton, J., Sutherland, G., Guindalini, C., Neale, B. et al (2007). Dopamine transporter polymorphisms are associated with short-term response to smoking cessation treatment. Pharmacogenet Genomics, 17, 61–67. O’Loughlin, J., Paradis, G., Kim, W., DiFranza, J., Meshefedjian, G. et al (2004). Genetically decreased CYP2A6 and the risk of tobacco dependence: a prospective study of novice smokers. Tob Control, 13, 422–428. Olsson, C., Anney, R., Forrest, S., Patton, G., Coffey, C. et al (2004). Association between dependent smoking and a polymorphism in the tyrosine hydroxylase gene in a prospective population-based study of adolescent health. Behav Genet, 34, 85–91. Osler, M., Holst, C., Prescott, E., and Sorensen, T. I. (2001). Influence of genes and family environment on adult smoking behavior assessed in an adoption study. Genet Epidemiol, 21, 193–200. Patterson, F., Schnoll, R., Wileyto, E., Pinto, A., Epstein, L. et al (2008). Toward personalized therapy for smoking cessation: a randomized placebo-controlled trial of bupropion. Clin Pharmacol Ther, 84, 320–325. Perkins, K. A., Lerman, C., Coddington, S., Jetton, C., Karelitz, J. L. et al (2008a). Gene and gene by sex associations with initial sensitivity to nicotine in nonsmokers. Behav Pharmacol, 19, 630–640. Perkins, K. A., Lerman, C., Grottenthaler, A., Ciccocioppo, M. M., Milanak, M. et al (2008b). Dopamine and opioid gene variants are associated with increased smoking reward and reinforcement owing to negative mood. Behav Pharmacol, 19, 641–649. Pianezza, M. L., Sellers, E. M., and Tyndale, R. F. (1998). Nicotine metabolism defect reduces smoking. Nature, 393, 750. Radwan, G. N., El-Setouhy, M., Mohamed, M. K., Hamid, M. A., Azem, S. A. et al (2007). DRD2/ANKK1 TaqI polymorphism and smoking behavior of Egyptian male cigarette smokers. Nicotine Tob Res, 9, 1325–1329. Ramirez-Latorre, J., Yu, C. R., Qu, X., Perin, F., Karlin, A. et al (1996). Functional contributions of alpha5 subunit to neuronal acetylcholine receptor channels. Nature, 380, 347–351. Rao, Y., Hoffmann, E., Zia, M., Bodin, L., Zeman, M. et al (2000). Duplications and defects in the CYP2A6 gene: identification, genotyping, and in vivo effects on smoking. Mol Pharmacol, 58, 747–755. Rasmussen, H., Bagger, Y., Tanko, L. B., Christiansen, C., and Werge, T. (2008). Lack of association of the serotonin transporter gene promoter region polymorphism, 5-HTTLPR, including rs25531 with cigarette smoking and alcohol consumption. Am J Med Genet B Neuropsychiatr Genet. Ray, R., Jepson, C., Patterson, F., Strasser, A., Rukstalis, M. et al (2006). Association of OPRM1 A118G variant with the relative reinforcing value of nicotine. Psychopharmacology (Berl), 188, 355–363.
32
Nicotine Dependence and Pharmacogenetics
Ray, R., Jepson, C., Wileyto, E. P., Dahl, J. P., Patterson, F. et al (2007a). Genetic variation in mu-opioidreceptor-interacting proteins and smoking cessation in a nicotine replacement therapy trial. Nicotine Tob Res, 9, 1237–1241. Ray, R., Jepson, C., Wileyto, P., Patterson, F., Strasser, A. A. et al (2007b). CREB1 haplotypes and the relative reinforcing value of nicotine. Mol Psychiatry, 12, 615–617. Redden, D. T., Shields, P. G., Epstein, L., Wileyto, E. P., Zakharkin, S. O. et al (2005). Catechol-O-methyltransferase functional polymorphism and nicotine dependence: an evaluation of nonreplicated results. Cancer Epidemiol Biomarkers Prev, 14, 1384–1389. Reuter, M., Hennig, J., Amelang, M., Montag, C., Korkut, T. et al (2007). The role of the TPH1 and TPH2 genes for nicotine dependence: a genetic association study in two different age cohorts. Neuropsychobiology, 56, 47–54. Saccone, N. L., Saccone, S. F., Hinrichs, A. L., Stitzel, J. A., Duan, W. et al (2009). Multiple distinct risk loci for nicotine dependence identified by dense coverage of the complete family of nicotinic receptor subunit (CHRN) genes. Am J Med Genet B Neuropsychiatry Genet, 150B, 453–466. Saccone, S. F., Hinrichs, A. L., Saccone, N. L., Chase, G. A., Konvicka, K. et al (2007). Cholinergic nicotinic receptor genes implicated in a nicotine dependence association study targeting 348 candidate genes with 3713 SNPs. Hum Mol Genet, 16, 36–49. Schinka, J. A., Town, T., Abdullah, L., Crawford, F. C., Ordorica, P. I. et al (2002). A functional polymorphism within the mu-opioid receptor gene and risk for abuse of alcohol and other substances. Mol Psychiatry, 7, 224–228. Schlaepfer, I. R., Hoft, N. R., Collins, A. C., Corley, R. P., Hewitt, J. K. et al (2008). The CHRNA5/A3/B4 gene cluster variability as an important determinant of early alcohol and tobacco initiation in young adults. Biol Psychiatry, 63, 1039–1046. Schnoll, R. A., and Lerman, C. (2006). Current and emerging pharmacotherapies for treating tobacco dependence. Expert Opin Emerg Drugs, 11, 429–444. Schnoll, R. A., Patterson, F., Wileyto, E. P., Tyndale, R. F., Benowitz, N. et al (2009). Nicotine metabolic rate predicts successful smoking cessation with transdermal nicotine: a validation study. Pharmacol Biochem Behav, 92, 6–11. Schoedel, K. A., Hoffmann, E. B., Rao, Y., Sellers, E. M., and Tyndale, R. F. (2004). Ethnic variation in CYP2A6 and association of genetically slow nicotine metabolism and smoking in adult Caucasians. Pharmacogenetics, 14, 615–626. Sellers, E. M., Kaplan, H. L., and Tyndale, R. F. (2000). Inhibition of cytochrome P450 2A6 increases nicotine’s oral bioavailability and decreases smoking. Clin Pharmacol Ther, 68, 35–43. Sellers, E. M., Tyndale, R. F., and Fernandes, L. C. (2003). Decreasing smoking behaviour and risk
497 through CYP2A6 inhibition. Drug Discov Today, 8, 487–493. Sesack, S. R., and Pickel, V. M. (1992). Prefrontal cortical efferents in the rat synapse on unlabeled neuronal targets of catecholamine terminals in the nucleus accumbens septi and on dopamine neurons in the ventral tegmental area. J Comp Neurol, 320, 145–160. Sherva, R., Wilhelmsen, K., Pomerleau, C. S., Chasse, S. A., Rice, J. P. et al (2008). Association of a single nucleotide polymorphism in neuronal acetylcholine receptor subunit alpha 5 (CHRNA5) with smoking status and with ’pleasurable buzz’ during early experimentation with smoking. Addiction, 103, 1544–1552 Shields, P. G., Lerman, C., Audrain, J., Bowman, E. D., Main, D. et al (1998). Dopamine D4 receptors and the risk of cigarette smoking in African-Americans and Caucasians. Cancer Epidemiol Biomarkers Prev, 7, 453–458. Skowronek, M. H., Laucht, M., Hohm, E., Becker, K., and Schmidt, M. H. (2006). Interaction between the dopamine D4 receptor and the serotonin transporter promoter polymorphisms in alcohol and tobacco use among 15-year-olds. Neurogenetics, 7, 239–246. Spitz, M. R., Amos, C. I., Dong, Q., Lin, J., and Wu, X. (2008). The CHRNA5-A3 region on chromosome 15q24-25.1 is a risk factor both for nicotine dependence and for lung cancer. J Natl Cancer Inst, 100, 1552–1556. Stapleton, J. A., Sutherland, G., and O’Gara, C. (2007). Association between dopamine transporter genotypes and smoking cessation: a meta-analysis. Addict Biol, 12, 221–226. Stevens, V. L., Bierut, L. J., Talbot, J. T., Wang, J. C., Sun, J. et al (2008). Nicotinic receptor gene variants influence susceptibility to heavy smoking. Cancer Epidemiol Biomarkers Prev, 17, 3517–3525. Strasser, A. A., Malaiyandi, V., Hoffmann, E., Tyndale, R. F., and Lerman, C. (2007). An association of CYP2A6 genotype and smoking topography. Nicotine Tob Res, 9, 511–518. Sullivan, P. F., and Kendler, K. S. (1999). The genetic epidemiology of smoking. Nicotine Tob Res, 1(Suppl 2), S51-S57; discussion S69-S70. Sun, D., Ma, J. Z., Payne, T. J., and Li MD. (2008). Beta-arrestins 1 and 2 are associated with nicotine dependence in European American smokers. Mol Psychiatry, 13, 398–406. Swan, G. E., Jack, L. M., Valdes, A. M., Ring, H. Z., Ton, C. C. et al (2007). Joint effect of dopaminergic genes on likelihood of smoking following treatment with bupropion SR. Health Psychol, 26, 361–368. Swan, G. E., Valdes, A. M., Ring, H. Z., Khroyan, T. V., Jack, L. M. et al (2005). Dopamine receptor DRD2 genotype and smoking cessation outcome following treatment with bupropion SR. Pharmacogenomics J, 5, 21–29. Thorgeirsson, T. E., Geller, F., Sulem, P., Rafnar, T., Wiste, A. et al (2008). A variant associated with
498 nicotine dependence, lung cancer and peripheral arterial disease. Nature, 452, 638–642. Timberlake, D. S., Haberstick, B. C., Lessem, J. M., Smolen, A., Ehringer, M. et al (2006). An association between the DAT1 polymorphism and smoking behavior in young adults from the National Longitudinal Study of Adolescent Health. Health Psychol, 25, 190–197. True, W. R., Xian, H., Scherrer, J. F., Madden, P. A., Bucholz, K. K. et al (1999). Common genetic vulnerability for nicotine and alcohol dependence in men. Arch Gen Psychiatry, 56, 655–661. Trummer, O., Koppel, H., Wascher, T. C., Grunbacher, G., Gutjahr, M. et al (2006). The serotonin transporter gene polymorphism is not associated with smoking behavior. Pharmacogenomics J, 6, 397–400. Uhl, G. R., Liu, Q. R., Drgon, T., Johnson, C., Walther, D. et al (2008). Molecular genetics of successful smoking cessation: convergent genome-wide association study results. Arch Gen Psychiatry, 65, 683–693. Vandenbergh, D. J., Bennett, C. J., Grant, M. D., Strasser, A. A., O’Connor, R. et al (2002). Smoking status and the human dopamine transporter variable number of tandem repeats (VNTR) polymorphism: failure to replicate and finding that never-smokers may be different. Nicotine Tob Res, 4, 333–340. Vandenbergh, D. J., O’Connor, R. J., Grant, M. D., Jefferson, A. L., Vogler, G. P. et al (2007). Dopamine receptor genes (DRD2, DRD3 and DRD4) and genegene interactions associated with smoking-related behaviors. Addict Biol, 12, 106–116. Vink, J. M., Willemsen, G., and Boomsma, D. I. (2005). Heritability of smoking initiation and nicotine dependence. Behav Genet, 35, 397–406. Walaas, I., and Fonnum, F. (1980). Biochemical evidence for gamma-aminobutyrate containing fibres from the nucleus accumbens to the substantia nigra and ventral tegmental area in the rat. Neuroscience, 5, 63–72. Wang, J. C., Grucza, R., Cruchaga, C., Hinrichs, A. L., Bertelsen, S. et al (2008a). Genetic variation in the CHRNA5 gene affects mRNA levels and is associated with risk for alcohol dependence. Mol Psychiatry, 14, 501–510.
R. Ray et al. Wang, Z., Ray, R., Faith, M., Tang, K., Wileyto, E. P. et al (2008b). Nicotine abstinence-induced cerebral blood flow changes by genotype. Neurosci Lett, 438, 275–280. Weiss, R. B., Baker, T. B., Cannon, D. S., von Niederhausern, A., Dunn, D. M. et al (2008). A candidate gene approach identifies the CHRNA5-A3-B4 region as a risk factor for age-dependent nicotine addiction. PLoS Genet, 4, e1000125. WHO. (2008). WHO Report on the Global Tobacco Epidemic, 2008: The MPOWER package. Geneva: World Health Organization. Wiesbeck, G. A., Wodarz, N., Weijers, H. G., DurstelerMacFarland, K. M., Wurst, F. M. et al (2006). A functional polymorphism in the promoter region of the monoamine oxidase A gene is associated with the cigarette smoking quantity in alcohol-dependent heavy smokers. Neuropsychobiology, 53, 181–185. Xian, H., Scherrer, J. F., Madden, P. A., Lyons, M. J., Tsuang, M. et al (2003). The heritability of failed smoking cessation and nicotine withdrawal in twins who smoked and attempted to quit. Nicotine Tob Res, 5, 245–254. Yu, Y., Panhuysen, C., Kranzler, H. R., Hesselbrock, V., Rounsaville, B. et al (2006). Intronic variants in the dopa decarboxylase (DDC) gene are associated with smoking behavior in European-Americans and African-Americans. Hum Mol Genet, 15, 2192–2199. Yudkin, P., Munafo, M., Hey, K., Roberts, S., Welch, S. et al (2004). Effectiveness of nicotine patches in relation to genotype in women versus men: randomised controlled trial. BMJ, 328, 989–990. Zeiger, J. S., Haberstick, B. C., Schlaepfer, I., Collins, A. C., Corley, R. P. et al (2008). The neuronal nicotinic receptor subunit genes (CHRNA6 and CHRNB3) are associated with subjective responses to tobacco. Hum Mol Genet, 17, 724–734. Zhang, H., Ye, Y., Wang, X., Gelernter, J., Ma, J. Z. et al (2006a). DOPA decarboxylase gene is associated with nicotine dependence. Pharmacogenomics, 7, 1159–1166. Zhang, L., Kendler, K. S., and Chen, X. (2006b). The mu-opioid receptor gene and smoking initiation and nicotine dependence. Behav Brain Funct, 2, 28.
Chapter 33
Genetics of Obesity and Diabetes Karani S. Vimaleswaran and Ruth J.F. Loos
Abbreviations ADAMTS9 ADIPOQ ADRB2 ADRB3 BAT2 BCDIN3D BDNF CAMK1D CDKAL1 CDKN2A CNR1 CTNNBL1 DGI DGKG DIAGRAM ENPP1 ETV5 FAIM2
ADAM metallopeptidase with thrombospondin type I motif 9 Adiponectin, C1Q and collagen domain containing β-adrenergic receptor 2 β-adrenergic receptor 3 HLA-B-associated transcript 2 BCDIN3 domain containing Brain-derived neurotrophic factor Calcium/calmodulin-dependent protein kinase 1D CDK5 regulatory subunit associated protein 1-like 1 Cyclin-dependent kinase inhibitor 2A Endocannabinoid receptor 1 Catenin (cadherin-associated protein), β-like 1 Diabetes genetics initiative Diacylglycerol kinase Diabetes genetics replication and meta-analysis Ectonucleotide pyrophosphatase/ phosphodiesterase 1 Ets variant gene 5 Fas apoptotic inhibitory molecule 2
K.S. Vimaleswaran () Medical Research Council (MRC) Epidemiology Unit, Institute of Metabolic Science, Addenbrooke’s Hospital – Box 285, Hills Road, Cambridge, CB2 0QQ, UK e-mail: [email protected]
FTO FUSION GIANT GNB3
GNPDA2 HHEX HTR2C IDE IL6 JAZF1 KCTD15 KIF11 LEP LEPR LGR5 MC4R MTCH2 MTNR1B NEGR1 NOTCH2 NPC1 NR3C1 PFKP PPARG PTER
Fat mass and obesity associated Finland-United States Investigation of NIDDM Genetics Genomic investigation of anthropometric traits Guanine nucleotide binding protein (G protein), beta polypeptide 3 Glucosamine-6-phosphate deaminase 2 Hematopoietically expressed homeobox 5-Hydroxytryptamine (serotonin) receptor 2C Insulin-degrading enzyme Interleukin 6 Juxtaposed with another zinc finger gene 1 Potassium channel tetramerisation domain containing 15 Kinesin family member 11 Leptin Leptin receptor Leucine-rich repeat-containing G-protein coupled Melanocortin 4 receptor Mitochondrial carrier homolog 2 Melatonin receptor 1B Neuronal growth regulator 1 Notch homologue 2, Drosophila Niemann-Pick disease, type C1 Nuclear receptor subfamily 3, group C, member 1 Phosphofructokinase Peroxisome proliferator-activated receptor gamma Phosphotriesterase related
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_33, © Springer Science+Business Media, LLC 2010
499
500
QTL ROC SEC16B SH2B1 SLC30A8 SNP THADA TMEM18 TCF7L2 TSPAN8 UCP WFS1 WTCCC
K.S. Vimaleswaran and R.J.F. Loos
Quantitative trait loci Receiver operating characteristics SEC16 homolog B SH2B adaptor protein 1 Solute carrier family 30 (zinc transporter), member 8 Single nucleotide polymorphism Thyroid adenoma associated Transmembrane protein 18 Transcription factor 7-like 2 Tetraspanin 8 Uncoupling protein Wolfram syndrome 1 Wellcome trust case control consortium
1 Introduction The continuing rise in obesity and diabetes prevalence is becoming an increasingly important clinical and public health challenge throughout the world (see Chapter 46). Obesity has reached epidemic proportions and is the major cause of the vast increase in the prevalence of type 2 diabetes. By current estimates, nearly 70% of adults in the USA and more than 60% in the UK are overweight; half of these are obese (International Association for the Study of Obesity). Changes in diet and physical activity habit are likely the main drives of the rise in obesity and diabetes prevalence during the last three decades (Hill et al, 2003). However, the contribution of hereditary influences cannot be ignored, especially at a time when we are beginning to develop an understanding of the molecular pathways involved in the control of energy homeostasis and of how variation in genes encoding proteins in these pathways can influence common obesity and type 2 diabetes. The genetic contribution to obesity and diabetes has been established through family, twin and adoption studies (Maes et al, 1997; Stunkard et al, 1986; Permutt et al, 2005). Results from twin studies have suggested that genetic factors explain 40–80% of the variance in body mass
index (BMI) and in risk of obesity (Allison et al, 1996; Herskind et al, 1996), while family studies have typically reported lower heritabilities of 20–50% (Luke et al, 2001; Rice et al, 1999). Data from adoption studies confirm the importance of a genetic contribution (20–60%) to obesity as evidenced by stronger correlations in BMI between adoptees and biological parents than between adoptive parents and adoptees (Stunkard et al 1986). The considerable range in heritability estimates is likely not only due to the differences in study design but also due to sample size, characteristics of the population (such as age) and the environment they live in, such as their dietary and physical activity habits (Maes et al, 1997). There is also ample evidence that diabetes has a substantial genetic component. The concordance of type 2 diabetes in monozygotic twins ranges between 50 and 70% compared to 20–37% in dizygotic twins (Kaprio et al, 1992; Newman et al, 1987; Poulsen et al 1999). Further evidence comes from studies that compare the risk in offspring with a family history of type 2 diabetes with offspring without such a family history. While the lifetime risk of developing type 2 diabetes is 7% in the general population, this risk is four- to sixfold (30–40%) higher in offspring of whom one parent had type 2 diabetes and almost 10-fold (70%) if both parents had diabetes (Köbberling and Tillil, 1982). Despite serious efforts over the past two decades to identify genetic variants that contribute to the predisposition to obesity and type 2 diabetes using traditional genetic epidemiological approaches, such as candidate gene approach and linkage studies, progress until recently has been slow and success limited. The availability of genome-wide association studies through the advancements of the International HapMap Project, the Human Genome Project and the progress in high-throughput genotyping has accelerated the potential to uncover genetic variants influencing common traits and diseases (Manolio et al, 2008). In this chapter, we review the main findings of candidate gene studies, genome-wide linkage
33
Genetics of Obesity and Diabetes
and genome-wide association studies for common obesity and type 2 diabetes. We, then, discuss how lifestyle factors such as diet and physical activity can influence the genetic susceptibility to obesity and diabetes. Finally, we discuss the impact of validated obesity and diabetes loci on public health and conclude by speculating about the discoveries that the future might bring.
501
•
(i) Linkage and positional cloning studies using extreme cases (e.g. morbidly obese/severe insulin resistance) and their families have provided evidence that several genes are implicated in monogenic forms of a disease. (ii) Animal models using gene knockout and transgenic approaches have identified the functional aspects of genes in relation to disease. (iii) Cellular model systems are used to identify biological networks and provide insight into molecular and regulatory aspects of genes responsible for phenotypes of interest, such as obesity and diabetes.
2 Obesity 2.1 Candidate Gene Studies The number of candidate gene studies (Box 33.1) for common obesity has grown steadily over the past 15 years. The latest update of Human Obesity Gene Map, which covers the literature available at the end of October 2005, reports 127 candidate genes associated with obesity-related traits (Rankinen et al, 2006). Among those, findings for 12 genes (ADIPOQ, ADRB2, ADRB3, GNB3, HTR2C, NR3C1, LEP, LEPR, PPARG, UCP1, UCP2 and UCP3) were replicated in 10 or more studies. Despite this number of replications, many other studies have shown no or even opposite association, and thus the overall conclusion for most of these genes remains inconclusive (Rankinen et al, 2006).
extreme/monogenic cases are considered candidates. The candidacy of a gene is based on the following sources:
•
•
Genetic variations in these candidate genes are then studied for association with the disease (obesity/diabetes) in the general population. For detecting the expected small effects of genetic variants involved in common traits and diseases, candidate gene studies need to be large scale (such as meta-analysis) and well powered.
Box 33.1 – Candidate Gene Approach •
The candidate gene approach, which has been used since the early 1990s, is a hypothesis-driven approach that relies on current understanding of the biology and pathophysiology of the disease. Genes that are thought to be involved in the pathogenesis of the disease based on animal models, cellular systems or
The major problem that has plagued the candidate gene approach is that many studies are small and thus often underpowered (see Chapter 29). Obesity is a heterogeneous condition and it is expected that many common genetic variants contribute to BMI and obesity, each conferring only modest risk. Thus, large sample sizes are required to identify such variants. This can be achieved by combining the previously published studies or by doing large-scale studies.
502
Large-scale analyses were instrumental in more firmly establishing the role of genetic variation in the MC4R (melanocortin 4 receptor), ADRB3 (beta 3 adrenergic receptor), PCSK1 (prohormone convertase 1/3), BDNF (Brainderived neurotrophic factor) and CNR1 (endocannabinoid receptor 1) genes in common obesity. MC4R encodes a seven-transmembrane, G protein-linked receptor that is widely expressed in the central nervous system and plays a key role in the regulation of food intake and energy homeostasis (Huszar et al, 1997). Mutations in MC4R are the most common monogenic cause of obesity, with approximately 5% of severely obese children carrying pathogenic mutations in the MC4R gene (Farooqi et al, 2003). Its role in common forms of obesity remained unexplained until recently. The two most frequently studied and most common non-synonymous MC4R variants are the V103I and I251L, which have been shown to have potential functional implications (Xiang et al, 2006). Both variants have been studied frequently for association with BMI and obesity. So far, only one sizeable populationbased study (7937 individuals) on the V103I variant has reported a significantly reduced risk of obesity in 103I allele carriers [odds ratio (OR): 0.69, 95% confidence interval (CI) 0.50– 0.96; p = 0.03] (Heid et al, 2005), whereas other smaller studies found no evidence of such association. However, a meta-analysis including 29,563 individuals from 25 populations (Young et al, 2007) confirmed the protective effect of the 103I allele (OR: 0.82, 95% CI 0.70–0.96; p = 0.015) on obesity risk, which was further established with the latest meta-analyses (Stutzmann et al, 2007), including a total of 39,879 individuals (OR: 0.80, 95% CI: 0.70–0.92; p = 0.002). In addition, strong evidence for a protective effect of the I251L MC4R variant on BMI and risk for adult and childhood obesity was obtained in eight out of nine populations examined (Stutzmann et al, 2007). The meta-analysis of all case–control studies found a nearly 50% reduced risk for obesity among carriers of the 251L-allele (OR: 0.52, 95% CI 0.38–0.71; p = 3.6 × 10−5 ).
K.S. Vimaleswaran and R.J.F. Loos
The Arg64Trp ADRB3 variant is one of the first genetic variants for which association with obesity was reported (Clement et al, 1995; Widen et al, 1995; Walston et al, 1995). ADRB3 is an obvious candidate gene given its involvement in the regulation of lipolysis and thermogenesis. A meta-analysis of 97 studies (n = 44,833) examining the association between ADRB3 Trp64Arg variant and BMI showed significant association in East Asians between the Arg64Trp variant and BMI, with Arg64 allele carriers having a 0.31 kg/m2 (p = 0.001) higher BMI compared to non-carriers, but not in populations of white European origin (0.08 kg/m2 , p = 0.36) (Kurokawa et al, 2008). In vitro experiments in rodent and human cell lines showed that stimulation of cell lines with the Arg64 variant had a reduced ability to stimulate adenyl cyclase activity compared with cell lines stimulated with the Trp variant (Pietri-Rouxel et al, 1997). Also, lipolysis in human adipocytes was lower in cells with the Arg64 variant compared with cells with the Trp variant (Umekawa et al, 1999). The PCSK1 gene is another candidate for obesity as it encodes the prohormone convertase 1/3 enzyme that converts prohormones into functional hormones involved in energy metabolism regulation. Rare mutations in the PCSK1 gene have been found to cause monogenic obesity (Jackson et al, 1997). In a comprehensive largescale study, the role of common variants in the PCSK1 gene was studied in relation to the risk of obesity (Benzinou et al, 2008b). Two non-synonymous variants, N221D (located in the catalytic domain of prohormone convertase 1/3) and the Q665E-S690T pair, were consistently associated with obesity in adults and children. Each additional minor allele (frequency: 4-7%) of the N221D variant increased the risk of obesity 1.34-fold, while each additional minor allele (frequency: 25-30%) of the Q665E-S690T pair increased the risk 1.22fold. Functional characterization of these variants showed a significant impairment of the N221D mutant PC1/3 protein catalytic activity, but no significant functional role for the Q665ES690T amino acid substitutions (Benzinou et al, 2008).
33
Genetics of Obesity and Diabetes
Rodent studies have shown that BDNF is involved in eating behaviour, body weight regulation and hyperactivity (Rios et al, 2001). Rare mutation in BDNF likely causes severe obesity and hyperphagia (Gray et al, 2006). A large-scale study, including 10,109 women, found that individuals homozygous for Met allele (frequency: 4.5%) of the Val66Met variant have a significantly lower BMI (−0.76 kg/m2 ) than Val66 allele carriers (Shugart et al, 2009). Genome-wide association studies have further confirmed the association of BDNF variant with BMI (Thorleifsson et al, 2009). Because of the physiological role in the regulation of energy metabolism and food intake, CNR1 has been considered as a biological candidate for human obesity. A study on 5750 individuals showed that CNR1 variations increase the risk of obesity and modulate BMI in European children (odds ratio (OR): 1.52. p = 3 × 10−5 ) and adults (OR: 1.85, p = 1.1 × 10−6 ) (Benzinou et al, 2008). Large-scale studies have also been well powered to prove that an association is truly negative. In this regard, the role of ENPP1 (ectoenzyme nucleotide pyrophosphate phosphodiesterase) in the development of obesity was shown to be likely limited. Four studies each with more than 5000 participants and with a combined sample size of 27,781 individuals found no association between the Lys121Gln variant and the obesityrelated traits (Meyre et al, 2007; Grarup et al, 2006; Lyon et al, 2006; Weedon et al, 2006). Also, the association between the –174G→C IL6 (interleukin-6) variant and the obesity was challenged by a large meta-analysis (Qi et al, 2007) combining data from 26,944 individuals from 25 populations, which showed no association between the –174G→C IL6 variant and obesity risk. In a large meta-analysis (Jalba et al, 2008), the Glu27Gln and the Arg16Gly polymorphisms of the beta 2-adrenergic receptor (ADRB2) gene were examined for their association with obesity in 10,404 and 4328 individuals, respectively. The presence of the Glu27 allele in the ADRB2 gene was found to be a significant risk factor for obesity in Asians, Pacific Islanders and American Indians, but not
503
in Europeans. However, the Arg16 allele was not associated with obesity. Although it might be premature to fully discount the involvement of ENPP1, IL6 and ADRB2 in obesity development, it would take more large-scale studies and meta-analyses to reverse current observations. For other genes tested in large-scale studies or in meta-analyses and for most of the candidate genes reported in the Human Obesity Gene Map (Rankinen et al, 2006), further studies will be required to prove or refute their role in obesity susceptibility. In summary, by means of large-scale studies and meta-analyses, at least five variants in four candidate genes have been found to be robustly associated with obesity-related traits (Loos, 2009). The candidate gene approach will continue to contribute to our understanding of obesity susceptibility, as it is useful for determining the association of a genetic variant with obesity and for identifying genes of modest effect.
2.2 Genome-Wide Studies 2.2.1 Genome-Wide Linkage Studies Although genome-wide linkage studies (Box 33.2) have proven to be successful for monogenic disorders with large genetic effects (Dean, 2003), its success in common diseases and continuous traits such as obesity and BMI has been limited (Saunders et al, 2007). Results of the first genome-wide linkage scan on body fat percentage were published in 1997 (Norman et al, 1997) and, similar to the candidate gene approach, the number of studies and QTLs (Quantitative trait loci) has grown exponentially over the past 10 years. The latest Human Obesity Gene map (Rankinen et al, 2006) reported more than 250 genetic regions, distributed across all chromosomes (except the Y chromosome), from more than 60 genome-wide linkage scans of which 15 loci have been replicated in at least three studies. Thus far, however, none of these loci could be narrowed down sufficiently
504
K.S. Vimaleswaran and R.J.F. Loos
to pinpoint the genes or variants that underlie the linkage to the obesity-related traits. This is likely due to lack of power and resolution to identify genetic variants of small effects that we expect for common obesity. A meta-analysis of 37 genome-wide linkage studies containing data on over 31,000 individuals from more than 10,000 families showed only a nominal evidence for linkage at chromosomes 13q13.2-q33.1, 12q23-q24.3, 11q13.3-22.3 and 16q12.2, of which the latter harbours the FTO (fat mass and obesity-associated) locus (Saunders et al, 2007).
Genome-Wide Association Approach •
•
•
Box 33.2 – The Genome-Wide Approaches Genome-wide approaches are hypothesis generating and aim to identify new, unanticipated genetic variants associated with traits or diseases through screening of the whole genome.
•
Genome-Wide Linkage Approach •
•
•
The genome-wide linkage approach, used since the mid-1990s, tests whether certain chromosomal regions across the genome co-segregate with a trait or disease of interest from one generation to the next. The approach requires populations of related individuals, such as siblings, nuclear families or extended pedigrees, hence, limiting the likelihood of achieving large sample sizes. Genome-wide linkage can only identify broad chromosomal regions that harbour hundreds of genes and it is often impossible to pinpoint which variant is causing the linkage with the disease.
•
The genome-wide association approach, which was first published in 2005, also examines the entire genome with no prior assumptions and aims to identify previously unsuspected genetic loci associated with a disease or trait of interest. It does not rely on familial relatedness and can therefore achieve larger sample sizes than typical familybased linkage studies. This approach screens the whole genome at higher resolution levels than genome-wide linkage studies and, thus, is able to narrow down the associated locus more accurately. Two major advances have set the stage for genome-wide association studies. First are the recent advancements in the International HapMap Project (International HapMap Consortium et al, 2007) and the completion of the human genome project and second is the substantial progress in high-throughput genotyping, which has made it possible to genotype more than 1 million genetic variants in a single analysis. Together, these breakthroughs have enabled production of single nucleotide polymorphism (SNP) chips that can capture more than 80% of the common genetic variation reported in the HapMap (Magi et al, 2007). Study design: Genome-wide association studies typically comprise two stages; a discovery stage, followed by at least one replication stage. The discovery stage involves highdensity genotyping of hundreds of thousands of genetic variants across the genome. Each variant is tested for association with a trait or disease
33
Genetics of Obesity and Diabetes
of interest. Studies with large sample sizes at this stage tend to be more successful, in particular for common traits such as obesity and diabetes as, they are better powered to identify associations of small effect size. Associations that meet the genome-wide significance threshold (p < 5.0×10–8 ) are taken forward for replication to validate the initial observation. Only variants of loci for which the association observed at the discovery stage is confirmed at the replication stage are considered “true hits”. Genome-wide linkage has now largely been replaced by genome-wide association as the hypothesis-generating approach, at least for common diseases and traits. This is because not only the latter has become more affordable to the general scientific community but it also has much greater resolution and does not require recruitment of related individuals, which is often a tedious task that limits sample size and thus power.
2.2.2 Genome-Wide Association Studies Genome-wide association is the latest genefinding tool in genetic epidemiology (Box 33.2) and has already resulted in an unprecedented chain of discoveries in the genomics of complex diseases. Since the introduction of the genome-wide association approach, three waves of discoveries based on large-scale high-density genome-wide association studies for obesityrelated traits have been performed. The first wave, in 2007, comprised two high-density genome-wide association studies that each confirmed FTO as the first gene, incontrovertibly associated with common obesity and related traits. Interestingly, the first study (Frayling et al, 2007) identified FTO through a genomewide association study for type 2 diabetes in which a cluster of common variants in the
505
first intron of the FTO gene showed a highly significant association with type 2 diabetes mediated through BMI. Subsequently, the association with BMI and obesity was unequivocally replicated in 13 cohorts comprising more than 38,000 individuals. The second study (Scuteri et al, 2007) was the first large-scale highdensity genome-wide association study of BMI, conducted in more than 4000 Sardinians. In the initial analyses, variants in the FTO and PFKP (platelet-type phosphofructokinase) genes showed the strongest association, but only those in FTO were significantly replicated in European Americans and Hispanic Americans. Each risk allele increased BMI by 0.10–0.13 standard deviations (equivalent to about 0.40–0.66 kg/m2 ) and the risks for overweight and obesity by 1.18fold and 1.32-fold, respectively. Taken together, homozygotes for the risk allele weighed about 3 kg more and had a 1.67-fold increased risk for obesity compared with those who did not inherit a risk allele (Frayling et al, 2007; Scuteri et al, 2007). The frequency of the FTO risk alleles is high in populations of European decent; 63% carry at least one risk allele and 16% are homozygous. Although the population attributable risk for obesity (∼20%) and overweight (∼13%) was rather high, the FTO variants explained only 1% of the variation in BMI (Frayling et al, 2007). In the second wave of discoveries, collaborative efforts were initiated to combine individual genome-wide association studies, thereby increasing sample size and power to identify more common variants with small effects. The GIANT (Genomic Investigation of Anthropometric Traits) consortium is an international collaborative initiative that brings together research groups specifically focussing on anthropometric traits from across Europe and the USA. In their first meta-analysis, data of 7 genomewide association scans for BMI including 16,876 individuals were combined (Loos et al, 2008). Despite a quadrupling increase in sample size compared to the first wave of genome-wide association studies, only FTO and one new locus – out of 10 loci that were taken forward for replication – were unequivocally confirmed. The newly
506
identified locus mapped at 188 kb downstream of MC4R (near-MC4R). The same locus was also identified by a genome-wide association study in 2684 Indian Asians and confirmed in 11,955 individuals of Indian Asian and European ancestry (Chambers et al, 2008). While the effect size is the same in both ethnic groups, the frequency of the risk allele in Asian Indians (36%) is higher than in white Europeans (27%). This explains in part why this locus could be identified with a relatively small sample of Asian Indians in the discovery stage. For the third wave of discoveries, the GIANT consortium increased the sample size to 32,387 adults of European ancestry from 15 cohorts (Willer et al, 2009). Of the 35 loci that were taken forward for follow-up in an independent series of 59,082 individuals, eight loci were firmly replicated. These include the previously established FTO and near-MC4R loci and six new loci, i.e. near NEGR1, near TMEM18, in SH2B1, near KCTD15, near GNPDA2 and in MTCH2. In parallel with the analyses of the GIANT consortium, deCODE genetics performed a meta-analysis of four genomewide association studies for BMI, including 30,232 Europeans and 1,160 African Americans (Thorleifsson et al, 2009). A total of 43 single nucleotide polymorphisms (SNPs) in 19 chromosomal regions were taken forward for replication genotyping in 5,586 Danish individuals and for confirmation in discovery stage data of the GIANT consortium. Besides the FTO and near-MC4R loci, eight additional loci reached genome-wide significance. Of these, four loci (near NEGR1, near TMEM18, in SH2B1, near KCTD15) had also been identified by the GIANT consortium, whereas four loci were novel, i.e. in SEC16B, between ETV5 and DGKG, in BDNF and between BCDIN3D and FAIM2. Variation in the BAT2 gene was consistently associated with weight, but not BMI, suggesting that this locus might contribute to overall size rather than adiposity. While the studies by the GIANT consortium and deCODE genetics focused on BMI as the main outcome, a third genomewide association study examined association with the risk of early-onset and morbid adult
K.S. Vimaleswaran and R.J.F. Loos
obesity in 1380 cases and 1416 controls (Meyre et al, 2009). A total of 38 highly significant markers were taken forward for genotyping in 14,186 adults and children to test for replication with BMI and obesity risk. In addition to FTO and near-MC4R, three new markers were identified; in NPC1, near MAF and near PTER (Table 33.1). The discovery of these novel loci has already started to provide valuable insights into pathophysiological mechanisms and pathways that underlie obesity development, in particular for the first discovered FTO gene. Two studies (Gerken et al, 2007; Sanchez-Pulido and Andrade-Navarro, 2007) pointed out that FTO is a member of the non-heme dioxygenase superfamily, encodes a 2-oxoglutaratedependent nucleic acid demethylase and localizes to the nucleus. Studies in rodents indicated that Fto mRNA is most abundant in the brain, particularly in the hypothalamic nuclei governing energy balance (Gerken et al, 2007). Another study (Fischer et al, 2009) has shown that loss of Fto in mice leads to a significant reduction in adipose tissue and lean body mass, which was found to develop as a consequence of increased energy expenditure despite decreased spontaneous locomotor activity and relative hyperphagia. A peripheral role for FTO was proposed by a study in healthy women showing that FTO mRNA levels in adipose tissue increase with BMI, and carriers of the risk allele had reduced lipolytic activity, independent of BMI (Wahlen et al, 2008). For other loci, except SH2B1 (Ren et al, 2007), BDNF (Nakagawa et al, 2003) and MC4R (Farooqi et al, 2003), the physiological role in relation to obesity risk is not or poorly understood. Taken together, the three waves of highdensity multistage genome-wide association scans, over the past 2 years, have identified 15 new loci convincingly associated with obesity traits, proving this approach more productive than any other gene-discovery methods previously applied for common traits. To date, of all identified loci, the genetic variation in the FTO has still the largest effect on obesity susceptibility.
NEGR1
SEC16B, RASAL2 TMEM18
ETV5
GNPDA2
PTER BDNF
MTCH2
BCDIN3D, FAIM2 SH2B1, ATP2A1
1p31.1
1q25.2
3q27
4p13
10p12 11p13
11p11.2
12q13
MAF
FTO
NPC1 MC4R
KCTD15
16q22–q23
16q12.2
18q11–q12 18q22
19q13.11
16p11.2
2p25.3
Gene symbol
Chr. location
–
Intracellular lipid transport Hypothalamic signalling
Transcription factor involved in adipogenesis and insulin-glucagon regulation Neuronal function + control of appetite
Neuronal role in energy homeostasis
– BDNF expression is regulated by nutritional state and MC4R signaling A putative mitochondrial carrier protein – cellular apoptosis Adipocyte apoptosis
Spermatogonial stem cell self-renewal –
Neural development
Regulation of neurite outgrowth in the developing brain –
0.06–0.17
∼0.14 0.19–0.32
0.26–0.66
∼0.07
0.15
∼0.09
0.07
∼0.07 ∼0.19
0.19
∼0.19
0.19–0.26
∼0.11
0.10–0.13
68–69
53 27–28
Frayling et al. (2007), Scuteri et al. (2007), Loos et al. (2008), Willer et al. (2009), Thorleifsson et al. (2009), and Meyre et al. (2009) Meyre et al. (2009) Loos et al. (2008), Chambers et al. (2008), Willer et al. (2009), Thorleifsson et al. (2009), and Meyre et al. (2009) Willer et al. (2009) and Thorleifsson et al. (2009)
Meyre et al. (2009)
91
46–48
Willer et al. (2009) and Thorleifsson et al. (2009)
Thorleifsson et al. (2009)
Willer et al. (2009)
Meyre et al. (2009) Thorleifsson et al. (2009)
Willer et al. (2009)
Willer et al. (2009) and Thorleifsson et al. (2009) Thorleifsson et al. (2009)
Willer et al. (2009) and Thorleifsson et al. (2009) Thorleifsson et al. (2009)
Reference
38
35
36
91 77
45
80
85
25
64
Genetics of Obesity and Diabetes
potassium channel tetramerisation domain containing 15
Niemann-Pick disease, type C1 Melanocortin 4 receptor
Fat mass- and obesity- associated gene
mitochondrial carrier homolog 2 (C. elegans) BCDIN3 domain containing Fas apoptotic inhibitory molecule 2 SH2B adaptor protein 1 ATPase, Ca++ transporting, cardiac muscle, fast twitch 1 v-maf musculoaponeurotic fibrosarcoma oncogene homolog
Glucosamine-6-phosphate deaminase 2 Phosphotriesterase related protein Brain-derived neurotrophic factor
Ets variant gene 5
SEC16 homolog B (S.cerevisiae) RAS protein activator like 2 transmembrane protein 18
Neuronal growth regulator 1
Table 33.1 Obesity-susceptibility loci identified through genome-wide association studies Risk allele Effect size frequency (%) Gene name Potential function (kg/m2 )
33 507
508
2.3 Obesity Susceptibility Genes, Food Intake and Energy Expenditure The identification of the novel obesity susceptibility loci has instigated new studies exploring through which arm of the energy balance, i.e. food intake or energy expenditure, these loci lead to obesity. In particular for the FTO locus, which was discovered as the first obesity susceptibility locus, new insights have begun to accumulate. Some studies have provided evidence for a role of the FTO locus in food intake. For example, two studies in a total of >8000 British children consistently showed that the BMI-increasing allele of the FTO locus was associated with increased energy intake, independent of body size (Cecil et al, 2008; Timpson et al, 2008). A third study in 3337 children showed that homozygotes for the FTO risk allele had a significantly reduced satiety responsiveness score (Wardle et al, 2008). This observation was confirmed in a smaller study of 131 children with careful registration of the children’s consumption of palatable food presented after having eaten a meal (Wardle et al, 2009). Homozygotes for the BMI-lowering FTO-allele ate significantly less than the heterozygotes or homozygotes of the BMI-increasing allele, suggesting that those with BMI-lowering allele are protected against overeating by promoting responsiveness to internal satiety signals (Wardle et al, 2009). However, not all studies have been able to support a role for the FTO locus in energy intake (Bauer et al, 2009; Hakanen et al, 2009; Johnson et al, 2009). While data from fto-deficient mice have suggested that fto might induce obesity through an effect on energy expenditure (Fischer et al, 2009), there is no evidence to support such role in humans (Berentzen et al, 2008; Cecil et al, 2008; Goossens et al, 2009; Hakanen et al, 2009; Haupt et al, 2009; Rampersaud et al, 2008; Speakman et al, 2008; Wardle et al, 2008). For the locus identified in the second wave on genome-wide association studies (Chambers et al, 2008; Loos et al, 2008), the MC4R gene
K.S. Vimaleswaran and R.J.F. Loos
is the nearest and most obvious candidate gene. Mutations in the MC4R gene are known to result in extreme obesity through hyperphagia (Farooqi et al, 2003). However, it is still unclear whether the near MC4R locus indeed reflects the functions of MC4R. A few studies, which were performed even before the genome-wide association era, have examined the potential role of genetic variation near MC4R gene in contributing to the physical activity energy expenditure based on evidence from studies of Mc4r knockout mice (Butler et al, 2001; Ste Marie et al, 2000). A study in 669 individuals showed that homozygotes for a variant located downstream of the MC4R gene had the lowest moderateto-strenuous activity scores (p= 0.005) and the highest inactivity scores (Loos et al, 2005). The same locus was found to be linked to physical activity levels in a genome-wide linkage study in 1030 siblings from 319 Hispanic families (Cai et al, 2006). Little is known about the more recently discovered loci and genes. The neuron-specific over-expression of SH2B1 has been shown to be protective against high-fat diet-induced obesity in mice (Ren et al, 2007), whereas the BDNF variant has been shown to be associated with eating behaviour in humans (Bauer et al, 2009; Shugart et al, 2009). A recent study in 1700 Dutch women (Bauer et al, 2009) examined the majority of the newly discovered obesity loci found that the SH2B1, KCTD15, MTCH2, NEGR1, and BDNF loci were associated with dietary macronutrient intake. Although the above-mentioned studies provide some first evidence of association with energy intake and expenditure, replication of these observations in larger cohorts will be required to confirm the reported findings.
3 Type 2 Diabetes 3.1 Candidate Gene Studies Candidate genes for type 2 diabetes are selected based on their involvement in pancreatic β-cell
33
Genetics of Obesity and Diabetes
function, insulin action/glucose metabolism, or other metabolic conditions that increase type 2 diabetes risk (e.g. energy intake/expenditure and lipid metabolism) (Barroso et al, 2003). To date, more than 50 candidate genes for type 2 diabetes have been studied in various populations worldwide. However, most candidate gene studies for diabetes typically tested a limited number of genetic variants and often in only small samples or in cases and controls that were poorly matched or diagnosed, frequently resulting in lack of replication of the weak associations detected (Moore and Florez, 2008). In this section, we focus on a few of the most promising candidate genes (PPARG, KCNJ11 and WFS1) for which the results are most convincing and the samples sizes are large. The PPARG (peroxisome proliferatoractivated receptor-γ) gene has been widely studied because of its importance in adipocyte and lipid metabolism (Tontonoz and Spiegelman, 2008). In addition, it is a target for hypoglycemic drugs known as thiazolidinediones. A prolineto-alanine (Pro12Ala) change in codon 12 of PPARG was the first genetic variant to be definitively implicated in the common form of type 2 diabetes. The rare Ala allele is present in ∼15% of white Europeans and was shown to be associated with increased insulin sensitivity. A study that combined data on a Finnish and a second generation Japanese cohort (Deeb et al, 1998) found that Pro allele homozygotes had 4.35 times higher risk for developing type 2 diabetes compared to those who do not carry the allele (p = 0.028). Although a number of subsequent small studies were not able to replicate this initial finding, a meta-analysis combining the results from 16 studies published before 2000 confirmed the association with type 2 diabetes (Altshuler et al, 2000). In addition, a metaanalysis of data from 57 studies comprising approximately 32,000 non-diabetic individuals further established the role of Pro12Ala variant in association with greater insulin sensitivity (standardized effect size 0.227, P = 0.0067) (Tönjes et al, 2006). The beta-cell adenosine triphosphatesensitive potassium (KATP ) channel plays a
509
critical role in insulin secretion. The channel is composed of two subunits: the sulfonylurea receptor-1 (SURl) and an inward rectifying potassium channel (Kir6.2) that are encoded on chromosome lp15.1 by genes ABCC8 and KCNJII, respectively. SNP E23K of KCNJ11 has been shown to be associated with type 2 diabetes. Although initial smaller studies failed to replicate the association of the E23K polymorphism with type 2 diabetes, large-scale studies and meta-analyses have consistently associated the lysine variant with type 2 diabetes, showing a 1.15 times higher risk for developing type 2 diabetes compared to those who do not carry this variant (Florez et al, 2004; Gloyn et al, 2003; Van Dam et al, 2005). Genome-wide association studies have further confirmed the association of PPARG and KCNJ11 variants in association with type 2 diabetes (Diabetes Genetics Initiative of Broad Institute of Harvard and MIT et al, 2007; Scott et al, 2007; Zeggini et al, 2007). WFS1 gene encodes wolframin, a protein that is defective in individuals with the Wolfram syndrome. This syndrome is characterized by diabetes insipidus, juvenile diabetes, optic atrophy and deafness. Disruption of Wfs1 in mice causes overt diabetes or impaired glucose tolerance, depending on genetic background (Ishihara et al, 2004; Riggs et al, 2005). Both humans and mice deficient in Wolframin show pancreatic βcell loss, possibly as a result of an enhanced endoplasmic reticulum stress response leading to increased β-cell apoptosis (Riggs et al, 2005; Yamada et al, 2006). Hence, WFS1 is critical for survival and function of insulin-producing pancreatic beta cells. The first evidence that variation in the WFS1 gene influences susceptibility to type 2 diabetes was shown in a familybased association study (Minton et al, 2002). A study on 1,536 SNPs in 84 candidate genes using a gene-centric approach showed that only WFS1 gene was associated with type 2 diabetes (Sandhu et al, 2007). This finding was further replicated in 9533 cases and 11,389 controls. Following this study, a meta-analysis of 11 studies (Franks et al, 2008), comprising up to 12,979 cases and 14,937 controls, further
510
confirmed association between the WFS1 gene variant, rs10010131 and type 2 diabetes (OR: 0.89, 95% CI 0.86–0.92; p = 4.9 × 10−11 ). In summary, large-scale studies and metaanalyses have identified only three candidate genes to be robustly associated with type 2 diabetes traits.
K.S. Vimaleswaran and R.J.F. Loos
expression, leading to reduced insulin secretion and enhanced risk of type 2 diabetes (Lyssenko et al, 2007). To date, genetic variation in the TCF7L2 gene has still the largest effect on type 2 diabetes susceptibility.
3.2.2 Genome-Wide Association Studies
3.2 Genome-Wide Studies 3.2.1 Genome-Wide Linkage Scans So far, more than 20 genome-wide linkage studies have been carried out to localize type 2 diabetes predisposing variants (Huang et al, 2006; Guan et al, 2008). Although these genome-wide linkage scans have suggested that type 2 diabetes susceptibility loci reside across the whole genome, only one study has so far successfully pinpointed the gene underlying linkage with type 2 diabetes, with the discovery of the TCF7L2 gene (Grant et al, 2006). The susceptibility effect was identified through a search for microsatellite associations across a large region of chromosome 10 that showed suggestive evidence of linkage with type 2 diabetes (Reynisdottir et al, 2003). Subsequent fine mapping of this region localized the variants associated with increased risk of type 2 diabetes to an intron in the TCF7L2 gene. These findings were further replicated in two independent populations from the USA and Denmark (Grant et al, 2006). Overall, the effect was considerable, with each additional risk allele increasing the odds of type 2 diabetes 1.5-fold (p = 10−18 ) (Grant et al, 2006). TCF7L2, also known as TCF-4, is a transcription factor and forms part of the WNT signalling pathway, acting as a nuclear receptor for CTNNBL1 (β-catenin) (Florez, 2007; Jin and Liu, 2008). The evidence implicating variants within TCF7L2 in type 2 diabetes susceptibility has instigated efforts to understand the mechanisms involved. It has been shown that the alteration of TCF7L2 expression or function disrupts pancreatic islet function, possibly through dysregulation of proglucagon gene
The genome-wide association approach has led to the identification of at least 15 novel type 2 diabetes susceptibility loci (Frayling, 2007; Doria et al, 2008). Similar to obesity, the field of type 2 diabetes genetics has witnessed three waves of large-scale high-density genome-wide association studies so far. The three waves comprise six genome-wide association scans performed in European populations (Diabetes Genetics Initiative of Broad Institute of Harvard and MIT et al, 2007; Sladek et al, 2007; Steinthorsdottir et al, 2007; Scott et al, 2007; Wellcome trust case control consortium, 2007; Zeggini et al, 2007) and one in East Asians (Unoki et al, 2008). The first wave of discoveries was based on a relatively small genome-wide association study (Sladek et al, 2007) including 661 cases and 614 controls from France identifying three novel loci, a non-synonymous polymorphism (rs13266634) in the zinc transporter SLC30A8, which is expressed exclusively in insulin-producing beta-cells and two loci that contain genes potentially involved in beta-cell development or function (IDE–KIF11–HHEX and EXT2–ALX4). The second wave of discoveries involved three further scans performed by the WTCCC, DGI and FUSION (Diabetes Genetics Initiative of Broad Institute of Harvard and MIT et al, 2007; Scott et al, 2007; Wellcome trust case control consortium, 2007; Zeggini et al, 2007) that identified CDKAL1 locus. These three studies collaborated by sharing data and coordinating replication studies and further identified CDKN2A/2B, FTO and IGF2BP2, in addition to other previously reported loci such as PPARG, KCNJ11 and TCF7L2. Another
33
Genetics of Obesity and Diabetes
genome-wide association study, which was conducted in 1399 cases and 5275 controls from Iceland, reported an intronic variant (rs7756992) in the CDKAL1 gene as a novel type 2 diabetes locus (Steinthorsdottir et al, 2007). Furthermore, this study showed that the insulin response for homozygotes was approximately 20% lower than for heterozygotes or non-carriers, suggesting that this variant confers risk of type 2 diabetes through reduced insulin secretion. During the same time, a genome-wide association study for prostate cancer in 1501 cases and 11,290 controls identified two variants on chromosome 17. One of these was in the first intron of TCF2 (HNF1β) gene, in which mutations are known to cause maturity-onset diabetes of the young type 5. As a follow-up, the TCF2 variant, rs7501939, was examined in eight case–control groups comprising 9936 type 2 diabetic cases and 23,087 controls and the variant showed a significant protection against the development of type 2 diabetes (OR: 0.91, p = 9.2 × 10−7 ) (Gudmundsson et al, 2007). In the third wave, a large-scale collaborative meta-analysis of genome-wide association scans for type 2 diabetes was performed as a three-stage study design by the Diabetes Genetics Replication and Metaanalysis (DIAGRAM) consortium (Zeggini et al, 2008). The DIAGRAM consortium combined data from the WTCCC, DGI and FUSION scans including 4549 cases and 5579 controls. A total of 69 genetic variants, showing the strongest associations in the genome-wide association meta-analysis, were taken forward for replication in a set of 22,426 individuals of which 11 variants were further replicated in an additional ∼57,000 individuals. Eventually, six variants [Notch homologue 2, Drosophila (NOTCH2), ADAM metallopeptidase with thrombospondin type I motif 9 (ADAMTS9), calcium/calmodulindependent protein kinase 1D (CAMK1D), juxtaposed with another zinc finger gene 1 (JAZF1), tetraspanin 8 (TSPAN8)/leucine-rich repeat-containing G protein coupled (LGR5) and thyroid adenoma associated (THADA)] showed consistent association with the risk of type 2 diabetes. The putative functional mechanisms by
511
which they may affect type 2 diabetes risk are listed in Table 33.2. As part of the third wave, a study in 1561 cases and 2824 controls from Japan (Unoki et al, 2008) that genotyped over 200,000 tagSNPs, identified KCNQ1 as a novel type 2 diabetes susceptibility locus and also confirmed the association of CDKAL1 and IGF2BP2 loci. KCNQ1 locus was also identified by a second, smaller scan, again performed in a Japanese population (Yasuda et al, 2008). Another study (Wu et al, 2008) replicated the association of 17 common variants in the genes identified from previous genome-wide scans in 3210 unrelated Chinese Hans. This study showed that the common variants in CDKAL1, CDKN2A/2B, IGF2BP2 and SLC30A8 loci independently or additively contribute to type 2 diabetes risk. The risk alleles of the CDKAL1 and CDKN2A/2B variants increased diabetes risk by ∼1.4- and ∼1.3-fold, respectively, which is higher than that observed in Europeans (Wellcome Trust Case Control Consortium, 2007; Zeggini et al, 2007). The risk allele frequencies of these variants were also higher in Chinese Hans compared to Europeans (Wu et al, 2008). Besides the genome-wide scans for type 2 diabetes, scans for traits related to type 2 diabetes have also been performed. Two genomewide association studies independently reported previously unknown genetic loci in association with fasting glucose concentrations. The first study (Prokopenko et al, 2009) showed that the variants in the gene encoding melatonin receptor 1B (MTNR1B) were consistently associated with fasting glucose across all the 10 genomewide association studies (n = 36,610). The risk allele of the MTNR1B locus was associated with an increase of 0.07 (95% CI: 0.06–0.08) mmol/l in fasting glucose levels (p = 3.2 × 10−50 ) and with reduced beta-cell function as measured by homeostasis model assessment (HOMA-B, p = 1.1 × 10−15 ). The same allele was also associated with an increased risk of type 2 diabetes (odds ratio = 1.09 (1.05–1.12), per G allele p = 3.3 × 10−7 ) in a meta-analysis of 13 case– control studies (18,236 cases and 64,453 controls). This study also confirmed the previous associations of fasting glucose with variants
NOTCH2
THADA
ADAMTS9
PPARGa
IGF2BP2
CDKAL1
JAZF1
SLC30A8
CDKN2A/ CDKAN2B
2p21
3p14
3q25
3q28
6p22.2
7p15
8q24.11
9p21
Juxtaposed with another zinc finger gene 1 Solute carrier family 30 (zinc transporter), member 8 Cyclin-dependent kinase inhibitor 2a/2b
CDK5 regulatory subunit associated protein 1-like 1
Disintegrin-like and metalloproteinase with thrombospondin type 1 motif Peroxisome proliferator activating receptor gamma Insulin-like growth factor 2 mRNA binding protein 2
Thyroid adenoma associated gene
Notch, Drosophila, homolog of, 2
Transcriptional repressor; associated with prostate cancer Beta-cell zinc transporter ZnT8; insulin storage and secretion Cyclin-dependent kinase inhibitors 2A/2B and p15 tumour suppressor; islet development
Presumed regulator of cyclin kinase; islet glucotoxicity sensor
Transcription factor involved in adipocyte development Growth factor binding protein; pancreatic development
Transmembrane receptor implicated in pancreatic organogenesis Thyroid adenoma; associates with PPARG Proteolytic enzyme regulating extracellular matrix
1.20
1.15
1.10
1.14
1.14
0.83
0.69
0.50
0.32
0.32
87
0.76
1.09
1.19
0.90
0.10
1.15
1.13
Zeggini et al. (2007), Diabetes Genetics Initiative of Broad Institute of Harvard and MIT et al. (2007), and Scott et al. (2007)
Sladek et al. (2007)
Zeggini et al. (2007), Diabetes Genetics Initiative of Broad Institute of Harvard and MIT et al. (2007), and Scott et al. (2007) Zeggini et al. (2007), Diabetes Genetics Initiative of Broad Institute of Harvard and MIT et al. (2007), and Scott et al. (2007) Zeggini et al. (2008)
Altshuler et al. (2000)
Zeggini et al. (2008)
Zeggini et al. (2008)
Zeggini et al. (2008)
Table 33.2 Type 2 diabetes susceptibility loci identified through candidate gene, genome-wide linkage and genome-wide association studies Effect Risk allele Gene symbol Gene name Potential function size frequency Reference
1p12
Chr. Location
512 K.S. Vimaleswaran and R.J.F. Loos
Transcription factor 7-like 2
Potassium channel, inwardly rectifying, subfamily J, member 11 Potassium channel, voltage-gated, KQT-like subfamily, member 1 Tetraspanin 8/ Leucine-rich repeat containing G protein coupled
Wolfram syndrome 1 (wolframin)
Transcription factor 2
TCF7L2b
KCNJ11a
KCNQ1
TSPAN8/ LGR5
WFS1a
TCF2a
FTO
10q25.3
11p15.1
11p15.5
12q21
4p16.1
17q12
16q12.2
Cell surface glycoprotein implicated in gastrointestinal cancers Critical for survival and function of insulin-producing pancreatic beta cells transcription factor implicated in pancreatic islet development and function Altered BMI
Beta-cell dysfunction
CDC123: required for S phase entry of the cell cycle; CAMK1D: mediator of chemokine signal transduction in granulocytes IDE: neutral metallopeptidase that can degrade peptides; HHEX: transcription factor involved in pancreatic development; KIF11: kinesin related motor in microtubule and spindle function Transcription factor that transactivates proglucagon and insulin genes Kir 6.2 potassium channel; risk allele impairs insulin secretion
Potential function
Table 33.2 (Continued)
1.17
0.91
0.40
0.55
0.60
0.27
1.09
1.12
0.93
35
31
0.65
0.18
Risk allele frequency
1.29
1.14
1.37
1.13
1.11
Effect size
Frayling et al. (2007)
Gudmundsson et al. (2007)
Sandhu et al. (2007)
Zeggini et al. (2008)
Unoki et al. (2008)
Florez et al. (2004)
Grant et al. (2006)
Sladek et al. (2007)
Zeggini et al. (2008)
Reference
Type 2 diabetes susceptibility loci marked with a were identified through the candidate gene approach, those marked with b were identified through genome-wide linkage, and all others were identified through genome-wide association studies.
Fat mass and obesity associated
Insulin degrading enzyme/ Hematopoietically expressed homeobox/ Kinesin family member 11
IDE/HHEX/ KIF11
10q23-q25
Cell division cycle protein 123 homolog/ Calcium/calmodulin dependent protein kinase i-delta
Gene name
CDC123/ CAMK1D
Gene symbol
10p13-p14
Chr. Location
33 Genetics of Obesity and Diabetes 513
514
at the G6PC2 (rs560887, p = 1.1 × 10−57 ) and GCK (rs4607517, p = 1.0 × 10−25 ) loci. Another study (Bouatia-Naji et al, 2009) identified rs1387153, near MTNR1B, as a modulator of fasting plasma glucose (p = 1.3 × 10−7 ) in genome-wide association data from 2151 nondiabetic French individuals and also showed an association with the risk of developing type 2 diabetes (OR: 1.15, 95% CI = 1.08–1.22, p = 6.3 × 10−5 ). In addition, this study observed cumulative effects of the MTNR1B locus and the three previously identified genetic determinants of fasting plasma glucose (G6PC2, GCK and GCKR). Those carrying six or more high FPG alleles showed a mean 0.36 mmol/l increase in fasting plasma glucose compared to individuals with zero or one high fasting plasma glucose allele. So far, 18 loci for type 2 diabetes (Table 33.2) and four loci for fasting glucose have been identified through genome-wide association scans. In comparison to the candidate gene and genomewide linkage approaches, the genome-wide association studies have been extremely successful for both type 2 diabetes and BMI.
4 Genetic Prediction of Obesity and Diabetes There is growing interest in the potential for the increasing numbers of genetic susceptibility variants to contribute towards individualized medical care. However, at this stage, the prospects for individual prediction seem limited. In a recent study (Willer et al, 2009), the predictive value of eight validated obesity susceptibility loci (TMEM18, KCTD15, SH2B1, MTCH2, NEGR1, GNPDA2, FTO and MC4R) on obesity risk was examined in 14,409 men and women of the population-based EPICNorfolk cohort. A genetic predisposition score for each individual, summing the number of BMI-increasing alleles, was calculated. The average BMI of individuals carrying 13 or more risk alleles ( 0. If we then evaluated whether X2 mediated the effect of X1 on X3 using the Sobel test, α1 = ρ and β2 = ρ/(1 + ρ) and thus, since ρ > 0, α1 β2 > 0. A large enough sample size with a valid test would document that fact. However, the very same proof shows that X1 mediates the effect of X2 on X3 , or that X3 mediates the effect of X1 on X2 , or any other of the six permutations of these three variables to the roles of T, M, and O. In fact, with any three correlated variables, even when the three pair-wise correlations are not equal, one can often use the Sobel test to document that any one mediates the effect of any other on the remaining third. It is to prevent this kind of ambiguity that the MacArthur approach explicitly requires that, before considering the possibility of mediation, it is clear from the design that T precedes M that precedes O in time. However, this is probably a minor problem. The Baron and Kenny model is often described using a diagram in which T is connected to M which is connected to O using single-pointed arrows in that order, strongly suggesting that the time precedence required explicitly in the MacArthur model is tacitly assumed in the Baron and Kenny model. The problem is only with users who have applied the moderator and mediator analyses using the B&K model with crosssectional data or other types of data in which temporal precedence is indeterminate. But now suppose that the temporal order condition is satisfied. Then the Sobel mediation test and other related tests (MacKinnon et al, 2002) require that one perform a linear regression analysis relating M to T, resulting in an estimate a of α 1 and its standard error s(a), then a linear regression analysis relating O to M and T,
H.C. Kraemer
resulting in an estimate b of β 2 and its standard error s(b). The test then computes a z-statistic, where, for example, the Sobel equation: z - value = ab/(b2 s(a)2 + a2 s(b)2 )1/2 . The statistical problem is that s(a) computed from a standard linear regression is not the true standard error of a but the standard error conditional on the T sample, as s(b) is the standard error of b conditional on the (T,M) sample. Their true standard errors are much larger and determined by the unknown joint distribution of (T,M). Conditional on the (T,M) sample, the distributions of a and b are normal, but the unconditional distributions are mixtures of normal distributions, not themselves normal distributions. The distribution of ab, the product of two variables, with unknown unconditional distributions is unlikely to be normal. All these problems have been realized in the poor performance of the Sobel test, resulting in the current recommendations to use bootstrap methods to obtain a confidence interval for α 1 β 2 and to reject the null hypothesis of absence of mediation if the value zero is not within that confidence interval. But then the problem is whether α 1 β 2 is an unambiguous indicator of mediation in the MacArthur Model. Figure 55.2 is an example, with binary T (T1 versus T2), ordinal O (with means O1 and O2 in the two groups), ordinal M (with means M1 and M2 in the two groups), in which all the βs equal 5 and α 1 equals 6. (Such results are general, but negative distances are hard to visualize.) Then β 0 is the value of the intercept of the T1 line at M=0. β 1 is the distance from that intercept to the intercept of the T1 line at M=0. Then α 1 is the distance between M1 and M2. Three auxiliary lines are drawn: two horizontal lines, one through the lower intercept at (0,β 0 ) and one through the point showing the response to T1 at (M1, O1), and a line through the upper intercept at (0, β 0 +β 1 ) parallel to the lower line (showing what the response would have been if the two lines were parallel). Then components of the difference between O2 and O1 can then be seen:
55
Moderators and Mediators
879
O 120 O2
T2 T1
100
β3M2
80 60
β1
40
β2M1
O1
20
α1
0 0
1
2
3 M1
β2α1
4
5
β0 6
7
8
9 M2
Fig. 55.2 Graphical display of the mediator effect. The two heavy lines represent the regression of O on M in groups T1 and T2. The means of M for T1 and T2 are M1 and M2; the difference between them is α 1 . The means of O in T1 and T2 are O1 and O2, and the difference is graphically dissected into its three components: the main effect β 1 , the Sobel mediation effect of α 1 β 2 , and the portion due to an interaction effect (β 2 M2)
• β 1 , the main effect of treatment, here equal to the difference in response had no posttreatment change or event occurred; • α 1 β 2 , the effect size used in the Sobel and other related tests; • β 3 M2, which in this illustration is the major portion of that difference, is due to the interaction effect that is assumed to be zero in the Baron and Kenny model. A remarkable fact now evident is that Sobel mediator effect size can be determined without knowing where the response line for T2 is located, a consequence of assuming that the response lines are parallel. That is not true for the MacArthur approach. If an interaction exists in the population but is assumed zero in the model, the two lines shown in Fig. 55.2 would be fitted by two different lines still going through the points (M1, O1) and (M2, O2), but with a common slope that is a weighted average of the two different actual slopes. Which weighted average depends on the size of the population interactions, the relative sizes of the T1 and T2 groups, and the size of the correlation between M and T. Then because the Sobel mediator effect is determined by that slope, what it actually means is uninterpretable.
In summary the Sobel test, using bootstrap estimation, is a clear indicator of mediation if and only if the linear model holds and there is no interaction in the population, in which case the MacArthur approach will give the same answer. However, the MacArthur approach gives the correct answer under the linear model whether or not the lines are parallel and can be used when the linear model is not appropriate. A similar comment probably to other tests and inferences derived under the B&K model – sometimes the two approaches will concur and sometimes not.
7 Conclusions The MacArthur approach is more than a redefinition of moderators and mediators, although that is essentially its core, but represents a way of thinking essential to understanding complex biobehavioral processes. As such, it remains very much a “work in progress.”
References Altman, D. G., Schulz, K. F., Hoher, D., Egger, M., Davidoff, F. et al (2001). The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med, 134, 663–694. Baron, R. M., and Kenny, D. A. (1986). The moderatormediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol, 51, 1173–1182. Begg, C., Cho, M., Eastwood, S., Horton, R., Moher, D. et al (1999). Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA, 276, 637–639. Essex, M. J., Kraemer, H. C., Armstong, J. M., Boyce, W. T., Goldsmith, H. H. et al (2006). Exploring risk factors for the emergence of children’s mental health problems. Arch Gen Psychiatry, 63, 1246–1256. Finney, D. J. (1994). On biometric language and its abuses. Biometric Bull, 11, 2–4. Jacobi, C., Hayward, C., deZwaan, M., Kraemer, H. C., and Agras, W. S. (2004). Coming to terms with risk factors for eating disorders: application of risk terminology and suggestions for a general taxonomy. Psychol Bull, 130, 19–65. Jaffee, S. R., Caspi, A., Moffitt, T. E., Dodge, K. A., Rutter, M. et al (2005). Nature X nurture: genetic
880 vulnerabilities interact with physical maltreatment to promote conduct problems. Devel Psychopathol, 17, 67–84. Jemal, A., Ward, E., and Thun, M. J. (2007). Recent trends in breast cancer incidence rates by age and tumor characteristics among U.S. women. Breast Cancer Res, 9, R28. King, A. C., Ahn, D. F., Atienza, A. A., and Kraemer, H. C. (2008). Exploring refinements in targeted behavioral medical intervention to advance public health. Ann Behav Med, 35, 251–260. Kraemer, H. C. (2008). Toward non-parametric and clinically meaningful moderators and mediators. Stat Med, 27, 1679–1692. Kraemer, H. C., and Blasey, C. (2004). Centring in regression analysis: a strategy to prevent errors in statistical inference. Int J Method Psychiat Res, 13, 141–151. Kraemer, H. C., Frank, E., and Kupfer, D. J. (2006). Moderators of treatment outcomes: clinical, research, and policy importance. JAMA, 296, 1286–1289. Kraemer, H. C., Kazdin, A. E., Offord, D. R., Kessler, R. C., Jensen, P. S. et al (1999). Measuring the potency of a risk factor for clinical or policy significance. Psychol Methods, 4, 257–271. Kraemer, H. C., Kiernan, M., Essex, M. J., and Kupfer, D. J. (2008). How and why criteria defining moderators and mediators differ between the Baron and Kenny and MacArthur Approaches. Health Psychol, 27, S101–S108. Kraemer, H. C., and Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. Biol Psychiatry, 59, 990–996. Kraemer, H. C., Lowe, K. K., and Kupfer, D. J. (2005). To Your Health: How to Understand What Research Tells us About Risk. Oxford: Oxford University Press. Kraemer, H. C., Stice, E., Kazdin, A., and Kupfer, D. (2001). How do risk factors work together to produce an outcome? Mediators, moderators, independent, overlapping and proxy risk factors. Am J Psychiatry, 158, 848–856.
H.C. Kraemer Kraemer, H. C., Wilson, G. T., Fairburn, C. G., and Agras, W. S. (2002). Mediators and moderators of treatment effects in randomized clinical trials. Arch Gen Psychiatry, 59, 877–883. MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., and Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychol Methods, 7, 83–104. Murphy, G. M. J., Hollander, S. B., Rodrigues, Kremer, C., and Schatzberg, A. F. (2004). Effects of the serotonic transporter gene promoter polymorphism on Mirtazapine and Paroxetine efficacy and adverse events in geriatric major depression. Arch Gen Psychiatry, 61, 1163–1169. Piaggio, G., Elbourne, D. R., Altman, D. G., Pocock, S. J., and Evans, S. J. W. (2006). Reporting of noninferiority and equivalence randomized trials: an extension of the CONSORT statement. JAMA, 295, 1152–1160. American Psychological Association (2001). Publication Manual of the American Psychological Association, 5th Ed. Washington, DC: American Psychological Association. Rennie, D. (1996). How to report randomized controlled trials: The CONSORT Statement. JAMA, 276, 649. Sobel, M. E. (Ed.). (1982). Asymptotic Intervals for Indirect Effects in Structural Equations Models. San Francisco, CA: Jossey-Bass. Stampfer, M. J., Willett, W. C., Colditz, G. A., Rosner, B., Speizer, F. E. et al (1985). A prospective study of postmenopausal estrogen therapy and coronary heart disease. New Engl J Med, 313, 1044–1049. The MTA Cooperative Group. (1999). Moderators and mediators of treatment response for children with attention-deficit/hyperactivity disorder. Arch Gen Psychiatry, 56, 1088–1096. Writing Group for the Women’s Health Initiative Investigators. (2002). Principal results from the Women’s Health Initiative randomized controlled trial. JAMA, 288, 321–333.
Chapter 56
Multilevel Modeling S.V. Subramanian
1 Introduction The term “multilevel” refers to the distinct levels or units of analysis, which usually, but not always, consist of individuals (at lower level) who are nested within contextual/aggregate units (at higher level). Indeed, individuals are organized within a nearly infinite number of levels of organization, from the individual up (for example, families, neighborhoods, counties, and states), from the individual down (for example, body organs, cellular matrices, and DNA), and for overlapping units (for example, area of residence and work environment). It is, therefore, necessary that links should be made between these different levels of analysis. Multilevel methods consist of statistical procedures that are pertinent when (i) the observations that are being analyzed are correlated or clustered or (ii) the causal processes are thought to operate simultaneously at more than one level and/or (iii) there is an intrinsic interest in describing the variability and heterogeneity in the phenomenon, over and above the focus on average (Diez Roux, 2002; Subramanian, 2004; Subramanian et al, 2003). Multilevel methods are specifically geared toward the statistical analysis of data that have
S.V. Subramanian () Department of Society, Human Development and Health, Harvard School of Public Health, 677 Huntington Avenue, Kresge Building, 7th Floor, Boston MA 02115, USA e-mail: [email protected]
a nested structure. The nesting, typically, but not always, is hierarchical. For instance, a twolevel structure would have many level-1 units nested within a smaller number of level-2 units. In educational research, the field that provided the impetus for multilevel methods, level-1 usually, consists of pupils who are nested within schools at level-2. Such structures arise routinely in health and social sciences, such that level-1 and level-2 units could be workers in organizations, patients in hospitals, and individuals in neighborhoods, respectively. In this chapter, for exemplification, we will consider the structure of individuals nested within neighborhoods (used to reflect one practical realization of place) (see Chapter 24). The existence of nested data structures is neither random nor ignorable; for instance, individuals differ but so do the neighborhoods. Differences among neighborhoods could either be directly due to the differences among individuals who live in them or groupings based on neighborhoods may arise for reasons less strongly associated with the characteristics of the individuals who live in them. Regardless, once such groupings are established, even if their establishment is random, they will tend to become differentiated. This would imply that the group (for example, neighborhoods) and its members (for example, individual residents) can exert influence on each other suggesting different sources of variation (for example, individual induced and neighborhood induced) in the outcome of interest and thus compelling analysts to consider covariates at the individual and at the neighborhood level. Ignoring this multilevel
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_56, © Springer Science+Business Media, LLC 2010
881
882
2 Multilevel Framework: A Necessity for Understanding Ecologic Effects Figure 56.1 identifies a typology of designs for data collection and analyses (Blakely and Woodward, 2000; Kawachi and Subramanian, 2006; Subramanian et al, 2007; Subramanian et al, 2009), where the rows indicate the level or the unit at which the outcome variable is being measured (that is, at the individual level (y) or at the ecological level (Y)) and the columns indicate whether the exposure is being measured at the individual level (x) or at the ecological
Exposure Individual (x)
Ecologic (X)
(measured at individual level)
(measured at ecological level)
Individual
Outcome
structure of variations not simply risks overlooking the importance of neighborhood effects, but has implications for statistical validity (Goldstein, 2003; Raudenbush and Bryk, 2002). Clustered data also arise as a result of sampling strategies. For instance, while planning large-scale survey data collection, for reasons of cost and efficiency, it is usual to adopt a multistage sampling design. A national population survey, for example, might involve a three-stage design, with regions sampled first, then neighborhoods, and then individuals. A design of this kind generates a three-level hierarchically clustered structure of individuals at level-1 nested within neighborhoods at level-2, which in turn are nested in regions at level-3. Individuals living in the same neighborhood can be expected to be more alike than they would be if the sample were truly random. Similar correlation can be expected for neighborhoods within a region. Much documentation exists on measuring this “design effect” and correcting for it. Indeed, clustered designs (for example, individuals at level-1 nested in neighborhoods at level-2 nested in regions at level-3) are often a nuisance in traditional analysis. However, individuals, neighborhoods, and regions can be seen as distinct structures that exist in the population that should be measured and modeled.
S.V. Subramanian
(y)
(y,x)
(y,X)
Traditional risk
Multilevel study
factor study
Ecologic (Y)
(Y,x)(A)
(Y,X) Ecological study
Fig. 56.1 Typology of studies (Subramanian et al, 2007) Note: (A) This type of study is impossible to specify as it stands. Practically speaking, it will either take the form of (y,x), that is, ecological study, where X will now simply be central tendency of x. Or, if dis-aggregation of Y is possible, so that we can observe y, then it will be equivalent to (y,x)
level (X). The ecological level, in this illustration, relates to the neighborhood level. Study type (y,x) is most commonly encountered when the researcher aims to link exposure to outcomes, with both being measured at the individual level. Study type (y,x) typically ignores ecological effects (either implicitly or explicitly). Conversely, study type (Y,X) – referred to as an “ecological study” – may seem intuitively appropriate for research where higher levels (for instance, neighborhoods, regions, states, and schools) are the targets of interest. However, study type (Y,X) conflates the genuinely ecological and the aggregate or “compositional” (Moon et al, 2005) and precludes the possibility of testing heterogeneous contextual effects on different types of individuals. Ecological effects reflect predictors and associated mechanisms operating primarily at the contextual level. The search for such measures and their scientific validation and assessment is an area of active research (Raudenbush, 2003). Aggregate effects, in contrast, equate the effect of a neighborhood with the sum of the individual effects associated with the people living within the neighborhood. In this situation the interpretative question becomes particularly relevant. If common membership of
56
Multilevel Modeling
a neighborhood by a set of individuals brings about an effect that is over and above those resulting from individual characteristics, then there may indeed be an ecological effect. Study type (y,X) provides a multilevel approach, that is, in which an ecological exposure is linked to an individual outcome. A more complete representation would be type (y,x,X) whereby we have an individual outcome (y), individual confounders (x), and neighborhood exposure (X) reflecting a multilevel structure of individuals nested within neighborhoods. A fundamental motivation for study type (y,x,X) is to distinguish “neighborhood differences” from “the difference a neighborhood makes” (Moon et al, 2005). Stated differently, ecological effects on the individual outcome should be ascertained after individual factors that reflect the composition of the places (and may be potential confounders) have been controlled. Indeed, compositional explanations for ecological variations in health are common. It nonetheless makes intuitive sense to test for the possibility of ecological effects. Besides anticipating their impact on individual outcomes, compositional factors may vary by context. Thus, unless contextual variables are considered, their direct effects and any indirect mediation through compositional variables remain unidentified. Moreover, composition itself has an intrinsic ecologic dimension; the very fact that individual (compositional) factors may “explain” ecologic variations serves as a reminder that the real understanding of ecologic effects is likely to be complex. The multilevel framework with its simultaneous examination of the characteristics of the individuals at one level and the context or ecologies in which they are located at another level accordingly offers a comprehensive framework for understanding the ways in which places can affect people (contextual) and/or people can affect places (composition). It likewise allows for a more precise distinction between aggregative fallacy (which is the invalid transfer of results observed at an aggregate level to the individual level) and ecologic effects (which is the effect of aggregate ecologies on individual outcomes) (Subramanian et al, 2009).
883
3 A Typology of Multilevel Data Structures The idea of multilevel structure can be recast, with great advantage, to address a range of circumstances where one may anticipate clustering. Outcomes as well as their causal mechanisms are rarely stable and invariant over time, producing data structures that involve repeated measures, which can be considered a special case of multilevel clustered data structures. Consider the “repeated cross-sectional design” that can be structured in multilevel terms with neighborhoods at level-3, year/time at level-2, and individuals at level-1. In this example, level-2 represents repeated measurements on the neighborhoods (level-3) over time. Such a structure can be used to investigate what sorts of individuals and what sorts of neighborhoods have changed with respect to the outcome. Alternatively, there is the classic “longitudinal or panel design” in which the level-1 is the measurement occasion, level-2 is the individual, and level-3 is the neighborhood. This time, the individuals are repeatedly measured at different time intervals so that it becomes possible to model changing individual behaviors within a contextual setting of, say, neighborhoods. When different responses/outcomes are correlated, it could be seen as generating a “multivariate” multilevel data structure in which level-1 are sets of response variables measured on individuals at level-2 nested in neighborhoods at level-3. The “multivariate responses” could be, for instance, different aspects of, say, health behavior (for example, smoking and drinking). In addition, such responses could be a mixture of “quality” (do you smoke/do you drink) and “quantity” (how many/how much) producing “mixed multivariate responses.” The substantive benefit of this approach is that it is possible to assess whether different types of behavior and whether the qualitative and quantitative aspects of each behavior are related to individual characteristics in the same or different ways. Additionally, we can also ascertain whether neighborhoods that are high for one
884
behavior are also high for another and whether neighborhoods with high prevalence of smoking, for instance, are also high in terms of the number of cigarettes smoked. While the previous examples are strictly hierarchical, in that all level-1 units that form a level-2 grouping are always in the same group at any higher level, data structures could be nonhierarchical. For example, a model of health behavior (for instance, smoking) could be formulated with individuals at level-1 and both residential neighborhoods and workplaces at level-2 not nested but crossed and are also called as the “cross-classified structures.” Individuals are then seen as occupying more than one set of contexts, each of which may have an important influence. For instance, individuals in a particular workplace may come from different neighborhoods and individuals in a neighborhood may go to several worksites. A related structure occurs when an individual can be considered to belong simultaneously to several neighborhoods with the contributions of each neighborhood being weighted in relation to its distance (if the interest is spatial) from the individual. This generates a structure that is referred to as “multiple membership” data structures which are non-hierarchical in design. In summary, between some combinations of hierarchical structures, cross-classified nesting, and multiple memberships, a great deal of complexity that is imprinted either explicitly or implicitly in data can be incorporated via multilevel models.
S.V. Subramanian
because neighborhoods are treated as a population of units from which we have observed one random sample. This enables us to draw generalizations for a particular level (for example, neighborhoods) based on an observed sample of neighborhoods. Further, it is more efficient to model neighborhoods as a random variable given the (likely) large number of neighborhoods. On the other hand, gender, for instance, is not a level because it is not a sample out of all possible gender categories. Rather, it is an attribute of individuals. Thus, male or female in our gender example is a “fixed” discrete category of a variable with the specific categories only contributing to their respective means. They are not a random sample of gender categories from a population of gender groupings. Further, we would usually wish to ascribe a fixed effect to each gender, but not each neighborhood. Rather, we wish to model an ecologic attribute at the neighborhood level. It is possible to consider “levels” as “variables”. Thus, when neighborhoods are considered as a variable, they are typically reflective of a fixed classification. While this may be useful in certain circumstances, doing so robs the researcher of the ability to generalize to all neighborhoods and inferences are only possible for the specific neighborhoods observed in the sample.
5 Multilevel Analysis There are three constitutive components of multilevel analysis which are now discussed.
4 The Distinction Between Levels and Variables
5.1 Evaluating Sources of Variation: Compositional and/or Each of the levels that were discussed in the preContextual vious section (for example, neighborhoods) can be considered as variables in a regression equation with an indicator variable specified for each neighborhood. Conversely, why are many categorical variables such as gender, ethnicity/race, and social class not a level? Critical to treating neighborhoods, for example, as a level is
A fundamental application of multilevel methods is disentangling the different sources of variations in the outcome. Evidence for variations in poor health, for instance, between different neighborhoods can be due to factors that are
56
Multilevel Modeling
intrinsic to, and are measured at, the neighborhood level. In other words, the variation is due to what can be described as contextual or neighborhood effects. Alternatively, variations between neighborhoods may be compositional, that is, certain types of people who are more likely to be in poor health due to their individual characteristics happen to be clustered in certain neighborhoods. The issue, therefore, is not whether variations between different neighborhoods exist (they usually do), but what is the primary source of these variations. Put simply, are there significant contextual differences in health between neighborhoods, after taking into account the individual compositional characteristics of the neighborhood? The notions of contextual and compositional sources of variation have general relevance and they are applicable whether the context is administrative (for example, political boundaries), temporal (for example, different time periods), or institutional (for example, schools or hospitals).
5.2 Describing Contextual Heterogeneity Contextual differences may be complex such that they may not be the same for all types of people. Describing such contextual heterogeneity is another aspect of multilevel analysis and can have two interpretative dimensions. First, there may be a different amount of neighborhood variation, such that, for example, for high social class individuals it may not matter in which neighborhoods they live (thus a lower between-neighborhood variation) but it matters a great deal for the low social class and as such show a large between-neighborhood variation. Second, there may be a differential ordering: neighborhoods that are high for one group are low for the other and vice versa. Stated simply, the multilevel analytical question is whether the contextual neighborhood differences in poor health, after taking into account the individual
885
composition of the neighborhood, are different for different types of population groups.
5.3 Characterizing and Explaining the Contextual Variations Contextual differences, in addition to people’s characteristics, may also be influenced by the different characteristics of neighborhoods. Stated differently, individual differences may interact with context, and ascertaining the relative importance of individual and neighborhood covariates is another key aspect of a multilevel analysis. For example, over and above social class (individual characteristic), health may depend upon the poverty levels of the neighborhoods (neighborhood characteristic). The contextual effect of poverty can either be the same for both the high and the low social class suggesting that while neighborhood poverty explains the prevalence of poor health, it does not influence the social class inequalities in health. On the other hand, the contextual effects of poverty may be different for different groups, such that neighborhood poverty adversely affects the low social class, but does the opposite for the high social class. Thus, neighborhood level poverty may not only be related to average health achievements but also shapes social inequalities in health. The analytical question of interest is whether the effect of neighborhood level socioeconomic characteristics on health is different for different types of people? Presence of a multilevel data structure along with an interest in understanding contextual effects provides substantive as well as technical motivation to use multilevel statistical models (Goldstein, 2003; Raudenbush and Bryk, 2002). We shall not review the basic principles of multilevel modeling here as they have been described elsewhere in the context of health research (Blakely and Subramanian, 2006; Moon et al, 2005; Subramanian et al, 2003), but instead provide a brief overview of the type of models invoked for identifying ecological effects.
886
S.V. Subramanian
6 Specifying Multilevel Models Like all statistical regression equations, multilevel models have the same underlying function, which can be expressed as Response = Fixed/Average Parameters +(Random/Variance Parameters).
While in a conventional regression model the random part of the model is usually restricted to a single term (that are called as “error terms” or “residuals”), in the multilevel regression model the focus or innovation is on expanding the random part of a statistical model. In order to exemplify multilevel models we consider the following example. Suppose we are interested in studying the variation in body mass index (BMI), as a function of certain individual and neighborhood predictors. Let us assume that the researcher collected data on a sample of 50 neighborhoods and, for each of these neighborhoods, a random sample of individuals. We then have a two-level structure where the outcome is BMI y, for individual i in neighborhood j. We will restrict this exemplification to one individual-level predictor, poverty, x1ij , coded as 0 if not poor and 1 if poor, for every individual i in neighborhood j, and one neighborhood predictor, w1 j , a socioeconomic deprivation index in neighborhood j.
7 Variance Component or Random Intercepts Model Multilevel models operate by developing regression equations at each level of analysis. In the illustration considered here, models would have to be specified at two levels, level-1 and level-2. The model at level-1 can be formally expressed as yij = β0j + β1 x1ij + e0ij .
(56.1)
In this level-1 model, β0j (associated with a constant, x0ij , which is a set of 1s, and therefore, not written) is the mean BMI for the jth
neighborhood for the non-poor group; β 1 is the average differential in BMI associated with individual poverty status (x1ij ) across all neighborhoods. Meanwhile, e0ij is the individual or the level-1 residual term. To make this a genuine two-level model we let β0j become a random variable as β0j = β0 + u0j ,
(56.2)
where u0j is the random neighborhood-specific displacement associated with mean BMI (β 0 ) for the non-poor group. Since we do not allow, at this stage, the average BMI differential for the poor group (β 1 ) to vary across neighborhoods, u0j is assumed to be same for both groups. Eq. (56.2) is then the level-2 between-neighborhood model. It is worth emphasizing that the “neighborhood effect”, u0j , can be treated in one of two ways. One can estimate each neighborhood separately as a fixed effect (that is, treat them as a variable; with 50 neighborhoods there will be 49 additional parameters to be estimated). Such a strategy may be appropriate if the interest is in making inferences about just those sampled neighborhoods. On the other hand, if neighborhoods are treated as a (random) sample from a population of neighborhoods (which might include neighborhoods in future studies if one has complete population data), the target of inference is the variation between neighborhoods in general. Adopting this multilevel statistical approach makes u0j a random variable at level-2 in a two-level statistical model. Substituting the level-2 model (Eq. 56.2) into level-1 model (Eq. 56.1) and grouping them into fixed and random part components (the latter shown in brackets) yields the following combined, also referred to as random intercepts or variance components, model: yij = β0 + β1 x1ij + (u0j + e0ij ).
(56.3)
We have now expressed the response yij as the sum of a fixed part and a random part. Assuming a normal distribution with a 0 mean, we can
56
Multilevel Modeling
887
2 : the betweenestimate a variance at level-1 (σe0 individual within-neighborhood variation) and 2 : the between-neighborhood varialevel-2 (σu0 tion), both conditional on fixed poverty differences in BMI. It is the presence of more than one residual term (or the structure of the random part more generally) that distinguishes the multilevel model from the standard linear regression models or analysis of variance type analysis. The underlying random structure (variance–covariance) of the model specified in 2 ); Var[e ] ∼ Eq. (56.3) is Var[u0j ] ∼ N(0, σu0 0ij 2 N(0, σe0 ); and Cov[u0j , e0ij ] = 0. It is this aspect of the regression model that requires special estimation procedures in order to obtain satisfactory parameter estimates (Goldstein, 2003). The model specified in Eq. (56.3) with the above random structure is typically used to partition variation according to the different levels, with the variance in yij being the sum of 2 and σ 2 . This leads to a statistic known as σu0 e0 intraclass correlation, or intra-unit correlation, or more generally variance partitioning coefficient (Goldstein et al, 2002), representing the degree of similarity between two randomly chosen individuals within a neighborhood. This can be expressed as
ρ=
2 σu0 2 + σ2 σu0 e0
.
(56.4)
Note that Eq. (56.3) estimates a variance based on the observed sample of neighborhoods. While this is important to establish the overall importance of neighborhoods as a unit or level, another quantity of interest may pertain to estimating whether living in neighborhood j1 , as compared to neighborhood j3 , for example, predicts a different BMI conditional on compositional influences of covariates. Given Eq. (56.3), we can estimate for each level-2 a unit: ˆ ) ˆ uˆ 0j = E(u0j |Y, β,
(56.5)
The quantity uˆ 0j is referred to as “estimated” or “predicted” residuals, or using Bayesian ter-
minology, as “posterior” residual estimates, and is calculated as uˆ 0j = rj ×
2 σu0 2 + σ 2 /n σu0 e0 j
,
(56.6)
2 and σ 2 are as defined above, r is where σu0 j e0 the mean of the individual-level raw residuals for neighborhood j, and nj is the number of individuals within each neighborhood j. This formula for uˆ 0j uses the level-1 and level-2 variances and the number of people observed in neighborhood j to scale the observed level-2 residual (rj ). As the level-1 variance declines or the sample size increases, the scale factor approaches 1 and thus estimated uˆ 0j approaches rj . These neighborhood-level residuals are random variables that are seen to be coming from a distribution and whose parameter values quantify the variation among the higher level or neighborhood units (Goldstein, 2003). Another interpretation is that each uˆ 0j estimates neighborhood j’s departure from the expected mean outcome. This interpretation is premised on the assumption that each neighborhood belongs to a population of neighborhoods, and the distribution of the population provides information about plausible values for neighborhood j (Goldstein, 2003). For a neighborhood with only a few individuals, we can obtain more precise estimates by combining the population and neighborhood-specific observations than if we were to ignore the population membership assumption and use only the information from that neighborhood. When the estimated residuals at higher level units are of interest in their own right, we need to provide standard errors, interval estimates, and significance tests as well as point estimates for them (Goldstein, 2003).
8 Modeling Places: Fixed or Random? It is worth drawing parallels between the multilevel or a random-effects model Eq. (56.3) and
888
S.V. Subramanian
the conventional ordinary least squares or fixedeffects regression model. Consider the fixedeffects model, whereby the neighborhood effect is estimated by including a dummy for each neighborhood, as shown below: yij = β0 + βxij + βNj + (e0ij ),
(56.7)
where Nj is a vector of dummy variables for N − 1 neighborhoods. The key conceptual difference between the fixed and the random-effects approach to modeling neighborhoods is that while the fixed part coefficients are estimated separately, the random part differentials (u0j ) are conceptualized as coming from a distribution (Goldstein, 2003). This conceptualization results in three practical benefits (Jones and Bullen, 1994): (i) pooling information between neighborhoods, with all the information in the data being used in the combined estimation of the fixed and random part; in particular, the overall regression terms are based on the information for all neighborhoods; (ii) borrowing strength, whereby neighborhood-specific relations that are imprecisely estimated benefit from the information for other neighborhoods; and (iii) precision-weighted estimation, whereby unreliable neighborhood-specific fixed estimates are differentially down-weighted or shrunk toward the overall city-wide estimate. A reliably estimated withinneighborhood relation will be largely immune to this shrinkage. The random-effects and the fixed-effects estimates for each neighborhood, meanwhile, are related (Jones and Bullen, 1994). The neighborhood-specific random intercept (β0j ) in a multilevel model is a weighted combination of the specific neighborhood coefficient in a fixed∗ ) and the overall multilevel effects model (β0j intercept (β 0 ), in the following way: ∗ β0j = wj β0j + (1 − w)β0 ,
(56.8)
with the overall multilevel intercept being a weighted average of all the fixed intercepts: β0
∗ wj β0j
wj .
(56.9)
Each neighborhood weight is the ratio of the true between-neighborhood parameter variance to the total variance, which additionally includes sampling variance resulting from observing a sample from the neighborhood. Consequently, the weights represent the reliability or precision of the fixed terms: wj =
2 σuo , 2 υj2 + σuo
(56.10)
where the random sampling variance of the fixed parameter is υj2 =
σe2 , nj
(56.11)
with nj being the number of observations within each neighborhood. When there are genuine differences between the neighborhoods and the sample sizes within a neighborhood are large, the sampling variance will be small in comparison to the total variance. As a result, the associated weight will be close to 1, with the fixed neighborhood effect being reliably estimated, and the random effect neighborhood estimate will be close to the fixed neighborhood effect. As the sampling variance increases, however, the weight will be less than 1 and the multilevel estimate will increasingly be influenced by the overall intercept based on pooling across neighborhoods. Shrinkage estimates allow the data to determine an appropriate compromise between specific estimates for different neighborhoods and the overall fixed estimate that pools information across places over the entire sample (Jones and Bullen, 1994). Importantly, the fixed-effects approach to modeling neighborhood differences using crosssectional data is not a choice for a typical multilevel research question, where there is an intrinsic interest in an exposure measured at the
56
Multilevel Modeling
889
level of neighborhood such as the one specified in Eq. (56.3); in such instances, a multilevel modeling approach is a necessity. This is because the dummy variables associated with the neighborhoods (measuring the fixed effects of each neighborhood) and the neighborhood exposure is perfectly confounded and, as such, the latter is not identifiable (Fielding, 2004). Thus, the fixedeffects specification to understand neighborhood differences is unsuitable for the sort of complex questions which multilevel modeling can address.
9 Random Coefficient or Random Slopes Model We can expand the random structure in Eq. (56.3) by allowing the fixed effect of individual poverty (β 1 ) to randomly vary across neighborhoods in the following manner: yij = β0j + β1j x1ij + e0ij .
(56.12)
At level-2, there will now be two models: β0j = β0 + u0j ,
(56.13)
β1j = β1 + u1j .
(56.14)
Substituting the level-2 models in Eqs. (56.13) and 56.14 into the level-1 model in Eq. (56.12) gives yij = β0 + β1 x1ij + (u0j + u1j x1ij + e0ij ). (56.15) Across neighborhoods, the mean BMI for non-poor is β 0 , the mean BMI for the poor is β0 + β1 , and the mean “poverty-differential” is β 1 . The poverty differential is no longer constant across neighborhoods, but varies by the amount u1j around the mean β 1 . Such models are also referred to as random slopes or random coefficient models. These models have a
much more complex variance–covariance structure than before:
u0j Var u1j
σ2 ∼ N 0, u0 2 σu0u1 σu1
(56.16)
and 2 Var[e0ij ] ∼ N(0, σe0 ).
(56.17)
With this formulation, it is no longer straightforward to think in terms of a summary intraclass correlation statistic ρ as the level-2 variation is now a function of a individual predictor variable x1ij . In our exemplification when x1ij is a dummy variable, we will have two variances estimated at level-2: one for non-poor which is 2 σu0
(56.18)
and one for poor which is 2 2 2 σu0 + 2σuou1 x1ij + σu1 x1ij .
(56.19)
That is, level-2 variation will be a “quadratic” function of the individual predictor variable when x1ij is a continuous predictor. Thus the notion of “random intercepts and slopes”, while intuitive, is not entirely appropriate. Rather, what these models are really doing is modeling variance as some function (constant, quadratic, or linear) of a predictor variable (Subramanian et al, 2003). Building on the above perspective of modeling the variance–covariance function (as opposed to “random intercepts and slopes”), we can extend the concept to modeling variance function at level-1. It is extremely common to assume that the variance is “homoskedastic” in 2 ; Eq. (56.15)), the random part at level-1 (σe0 and indeed, researchers seldom report whether this assumption was tested or not. One strategy would be to model the different variances for poor and non-poor of the following form: yij = β0 + β1 x1ij + (u0j + u1j x1ij +e1ij x1ij + e2ij x2ij ),
(56.20)
890
S.V. Subramanian
where, x1ij = 0 for non-poor, 1 for poor, and the new variable x2ij = 1 for non-poor, 0 for poor, 2 giving the variance for poor and with σe1 2 σe2 giving the variance for non-poor, and Cov[e1ij , e2ij ] = 0. There are other parsimonious ways to model level-1 variation in the presence of a number of predictor variables (Goldstein, 2003; Subramanian et al, 2003). With this specification, we do not have an interpretation of the random level-1 coefficients as “random slopes” as we did at level-2. The level-1 parameters, 2 and σ 2 , describe the complexity of levelσe1 e1 1 variation, which is no longer homoskedastic (Goldstein, 2003). Anticipating and modeling heteroskedasticity or heterogeneity at the individual level may be important in multilevel analysis as there may be cross-level confounding – what may appear to be neighborhood heterogeneity (level-2) to be explained by some ecological variable could be due to a failure to take account of the between-individual (withinneighborhood) heterogeneity (level-1).
10 Modeling the Fixed Effect of a Neighborhood Predictor An attractive feature of multilevel models – one that is perhaps most commonly used in health and social science research – is their utility in modeling neighborhood and individual characteristics, and any interaction between them, simultaneously. We will consider the underlying level-2 model related to Eq. (56.20), which is exactly the same as specified in Eqs. (56.13) and (56.14), but now including a level-2 predictor: w1j , the deprivation index for neighborhood j: β0j = β0 + α1 w1j + u0j ,
(56.21)
β1j = β1 + α2 w1j + u1j .
(56.22)
Note that the separate specification of micro and macro models correctly recognizes that the contextual variables (w1j ) are predictors of
between-neighborhood differences. The extension of Eq. (56.20) will now be yij = β0 + β1 x1ij + α1 w1j + α2 w1j x1ij +(u0j + u1j x1ij + e1ij x1ij + e2ij x2ij ). (56.23) The combined formulation in Eq. (56.23) highlights an important feature, the presence of an interaction between a level-2 and level1 predictor (w1j · x1ij ), represented by the fixed parameter α 2 . Now, α 1 estimates the marginal change in BMI for a unit change in the neighborhood deprivation index for the non-poor and α 2 estimates the extent to which the marginal change in BMI for unit change in the neighborhood deprivation index is different for the poor. This multilevel statistical formulation allows cross-level effect modification or interaction between individual and neighborhood characteristics to be robustly specified and estimated. In summary, multilevel models are concerned with modeling both the average and the variation around the average, at different levels. To accomplish this they consist of two sets of parameters: those summarizing the average relationships(s) and those summarizing the variation around the average at both the level of individuals and neighborhoods. Models presented in the preceding section can be easily adapted to other structures with nesting of level-1 units within level-2 units. Additionally, these models can be extended to three or more levels. While the preceding discussion considered a single normally distributed response variable for illustration, multilevel models are capable of handling a wide range of responses. These include binary outcomes, proportions (as logit, log–log, and probit models), multiple categories (as ordered and unordered multinomial models), and counts (as Poisson and negative binomial distribution models). In essence, these models work by assuming a specific, “non-Gaussian” distribution for the random part at level-1, while maintaining the normality assumptions for random parts at higher levels. Consequently, the discussion presented in this entry focusing at the neighborhood level would continue to hold
56
Multilevel Modeling
regardless of the nature of the response variable, with some exceptions. For instance, determining intraclass correlation or partitioning variances across individual and neighborhood levels in complex non-linear multilevel logistic models is not straightforward (see elsewhere for details: (Browne et al, 2005; Goldstein et al, 2002)).
11 Exploiting the Flexibility of Multilevel Models to Incorporating “Realistic” Complexity Current implementations of multilevel models have generally failed to exploit the full capabilities of the analytical framework (Leyland, 2005; Moon et al, 2005; Subramanian, 2004). Much, if not all, of the current research linking neighborhoods and health is cross-sectional and assumes a hierarchical structure of individuals nested within neighborhoods. This simplistic scenario ignores, for instance, the possibility that an individual might move several times and as such reflect neighborhood effects drawn from several contexts or that other competing contexts (for example, schools, workplaces, and hospital settings) may simultaneously contribute to contextual effects. Figure 56.2 provides a visual illustration of one complex, but realistic multilevel structure for neighborhoods and health research, where time measurements (level-1)
Fig. 56.2 Multilevel structure of repeated measurements of individuals over time across neighborhoods with individuals having multiple membership to different
891
are nested within individuals (level-2) who are in turn nested within neighborhoods (level-3). Importantly, individuals are assigned different weights for the time spent in each neighborhood. For example, individual 25 moved from neighborhood 1 to neighborhood 25 during the time period t1–t2, spending 20% of her time in neighborhood 1 and 80% in her new neighborhood. This multiple membership design would allow control of changing context as well as changing composition. Such designs could be extended to incorporate memberships to additional contexts, such as workplaces or schools. It can also be extended to enable consideration of weighted effects of proximate contexts (Langford et al, 1998). So, for example, the geographic distribution of disease can be seen not only as a matter of composition and the immediate context in which an outcome occurs but also as a consequence of the impact of nearby contexts with nearer areas being more influential than more distant ones. This is also called spatial autocorrelation and forms an important area of spatial statistical research (Lawson, 2001). While such analyses require high-quality longitudinal and context-referenced data, models that incorporate such “realistic complexity” (Best et al, 1996) are likely to improve our understanding of true neighborhood effects. While the foregoing discussion provides a sound rationale to adopt a multilevel analytic approach for modeling ecologic effects, they obviously do not overcome the limitations intrinsic to any observational study design, single-level or multilevel.
neighborhoods across the time span Source: Subramanian (2004)
892
12 Summary The multilevel statistical approach – an approach that explicitly models the correlated nature of the data arising either due to sampling design or because populations are clustered – has a number of substantive and technical advantages. From a substantive perspective, it circumvents the problems associated with ecological fallacy (the invalid transfer of results observed at the ecological level to the individual level); individualistic fallacy (occurs by failing to take into account the ecology or context within which individual relationships happen); and atomistic fallacy (arises when associations between individual variables are used to make inferences on the association between the analogous variables at the group/ecological level). The issue common to the above fallacies is the failure to recognize the existence of unique relationships being observable at multiple levels and each being important in its own right. Specifically, one can think of an individual relationship (for example, individuals who are poor are more likely to have poor health); an ecological–contextual relationship (for example, places with a high proportion of poor individuals are more likely to have higher rates of poor health); and an individual–contextual relationship (for example, the greatest likelihood of being in poor health is found for poor individuals in places with a high proportion of poor people). Multilevel models explicitly recognize the level-contingent nature of relationships. From a technical perspective, the multilevel approach enables researchers to obtain statistically efficient estimates of fixed regression coefficients. Specifically, using the clustering information, multilevel models provide correct standard errors and thereby robust confidence intervals and significance tests. These generally will be more conservative than the traditional ones that are obtained simply by ignoring the presence of clustering. More broadly, multilevel models allow a more appropriate and realistic specification of complex variance structures at each level. Multilevel models are also precision
S.V. Subramanian
weighted and capitalize on the advantages that accrue as a result of “pooling” information from all the neighborhoods to make inferences about specific neighborhoods. While the advances in statistical research and computing have shown the potential of multilevel methods for health and social behavioral research there are issues to be considered while developing and interpreting multilevel applications. First, it is important to clearly motivate and conceptualize the choice of higher levels in a multilevel analysis. Second, establishing the relative importance of context and composition is probably more apparent than real and necessary caution must be exercised while conceptualizing and interpreting the compositional and contextual sources of variation. Third, it is important that the sample of neighborhoods belong to well-defined population of neighborhoods such that the sample shares exchangeable properties that are essential for robust inferences. Fourth, it is important to ensure adequate sample size at all levels of analysis. In general, if the research focus is essentially on neighborhoods then clearly the analysis requires more neighborhoods (as compared to more individuals within a neighborhood). Lastly, like all quantitative procedures, the ability of multilevel models to make causal inferences is limited and innovative strategies including randomized neighborhood-level research designs (via trials or natural experiments) in combination with multilevel analytical strategy may be required to convincingly demonstrate causal effects of social contexts such as neighborhoods. Acknowledgment S. V. Subramanian is supported by the National Institutes of Health Career Development Award (NHLBI 1 K25 HL081275).
References Best, N. G., Spiegelhalter, D. J., Thomas, A., and Brayne, C. E. G. (1996). Bayesian analysis of realistically complex models. J Roy Stat Soc A, 159, 232–342. Blakely, T., and Subramanian, S. V. (2006). Multilevel studies. In M. Oakes & J. Kaufman (Eds.),
56
Multilevel Modeling
Methods for Social Epidemiology (pp. 316–340). San Francisco: Jossey Bass. Blakely, T. A., and Woodward, A. J. (2000). Ecological effects in multi-level studies. J Epidemiol Commun Health, 54, 367–74. Browne, W. J., Subramanian, S. V., Jones, K., and Goldstein, H. (2005). Variance partitioning in multilevel logistic models that exhibit overdispersion. J Royal Stat Soc A, 168, 599–613. Diez Roux, A. V. (2002). A glossary for multilevel analysis. J Epidemiol Commun Health, 56, 588–594. Fielding, A. (2004). The role of the Hausman test and whether higher level effects should be treated as random or fixed. Multilevel Modeling Newsletter, 16, 3–9. Goldstein, H. (2003). Multilevel Statistical Models, 3rd Ed. London: Arnold. Goldstein, H., Browne, W. J., and Rasbash, J. (2002). Partitioning variation in multilevel models. Understanding Stat, 1, 223–232. Jones, K., and Bullen, N. (1994). Contextual models of urban house prices: a comparison of fixed- and random-coefficient models developed by expansion. Econ Geogr, 70, 252–272. Kawachi, I., and Subramanian, S. V. (2006). Measuring and modeling the social and geographic context of trauma: a multilevel modeling approach. J Trauma Stress, 19, 195–203. Langford, I. H., Bentham, G., and McDonald, A. L. (1998). Multilevel modelling of geographically aggregated health data: a case study on malignant melanoma mortality and UV exposure in the European Community. Stat Med, 17, 41–57. Lawson, A. B. (2001). Statistical Methods in Spatial Epidemiology, 2nd Ed. Chichester, UK: Wiley. Leyland, A. H. (2005). Assessing the impact of mobility on health: implications for life course epidemiology. J Epidemiol Commun Health, 59, 90–91. Moon, G., Subramanian, S. V., Jones, K., Duncan, C., and Twigg, L. (2005). Area-based studies and the evalua-
893 tion of multilevel influences on health outcomes. In A. Bowling & S. Ebrahim (Eds.), Handbook of Health Research Methods: Investigation, Measurement and Analysis (pp. 266–292). Berkshire, England: Open University Press. Raudenbush, S., and Bryk, A. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. Thousand Oaks: Sage Publications. Raudenbush, S. W. (2003). The quantitative assessment of neighborhood social environment. In I. Kawachi & L. F. Berkman (Eds.), Neighborhoods and Health. New York: Oxford University Press. Subramanian, S. V. (2004). Multilevel methods, theory and analysis. In N. Anderson (Ed.), Encyclopedia of Health and Behavior (pp. 602–608). Thousand Oaks, CA: Sage Publications. Subramanian, S. V., Jones, K., and Duncan, C. (2003). Multilevel methods for public health research. In I. Kawachi & L. Berkman (Eds.), Neighborhoods and Health (pp. 65–111). New York: Oxford Press. Subramanian, S. V. (2004). The relevance of multilevel statistical methods for identifying causal neighborhood effects. Soc Sci Med, 58, 1961–1967. Subramanian, S. V. (2004). The relevance of multilevel statistical models for identifying causal neighborhood effects. Soc Sci Med, 58, 1961–1967. Subramanian, S. V., Glymour, M. M., and Kawachi, I. (2007). Identifying causal ecologic effects on health: a methodologic assessment. In S. Galea (Ed.), Macrosocial Determinants of Population Health (pp. 301–331). New York: Springer Media. Subramanian, S. V., Jones, K., and Duncan, C. (2003). Multilevel methods for public health research. In I. Kawachi & L. F. Berkman (Eds.), Neighborhoods and Health (pp. 65–111). New York: Oxford University Press. Subramanian, S. V., Jones, K., Kaddour, A., and Krieger, N. (2009). Revisiting Robinson: the perils of individualistic and ecologic fallacy. Int J Epidemiol, 38, 342–360.
Chapter 57
Structural Equation Modeling in Behavioral Medicine Research Maria Magdalena Llabre
1 Introduction Structural equation modeling (SEM) is a broad data analytic framework that can subsume most of the analyses performed within the behavioral medicine research field. Over the past 30 years SEM has gone from a novel methodology to mainstream statistical analysis, expanding beyond general linear models to address nonlinear models, categorical outcomes, and multilevel models, to name a few. While SEM was initially considered to be most useful for testing causal models, the heuristic aspects of the framework go beyond causal hypothesis testing. In fact, the specification of the models, a necessary step in SEM, is a valuable tool in itself, helping researchers better understand their research questions and their data. Models are specified either through a system of structural equations or through a path diagram. The path diagram is a valuable way to visually describe how variables are expected to relate to one another within a specified research context, forcing the researcher to think critically about every variable relevant to the phenomenon under study. This chapter is designed to provide a brief overview of principles and current topics in SEM. The chapter strives for breadth rather than depth, but will give readable references
M.M. Llabre () Department of Psychology, University of Miami, P.O. Box 24-8185, Coral Gables, FL 33124, USA e-mail: [email protected]
for those interested in delving deeper into any one topic. I begin with an introduction to the framework and provide a general overview of relevant aspects of SEM. I present some hypothetical examples and also provide a few references to published examples. Readers interested in a more technical introduction may want to read Bollen (1989) or the more recent work by Kaplan (2000). Those who want a very readable conceptual introduction are referred to Kline (2010). New developments have been described in many places, including some edited books such as Marcoulides and Schumaker (2001) and Hancock and Mueller (2006).
2 Model Specification SEM is most useful when the data analysis is based on theory. That is partly because a necessary first step in any SEM analysis is the specification of the model to be tested. A model includes the observed variables to be analyzed, the constructs to be inferred, the unobserved but ever present errors or disturbances, and the ways in which these observed and unobserved variables are related to one another. All statistical analyses are based on some model. For example, when conducting an independent t-test we imply a simple linear model, which we do not bother to specify but which is Y = α + βX + e,
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_57, © Springer Science+Business Media, LLC 2010
895
896
where X is a 0, 1 dummy coded variable. The parameters in this model include α and β, the structural parameters, as well as the variances of X and e. α is the mean of the group coded 0 and β is the difference between the means of the two groups. The t-test is the test of the β, obtained by dividing its sample estimate by its standard error. Multiple regression analysis is based on a more general linear model which we sometimes present. As we move toward more complex models, there is a greater need to describe them. Because of the multivariate nature of SEM, the models can have multiple equations, and each must be specified.
2.1 Notation While this chapter will keep formulas and notation to a minimum, some basic definitions are necessary. We will use X or Y to indicate observed variables, also called indicators. X will be used for exogenous variables, those variables whose causes are not part of our model, and Y for endogenous variables, the variables whose causes are posited by our model. These are the variables we measure, the operational definitions of our constructs. In SEM we can go beyond observed variables and work with latent variables, the constructs of interest to us. A latent variable, like a construct, cannot be observed directly, but is inferred from the covariances among its indicators. For example, we cannot observe depression directly, but diagnose it from responses to a structured interview that cover affective, cognitive, behavioral, and somatic reported symptoms (APA, 1994). We will use F to denote a latent variable, also referred to as a factor. Errors will be identified with the symbols E or D. The distinction lies in the use of D, a disturbance, to indicate the residuals associated with unmeasured variables in a prediction equation not part of a measurement model, while E will be restricted to those equations that relate latent variables to their indicators in a measurement model.
M.M. Llabre
Many find it easier to specify the model with a path diagram rather than with a system of equations. But for many computer programs (AMOS is an exception; Arbuckle, 2003), the diagram must be converted to model equations. We will first consider model specification with a diagram which we will then convert to a set of equations.
2.2 Path Diagram Let us consider an example of a hypothetical overly simplistic model in behavioral medicine research. Many of us believe that stress influences health because stress activates the sympathetic nervous system (SNS) as well as the immune system, two important pathways linking the mind and the body. Suppose we have collected a measure of perceived stress from a sample of participants on three separate occasions during a 1-month period, as well as physiological measures of stress reactivity in the form of blood pressure reactivity and C-reactive protein (CRP) levels assayed from blood. In this population the relevant indicators of health might be a rating of fatigue, the number of visits to the doctor, and a general rating of health, all in reference to the previous month. At this point we will focus on a few key features in the example to describe the elements of a path diagram. These elements are ovals, rectangles, and arrows. The arrows are either single headed or double headed. Singleheaded arrows indicate direct effects called path coefficients (β’s). Double-headed arrows indicate covariance which, when both arrows point to the same variable, is a variance. In Fig. 57.1 we see that stress and health are shown in ovals because they are latent variables, not directly observed. The observed variables are shown in rectangles. Errors and disturbances are also shown in ovals because they, also, cannot be directly observed. The direct effects of errors and disturbances on the variables are always fixed to equal 1. This is necessary to be able to identify the models. Stress is the underlying latent variable manifested in the observed measures of perceived stress. This relationship is depicted by
57
Structural Equation Modeling in Behavioral Medicine Research
E1
1 Perceived
D1
1 Perceived
stress2
Fatigue
1
stress1
BP Reactivity
E2
897
Stress
1
E4
D3 1
Health
Health rating
1
E5
CRP 1
E3
1 Perceived
stress3
Number 1 E6 of visits
D2
Fig. 57.1 Path diagram of mediators of stress and health with latent variables
an arrow stemming from the stress latent variable and pointing to each of the three observed measures of perceived stress. Similarly, health is an underlying construct that determines the observed measures of fatigue, number of doctor’s visits, and general rating of health. These relationships between the latent variable and its indicators represent the measurement model aspect of the SEM. Each of the indicators is measured with some error; they are not perfectly valid and/or reliable. The measurement error in each indicator is captured by the ovals with the E labels in each. These measurement errors contain other influences on the measures of either stress or health that may be specific to how or when the measures were taken. The path diagram also indicates the relation between stress and health is not direct, but rather mediated (see Chapter 55) by two different pathways: blood pressure reactivity and CRP. This part of the model represents the structural aspect of SEM. The mediator variables are measured with a single indicator each, which is a weakness in this model. The assumption is that they have been measured with perfect reliability (an assumption we generally make in all analyses that do not incorporate a measurement model). Reactivity, CRP, and the latent variable of health are all endogenous. In other words, the model specifies its predictors, but this prediction
is not perfect; thus a disturbance term is specified for each. The arrows in the path diagram are model parameters to be estimated. We can now translate our diagram into a set of structural equations. We begin with the equations for the measurement model with one equation for each indicator as shown below. For simplicity, we initially work with centered variables and the equations will have intercepts of zero: X1 = λ1F1 + E1, X2 = λ2F1 + E2, X3 = λ3F1 + E3, X4 = λ4F2 + E4, X5 = λ5F2 + E5, X6 = λ6F2 + E6.
(57.1)
In terms of the structural aspect of the model we have three equations corresponding to the three endogenous variables: Y1 = β1F1 + D1, Y2 = β2F1 + D2, F2 = β3Y1 + β4Y2 + D3.
Note that the path diagram does not specify double-headed curved arrows among residuals or between residuals and other predictors, thus assuming independence of errors.
898
The purpose of the analysis of this model will be twofold. First, we wish to test whether the specified model fits the data well. And simultaneously, we want to estimate the parameters and test them for significance. In particular we wish to estimate the direct effects from stress to the indicators of SNS and CRP and, in turn, the extent to which those indicators predict health. Once we have estimated direct effects, we will also want to quantify the indirect effect for a test of mediation.
M.M. Llabre
third structural equation (1): Y3 = β3F1 + D3. If we know the values of β3, the variance of F1, and the variance of D3, we could calculate the variance of Y3, as implied by the equation because Var (Y3) = β32 × Var(F1) + Var(D3), using covariance algebra. This will hold for other variances and covariances as well. We can then compare the model-implied variance–covariance matrix to the one generated from the data, which we will call Σ, to test the hypothesis that the model fits the data: H0 : = ().
3 Parameter Estimation and Model Fit The popularity of SEM results from advances in methods of estimation of the structural equation parameters. Karl Joreskog (see Cudeck et al, 2001, for references to his work and new areas of development) and the LISREL program (Joreskog and Sorbom, 1996) made this framework accessible to applied researchers. In addition to LISREL other software programs currently available to conduct these analyses include EQS (Bentler, 1995), AMOS (Arbuckle, 2003), Mx (Neal et al, 2002), and Mplus (Muthen and Muthen, 1998–2004). For a description of these and other programs, see Kline (2010). The most common method of estimation used in these programs is maximum likelihood (ML), performed iteratively to arrive at an admissible solution. The idea is as follows: Once a model is specified, such as that in Fig. 57.1, one can make initial guesses at the values of the parameter estimates. In fact, if we had superpowers and knew the population values for those parameters, given our model, we could work backward and tell the values of the variances and covariances of our variables. These would form the model-implied variance–covariance matrix which we will label Σ(). For our data with eight indicators it would be an 8 × 8 variance–covariance matrix with variances along the diagonal and covariances in the off-diagonals. For illustration, let us take the
One minor problem is that we do not know the values of the population parameters, but have to estimate them from the data. The algorithms used by the computer programs perform the parameter estimation and test of model fit simultaneously and iteratively. They begin with starting values, guesses about the values of these parameters, which are used to generate a modelimplied matrix. The model-implied matrix is compared to the data-based matrix for a calculation of a residual matrix. If the residuals are large, model parameters are modified in an attempt at minimizing the residuals. In ML estimation a fit function is used such that the parameter values estimated have the greatest likelihood of having given rise to the sample values obtained, assuming a multivariate normal distribution. For more detailed explanations of this and other methods of estimation, see Bollen (1989). The typical output from a computer analysis will have indices of model fit, as well as the parameter estimates, their standard errors, and z-values used to test them for statistical significance. The primary statistic used in testing the hypothesis of perfect fit is a χ 2 obtained by multiplying N–1 times the minimum value of the ML fit function. A nonsignificant χ 2 is indicative of good model fit, but may be difficult to obtain with large samples because of its direct dependence on sample size. With large samples, even small differences between the two matrices may be picked up as indicative of lack of fit. Several other indices have been developed and
57
Structural Equation Modeling in Behavioral Medicine Research
proposed as either alternatives or companions to the χ 2 . The ones that have been recommended and are included in current versions of computer programs are the comparative fit index (CFI; Bentler, 1990), the root mean squared error of approximation (RMSEA; Steiger, 1990), and the standardized root mean residual (SRMR). As the name indicates, CFI compares the fit of the specified model to a null model which posits no relationships among the variables. Models with values of CFI greater than 0.95 are desirable (Hu and Bentler, 1999). The RMSEA is based on a non-central χ 2 distribution, the distribution of the test statistic under the alternative hypothesis, and measures the degree of lack of fit of the model per degree of freedom. Values less than 0.06 are considered indicative of models with close fit to the data. A value of 0.08 for any given standardized residual is considered acceptable and so when the average value across all residuals (SRMR) is less than 0.10, the model is considered acceptable. Beyond overall measures of fit, it is important to make sure that parameter estimates make sense in relation to the problem being investigated. For example, variances should all be positive and the signs of effects in the expected direction. Examples of SEM are not as common in the behavioral medicine literature as they are in the social sciences. One recent example is work by Bleil et al (2008) who tested a model of cardiac autonomic function, measured by high frequency heart rate variability, predicted by negative emotions. Another example is the work by Weaver et al (2005) who tested a stress and coping model of medication adherence and its relation to viral load in HIV-positive individuals, including a test of an alternative model. Testing of alternative models is uncommon but necessary to strengthen the causal inferences often associated with SEM. Sometimes researchers improperly assume that models that fit the data represent reality, without recognizing there are always multiple alternative models that fit just as well. Models can be rejected but not proven. This is not to suggest that causal inference can never be entertained, but rather to remind readers
899
of the importance of considering design features such as the use of longitudinal data, instrumental variables, randomization, experimentation, or inclusion of other variables that help dissect confounded relations. Alternative models that are nested can be compared statistically. Nested models are models that have the same variables but where the parameters estimated in one model represent a subset of the parameters estimated in the more general model. When models are nested, the difference between their χ 2 is also a χ 2 and, therefore, the difference can be tested for statistical significance, with degrees of freedom calculated from the difference in degrees of freedom between the two models. When the result of the difference test is statistically significant, the more general model with more parameters estimated should be retained. On the other hand, when the result is not significant, the more parsimonious model should be retained. Often, alternative models are not nested and the χ 2 difference test cannot be used. In that case, there are alternative descriptive indices, such as the Akaike Information Criterion (Akaike, 1974) or the Bayesian Information Criterion (Schwarz, 1978), that take both fit and parsimony into consideration. For both lower indices are associated with preferred models.
3.1 Path Analysis If instead of latent variables for measuring stress and health in the path diagram of Fig. 57.1 we had a single indicator for each one, the resulting model would be a path analysis model shown in Fig. 57.2. Path analysis is a special case of SEM where all variables are observed (except for the errors) and assumed to be perfectly reliable. To the extent the reliability assumption is true, the path coefficients from the path model versus the structural model should be the same. But when reliability is not perfect, and the indicators contain measurement error, the path coefficients from the path model will be biased, typically underestimated.
900
M.M. Llabre
D1 1
BP Reactivity Perceived stress2
1
D3
Health rating Cortisol 1
D2
Fig. 57.2 Path diagram of mediators of stress and health with single indicators
Path analysis models have the direction of effects going one way only. These are called recursive models. Recursive models are all identified, meaning solutions may be obtained for all of the parameters in the model. When effects go both ways, say X to Y and also Y to X, the models are no longer path analysis models. These models are called nonrecursive. Under certain conditions nonrecursive models can be identified and analyzed in SEM. While the issue of identification will not be covered in this chapter, it is worth pointing out that a necessary condition for identification of any model is that the number of parameters to be estimated be less than or equal to the number of variances and covariances in the data. This will always be true for path analysis models. If our model has, for example, p = 4 variables, there will be p × (p + 1)/2 or 4 × (4 + 1)/2 = 10 unique variances and covariances. This is the information used for model estimation. In our model in Fig. 57.2 there are q = 8 parameters to be estimated. The difference between p and q is the degrees of freedom (df). When df is 0, the model is just identified and cannot be tested for model fit because it fits the data perfectly, meaning the model-implied variance–covariance matrix perfectly matches the one obtained from the data. These models are also called saturated models. In this case the focus is not on model fit but
rather on estimation of model parameters, their test of significance, and the explanatory power of the model. When df is greater than 0, as is the case in our example, the model is overidentified and model fit can be tested. The 2 df (10 – 8) imply that there are two different ways in which our model could be incorrect. In our example, they come from the fact that we specified complete mediation between stress and health, i.e., there is no direct effect linking those two variables (1 df). Also, the two mediators are not specified to be correlated in this model beyond what results from sharing the common predictor of stress (1 df). There is neither a single-headed nor a double-headed arrow linking those two. If either one of those conditions is true our model will be rejected.
3.2 Model Parameters It is worth taking time to consider model parameters in the structural aspect of SEM. The parameters of primary interest are the path coefficients, the direct effects among the variables. Also counted as parameters are the variances of the disturbances and any covariances they may share. These covariances represent shared variance among endogenous variables that are external to the model. If converted to correlations, these would be partial correlations, controlling for the explanatory variables in the model. The variances and covariances among the exogenous variables are also counted as model parameters.
4 Measurement Model One of the key features of SEM that distinguishes it from other general linear models is the ability to incorporate a measurement model for the constructs of interest while simultaneously analyzing their interrelations. Often when we work with observed variables we forget they are frequently measured with some amount of error. Classical test theory (Crocker and Algina,
57
Structural Equation Modeling in Behavioral Medicine Research
1986) reminds us that an observed score is made up of two components, a true score and an error score as depicted below: X = T + E. When we assess the reliability of a measurement procedure, we are estimating the proportion of variance in an observed score that is true score or that is not error. This can be accomplished in several ways, most of which assume parallel measurement. Parallel measures are measures of the same construct with the same metric (units of measurement) and equal error variances. I purposely chose to include the same measure of perceived stress taken at three different times to assess the stress construct in Fig. 57.1 to make this point. These three measures are parallel if the path coefficients (λ, factor loadings) from the stress latent variable to the three indicators all equal 1, and if the three error variances are equal to each other. In that case the retest reliability of the stress measure may be estimated by taking the variance of the stress latent variable and dividing it by the variance of the stress latent variable plus the error variance. This proportion of variance in the total that is explained by the latent variable is reported in most SEM programs: Reliability = Var(F1)/[Var(F1) + Var(E)]. The stress latent variable is defined by the shared variance among the three parallel measures and does not contain random measurement error. The error variance is separate because under the assumption of independence, it does not contribute to the shared variance. Thus, when we estimate the effect of stress on, say blood pressure reactivity, the stress latent variable does not introduce random noise to that estimation. It follows that anytime a researcher can work with a latent variable based on multiple indicators, she/he has the advantage of eliminating a source of error and bias. One way to improve upon our model in Fig. 57.1 would be to add multiple measures of blood pressure reactivity as well as multiple measures of CRP.
901
In SEM this measurement model is not restricted to multiple parallel measures. The SEM measurement model is more general. It is called a congeneric model. In a congeneric model the indicators are assumed to reflect a unidimensional latent variable. But they do not need to have the same metric nor equal error variances. The measurement model of health represents this type of congeneric model. Its three indicators reflect different aspects of health (or lack thereof) quantified in different ways with varying amounts of measurement error. The health latent variable still represents the shared variance among the three indicators. Because this latent variable is not an observed quantity it does not have a metric of its own. Unlike the situation with parallel measurement where the metric is constant across the indicators and is applied to the latent variable, in a congeneric model the researcher decides which of the indicators will contribute its metric in order to identify the latent variable. In the absence of either a gold standard or a most reliable indicator, this decision is arbitrary. But although it will influence the parameter estimates that are metric based, in most situations it does not influence model fit. This metric assignment is done by fixing the loading for the selected indicator to a value of 1, instead of estimating it. As is the interpretation of a regression coefficient, the value 1 implies a change of 1 unit in X4 for every change of 1 unit in F2, a one-to-one correspondence:
X4 = 1F2 + E4. The loadings for the other indicators are freely estimated. An alternative way to assign a metric to a latent variable is to standardize it by giving the latent variable a variance of 1. SEM measurement models are also more common in psychology than in behavioral medicine. Shen and colleagues (2006) used a second-order model to examine the structure of the metabolic syndrome. My colleagues and I (Llabre et al, 2006) also used a second-order model to separate estimates of reliability and validity in measures of medication adherence.
902
M.M. Llabre
In strictly measurement models, the parameters to be estimated are the factor loadings (except for the indicator which sets the metric), the measurement error variances, the variances of the latent variables, and the covariances among latent variables when there are more than one. Sometimes, some error covariances are also estimated. This happens when some of the measures used to reflect a latent variable share method variance.
always correlated. Individuals who experienced the death of a parent are not necessarily those who recently change jobs or whose spouse was incarcerated. In this case there is no underlying latent variable that generates the indicators; instead, the indicators define the latent variable. This type of measurement model is depicted by arrows pointing from the indicators to the latent variable and is called formative measurement. Formative measurement models are more difficult to work with because they can only be estimated when the latent variable is used to predict some subsequent outcome, and different outcomes can alter the meaning of the latent variable (see Bollen and Lennox, 1991; Howell et al, 2007).
4.2 Formative Indicators
5 Mean Structures
The measurement model previously described assumes that the latent variable is responsible for the indicators; thus the arrows flow from the latent variable to its indicators. In this case the indicators are “reflective” because they represent a reflection of the underlying latent variable. In this type of model one expects for the correlations among the indicators to be moderate to high, as they share an underlying cause. What is important is their commonality and not their uniqueness. In fact, their uniqueness is separated into the error term. The loss of any one indicator, when there are many, is not catastrophic, as they are considered interchangeable. This type of measurement model does not fit all situations encountered in behavioral medicine research. For example, there has been emerging interest in quantifying childhood socioeconomic status (SES), as it has been shown to find its way “under the skin” (Miller et al, 2009). SES is commonly defined in terms of measures of parent education, income, and occupation, but each of these is considered a critical piece of the construct. They in combination define SES. Similarly, checklists of life events are used to define what is considered life stress. These items are not interchangeable, and they are not
So far we have been concerned with models that focus on relationships among variables and have made the assumption that our variables were centered, meaning we have subtracted the mean from the original variable to simplify our equations and eliminate the intercept. This is appropriate when analyzing covariances or correlations which are invariant when adding or subtracting constants. However, centering can be limiting because there are many important research questions in behavioral medicine that require retaining information about the means. For instance, in randomized clinical trials the focus is often a comparison of two group means. Questions related to health disparities often require testing hypotheses about group means. Longitudinal studies of health outcomes also often focus on changes in mean levels over time. Therefore, if the SEM latent variable framework is to be useful in those situations, we need to consider means. When we work with mean structures, our equations include intercepts. You may recall when learning about regression that the intercept was associated with a vector of 1’s in the data. This is because when we regress a variable, say Y, on the constant 1, the regression coefficient
These examples show that measurement models are relevant to behavioral medicine.
4.1 Measurement Model Parameters
57
Structural Equation Modeling in Behavioral Medicine Research
is the mean of Y. Also, if we regress a variable Y on a constant 1 and another predictor, say X, the regression coefficient for the constant is the intercept. These concepts apply in SEM. In a path diagram we indicate the inclusion of means and/or intercepts by specifying a triangle with the number 1 in it. This triangle represents a constant which will be useful in estimating means and intercepts. The arrows going from the triangle to a variable represent either a mean or an intercept, depending on whether there are other predictors also going to the variable. As I mentioned earlier, mean structures are useful when comparing multiple groups or examining change in a group over time. In a subsequent section, I will introduce latent growth models, which are relevant for studying change over time.
6 Multiple Groups SEM can be applied to a single group or multiple groups. When analyzing multiple groups all parameters in a model (i.e., means, variances, and path coefficients) may be compared between groups. The general approach is to compare two models: one where the parameter or parameters of interest are specified to be equal between groups, we say constrained equal, and another model which allows the parameters to vary between groups. Each model is associated with a χ 2 test of model fit. Importantly, these models are nested; the model with constrained parameters is nested within the more general model. In this fashion, groups can be compared on many different dimensions without restrictive assumptions. For instance, recall how in the analysis of variance when we compare group means we assume their variances are equal. With multiple group SEM we can compare means in the presence of unequal variances.
7 Latent Growth Model Latent growth models (LGMs) are special cases of SEM applied to longitudinal data where the
903
interest is in the estimation of parameters of change over time (Duncan et al, 1999). They are closely related to mixed models or multilevel models of longitudinal data (see Chou et al, 1998; MacCallum et al, 1996, for a comparison of approaches). The idea is that there is an underlying latent process of change responsible for the data, and the goal is to capture the parameters of the change process. Requirements for LGM are data from at least three time points with interval scale measurement using the same metric at all time points. The researcher specifies a general functional form for the growth over time. This function could be linear or nonlinear, but the form is limited by the number of time points available. With three time points we are restricted to a linear function with an intercept and a slope, a common form in many studies. Parameters of interest are the average intercept and slope, as well as the variability in individuals’ intercepts and slopes. LGM has been applied to the investigation of cardiovascular reactivity and recovery from stress (Llabre et al, 2001) where a piecewise function was used to model both reactivity and recovery separately, but simultaneously. Reactivity was modeled with a linear function, while the recovery curve was quadratic. In another application (Llabre et al, 2004), we illustrated how LGM could be used to compare cardiovascular recovery across stressors and across groups. For a brief example, let us examine a path diagram of an LGM from a hypothetical model of change in Beck Depression Inventory scores over a 6-month interval in cancer patients starting right after a diagnosis (Time 0) and repeated 3 (Time 3) and 6 months (Time 6) later, as shown in Fig. 57.3 with solid lines. The latent variables in this model are the characteristics of the hypothesized linear change process for the given time interval. Any line is characterized by an intercept, labeled Baseline, and a slope, labeled Change. What makes this measurement model an LGM is that the loadings linking the latent variables to the indicators are not estimated, but rather are used to specify the time structure of the data in months. This is conveyed also in the equations shown below
904
M.M. Llabre
Fig. 57.3 Latent growth model path diagram of linear change
e2
e1
BDI@0
e3
Observed variables
BDI@6
BDI@3
1
Errors
Time structure
3
1
6 1
Baseline
0 Change
Latent variables
1 BDI@0 = 1Baseline + t0Change + e1 BDI@3 = 1Baseline + t3Change + e2 BDI@6 = 1Baseline + t6Change + e3
the figure. The parameters of interest are the means of the Baseline and Change latent variables, indicated by the paths from the constant 1; the variances of the latent variables, quantifying individual differences in the trajectory of change; the covariance between the Baseline and Change; and the error variances which are often assumed to be equal across time. If we wanted to compare the change in depression between participants randomized to an intervention designed to reduce symptoms of depression and control participants, we could add a dummy coded (0, 1) indicator for this group classification with an arrow pointing to the Change latent variable (see dashed lines). The estimate of this added parameter represents the difference between the means of the two groups on the Change latent variable. In this revised model, the path from the constant to the Change latent variable is now the mean slope for the control group. LGM is very flexible and can be embedded in more complex SEM models. With a little imagination, you can envision that LGM variables can be used, not only as outcome variables but also as predictors of other outcomes. So, for example, one can investigate whether the change in depression might be associated with changes in inflammation or with other markers of disease. These models are ideally suited to test the types of mechanism hypotheses generated in many behavioral medicine laboratories.
Dummy
7.1 Latent Difference Scores A related set of models to examine change that is free of measurement error comes from work by McArdle and colleagues (Hamagami and McArdle, 2000; McArdle and Hamagami, 2000) and generally referred to as latent difference scores. In their framework, the change process is segmented into change scores, which take advantage of multiple repeated measurements in order to be able to separate it from the error. These latent difference scores can be influenced by an overall change process (Constant change), as assumed in LGM, and also by the preceding level of the variable (Proportional change). These models are particularly useful for studying reciprocal influences in multiple change processes. King and colleagues (2006) have made this methodology quite accessible to readers in their application to trauma recovery (King et al, 2006).
8 Missing Data When working in the SEM framework with ML it is possible to take advantage of its full information capabilities to include all of the available data. Often referred to as full information maximum likelihood (FIML), this approach has
57
Structural Equation Modeling in Behavioral Medicine Research
been shown to yield unbiased estimates of group parameters when missingness (whether data are missing or not) is related to variables that are accessible for analysis (Little and Rubin, 2002; Schafer and Graham, 2002). This condition or assumption, sometimes confusing because it is called missing at random (MAR), implies that once the variables that predict missingness are controlled in the analysis, the remaining mechanism responsible for the missingness is a random process. This approach is superior to older deletion or imputation approaches such as listwise or pairwise deletion, or mean, regression or hotdeck imputation, and comparable to multiple imputation (Collins et al, 2001). The older methods are less powerful and produce biased parameter estimates unless the missingness is the result of a completely random process, referred to as missing completely at random (MCAR). MCAR is a stricter assumption than MAR, particularly in longitudinal studies when attrition can often be predicted from variables collected at baseline, such as disease severity. For a clear explanation of this and other modern methods, see Enders (2006).
9 Sample Size and Power The appropriate sample size in an SEM analysis must be considered keeping in mind several issues including model complexity, estimation method, and statistical power. With samples of less than 100 participants, the models must be simple and the variables normally distributed, otherwise the researcher will likely find problems with convergence. As a general rule, more complex models or non-normal data will require more participants because more parameters will have to be estimated. Kline (2010) provides some rules for classifying studies into small (n200). But given all of the factors that bear on the question of sample size, these general rules may not be relevant for a given study. There are two power considerations in SEM: the power associated with the test of the overall
905
model and the power associated with the test of each parameter or set of parameters in the model. Power analyses can be performed in the design phase of a study to determine the appropriate sample size or after the analyses to determine whether the study was sufficiently powered for a given effect size. MacCallum et al (1996) provided a useful approach to power determination for the overall model based on the RMSEA. Hancock (2006) shows how to calculate power for individual parameters or sets of parameters. Muthén and Muthén (2002) illustrate the use of Monte Carlo simulation in power estimation in SEM.
10 Categorical Outcomes As stated earlier, ML estimation assumes continuous variables and multivariate normality. Often in behavioral medicine, variables of interest are dichotomous (have a disease or not), represent count variables, or have a preponderance of zeros. Various strategies are available for working with categorical data and non-normal distributions (Finney and DiStefano, 2006). These include various types of adjustments to the χ 2 test and the standard errors of the parameters, using robust weighted least squares methods, including mean and/or variance adjusted weighted least squares (WLSMV; Muthén, 1984), or bootstrapping the standard errors. An important consideration in determining the appropriate method is whether the underlying variable is truly continuous, but the measures available cannot make fine discriminations, as opposed to variables that are truly categorical. One available program, Mplus, is particularly useful for these situations. Using the Mplus framework, researchers can incorporate non-normal and/or categorical outcomes within more comprehensive SEM models and conduct discrete time survival analyses (Muthen and Masyn, 2005) while accounting for measurement error (Masyn, 2008). Advances in this area will make SEM more relevant to current problems in behavioral medicine.
906
11 Latent Class and Mixture and Multilevel Models Up to now we have been considering models that apply to single populations. However, it is possible to consider situations where our participants come from different subpopulations not previously identified and to use the SEM to identify these mixtures of populations. In this context the mixtures or subpopulations represent latent classes, identified with a categorical latent variable. In latent class analysis the latent classes are determined on the basis of the associations among categorical outcomes. But the latent classes could be subgroups that, for example, have different trajectories in LGM or have different factor structures in measurement models or have different path models (Muthén, 2001). When these subpopulations are known ahead of time, their models may be compared with multiple group SEM as described earlier. But it is when the classes are unknown that the researcher can employ these more exploratory methods and determine the number of classes (Nylund et al, 2007), the probability associated with belonging to a given class, and estimate the parameters within classes. Jung and Wickrama (2008) present a step-by-step illustration on a growth mixture model analysis; Lubke et al (2007) also provides an instructive application of this methodology. Given that the factors that affect health occur at multiple levels (for example, the cell, the person, the family, and the community) some of the models of health or disease will benefit from a multilevel data structure. Multilevel models are considered in a separate chapter. However, it is worth mentioning that multilevel models can be specified within an SEM framework (Heck, 2001; Mehta and Neale, 2005), with different influences specified at different levels.
12 Concluding Comments In this chapter I have attempted to provide an overview of the principles involved in using an SEM framework, as well as some introductory
M.M. Llabre
comments on more advanced topics. Adopting this framework when designing and analyzing data has several advantages, including control for measurement error, the examination and testing of mechanisms, and the quantification of change. SEM allows the researcher to match the analyses to the complex questions of interest in behavioral medicine research.
References Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans Automat Contr, 19, 716–723. American Psychiatric Association (1994). Diagnostic and statistical manual of mental disorders, 4th Ed (DSM-IV). Washington, DC: Author. Arbuckle, J. L. (2003). Amos 5. Chicago: SmallWaters. Schwarz, G. E. (1978). Estimating the dimension of a model. Ann. Statist, 6, 461–464. Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychol Bull, 107, 238–246. Bentler, P. M. (1995) EQS Structural Equations Program Manual. Encino, CA: Multivariate Software. Bleil, M. E., Gianaros, P. J., Jennings, J. R., Flory, J. D., and Manuck, S. B. (2008). Trait negative affect: toward an integrated model of understanding psychological risk for impairment in cardiac autonomic function. Psychosom Med, 70, 328–337. Bollen, K. (1989). Structural Equations with Latent Variables. New York: Wiley and sons. Bollen, K., and Lennox, R. (1991). Conventional wisdom on measurement: a structural equation perspective. Psychol Bull, 110, 305–314. Chou, C., Bentler, P. M., and Pentz, M. A. (1998). Comparisons of two statistical approaches to study growth curves: the multilevel model and the latent curve analysis. Struct Equation Model, 5, 247–266. Collins, L. M., Schafer, J. L., and Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods, 6, 330–351. Crocker, L., and Algina, J. (1986). Introduction to Classical and Modern Test Theory. New York: Holt, Rinehart and Winston. Cudeck, R., Du Toit, S., and Sorbom, D. (Eds.). (2001). Structural Equation Modeling: Present and Future. Chicago: Scientific Software International. Duncan, T. E., Duncan, S. C., Strycker, A. L., Li, F., and Alpert, A. (1999). An Introduction to Latent Variable Growth Curve Modeling: Concepts, Issues, and Applications. Mahwah, NJ: Erlbaum. Enders, C. K. (2006). Analyzing structural equation models with missing data. In G. R. Hancock & R. O. Mueller (Eds.), Structural Equation Modeling:
57
Structural Equation Modeling in Behavioral Medicine Research
A Second Course (pp. 313–344). Greenwich, CT: Information Age Publishing. Finney, S. J., and DiStefano, C. (2006). Nonnormal and categorical data in structural equation modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural Equation Modeling: A Second Course (pp. 269–312). Greenwich, CT: Information Age Publishing. Hamagami, F., and McArdle, J. J. (2000). Advanced studies of individual differences linear dynamic models for longitudinal data analysis. In G. Marcoulides & R. Schumaker (Eds.), New Developments and Techniques in Structural Equations Modeling (pp. 203–246). Mahwah, NJ: Erlbaum. Hancock, G. R. (2006). Power analysis in covariance structure modeling. In G. R. Hancock & R. O. Mueller (Eds.), Structural Equation Modeling: A Second Course (pp. 69–118). Greenwich, CT: Information Age Publishing. Howell, R. D., Breivik, E., and Wilcox, J. B. (2007). Reconsidering formative measurement. Psychol Methods, 12, 205–218. Plus comments and reply. Hancock, G. R., and Mueller, R. O. (2006) Structural Equation Modeling: A Second Course. Greenwich, CT: Information Age Publishing. Heck, R. H. (2001) Multilevel modeling with SEM. In G. A. Marcoulides & R. E. Schumaker (Eds.), New Developments and Techniques in Structural Equation Modeling (pp. 89–128). Mahwah, NJ: Erlbaum. Hu, L., and Bentler, P. M. (1999). Cutoff criteria for fit indices in covariance structure analysis: conventional criteria vs new alternatives. Struct Equ Modeling, 6, 1–55. Joreskog, K. G., and Sorbom, D. (1996). LISREL 8: User’s Reference Guide. Chicago: Scientific Software International. Jung, T., and Wickrama, K. A. S. (2008). An introduction to latent class growth analysis and growth mixture modeling. Soc Personal Psychol Compass, 2, 302–317. Kaplan, D. (2000). Structural Equation Modeling: Foundations And Extensions. Thousand Oaks: Sage. King, L. A., King, D. W., McArdle, J. J., Doron-LaMarca, S., and Orazem, R. J. (2006). Latent difference score approach to longitudinal trauma research. J Traumatic Stress, 19, 771–785. Kline, R. (2010). Principles and Practice of Structural Equation Modeling, 2nd Ed. New York: Guilford Press. Little, R. J., and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd Ed. Hoboken, NJ: Wiley. Llabre, M. M., Spitzer, S. B., Saab, P. G., and Schneiderman, N. (2001). Piecewise latent growth curve modeling of systolic blood pressure reactivity and recovery from the cold pressor test. Psychophysiology, 38, 951–960. Llabre, M. M., Spitzer, S., Siegel, S., Saab, P. G., and Schneiderman, N. (2004). Applying latent growth curve modeling to the investigation of individual differences in cardiovascular recovery from stress. Psychosom Med, 66, 29–41.
907
Llabre, M. M., Weaver, K., Duran, R., Antoni, M., McPhearson-Baker, S., and Schneiderman, N. (2006). A measurement model of medication adherence to highly active antiretroviral therapy and its relation to viral load in HIV+ adults. AIDS Patient Care STDS, 20, 701–711 Lubke, G., Muthen, B., Moilanen, I. K. et al (2007). Subtypes versus severity differences in attention-deficit/ hyperactivity disorder in the Northern Finnish birth cohort. J Acad Child Adolesc Psychiat, 46, 1584–1593. MacCallum, R. C., Browne, M. W., and Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychol Methods, 1, 130–149. Marcoulides, G. A., and Schumaker, R. E. (2001). New Developments and Techniques in Structural Equation Modeling. Mahwah, NJ: LEA. Masyn, K. E. (2008). Modeling measurement error in event occurrence for single, non-recurring events in discrete-time survival analysis. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in Latent Variable Mixture Models (pp. 105–145). Charlotte, NC: Information Age Publishing, Inc. McArdle, J. J., and Hamagami, F. (2000). Linear dynamic analysis of incomplete longitudinal data. In L. Collins & A. Sayer (Eds.), New Methods for the Analysis of Change (pp. 139–175). Wash DC: APA. Mehta, P., and Neale, M. (2005). People are variables too: Multilevel structural equations modeling. Psychol Methods, 10, 259–284. Miller, G. E., Chen, E., Fok, A. K., Walker, H., Lim, A. et al (2009). Low early-life social class leaves a biological residue manifested by decreased glucocorticoid and increased proinflammatory signaling. Proc Natl Acad Sci USA, 106, 14716–14721. Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika, 49, 115–132. Muthén, B. (2001). Second-generation structural equation modeling with a combination of categorical and continuous latent variables: new opportunities for latent class/latent growth modeling. In L. M. Collins & A. Sayer (Eds.), New Methods for the Analysis of Change (pp. 291–322). Washington, DC: APA. Muthen, B., and Masyn, K. (2005). Discrete time survival mixture analysis. J Educ Behav Stats, 30, 27–28. Muthen, L., and Muthen, B. (1998-2004). Mplus (version 5.1). Los Angeles: Muthen and Muthen. Muthén, L. K. and Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Struct Equ Modeling, 4, 599–620. Neal, M. C., Boker, S. M., Xie, G., and Maes, H. H. (2002). Mx: Statistical Modeling, 6th Ed. Richmond: Virginia Commonwealth University. Nylund, K. L., Asparouhov, T., and Muthen, B. (2007). Deciding on the number of classes in latent class
908 analysis and growth mixture modeling. A Monte Carlo simulation study. Struct Equ Modeling, 14, 535–569. Schafer, J. L., and Graham, J. W. (2002). Missing data: our view of the state of the art. Psychol Methods, 7, 147–177. Shen, B. J., Goldberg, R. B., Llabre, M. M., and Schneiderman, N. (2006). Is the factor structure of the metabolic syndrome comparable between men and women and across three ethnic groups: The
M.M. Llabre Miami Community Health Study. Ann Epidemiol, 16, 131–137. Steiger, J. H. (1990). Structural model evaluation and modification: an interval estimation approach. Multivar Behav Res, 25, 173–180. Weaver, K. E.., Llabre, M. M., Durán, R. E., Antoni, M. H., Ironson, G. et al (2005) A stress and coping model of medication adherence and viral load in HIV+ men and women on highly active antiretroviral therapy (HAART). Health Psychol, 24, 385–392.
Chapter 58
Meta-analysis Larry V. Hedges and Elizabeth Tipton
1 Introduction The research literature in behavioral medicine, like that in other areas of medicine, is experiencing dramatic growth. This expansion has made it essential that systematic reviews of research be conducted that can organize and synthesize findings. The fundamental statistical tool in systematic reviews of research is meta-analysis, which represents the results of research studies (such as clinical trials) by numerical indices of effect sizes and then summarizes these results across studies by using statistical procedures. There are many non-statistical aspects of carrying out systematic reviews in any area (see Cooper et al, 2009; Counsell, 1997; Meade and Richardson, 1997). However, this chapter provides an introduction to meta-analysis for clinical trials in behavioral medicine. For a more complete introduction to meta-analysis we recommend Borenstein et al (2009) and Cooper et al (2009).
2 Effect Sizes Effect sizes are numerical indices of study results. They are selected to represent the results of a study in a manner that will be comparable across studies. Depending on the design of the study, different effect sizes may be more natural,
L.V. Hedges () Department of Statistics, Northwestern University, 2046 Sheridan Road, Evanston, IL 60208, USA e-mail: [email protected]
and often more than one effect size index might be chosen. In this chapter we will focus on studies that will compare a treated group with a control group (as in most randomized controlled trials). When the outcome is measured as a discrete variable (e.g., alive or not), the natural effect sizes are the risk ratio or the odds ratio (although the risk difference is sometimes used as well). When the outcome is measured as a continuous variable (such as a cognitive test score or a subjective rating of pain), but not measured on exactly the same scale in every study, a natural measure of effect size is the standardized mean difference (sometimes called Cohen’s d). Finally, when both the independent variables and the outcome are continuous variables, a natural measure of the effect size is the Pearson correlation coefficient ρ. The effect sizes usually used in meta-analysis have the property that they are approximately normally distributed with standard errors that are largely a function of the sample size in the study and can be computed from analytical formulas. In this section we describe several effect size indices and show how to compute their sampling variances (the square of their standard errors). The (sample) effect size (estimates) and their variances are the basic inputs required from each study in the meta-analysis. All of the effect size estimates that we describe in this section have approximately normal sampling distributions. For each of these, we can therefore construct 95% confidence intervals of the following form: √ √ T − 1.96 v ≤ θ ≤ T + 1.96 v,
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_58, © Springer Science+Business Media, LLC 2010
909
910
L.V. Hedges and E. Tipton
where θ is the parameter of interest, T the √ sample estimate, and v the standard error or the estimate.
2.1 Studies Measuring Outcomes on a Binary Scale Suppose that each study measures the outcome on a binary scale (such as survived to 6 months or not), with one of those two outcomes selected as a target outcome. Let π T and π C be the underlying parameters describing the proportion of individuals in the treatment and control groups, respectively, that experience the target outcome. One might describe these proportions as the “risks” of the target outcome in the treatment and control groups. A treatment effect (and therefore effect size indices) can be defined in one of three ways. The simplest but least statistically satisfactory is the risk difference: = π T − π C. Although it is simple, the risk difference has the undesirable property that its range is limited by the baseline risk (the risk in the control group); for example, if π C = 0.05, the risk difference can be no smaller than –0.05, even if the treatment reduces the risk to 0. There are also technical shortcomings that suggest that the risk difference may not be the ideal index to use in summarizing effects across studies. An alternative representation of a treatment effect is the risk ratio: = π T /π C . The risk ratio is an intuitive index that is frequently used in epidemiological studies, which has advantages over the risk difference for summarizing estimates across studies. A third representation of a treatment effect is the odds ratio: πT 1 − πT πT 1 − πC . = C ω= π 1 − πT πC 1 − πC
Note that the odds (in the sense of betting on a horserace) of the target outcome in the treatment
group are π T /(1 − π T ) and the odds of the target outcome in the control group are π C /(1 − π C ), so the odds ratio ω is literally the ratio of the odds in the treatment group to that in the control group. When the prevalence π C (and therefore usually π T ) are small, then (1 − π C )/(1 − π T ) will be close to unity and the odds ratio will be close to the risk ratio. Of these three measures, the odds ratio is generally preferable for the statistical analysis. In addition to having superior mathematical properties, the odds ratio can be computed in both retrospective and prospective studies, while the risk ratio and risk difference can only be calculated in prospective studies. Additionally, some empirical investigations have found that the odds ratio is the more consistent estimate across studies. However, an important disadvantage of the odds ratio is that it is less intuitive and harder to interpret than the risk difference and the risk ratio. Consequently, it is sometimes useful to carry out a meta-analysis in the metric of odds ratios, and then convert the resulting measure back into a risk ratio or risk difference for interpretation. Such a conversion requires that a value of the prevalence (π C ) be assumed, which is often selected as a typical value that might be expected. The data from a study with binary outcomes can be summarized in a 2×2 table such as Table 58.1. In a prospective study such as randomized trial, any of the three effect sizes can be estimated from the data given in Table 58.1. The simplest estimates arise by substituting the sample proportions for the corresponding population parameters in the definitions of the effect size (that is substituting pT = a/(a + b) for π T and pC = c/(c + d) for π C .
Table 58.1 Generic data summary table from a study with binary outcome Outcome Treatment Control Total
Target
Non-target
Total
a c a+c
b d b+d
nT nC N
58
Meta-analysis
911
The estimate of the risk difference is D = pT − pC =
a c − , a+b c+d
and the variance of D is estimated by v= =
pC 1−pC C n
pT (1−pT ) nT
+
ab (a+b)3
cd . (c+d)3
+
The estimate of the risk ratio is r=
pT pC
=
v=
1 1 1 1 + + + . 1 1 1 a + /2 b + /2 c + /2 d + 1/2
2.2 Studies Measuring Outcomes on a Continuous Scale
a (c + d) . c (a + b)
Statistical analyses involving risk ratios (including meta-analyses) typically use the (natural) logarithm of the risk ratio, not the raw risk ratio. The variance of ln(r) is estimated by v=
the estimate ln(o) of the log-odds ratio nor its variance can be calculated. In this case, we usually add 1/2 to each cell of Table 58.1, so that the estimate of the odds ratio becomes [(a + 1/2)(d + 1/ )]/[(b + 1/ )(c + 1/ )] and the variance becomes 2 2 2
1 − pT 1 − pC b d + C C = + . T T n p a (a + b) c (c + d) n p
Note that if either pC = 0 or pT = 0 (that is if c = 0 or a = 0 in Table 58.1), neither the estimate ln(r) of the log-risk ratio nor its variance can be calculated. In this case, we usually add 1/2 to each cell of Table 58.1, so that the estimate of the risk ratio becomes [(a + 1/2)(c + d + 1)]/[(c + 1/2)(a + b + 1)] and the estimate of the variance of ln(r) becomes
Suppose that each study evaluates the effect of a treatment by comparing the mean of a group of treated individuals with the mean of a group of control individuals. If the outcome measurements are normally distributed within the treatment groups with equal variances, the natural analysis would involve a t-test or an analysis of variance. The natural effect size parameter in this case is the standardized mean difference (sometimes called Cohen’s d): δ=
µT − µC , σ
where the parameters µT and µC are the treatment and control group means and the parameter σ is the within-group standard deviation. The quantity δ represents the treatment effect in standard deviation units. However because δ is a 1 1 d + /2 b + /2 + . v= population parameter, it is not observed. In fact a + 1/2 (a + b + 1) c + 1/2 (c + d + 1) we carry out the study to estimate or draw inferThe estimate of the odds ratio is ences about δ. The natural estimate of δ is the sample standardized mean difference: pT 1 − pC ad = o= C . T C bc p 1 − pT Y −Y d= . S As in the case of the risk ratio, statistical
analyses involving odds ratios (including metaanalyses) typically use the (natural) logarithm of the odds ratio, not the raw odds ratio. The estimated variance of ln(o) is v= =
1 nT pT
+
1 a
1 b
+
1 nT (1−pT )
+
1 c
+
1 d pC
+
1 nC pC
+
1 nC (1−pC )
Note that if either or pT is 0 or 1 (that is if any of a, b, c or d in Table 58.1 is 0), neither
T
C
where Y and Y are the treatment and control group sample means and S is the pooled withingroup standard deviation. This estimate is often modified slightly to adjust for small sample bias to produce an unbiased estimate of δ (sometimes called Hedges’ g):
3 g=d 1− T 4 n + nC − 9
912
L.V. Hedges and E. Tipton
where nT and nC are the sample sizes in the treatment and control groups of the study. The variance of g is determined (mostly) by the sample sizes and (slightly) by the magnitude of g. Specifically, the variance, v, of g can be computed as v=
nT + nC g2 . + nT nC 2 nT + nC
The effect size g is approximately normally distributed with a mean of δ and a variance of v. Finally, suppose that both the outcome and independent variables are continuous measures as in the case when the studies are correlational. In this case, the natural effect size parameter is ρ, the Pearson correlation coefficient. Its sample estimate is r, where
r=
n
i=1 n
i=1
(xi − x)(yi − y)
(xi − x)
n 2
i=1
.
(yi − y)2
In order to apply normal theory, we must use a transformation of r, the Fisher z transformation, where 1 1+r z = ln . 2 1−r The result, z, is unbiased, with mean 1+ρ 1 ζ = ln 2 1−ρ and variance v = 1/(n – 3). Here n is the total sample size in the correlational study. Finally, note that it is possible to compute confidence intervals for both δ and ζ from single values of g or z, so that we can compute the confidence interval for the effect size from each study in the meta-analysis.
of the effect size index used. The one exception is when studies being combined have binary outcomes and very small sample sizes, a case in which special methods (so-called MantelHaenszel methods) are needed. Therefore, we will present the methods for meta-analysis using a general effect size parameter which will be denoted by θ, and a general effect size estimate denoted by T, and its variance denoted by v. Thus the raw data for the meta-analysis of k studies are the effect size estimates T1 , . . . , Tk and their variances v1 , . . . , vk . The estimate from the ith study Ti estimates the unknown effect size parameter θ i. The summary of a collection of effect sizes via meta-analysis addresses two basic questions. The first concerns the typical or average value of the effect sizes. The second concerns the consistency of effect sizes across studies. The typical effect size in meta-analyses is estimated by averaging estimates across studies. However, because some studies produce more precise estimates (that is, they have smaller variances) than others, it makes sense to give more weight to some (the more precise) estimates than others. Two major statistical approaches to meta-analysis differ in how they compute these weights. Fixed effects methods do not include between-study differences in computing weights, while random effects methods include between-study variations in computing weights. We will describe each one of them below.
3.1 Fixed Effects Methods Combining Estimates If the effect size parameters are identical across studies so that θ 1 = . . . = θ k = θ , then the most precise estimate of θ is given by the weighted mean effect size
3 Combining Estimates of Effect Size Across Studies Methods for combining estimates of effect size across studies are generally the same, regardless
T¯ • =
k
wi Ti
i=1 k
i=1
, wi
58
Meta-analysis
913
where wi = 1/vi , so that the weight given to a particular effect size is the inverse of its variance. Because each of the effect size estimates is normally distributed, the weighted mean T • is also normally distributed and the variance v• of T • is the reciprocal of the sum of the weights:
v• =
k
i=1
wi
−1
.
reasonably homogeneous, but it is important to understand whether the hypothesis that θ1 = · · · = θk is reasonably consistent with the evidence. To test the hypothesis that the effect sizes are the same across studies, we usually use the statistic
Q=
k
i=1
2 wi Ti − T • .
When the effect size parameters are identical, Q has a chi-square distribution with (k – 1) degrees of freedom. Therefore a test of the √ √ T • − 1.96 v• ≤ θ ≤ T • + 1.96 v• . hypothesis that effect sizes are identical across studies at significance level α consists of comIn cases such as the risk ratio or the odds paring the obtained value of Q with the upper α ratio where the statistical analysis is carried out critical value of the chi-square distribution with in the log metric, the confidence interval is first (k – 1) degrees of freedom, and rejecting the computed in that transformed metric, then the hypothesis of identical effect sizes if Q exceeds confidence limits are transformed back to the this critical value. metric of the effect size by using the exponenNote, however, that this test need not be very x tial function exp(x) = e . For example, the 95% powerful if the number of studies is small or if confidence interval in the log metric (e.g., for the the variances of the effect sizes are large (e.g., log-risk ratio or log-odds ratio) is transformed if the sample sizes in most studies are small) back to the unlogged metric (e.g., the risk ratio (see Hedges and Pigott, 2001). Thus one should or odds ratio) as not routinely conclude that effect size parameters are identical (or essentially identical) across √ √ exp T • − 1.96 v• ≤ θ ≤ exp T • + 1.96 v• . studies unless the number of studies is large and they also have large sample sizes (and thus vi are A test of the hypothesis that θ = 0 uses the small). test statistic A 95% confidence interval for θ is given by
T• Z=√ . v• The level α two-tailed test rejects the null hypothesis when |Z| exceeds the 100α percent critical value of the standard normal distribution (e.g., 1.96 for α = 0.05). When the statistical analysis is performed in the log metric (e.g., for risk ratios or odds ratios), the significance test is conducted in the metric of the log-transformed effects. The reason is that the null hypothesis that ρ = 1 is equivalent to the null hypothesis that ln(ρ) = 0 and similarly the null hypothesis that ω =1 is equivalent to the null hypothesis that ln(ω) = 0. The weighted mean provides a summary of the common effect size estimates if they are
3.1.1 Example A systematic review and meta-analysis of k = 10 studies of interventions for smoking cessation for pregnant women was reported by Naughton and colleagues (2008). The data reported in Table 58.2 are from ten of their studies that used assignment of individuals (as opposed to clusters of individuals) to treatment. The number of individuals assigned to treatment who ceased smoking, the number who did not, and the total number assigned to treatment, the number assigned to the control condition who ceased smoking, the number who did not, and the total number assigned to control (a, b, nT , c, d, and nC from Table 58.1) are given in the columns two to seven of Table 58.2. The next three columns
1 2 3 4 5 6 7 8 9 10 Totals
14 29 56 4 9 57 12 12 5 4
a
88 97 388 42 34 343 86 181 21 46
b
102 126 444 46 43 400 98 193 26 50
nT
2 12 18 1 11 35 3 11 0 0
c 102 104 191 23 36 379 98 187 30 47
d 104 116 209 24 47 414 101 198 30 47
nC 8.114 2.591 1.532 2.190 0.866 1.800 4.558 1.127 15.605 9.194
o 2.094 0.952 0.426 0.784 −0.144 0.588 1.517 0.120 2.748 2.219
ln(o)=T 0.593 0.138 0.081 1.317 0.259 0.052 0.438 0.185 2.261 2.265
v 1.687 7.260 12.311 0.759 3.858 19.354 2.281 5.402 0.442 0.442 53.796
w 2.848 52.708 151.570 0.576 14.883 374.572 5.201 29.182 0.196 0.195 631.931
w2
Table 58.2 Example data for computing fixed and random effects meta-analyses by combining log-odds ratios Study Treatment Control 3.533 6.912 5.248 0.595 −0.554 11.371 3.459 0.646 1.215 0.980 33.405
wT
7.396 6.581 2.237 0.467 0.079 6.680 5.248 0.077 3.339 2.173 34.277
wT2
1.427 4.069 5.285 0.702 2.723 6.263 1.830 3.412 0.422 0.421 26.554
w∗
2.988 3.874 2.253 0.550 −0.391 3.680 2.776 0.408 1.160 0.935 18.233
w∗ T
914 L.V. Hedges and E. Tipton
58
Meta-analysis
give the odds ratio, the log-odds ratio, and the variance of the log-odds ratio. Note that studies 9 and 10 have 0 individuals in the control group who experienced smoking cessation; therefore, we added 1/2 to each of the a, b, c, and d values for those two studies in order to compute the odds ratio and its variance. Using the summaries in Table 58.2 we see that the weighted mean of the log-odds ratios is T • = 33.405 53.796 = 0.621
with a variance of
v• = 1/53.796 = 0.019, which leads to a confidence interval for the logodds ratio ln(ω) of
915
cells (a, b, c, or d from Table 58.1 are zero), the methods described above may be used after adding 1/2 to the numbers in each cell. However when individual studies are very small, a large proportion of studies may have empty cells. In this case, special methods, known as MantelHaenszel methods, may be more appropriate. There is no hard and fast rule as to when MantelHaenszel methods are needed, but it might be wise to use these methods whenever more than a small proportion of studies (say 5%) have empty cells. We describe Mantel-Haenszel methods for estimating the odds ratio, but there are analogous methods for the risk ratio and the risk difference (see Greenland, 1982; Tarone, 1981). The Mantel-Haenszel method of combining odds ratios uses the statistic:
√ 0.354 = 0.621 −1.96 0.019 ≤ ln(ω) ≤ 0.621 √ +1.96 0.019 = 0.888.
Converting these into the metric of (unlogged) odds ratio ω yields the estimate o = exp(0.621) = 1.86 and the 95% confidence interval 1.42 = exp(0.354) ≤ ω ≤ exp(0.888) = 2.43 The test for the homogeneity of effect sizes is computed as Q = 34.227 − (33.405)2 /53.796 = 13.534 Comparing Q = 13.534 with the critical values of the chi-square distribution with (10 – 1) = 9 degrees of freedom, we see that a large Q value could occur between 10 and 15% of the time by chance if the odds ratios were identical across studies.
3.2 Mantel-Haenszel Methods
oMH =
k
i=1 k
, bi ci Ni
i=1
where ai , bi , ci , and di are the quantities in Table 58.1 for the ith study, and Ni = ai + bi + ci , +di is the total sample size for the ith study. Note that the Mantel-Haenszel statistic computes the summary directly from the cell counts and dispenses with the intermediate computation of odds ratios from each study. As long as none of the Ni are zero, it is unnecessary (and it would be incorrect) to add 1/2 to the cell counts in order to compute the Mantel-Haenszel statistic, even if some of the ai , bi , ci , and di are zero. An (somewhat complex) expression for the variance of ln(oMH ) is
vMH =
k
Pi Ri i=1 2 k Ri 2
+
k
(Pi Si + Qi Ri ) k Ri Si
i=1 k
2
i=1
i=1
A special problem arises in the meta-analysis of studies with binary outcomes when the individual studies are very small. When only one or a few have 2 × 2 data tables with empty
ai di Ni
+
k
i=1
Qi Si
i=1
2
k
i=1
Si
2 ,
916
L.V. Hedges and E. Tipton
The variance of ln(oMH ) is computed as
where Pi =
ai + di , Ni
Qi =
ci + bi , Ni
Ri =
ai di , Ni
Si =
bi ci . Ni
and
To compute confidence intervals for the odds ratio ω, we first compute a confidence interval for the ln(ω) and then use the exponential function exp(x) = ex to convert the confidence limits for the log-odds into confidence limits for the (unlogged) odds ratio. For example, a 95% confidence interval for ω is given by √ exp ln(oMH ) − 1.96 vMH ≤ ω √ ≤ exp ln(oMH ) + 1.96 vMH .
The Mantel-Haenszel method is basically a fixed effects procedure. There is no entirely satisfactory way to extend this method to random effects analysis. There are methods with the same motivation, such as those based on generalized mixed model procedures (see, e.g., Schall, 1991; or Breslow and Clayton, 1993).
3.2.1 Example Return to our example of k = 10 studies of interventions to promote smoking cessation among pregnant women. Table 58.3 illustrates the calculation of the quantities necessary for computing the Mantel-Haenszel estimate oMH of the odds ratio and the variance of ln(oMH ) directly. The estimate of the odds ratio is oMH = 83.494/45.252 = 1.98, which is very similar to the odds ratio estimate of 1.86 computed by averaging the log-odds ratios.
vMH =
42.351 2
2 (83.494)
+
20.687 + 41.142 2 (83.292) (42.252)
+
2 (42.252)2
21.565
= 0.018,
which is quite similar to the variance of the log-odds ratio of 0.019 computed by averaging the log-odds. Note that neither estimate required adding 1/2 to compensate for the zero cell counts in studies 9 and 10. A 95% confidence interval for ω based on the Mantel-Haenszel estimate is given by √ 1.52 = exp ln(1.98) − 1.96 0.018 ≤ ω √ ≤ exp ln(1.98) + 1.96 0.018 = 2.57. This confidence interval is quite similar to the interval of 1.42–2.43 computed in connection with the analysis that averaged the log-odds ratios.
3.3 Random Effects Methods If the effect size parameters are not identical (or almost so) across studies, an alternative method for combining estimates across studies is the random effects model. In this model, studies are considered as a sample of possible studies and their effect size parameters are considered as a sample from a universe of possible effect size estimates. In this model the object is to estimate the mean µ and variance τ 2 of the population of effect sizes (the population of θ values) from which the observed study effect sizes are sampled. If the effect size parameters corresponding to the studies in our sample of studies (θ1 , . . . , θk ) were observed, we could simply compute their variance as a sample estimate of τ 2 . Because they are not observed we must estimate their variance indirectly by noting that the variance of the observed effect size estimates (T1 , . . . , Tk ) depends partly on vi , which represent estimation
14 29 56 4 9 57 12 12 5 4
1 2 3 4 5 6 7 8 9 10 Totals
88 97 388 42 34 343 86 181 21 46
b 2 12 18 1 11 35 3 11 0 0
c 102 104 191 23 36 379 98 187 30 47
d 206 242 653 70 90 814 199 391 56 97
N 0.563 0.550 0.378 0.386 0.500 0.536 0.553 0.509 0.625 0.526 5.125
P 0.437 0.450 0.622 0.614 0.500 0.464 0.447 0.491 0.375 0.474 4.875
Q 6.932 12.463 16.380 1.314 3.600 26.539 5.910 5.739 2.679 1.938 83.494
R
Note: There are no columns for ad/N or bc/N because ad/N = R and bc/N = S
a
Study 0.854 4.810 10.695 0.600 4.156 14.748 1.296 5.092 0.000 0.000 42.252
S 3.903 6.849 6.196 0.507 1.800 14.215 3.267 2.921 1.674 1.019 42.351
PR 0.481 2.643 4.046 0.231 2.078 7.900 0.717 2.592 0.000 0.000 20.687
PS
Table 58.3 Example data for computing fixed effects meta-analyses using the Mantel-Haenszel method Control Treatment 3.029 5.613 10.184 0.807 1.800 12.324 2.643 2.818 1.004 0.919 41.142
QR
0.373 2.166 6.650 0.369 2.078 6.849 0.580 2.500 0.000 0.000 21.565
QS
58 Meta-analysis 917
918
L.V. Hedges and E. Tipton
errors and partly on τ 2 , which represents true heterogeneity among θ i . The Q-statistic used to test heterogeneity is a weighted sample variance that can be used to obtain an indirect estimate of τ 2 . In particular, τˆ 2 =
Q − (k − 1) , c
if the quantity on the right-hand side of the equation is positive, and zero otherwise, where c is a normalizing constant given by
c=
k
i=1
k
i=1 k
wi −
w2i . wi
i=1
Random effects methods weighted mean effect size as
∗
T• =
k
compute
the
w∗i Ti
i=1 k
i=1
, w∗i
where w∗i = 1/v∗i = 1/(vi + τˆ 2 ). This corresponds to weighting each effect size by the inverse of a new variance, v∗i = vi + τˆ 2 , which includes a component of between-study variation. As in the fixed effect case, the weighted mean T • ∗ is also normally distributed, the variance v∗• of T • ∗ is the reciprocal of the sum of the weights ∗
v• =
k
∗
wi
i=1
−1
,
and a 95% confidence interval for the average effect size µ is given by T • ∗ − 1.96 v∗• ≤ θ ≤ T • ∗ + 1.96 v∗• .
A test of the hypothesis that θ = 0 uses the test statistic ∗
T Z∗ = •∗ . v•
The level α two-tailed test rejects the null hypothesis when |Z| exceeds the 100α percent critical value of the standard normal distribution (e.g., 1.96 for α = 0.05). The fixed and random effects weighted means are similar in form and differ only in the weights used to compute them. When τˆ 2 > 0, the wi ∗ are more similar to one another than the wi . This means that studies receive more equal weights in the random effects weighted mean than in the fixed effects weighted mean, where one study can dominate (receive very large weight) if it has a very small variance (usually because it has a very large sample size). By contrast, in the random effects weighted mean, where the weight given to each study is more similar, no single study can completely dominate. Similarly, when τˆ 2 > 0, each wi ∗ is larger than the corresponding wi . Because the variance of the weighted mean is the inverse of the sum of the weights, this means that the variance v∗• of the random effects weighted mean T • ∗ is larger than the variance v• of the fixed effects weighted mean T • . One implication of this is that confidence intervals for the random effects weighted mean are longer than those of the fixed effects weighted mean. Note that the test of the hypothesis that τ 2 = 0 in the random effects analysis is exactly the test of the hypothesis that θ1 = · · · = θk based on the Q-statistics described in connection with the fixed effects analysis, since if τ 2 = 0, the effect size parameters will be identical. A quantitative description of the amount of heterogeneity can be provided in either one of two ways. The estimate of τ 2 provides one such estimate. The square root of this estimate, τˆ , is an estimate of the standard deviation of the distribution of the effect size parameters across studies. An alternative way to characterize heterogeneity is to describe the proportion of variation in the observed effect size estimates that is due to τ 2 . The estimate Q − (k − 1) I2 = × 100% Q does just this. Because τˆ describes the absolute amount of variation in θ s and I2 describes the amount of variation relative to the total variation
58
Meta-analysis
of estimates (including the amount of variation due to both variation of θ s and errors of estimation), both are complementary ways to describe variation in effect size parameters.
3.3.1 Example Returning to our example of k = 10 studies of interventions to promote smoking cessation among pregnant women, we use the quantities in Table 58.2 to compute and give an estimate of the between-studies variance component (τ 2 ), the random effects weight w∗ , w∗ T, and their sums. First compute the normalizing constant c as c = 53.796 − 631.931/53.796 = 42.05, then use this quantity along with the Q-statistic computed in the fixed effects analysis (Q = 13.534) to compute the estimate of τ 2 as τˆ 2 =
919
= 1.99 and the 95% confidence interval 1.36 = exp(0.306) ≤ ω ≤ exp(1.067) = 2.91. Note that the point estimate of the odds ratio is slightly larger than that computed using the fixed effects model, and the variance of the log-odds ratio computed using the random effects model is twice as large as the value of 0.019 computed using the fixed effects model. Similarly the confidence interval for ω computed using the random effects model is wider (1.36–2.91) than that computed using the fixed effects model (1.42–2.43). Using the value of the Q statistic of Q = 13.534 computed in the example for the fixed effects analysis, the value of I2 , representing the proportion of variance in the estimates that is due to variation in effect size parameters across studies is I 2 =100% × [13.534 − (10 − 1)]/13.534 =33.5%
13.534 − (10 − 1) = 0.108. 42.05
This value is used to compute the w∗ values and the w∗ T values in Table 58.2, for example, the random effects weight in study 1 is w∗1 = 1/(0.593 + 0.108) = 1.427. Using these random effects weights, the random effects weighted mean of the log-odds ratios is T • = 18.233 26.554 = 0.687
with a variance of
v• = 1/26.554 = 0.038, which leads to the 95% confidence interval for the log-odds ratio ln(ω) of √ 0.306 =0.687 − 1.96 0.038 ≤ ln(ω) ≤ 0.687 √ +1.96 0.038 = 1.067.
Converting these into the metric of (unlogged) odds ration ω yields the estimate o = exp(0.687)
4 Methods for Testing for Differences Between Groups of Studies There are also meta-analytic methods for modeling variation across studies as a function of study level covariates. Perhaps the most common such analyses are designed to determine whether the average effect sizes of subgroups of studies differ from one another, a meta-analytic generalization of analysis of variance. Another type of analysis examines the relation between continuously measured covariates and effect size, a meta-analytic generalization of regression analysis (sometimes called meta-regression). For more information about the fixed effects versions of these techniques, see Konstantopoulos and Hedges (2009), and for information about the random effects versions of these techniques, see Raudenbush (2009).
920
L.V. Hedges and E. Tipton
5 Forest Plots
6 Publication Bias
Note that for all the effect size indices we have considered (and a great many more as well) that it is possible to compute a confidence interval for the effect size parameter from a single value of the effect size estimate, so that we can compute a confidence interval for the effect size parameter associated with each study in the meta-analysis. A plot depicting all the confidence intervals from all of the studies in a meta-analysis is called a forest plot (as in seeing the forest and the trees). A forest plot provides an overview of all the effect size estimates in a meta-analysis along with their uncertainties (depicted by the error bars in the confidence intervals). An example of a forest plot arising from the data used in Naughton and colleagues (2008) example is given below (Fig. 58.1). Particularly note two things about this forest plot. First the error bars are of much different lengths, denoting that the sampling uncertainties of different studies are quite different. Second, note that the centers of these error bars (denoting the point estimates of effect sizes from different studies) are somewhat different from one another, indicating that there is variation in the effect size estimates across studies.
Publication selection is the tendency of studies that are published, reported, or otherwise available for review to be a non-random sample of the studies that were actually conducted. If studies producing effect size estimates that tend to be smaller are less likely to be available (e.g., if effects that are too small to be statistically significant are less likely to be published) then publication selection may lead to biases in the effect size estimates computed from observed studies. Such biases can be severe when selection is severe. There is a substantial literature on the detection and possible correction of publication bias which is beyond the scope of this chapter (see Rothstein et al, 2005; Sutton, 2009).
Model Study name
Odds ratio and 95% CI
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
7 Conclusion Meta-analysis can be a valuable tool for summarizing research findings across studies. It permits reviewers to describe the results of each study on a common effect size metric, combine information from many studies in an optimal fashion, and understand the degree to which the findings from different studies agree with one another.
References
Fixed Random 0.01
0.1
Favours C
1
10
100
Favours T
Fig. 58.1 Forest plot of odds ratios found in Table 58.2
Borenstein, M., Hedges, L. V., Higgins, J. P. T., and Rothstein, H. (2009). Introduction to Meta-analysis. London: Wiley. Breslow, N. E., and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J Am Stat Assoc, 88, 9–25. Cooper, H., Hedges, L. V., and Valentine, J. (2009). The Handbook of Research Synthesis and Meta-analysis, 2nd Ed. New York: Russell Sage Foundation. Counsell, C. (1997). Formulating questions and locating primary studies for inclusion in systematic reviews. Ann Intern Med, 127, 380–387.
58
Meta-analysis
Greenland, S. (1982). Interpretation and estimation of summary ratios under heterogeneity. Stat Med, 1, 217–227. Hedges, L. V., and Pigott, T. D. (2001). The power of statistical tests in meta-analysis. Psychol Methods, 6, 203–217. Konstantopoulos, S., and Hedges, L. V. (2009). Fixed effects models. In H. Cooper, L. V. Hedges, & J. Valentine (Eds.), The Handbook of Research Synthesis and Meta-analysis, 2nd Ed (pp. 279–294). New York: Russell Sage Foundation. Meade, M. O., and Richardson, W. S. (1997). Selecting and appraising studies for a systematic review. Ann Intern Med, 127, 531–537. Naughton, F., Prevost, A. T., and Sutton, S. (2008). Selfhelp smoking cessation interventions in pregnancy: a systematic review and meta-analysis. Addiction, 103, 566–579.
921 Raudenbush, S. W. (2009). Random effects models. In H. Cooper, L. V. Hedges, & J. Valentine (Eds.), The Handbook of Research Synthesis and Meta-analysis, 2nd Ed (pp. 295–316). New York: Russell Sage Foundation. Rothstein, H., Sutton, A., and Borenstein, M. (Eds.) (2005). Publication Bias in Meta-analysis. New York: Wiley. Schall, R. (1991). Estimation in generalized linear models with random effects. Biometrika, 78, 719–727. Sutton, A. (2009). Publication bias. In H. Cooper, L. V. Hedges, & J. Valentine (Eds). The Handbook of Research Synthesis and Meta-analysis, 2nd Ed (pp. 435–451). New York: Russell Sage Foundation. Tarone, R. E. (1981). On summary estimators of relative risk. J Chron Dis, 34, 463–468.
Part X
Behavioral and Psychosocial Interventions
Chapter 59
Trial Design in Behavioral Medicine Kenneth E. Freedland, Robert M. Carney, and Patrick J. Lustman
1 Introduction The design of a clinical trial depends on the purpose of the study. Most behavioral medicine trials test the efficacy of an intervention under carefully controlled conditions or its effectiveness in clinical practice. However, some trials are conducted for other purposes, and they may not be designed the same way as standard efficacy or effectiveness trials. Also, many efficacy trials can be designed in more than one way. Design decisions frequently involve difficult tradeoffs and they can be controversial. Investigators, grant reviewers, institutional review boards, and other interested parties often disagree about these decisions. Differences of opinion about control groups drive many of these disagreements. These differences can widen when the decision makers represent multiple disciplines or fields of research with different methodological traditions. Researchers who only conduct drug trials may be unfamiliar with the distinctive challenges involved in designing and conducting randomized trials of nonpharmacological interventions. Social and behavioral scientists who only conduct trials involving healthy subjects may be unaware of the complex design and
K.E. Freedland () Behavioral Medicine Center, Department of Psychiatry, Washington University School of Medicine, 4320 Forest Park Avenue, Suite 301, St. Louis, MI 63108, USA e-mail: [email protected]
logistical challenges involved in studies of medically ill patients. These crosswinds often buffet trialists who work at the nexus of behavioral and biomedical research. This chapter is not intended to be a comprehensive guide to clinical trial design. Instead, it focuses on design issues that are particularly relevant to contemporary behavioral medicine research. It may not resolve the controversies that often surround trial design decisions, but it should at least help to clarify some of the reasons why they are controversial.
2 Control Conditions 2.1 Control vs. Comparison In a randomized controlled trial (RCT), there is at least one experimental group and at least one control group. The latter is so named because its primary purpose is to “control” for threats to internal validity, i.e., the validity of conclusions about the causal relationship between the intervention and the outcome (Campbell and Stanley, 1966). However, some control groups have other purposes, in addition to controlling for threats to internal validity. In some trials, participants are randomly assigned to a new, experimental intervention or to an existing, standard treatment. The aim is to determine whether the new treatment is superior, or at least equivalent, to the standard treatment. Thus, the principal reason for including the standard treatment arm is to
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_59, © Springer Science+Business Media, LLC 2010
925
926
compare it to the experimental treatment. This design does control for most of the standard threats to internal validity, but if that were the only consideration, a different control condition such as a placebo might be a better choice. When designing a trial, both the control and the comparison functions of the control group(s) must be considered (Kazdin, 2003b). In a randomized comparison of two active treatments, the positive outcomes of both interventions could be due to factors such as the placebo effect rather than to their putative active ingredients. On the other hand, both treatments could actually be less effective than a mere placebo if their side effects were too aversive. Neither possibility would be apparent unless a placebo control group was included in the trial design. However, adding a placebo arm would increase costs and could raise ethical concerns. Given these constraints, it is necessary to consider whether comparison of the active interventions would be sufficiently informative to justify the trial, even without a placebo arm. An RCT of two interventions may be worth conducting even if some threats to internal validity cannot be completely ruled out.
2.2 The Standard Hierarchy of Control Conditions Control conditions differ with respect to the amount of manipulation or intervention they deliver. The minimal condition is a no-treatment control arm. It can only be used when it is possible to ensure that (1) Group A is given the experimental treatment and Group B is not, (2) Group B cannot obtain it any other way, and (3) Group B cannot obtain any other treatment for the same indication. A recent trial (Wolfe et al, 2003) compared a community-based dating violence prevention program for at-risk youth to a no-treatment control condition. The program was not available from any other source, and no other treatment for this problem was available in the community. Consequently, the
K.E. Freedland et al.
investigators could be confident that the control group would not receive any dating violence prevention services during the trial. If it is possible to employ a true, no-treatment control condition, then it is also possible to construct a hierarchy of control conditions that range from less to more intervention: (1) no treatment; (2) wait list; (3) nonspecific intervention or placebo; (4) isolated component(s) of an active treatment; (5) a complete, active treatment. Wait lists provide more intervention than no-treatment conditions in the sense that the participants are expecting to receive the experimental treatment. Nonspecific and placebo control conditions do not provide the putative active ingredients of the experimental intervention, but they do provide an intervention that can affect outcomes via therapist contact, expectations for improvement, or other mechanisms. Treatment component conditions are active interventions, albeit missing one or more ingredients of the full experimental intervention. Active treatment control arms generally provide as much intervention as the experimental treatment(s) to which they are compared; they simply represent different forms of treatment. Both treatment component and active treatment control conditions also have nonspecific or placebo effects. This hierarchy, or something similar, is presented in many research methodology textbooks (e.g., Kazdin, 2003b). It assumes that it is possible to withhold treatment altogether from untreated controls. This is possible in many experimental environments, but most clinical research in behavioral medicine is conducted in settings wherein pristine, no-treatment control conditions are not possible. If the target problem or disorder involves physical and/or mental health and if the participants have access to health-care services, then a true, no-treatment control condition is usually not an option. If the participants are randomly assigned to a nominal “no-treatment” condition, they may still receive some form of usual care (UC) or treatment as usual (TAU). They have access to non-study clinical services, whether or not they utilize them. Thus, in many trials, one cannot guarantee that
59
Trial Design in Behavioral Medicine
the participants will refrain from non-study treatment.
2.3 Usual Care, Treatment as Usual, and Standard of Care Controls “Usual care” and “treatment as usual” are often used interchangeably, but they have different connotations. UC is the preferred term in medical research (e.g., Minneci et al, 2008). It implies that the non-study care extends beyond the specific treatment or disorder being studied. For example, cancer patients enrolled in a psychosocial intervention trial might be receiving non-study treatment not only for cancer and its complications but also for other medical and psychiatric problems, whether related or unrelated to cancer. TAU terminology is used more often in mental health services research and psychosocial intervention studies than in medical research (Street and Luoma, 2002). It implies that an identifiable treatment of some sort is routinely provided for the target problem or disorder. It does not imply that other forms of care are being provided in the trial setting, although it does not preclude this either. UC might be very narrowly construed as including only the non-study psychiatric, psychological, social work, or primary care services that are available to the participants for the same psychosocial problem that is the target of the experimental intervention. It could be viewed more broadly as including services for any mental health problem, whether or not targeted in the trial, or even more broadly as comprising all of the health services the participants are receiving for all of their medical and psychiatric problems. This last definition not only is the most comprehensive but also captures more threats to internal validity. For example, care received for other medical problems can affect psychosocial adjustment. Also, UC often refers not only to health services that are actually utilized but also to services that could be utilized. Assignment to a UC control group does not necessarily
927
mean that all participants will actually receive treatment for the disorder in question. Some might not pursue, be offered, or accept treatment. UC typically includes whatever health services a patient would routinely receive, regardless of clinical trial participation. In contrast, enhanced usual care (EUC) conditions provide something other than purely naturalistic usual care. There are several circumstances in which an EUC control might be employed instead of a naturalistic UC condition. One of the most common occurs when the patient’s health-care provider(s) must be informed, for ethical and/or clinical reasons, about the results of assessments performed for the study. Doing so enhances usual care and might trigger additional nonstudy tests or treatments that would not have occurred if the patient were not participating in the trial. Even minimal enhancement of usual care can have substantial effects on the patient’s care and medical outcomes. Another reason to use an EUC control is to ensure that all participants receive the current, guideline-adherent standard of care. Even if a novel treatment is found to be superior to UC, it will not necessarily be seen as representing a significant advance in medical care unless the UC was state-of-the-art. An EUC control condition may be needed to provide this level of care. This can create a dilemma for behavioral trialists working at tertiary or quaternary care medical centers. Consider a trial that would compare a behavioral intervention to improve glycemic control in outpatients with Type 2 diabetes, to UC for diabetes. The behavioral intervention would be added on to UC for diabetes. Thus, the design can be depicted as comparing UC plus behavioral intervention to UC alone. If the researcher casts a broad net in recruiting for the trial, some of the participants might be receiving their diabetes care from the medical center’s renowned, state-of-the-art diabetes specialty clinic, while others might be receiving it from their own primary care physician, a community health center, or some other provider. The quality of UC for diabetes is likely to differ markedly across these settings. If it does, the researcher will not be able to claim that
928
the behavioral intervention improves outcomes above and beyond state-of-the-art medical care for diabetes, even if the trial’s results are positive. One alternative would be to restrict recruitment to the diabetes specialty clinic, where UC is uniformly state-of-the-art. The disadvantages of this approach are that it would limit recruitment and jeopardize the feasibility and generalizability of the trial. Another alternative would be to stratify by site. This approach would permit a comparison with state-of-the-art care within the specialty clinic sample, but it would only be possible to do this if the trial were fairly large. Yet another approach would be to ensure that all participants receive the best possible care, even if some of them would otherwise receive care that falls short of the state-of-the-art. This would change the trial design to EUC + behavioral intervention vs. EUC alone. This sort of enhancement is also easier said than done. The trialist would either have to arrange for all participants to be seen at the diabetes specialty clinic, and thereby increasing the cost of the trial and possibly alienating the participants’ primary care physicians, or else find a way to ensure that the all of the primary care physicians provide state-of-the-art diabetes care. Neither possibility would be a realistic option for some behavioral medicine research centers. Another reason to employ an EUC control is to standardize the non-study care received by all participants, in order to reduce extraneous variability in exposure to relevant treatments. A trial of a behavioral treatment for insomnia might compare the experimental treatment to UC. In this design, some participants in both arms might receive medications for insomnia from their own physicians, while others might not receive any medication, so UC would not consist of the same sort of medical treatment for insomnia from one patient to the next. In an EUC alternative, these physicians might be asked to prescribe the same medication, at the same dosage, to every patient. Of course, trialists usually cannot dictate nonstudy care or guarantee that every physician will comply with their requests to standardize care. The more the UC is enhanced, the more it resembles an experimental intervention in
K.E. Freedland et al.
its own right and the more it diverges from actual clinical practice. Thus, a trial comparing a novel experimental intervention to extensively enhanced UC is, essentially, a comparison of two active interventions. For these reasons, UC or minimally enhanced UC controls are more common than substantially enhanced or standardized EUC controls in behavioral medicine research. Evidence-based clinical guidelines exist for many different conditions and treatments. Treatments that are of the highest quality and that adhere closely to state-of-the-art clinical guidelines or to well-established best practices are often labeled standard of care (SoC). SoC should not be confused with standard care (SC), which is usually synonymous with usual or routine care. There are many reasons why the usual care in a particular setting or community might not always conform to established standards of care. There are unfortunate reasons, such as disparities in the quality of health care that fall along socioeconomic, demographic, or geographic lines (Werner et al, 2008). Entire health-care systems can differ with respect to their institutional commitment to, or ability to, provide state-of-the-art care. There is also a good reason for variability in UC: expert clinicians often depart from guidelines when doing so is clinically indicated. A basic example is a physician’s decision to withhold a guideline-recommended medication if the patient is allergic to it. Another is a therapist’s effort to tailor an evidence-based intervention to the idiosyncratic needs of an individual patient. Thus, appropriately individualized care (IC) accounts for some of the variability in usual care. In some circumstances, individualized care might yield better outcomes than rigid adherence to a specific SoC, but in others, the best way to achieve optimal outcomes might be to uniformly follow the SoC. Thus, the most rigorous comparator in some RCTs might be IC rather than SoC, and in some trials, it might be the reverse. It might even be desirable in some cases to compare a novel intervention to both an SoC and an IC control condition (Thompson and Schoenfeld, 2007). Like other three-arm designs,
59
Trial Design in Behavioral Medicine
however, this type of trial may be prohibitively expensive in many instances, and it may be necessary to choose just one of these control conditions even if it would be informative to include both. UC, SoC, and other conditions that are not entirely (or not at all) determined by the investigator’s research protocol can change over time. The chances that consequential change will occur increase over time. This pertains both to the length of the intervention phase of the trial and to the overall duration of the trial itself. The longer the patients remain in the intervention phase of a trial, the greater the chance that their non-study care will evolve. For example, nonstudy cardiac treatment regimens are unlikely to change over the course of a 4-week intervention phase in a group of patients with stable coronary disease, but they could very well change if the trial’s intervention phase lasts 6 months or a year. Similarly, the longer the trial as a whole takes to complete, the greater is the chance that there will be changes in the ways that the study population will be treated. New drugs can enter the market, existing ones can fall abruptly into disfavor, and new or revised clinical guidelines can be issued, to state just a few examples. Thus, temporal variability in usual care can be problematic in trials that involve long intervention phases and in ones that take years to complete. Clearly, UC, EUC, SoC, and IC can be rather complex control conditions. Failure to give due consideration to the characteristics of these control conditions can have unfortunate consequences for clinical trials, and even for entire lines of research (Burns, 2009). The composition of UC in any given trial depends upon the problem or disorder in question, the other problems that are present, the state-of-the-art of treatment for the target problem, the availability of treatment services, the patient population being studied, and the settings from which the participants are recruited. In multicenter studies, usual care may differ from one center to the next or from one recruitment venue to the next within a single center. It can differ from one patient to the next within the same setting, and it can change in the middle of the study. Given
929
its heterogeneity, it is essential to document key elements of UC. For example, it is important to systematically document any non-study medications that participants were taking during the trial. Data on non-study treatments can be very useful for analyses of the roles of differential treatment intensification and differential adherence in study outcomes, particularly in behavioral RCTs that cannot be double blinded and that involve some form of usual care. Differential treatment intensification occurs when one group tends to receive more intensive non-study care than another group, and differential adherence occurs when the patients in one group tend to adhere more closely to their nonstudy treatment regimens than those in another group. For example, patients with arthritis might be randomly assigned to a coping skills intervention plus usual care for arthritis or to usual care alone. The intervention might, either inadvertently or by design, encourage the participants to contact their physician more frequently or to be more assertive in requesting medications or other non-study treatments. Consequently, they may receive more frequent or intensive nonstudy medical care than the UC participants. Depending upon the aims of the intervention, this form of differential intensification could be viewed either as a rival explanation for better outcomes in the intervention arm or as a mechanism of action through which the intervention produced better outcomes. Conversely, the UC participants in such a trial might seek more non-study treatment than their counterparts in the intervention arm, because they are not receiving any special intervention for their arthritis and may have more difficulty coping with it. This might turn out to be a potential explanation for why there was no significant difference between the groups in the primary study outcome. Differential adherence can have the same causes and effects as differential intensification. The coping intervention might, for example, induce closer patient adherence to nonstudy treatments for arthritis in the intervention than in the usual care arm.
930
Treatment intensification measurement methods have only recently started to gain attention. So far, most of the emphasis has been on the intensification of medication regimens, as indicated by increases in the number of drug classes, increased dosages within a drug class, or switches to different drug classes (Schmittdiel et al, 2008; Selby et al, 2009). Other dimensions of treatment intensification, such as whether there is an increased frequency of non-study outpatient clinic visits, may also be useful to measure in some behavioral RCTs. Measurement of patient adherence is addressed in Chapter 7.
2.4 Usual Care and Its Variants in the Hierarchy of Control Conditions UC does not fit into the standard hierarchy of control conditions. In fact, if non-study clinical care is available to the participants of a trial, the standard hierarchy disintegrates. In the presence of UC, there is often no way to guarantee that participants randomly assigned to no-treatment control conditions will abstain from non-study treatment for the target problem or disorder (Kazdin, 2003a). For example, in a trial of a psychotherapeutic intervention for depression, some “no-treatment” participants might obtain antidepressants from their own physicians, as might some in the treatment arm. In addition, the presence of UC in the background of a trial often precludes pure controls for placebo or nonspecific treatment effects, as well as pure comparisons with treatment components or active interventions. If the depression trial were designed to compare the full experimental psychotherapeutic intervention against a treatment component condition, for example, the design would ordinarily be described simply as a comparison of the component in question vs. that component + the remaining ingredients. However, a more complete description would be UC + treatment component vs. UC + treatment component + the remaining ingredients, given that UC is present in the background of the trial.
K.E. Freedland et al.
The UCs do not necessarily cancel each other out in this type of design because UC may interact differentially with the two conditions. For example, if the remaining ingredients in our example included educational materials about treatments for depression, the participants in the full intervention arm might be more likely than those in the component-only arm to seek non-study antidepressants. This potential source of differential intensification of non-study care should be addressed whenever an RCT is conducted with a UC control group or even when UC is merely present in the trial’s milieu. This also holds for EUC, SoC, and IC controls. Clinical trials targeting problems that are not routinely (or ever) addressed in the setting of interest or for the population of interest might be regarded as an exception to this rule, but even these types of trials can be affected by UC. For example, self-forgiveness has been shown to predict better psychosocial adjustment and quality of life among women being treated for breast cancer (Romero et al, 2006). If a research team were to propose a clinical trial of selfforgiveness therapy, they would probably find that UC for breast cancer at their medical center does not include formal self-forgiveness therapy. If they were to dig a little deeper, however, they might find that the center’s pastoral care or support groups encourage self-forgiveness. Even if the medical center were bereft of selfforgiveness services, the participants would still be receiving interventions (e.g., visits with their physician or social worker) that could influence some of the outcomes of the self-forgiveness trial, such as quality of life. Thus, even if the experimental intervention is only available to trial participants, UC is still non-ignorable.
3 Design Issues in Behavioral Medicine Research 3.1 Efficacy and Effectiveness Trials In addition to obliterating the standard hierarchy of control conditions, the presence of UC
59
Trial Design in Behavioral Medicine
blurs the distinction between efficacy and effectiveness. In principle, the efficacy of an intervention should be tested under tightly controlled conditions, and its generalizability and effectiveness should subsequently be tested in the “real world” of clinical practice. However, when efficacy studies are conducted in clinical settings, the presence of UC constrains the investigator’s control over the conditions to which the participants are exposed (Kazdin, 2003a). Also, many behavioral medicine RCTs are designed to adapt or extend to medically ill patients, interventions that have already proven to be efficacious in medically well subjects. Consequently, many of our “efficacy” trials are actually efficacy– effectiveness hybrids. In effectiveness trials, interventions with known efficacy are tested as replacements for current practices or as ways to augment them. The control arm is either UC or EUC. The basic replacement effectiveness design compares a translational intervention to UC or EUC. The basic augmentation effectiveness design is similar except that the new intervention is added to existing practice: UC or EUC + translational intervention vs. UC or EUC alone. Some efficacy trials in behavioral medicine are analogous to augmentation effectiveness studies, except that the efficacy of the intervention has not yet been established, at least not in the setting of interest or for the population of interest. They compare UC or EUC + an experimental intervention to UC or EUC alone. For example, we recently compared two experimental interventions for depression in patients with a recent history of coronary artery bypass graft (CABG) surgery to UC (Freedland et al, 2009). For simplicity, the present discussion focuses on just one of the interventions, cognitive behavior therapy (CBT). We chose CBT because it had well-established efficacy for depression in psychiatric patients and could be adapted to address the needs and problems of post-CABG patients, but its efficacy had not yet been established in this population. We compared CBT to EUC for depression which, in this patient population, sometimes includes antidepressants but rarely includes psychotherapy. About half of
931
the participants were already taking an antidepressant but were still depressed at enrollment in the trial. They were permitted to continue their antidepressants regardless of group assignment. Consequently, about half of the patients in the experimental arm received a combination of CBT plus non-study antidepressants, and the other half received only CBT for depression. Similarly, half of the control group received nonstudy antidepressants, and the other half received no treatment for depression at all. The participants and their physicians were informed of the patient’s depression status at baseline. Thus, this trial provides an example of minimally enhanced usual care and the following design: EUC + CBT vs. EUC alone. Criticism of the UC or EUC control condition is common in this type of trial, particularly by reviewers who are uncomfortable with departures from more familiar efficacy designs. A frequent concern is that this design does not control for attention or placebo effects. It is true that the participants in the EUC arm are unlikely to receive as much clinical attention as their intervention arm counterparts, since CBT provides ample clinical attention. However, researchers may disagree about whether it is necessary or even desirable to control for attention in this type of study. Because the presence of UC in the background of a trial generally precludes pure placebo or attention controls, controlling for attention implies this design: EUC + attention vs. EUC + attention + other ingredients of CBT. A serious drawback of this design is that it is uninformative as to whether CBT yields better depression outcomes than UC alone after CABG surgery. That may seem like an effectiveness rather than an efficacy question, but as discussed above, the presence of UC blurs the distinction between efficacy and effectiveness. Unless CBT is first shown to be superior to UC for depression in this patient population, there is little reason for researchers, clinicians, patients, administrators, or policymakers to care whether its apparent effects are due to attention rather than to the other ingredients of CBT. One way to overcome this limitation would be to add an EUC-only group to the design. This
932
three-arm trial would require a larger sample and it would cost more than a two-arm trial. Would it be worth the additional expense? That depends on how important it is to determine the extent to which clinical attention explains the effects of CBT and that is not a simple question. Unlike the standard threats to internal validity (e.g., history, maturation, or regression to the mean), attention is not a rival explanation for the effects of CBT. Attention is an integral part of CBT, so the attention control design is analogous to the treatment component design, and its principal function is comparison rather than control. In other words, this design does not truly “control” for attention, since attention is not a threat to internal validity; instead it permits a comparison of the full intervention (CBT) to one of its key ingredients (clinical attention). However, clinical attention cannot be extracted from the therapy in which it is embedded and delivered in a pure, active ingredient-free form. Attention group therapists cannot simply sit and stare at their patients for the same number of hours that the experimental group therapists interact with theirs. They have to provide some sort of intervention in order to have a vehicle within which to deliver clinical attention. Furthermore, CBT provides a distinctive form of clinical attention known as a collaborative therapeutic relationship, in which the patient and therapist collaborate on cognitive-behavioral treatment goals and strategies (Beck, 1995). This special relationship cannot be extracted from CBT without changing it. At best, an attention condition might approximate CBT’s collaborative relationship, but there would still have to be some sort of non-CBT therapeutic content for the collaboration to form around. Assuming that a credible UC or EUC + attention condition can be implemented, what is gained from comparing it to the full intervention? In basic psychotherapy research, it is important to determine whether the particular form of therapy under investigation provides benefits above and beyond clinical attention. In applied behavioral medicine research, determining whether particular therapies have specific
K.E. Freedland et al.
active ingredients is usually less important than ascertaining whether the therapies are beneficial. Nonspecific therapies, the kind of interventions that may be used as attention control conditions for established interventions such as CBT, may require less training and experience, may be less expensive to deliver, and may or may not be as efficacious as the experimental intervention. These are important but secondary questions. The more pressing goal for behavioral medicine RCTs is to develop highly efficacious interventions for clinically significant problems. However, the behavioral medicine research community is increasingly aware of the need for behavioral and psychosocial interventions that are not only highly efficacious but effective and practical to implement in clinical practice (Glasgow and Emmons, 2007). Practical clinical trials (Glasgow et al, 2006; Tunis et al, 2003) are expected to play an increasingly important role in testing interventions in clinical practice settings. In some instances, two or more evidence-based interventions for a given problem may already be available to practitioners, but whether, for whom, and under which circumstances one intervention is superior to another, is unknown. A comparative effectiveness trial can address such questions. In other instances, practical clinical trials are needed for further evaluation of innovative behavioral interventions that have been found to be superior to other therapies, medications, and/or control conditions in tightly controlled efficacy studies. In these cases, the new intervention should be compared with the current standard of care, if one exists. If there is no evidence-based standard of care, or if current clinical practices in the settings of interest do not adhere to whatever evidence-based guidelines may be available, then comparison to some form of usual care would be appropriate (Glasgow et al, 2006).
3.2 Factorial Designs in Efficacy Research Factorial trial designs are uncommon in most areas of medical research, including behavioral
59
Trial Design in Behavioral Medicine
medicine, but they can be very useful. The Canadian Cardiac Randomized Evaluation of Antidepressant and Psychotherapy Efficacy (CREATE) trial (Frasure-Smith et al, 2006; Lesperance et al, 2007) is a good example. Patients with stable coronary disease and major depression were randomly assigned to either clinical management (CM) or interpersonal psychotherapy (IPT) + CM. Within each group, the participants were then randomized to citalopram or to a pill placebo. This yielded four groups: IPT + CM + pill placebo, IPT + CM + citalopram, CM + pill placebo, and CM + citalopram, and it permitted independent comparisons of citalopram vs. placebo and IPT vs. CM. Because the participants were medical patients, UC was present in the background of all four groups; this will be ignored here, for simplicity. The key advantage of this design is that it requires fewer subjects to test the two hypotheses (i.e., drug vs. placebo and IPT vs. CM) than if two separate trials were conducted instead. A critical assumption is that there is no interaction between the two interventions (FrasureSmith et al, 2006; Friedman et al, 1998). If there is an interaction, the effects of each intervention depend upon whether the other is coadministered. This is more likely to occur when the two interventions have similar mechanisms of action than when they operate through different pathways. Consequently, it is an appropriate design for simultaneously testing the efficacy of a drug and a behavioral or psychotherapeutic intervention. It might not be a good design for simultaneously testing two similar drugs or two forms of psychotherapy. A factorial design can also be used to test interaction hypotheses (Piantadosi, 2005). For example, a trial might be designed to determine whether the combination of a drug plus a behavioral intervention produces greater weight loss in obese subjects than either the drug or the behavioral intervention alone. The control group receives neither treatment, but like all of the other participants, they may, depending on the setting and population, receive some form of UC. Double randomization is not required for this type of study. Instead, the participants would be
933
randomly assigned just once, to the drug alone, to the behavioral intervention alone, to combination therapy, or to neither treatment. A larger sample is required for this type of trial than for the double-randomization factorial RCT design discussed above (Friedman et al, 1998).
3.3 Safety Trials The ultimate goals of medical care are to save lives, prevent or decrease disability, and maintain or improve quality of life. Unfortunately, many medical and surgical therapies have been found to increase the risk of death or disability, or to diminish quality of life, despite being efficacious for a target condition of some sort. The Cardiac Arrhythmia Suppression Trial (CAST) is a classic example. CAST showed that certain medications decrease cardiac arrhythmias but do so at the expense of increasing the risk of mortality (CAST Investigators, 1989). Such findings have generated widespread skepticism about claims of treatment efficacy that are based on improvements in surrogate end points such as arrhythmias without concomitant evidence of improved functioning, survival, or quality of life (Fleming and DeMets, 1996). In behavioral medicine, many trials target outcomes that could be considered surrogate end points, and do so without being designed or powered to evaluate survival or other ultimate end points. There are often good reasons for this. Depression, for example, is a risk factor for morbidity and mortality in patients with heart disease, but it is also a disorder in its own right. In the post-CABG depression trial discussed earlier, the goal was to test the efficacy of an intervention for depression in a patient population that had been excluded from almost every other depression trial ever conducted. It is possible, but unlikely, that exposure to CBT during the year after CABG surgery increases the risk of mortality. To determine whether it indeed poses this risk would have required a much larger trial. Conducting a large, expensive trial simply to confirm that CBT is not lethal would have
934
been hard to justify. It is essential to monitor serious adverse events (SAEs) in any RCT, but few medical or behavioral trials are adequately powered to analyze group differences in SAEs (Tsang et al, 2009). Safety is a more serious consideration when antidepressants are used to treat depression in patients who also have heart disease or other serious medical conditions. The sertraline antidepressant heart attack randomized trial (SADHART) evaluated the safety and efficacy of sertraline for major depression in patients with a recent acute myocardial infarction (MI). Left ventricular ejection fraction (LVEF) was the primary indicator of the drug’s safety, and other cardiac variables (e.g., angina) were secondary indicators. The trial was not adequately powered to study mortality, but deaths were analyzed in an exploratory analysis. SADHART was a double-blind, placebocontrolled trial. All of the participants received UC for heart disease and other medical problems, but they were not permitted to take nonstudy antidepressants while participating in the trial. In this sense, something was subtracted from the participants’ usual care, and replaced with either sertraline or a pill placebo. So, instead of receiving enhanced UC for depression, they received restricted usual care (RUC). In this type of design, it may be necessary to enhance some aspects of UC while restricting others. The SADHART team notified the participant’s cardiologist if significant cardiac abnormalities were found. Thus, although SADHART is usually described simply as a placebo-controlled trial, a complete description of the design compares EUC for heart disease, etc., + RUC for depression + sertraline to EUC for heart disease, etc., + RUC for depression + pill placebo.
3.4 Mediation Trials Successful mediation trials are the “holy grail” in many areas of behavioral medicine research.
K.E. Freedland et al.
They are designed to show that medical outcomes can be improved by treating behavioral problems. They are pragmatic trials in that they aim to yield new approaches to preventing or treating medical conditions, but they are also explanatory studies in that they are designed to determine whether the behavioral problem is a causal risk factor rather than a non-causal risk marker. They are referred to herein as mediation trials, because they target putative mediators of medical outcomes, rather than the medical outcomes themselves. For example, considerable evidence has accumulated that both depression and low perceived social support (LPSS) increase the risk of morbidity and mortality after acute MI. It is not known, however, whether these variables are causal risk factors or whether modifying them can reduce the incidence of recurrent infarction and death. The enhancing recovery in coronary heart disease (ENRICHD) clinical trial was designed to address these questions. Patients with a recent MI plus depression and/or LPSS were randomly assigned to an intervention that included CBT and, in some cases, sertraline, or to minimally enhanced UC for depression. The intervention had modest effects on depression and LPSS, but no effect on the combined primary end point of reinfarction-free survival (Berkman et al, 2003). The findings provided a reasonably clear answer to the pragmatic question of whether post-MI depression and LPSS are treatable: they are, but the best available treatments have relatively weak effects on these problems. They yielded an ambiguous answer to the pragmatic question of whether post-MI medical outcomes can be improved by treating depression and LPSS. The medical outcomes did not differ between the groups, but they might have differed if the intervention had stronger effects on depression and LPSS. The findings also left open the explanatory issue of whether depression and LPSS are causal risk factors: they may be, but ENRICHD neither proved nor disproved this. ENRICHD illustrates some of the challenges involved in conducting behavioral mediation trials. The intervention targeted two behavioral risk markers which, assuming that they are even
59
Trial Design in Behavioral Medicine
part of a causal process that leads to reinfarction or cardiac death after an MI, are distal mediators. The processes that are involved in blood clot formation are, in contrast, proximal mediators of recurrent infarction. All else being equal, it would be much easier to demonstrate that aspirin and clopidogrel prevent reinfarction, because they target proximal mediators. Establishing that a behavioral problem predicts adverse medical outcomes is a precondition for conducting a trial such as ENRICHD. The trial tests the mediation hypothesis by demonstrating that (1) the intervention improves medical outcomes, (2) it improves the behavioral risk factor, and (3) the improvement in medical outcomes is due, at least partially, to improvement in the behavioral risk factor. Whether the hypothesis will be supported depends on (1) the strength of the association between the behavioral risk factor and the medical outcome, (2) the efficacy of the intervention in relation to the behavioral risk factor, and (3) whether the intervention affects the medical outcome via other pathways. The strength of the association between the risk factor and the medical outcome is not something that the investigator can control, but it is an important consideration in deciding whether the trial should be conducted in the first place. If the intervention is not very efficacious for the behavioral risk factor, then there is little hope that it will affect the medical outcome even if the risk factor is a strong determinant of it. This was problematic in ENRICHD; the differences in depression and social support between the intervention and EUC groups were too small to affect the medical outcomes. If the intervention operates through multiple pathways, then it may be challenging to demonstrate behavioral mediation even if the medical outcomes improve. For example, sertraline is an antidepressant but it also interferes with blood clotting by binding with platelet receptors. If sertraline helps to prevent MIs (a potential benefit which has not yet been proven), it might do so by improving depression, inhibiting platelet activation, or both. These issues make it difficult to demonstrate that behavioral interventions can improve medical outcomes and even more difficult to test
935
causal hypotheses. It is often said that unlike observational studies, clinical trials are experiments and therefore provide stronger tests of causal hypotheses about risk factors (Elwood, 2007). They are indeed experiments, but they are less tightly controlled than most laboratory experiments. Also unlike laboratory research, the experimental manipulation in a behavioral trial does not necessarily produce distinct groups. In SADHART, for example, depression was neither totally abolished in the sertraline arm nor uniformly persistent in the placebo arm. No known treatment is efficacious enough to turn a depressed sample into a 100% fully remitted sample, and the placebo effect and spontaneous improvement conspire to ensure that some control subjects will partially or fully remit. In general, mediation trials should only be conducted if highly efficacious treatments for the behavioral risk factor exist. However, many behavioral problems are difficult to treat. Treatment development and efficacy studies are needed to pave the way for multicenter mediation trials. In any RCT, the efficacy of an experimental intervention is judged in relation to the control condition. The effect size has as much to do with the control condition as it does with the experimental treatment (Mohr et al, 2009). For example, antidepressant medications are more efficacious in relation to no-treatment controls than to pill placebos. This is especially important in mediation trials, which require highly efficacious treatments for behavioral targets in order to affect medical outcomes and to provide strong tests of the mediation hypotheses. For clinical problems like depression in medical patients, no-treatment control groups are not an option because of the presence of UC. Among the designs that are feasible in this circumstance, UC or minimally enhanced UC control designs are the most powerful. To employ a UC + attention control instead of UC or EUC could defeat the purpose of a mediation trial. If sufficient resources are available, it might be possible to add a third arm in order to “control” for attention, but doing so might not serve the primary aims of the trial. It could serve a secondary aim, but whether this is a wise use of limited resources in
936
an explanatory mediation trial should be questioned. It might be better to wait until more pragmatic questions come to the fore; in other words, after it has been shown that treating the behavioral risk factor can improve medical outcomes. Once this has been accomplished, it is then time to address such questions as whether the risk factor can be treated more rapidly, less expensively, and in a broader range of settings. On the other hand, if an applicable intervention has a powerful effect on a behavioral risk factor that is known to have a strong relationship with a medical outcome, it may be preferable to move directly to a more stringent mediation trial design. The Diabetes Prevention Program (DPP; Knowler et al, 2002) is an example. The DPP randomly assigned individuals at risk for Type 2 diabetes to an intensive lifestyle intervention, metformin, or placebo. The lifestyle intervention was designed to reduce obesity and increase physical activity. UC was enhanced by a brief lifestyle education intervention and by assisting the patients’ non-study health-care providers in following treatment guidelines for concomitant conditions such as hypertension. UC was also restricted in that treatments that could have affected study outcomes were discouraged if alternate treatments were available. The intensive lifestyle group lost more weight and engaged in more physical activity than the other groups. The incidence of diabetes was reduced by 58% in the lifestyle group and by 31% in the metformin group, as compared with the placebo group. The lifestyle intervention was also more efficacious than metformin. Thus, the DPP showed that the intensive lifestyle intervention is an efficacious treatment for obesity and physical inactivity in individuals at risk for Type 2 diabetes, and that it also helps to prevent the onset of diabetes in this population. It did so even though stringent control conditions were employed.
3.5 Statistical Power and Trial Design As noted previously, the effect size of an intervention depends not only on the intervention
K.E. Freedland et al.
itself but also on the condition to which it is compared (Mohr et al, 2009). A corollary of this fact is that trial design also affects statistical power. For example, if an RCT is designed to test the efficacy of an intervention in which positive outcome expectations are likely to be associated with better outcomes (as is the case with many psychosocial interventions), then for any given sample size, the statistical power of the primary hypothesis test will be lower if the RCT includes a placebo control group than if the control group receives no treatment. Comparative efficacy and effectiveness trials tend to require relatively large samples because they are usually designed to compare two active interventions to one another. In some trials, however, two different active interventions are compared to a control condition such as a placebo or usual care. In the DPP trial, both of the active treatments reduced the incidence of diabetes, in comparison to the placebo. Furthermore, the trial was large enough to compare the two active interventions to one another, and the results showed that the lifestyle intervention was superior to metformin (Knowler et al, 2002). Smaller trials with similar designs may have sufficient power to compare both active interventions to the control condition, but not to compare the active treatments to each other. This was the case, for example, in our recent trial of treatments for depression in patients with a recent history of CABG surgery. There was adequate power to compare the cognitive-behavioral and stress management interventions to the EUC control condition, but not enough to compare the interventions to each other (Freedland et al, 2009). In such a trial, even if Treatment A has a bigger effect than Treatment B vis-à-vis the control condition, it is not clear whether Treatment A is superior to Treatment B. Consequently, there will be some uncertainty about the clinical implications of the results. It could be argued that Treatment A is the more efficacious intervention, yet one cannot say with certainty that Treatment A is more efficacious than Treatment B. Thus, whenever possible, these sorts of trials should be powered to support comparisons
59
Trial Design in Behavioral Medicine
among all of the groups, not just between the active intervention and control arms.
3.6 Falsification Research The philosopher of science Karl Popper argued that a hypothesis is scientific only if it is falsifiable (Popper, 1972). Much of the research that is conducted in our field is designed to determine whether behavioral variables are risk factors for adverse medical outcomes. Much less research aims to falsify these hypotheses. For example, there is a strong association between depression and Type 2 diabetes (Anderson et al, 2001). This raises the question of whether treatment of depression can prevent diabetes. If so, that would lend support to the hypothesis that depression is a causal risk factor for diabetes. It is also possible that diabetes and depression are associated because diabetes causes depression. An RCT targeting glycemic control in diabetes, with depression as the primary outcome, would test this hypothesis. However, reciprocal causality is also possible. Consequently, even if diabetes treatment were shown to decrease depression, this would not literally falsify the hypothesis that depression causes diabetes. Nevertheless, if diabetes treatment was shown to prevent or improve depression, but depression treatment could not be shown to prevent or improve diabetes, then the “depression causes diabetes” hypothesis would essentially have been falsified. It would be almost impossible to design a clinical trial that could completely falsify any of behavioral medicine’s major hypotheses in one fell swoop. Falsification can emerge from a line of research, but it is unlikely to result from a single trial.
3.7 Mechanistic Research Long after behavioral variables such as mental stress, anger, or anxiety have been established as risk factors for a disease or an adverse medical
937
outcome, questions tend to remain about the underlying mechanism(s). Observational studies and laboratory experiments comprise most of the mechanistic studies in behavioral medicine, but clinical trials also provide opportunities to test mechanistic hypotheses. If a candidate mechanism has been associated in cross-sectional research both with a behavioral risk factor and with an adverse medical outcome that the risk factor predicts, one of the next questions to ask is whether change in the risk factor is associated with change in the candidate mechanism. This can be accomplished by correlating the change scores derived from an observational study, but not if the risk factor tends to remain stable. By intervening in the risk factor, an investigator can induce change, or promote more rapid change than would otherwise occur, and thereby create the variability needed to correlate the risk factor and candidate mechanism change scores. This strategy utilizes an intervention to perturb a behavioral risk factor. The purpose of the study is not to test the efficacy of the intervention, but rather to use an intervention with established efficacy to manipulate the risk factor. There is no need to control for threats to any conclusions that might be drawn about the efficacy of the treatment, so an uncontrolled, “open label” trial is a satisfactory way to test this type of hypothesis. Including a control group would waste resources and needlessly expose additional subjects to experimentation. For example, Carney et al (2000) used CBT to treat a group of patients with stable coronary disease and comorbid depression in an uncontrolled trial. The aim was to induce change in depression, in order to correlate it with change in various indicators of cardiovascular autonomic dysregulation, including heart rate and heart rate variability. CBT was chosen because it was an efficacious, nonpharmacological treatment for depression. It was considered to be less likely than antidepressant medications to have direct, physiological effects on the cardiovascular variables of interest, i.e., effects that are not mediated by change in depression. Since CBT and antidepressants improve depression via different pathways, it might have
938
been informative to conduct a randomized comparison of CBT vs. an antidepressant. The primary aim would still have been to correlate change in depression with change in cardiovascular variables, but this design would make it possible to determine whether the type of treatment explained any of the variance in the cardiovascular change scores. This might have provided some new insights into the cardiovascular effects of antidepressants, but it would not have provided better data than the uncontrolled trial with regard to whether cardiovascular autonomic dysregulation is a plausible mechanism linking depression to cardiac events. It would also have been more expensive and expose more subjects to experimentation. Here again, a trial design may seem appear at first glance to be rigorous by familiar methodological standards, yet not serve the purpose of the study as well as a seemingly less rigorous design. Mechanistic questions can also be investigated as ancillary studies within standard efficacy RCTs. These trials need control groups, unlike in the kinds of studies discussed above. The controls are justified by the primary (efficacy) aims of the trial, not by the mechanistic aims. If the control group data are useful for ancillary mechanistic studies, all the better, but that is not why the control groups are included in these studies. Although there are not many examples in the behavioral medicine literature to draw upon, RCTs can also be used to falsify mechanistic hypotheses. For example, depression is known to increase the risk of Type 2 diabetes, but the mechanisms underlying this risk have not been firmly established. Glucose dysregulation is one of the leading candidates. However, it is also possible that glucose dysregulation causes depression. If the causal arrow runs in this direction, then glucose dysregulation may be a “third variable” cause of both depression and diabetes. One way to study this question would be to randomly assign a sample of depressed, pre-diabetic individuals to treatment with either a metformin or a selective serotonin reuptake inhibitor. If the antidepressant failed to improve glucose control despite improving depression, and if the metformin improved depression by
K.E. Freedland et al.
improving glucose control, this would help to falsify the hypothesis that depression causes diabetes by causing glucose dysregulation. This would be a two-arm comparison of two active treatments, with no placebo control arm. A placebo could have mild effects on both depression and on glycemic control, so the inclusion of a placebo control arm would probably not be very informative.
4 Summary Clinical trial designs that are suitable for a given purpose, population, or setting may not be suitable for others. The design of a trial should match its purpose, even if this means challenging some common assumptions about what a rigorous trial should look like. For example, a trial whose primary purpose is to test a mechanistic hypothesis should not necessarily resemble a treatment efficacy trial. Many behavioral medicine researchers conduct their trials in medical care settings and enroll participants who either already have or are at risk for developing medical illnesses. The pervasive influence of usual care in these settings is a salient consideration for trial designers. It is especially important to take usual care into account when choosing control groups. Careful consideration of these issues could help to resolve many of the controversies that surround the design of clinical trials in behavioral medicine, including disagreements about control conditions. Greater agreement about trial design issues among all interested parties (including investigators, grant reviewers, and institutional review boards) would help to foster the growth of clinical trials in behavioral medicine.
References Anderson, R. J., Freedland, K. E., Clouse, R. E., and Lustman, P. J. (2001). The prevalence of comorbid depression in adults with diabetes: a meta-analysis. Diabetes Care, 24, 1069–1078.
59
Trial Design in Behavioral Medicine
Beck, J. S. (1995). Cognitive Therapy: Basics and Beyond. New York: Guilford. Berkman, L. F., Blumenthal, J., Burg, M., Carney, R. M., Catellier, D. et al (2003). Effects of treating depression and low perceived social support on clinical events after myocardial infarction: the Enhancing Recovery in Coronary Heart Disease Patients (ENRICHD) Randomized Trial. JAMA, 289, 3106–3116. Burns, T. (2009). End of the road for treatment-as-usual studies? Br J Psychiatry, 195, 5–6. Campbell, D. T., and Stanley, J. C. (1966). Experimental and Quasi-experimental Designs for Research. Chicago: R. McNally. Carney, R. M., Freedland, K. E., Stein, P. K., Skala, J. A., Hoffman, P. et al (2000). Change in heart rate and heart rate variability during treatment for depression in patients with coronary heart disease. Psychosom Med, 62, 639–647. CAST Investigators (1989). Preliminary report: effect of encainide and flecainide on mortality in a randomized trial of arrhythmia suppression after myocardial infarction. The Cardiac Arrhythmia Suppression Trial (CAST) Investigators. New Engl J Med, 321, 406–412. Elwood, J. M. (2007). Critical Appraisal of Epidemiological Studies and Clinical Trials, 3rd Ed. Oxford: Oxford University Press. Fleming, T. R., and DeMets, D. L. (1996). Surrogate end points in clinical trials: are we being misled? Ann Intern Med, 125, 605–613. Frasure-Smith, N., Koszycki, D., Swenson, J. R., Baker, B., van Zyl, L. T. et al (2006). Design and rationale for a randomized, controlled trial of interpersonal psychotherapy and citalopram for depression in coronary artery disease (CREATE). Psychosom Med, 68, 87–93. Freedland, K. E., Skala, J. A., Carney, R. M., Rubin, E. H., Lustman, P. J. et al (2009). Treatment of depression after coronary bypass surgery: a randomized, controlled trial. Arch Gen Psychiatry, 66(4), 387–396. Friedman, L. M., Furberg, C., and DeMets, D. L. (1998). Fundamentals of Clinical Trials, 3rd Ed. New York: Springer. Glasgow, R. E., Davidson, K. W., Dobkin, P. L., Ockene, J., and Spring, B. (2006). Practical behavioral trials to advance evidence-based behavioral medicine. Ann Behav Med, 31, 5–13. Glasgow, R. E., and Emmons, K. M. (2007). How can we increase translation of research into practice? Types of evidence needed. Ann Rev Public Health, 28, 413–433. Kazdin, A. E. (2003a). Methodological Issues and Strategies in Clinical Research, 3rd Ed. Washington, DC: American Psychological Association. Kazdin, A. E. (2003b). Research Design in Clinical Psychology, 4th Ed. Boston: Allyn and Bacon. Knowler, W. C., Barrett-Connor, E., Fowler, S. E., Hamman, R. F., Lachin, J. M. et al (2002). Reduction in the incidence of type 2 diabetes with lifestyle
939 intervention or metformin. New Engl J Med, 346, 393–403. Lesperance, F., Frasure-Smith, N., Koszycki, D., Laliberte, M. A., van Zyl, L. T. et al (2007). Effects of citalopram and interpersonal psychotherapy on depression in patients with coronary artery disease: the Canadian Cardiac Randomized Evaluation of Antidepressant and Psychotherapy Efficacy (CREATE) trial. JAMA, 297, 367–379. Minneci, P. C., Eichacker, P. Q., Danner, R. L., Banks, S. M., Natanson, C. et al (2008). The importance of usual care control groups for safety monitoring and validity during critical care research. Int Care Med, 34, 942–947. Mohr, D. C., Spring, B., Freedland, K. E., Beckner, V., Arean, P. et al (2009). The selection and design of control conditions for randomized controlled trials of psychological interventions. Psychother Psychosom, 78, 275–284. Piantadosi, S. (2005). Clinical Trials: A Methodologic Perspective, 2nd Ed. Hoboken, NJ: Wiley-Interscience. Popper, K. R. (1972). The Logic of Scientific Discovery, 6th impression revised Ed. London: Hutchinson. Romero, C., Friedman, L. C., Kalidas, M., Elledge, R., Chang, J. et al (2006). Self-forgiveness, spirituality, and psychological adjustment in women with breast cancer. J Behav Med, 29, 29–36. Schmittdiel, J. A., Uratsu, C. S., Karter, A. J., Heisler, M., Subramanian, U. et al (2008). Why don’t diabetes patients achieve recommended risk factor targets? Poor adherence versus lack of treatment intensification. J Gen Intern Med, 23, 588–594. Selby, J. V., Uratsu, C. S., Fireman, B., Schmittdiel, J. A., Peng, T. et al (2009). Treatment intensification and risk factor control: toward more clinically relevant quality measures. Med Care, 47, 395–402. Street, L. L. and Luoma, J. B. (2002). Control groups in psychosocial intervention research: ethical and methodological issues. Ethics Behav, 12, 1–30. Thompson, B. T., and Schoenfeld, D. (2007). Usual care as the control group in clinical trials of nonpharmacologic interventions. Proc Am Thorac Soc, 4, 577–582. Tsang, R., Colley, L., and Lynd, L. D. (2009). Inadequate statistical power to detect clinically significant differences in adverse event rates in randomized controlled trials. J Clin Epidemiol, 62, 609–616. Tunis, S. R., Stryer, D. B., and Clancy, C. M. (2003). Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy. JAMA, 290, 1624–1632. Werner, R. M., Goldman, L. E., and Dudley, R. A. (2008). Comparison of change in quality of care between safety-net and non-safety-net hospitals. JAMA, 299, 2180–2187. Wolfe, D. A., Wekerle, C., Scott, K., Straatman, A. L., Grasley, C. et al (2003). Dating violence prevention with at-risk youth: a controlled outcome evaluation. J Consult Clin Psychol, 71, 279–291.
Chapter 60
Methodological Issues in Randomized Controlled Trials for the Treatment of Psychiatric Comorbidity in Medical Illness David C. Mohr, Sarah W. Kinsinger, and Jenna Duffecy
1 Why Do We Need RCTs for Treatments for Psychiatric Disorders in Medical Patients? An enormous literature has validated the use of a variety of psychological and behavioral treatments for many common psychiatric disorders, including the mood and anxiety disorders (Cuijpers et al, 2008) that commonly afflict people with medical illnesses. Presumably many of these trials enroll participants who are reasonably representative of the general public, which includes many people with many medical illnesses. So why do we need randomized controlled trials (RCTs) of psychological and behavioral treatments for psychiatric disorders in patients with medical illness? Why not accept the findings of RCTs conducted in the general public? There are at least three broad answers to that question: First, RCTs of psychological and behavioral treatments are needed when there are questions of whether a validated treatment generalizes from one population to another. Second, treatments are sometimes altered to meet the specific needs of a patient population. For example, treatments may include components to manage symptoms of the medical illness, such as pain or fatigue, or may be delivered differently to
D.C. Mohr () Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, 680 N. Lakeshore Drive, Suite 1220, Chicago, IL 60611, USA e-mail: [email protected]
accommodate disabilities. These alterations may change the efficacy of the treatment. Finally, we will propose that trials are needed in part because our ability to accurately diagnose psychiatric disorders is diminished by medical illness. We will argue that when outcomes in RCTs of validated psychological and behavioral treatments are substantially smaller in medical populations than in non-medical populations, the problem may lie in our ability to accurately identify the psychiatric disorder in that population, rather than in the intervention. We will discuss the implications of this for RCT design.
2 The Influences of Medical Illness on Psychological Functioning 2.1 Occurrence of Psychiatric Disorders in Medical Populations The literature on the relationship between medical illness and psychiatric disorders is mixed. By far the largest literature on comorbid psychiatric problems and medical illness has focused on depressive symptoms and disorders. Chronic medical conditions have generally been associated with increased prevalence of depression and anxiety (Scott et al, 2007). Rates of depression have been shown to be higher among patients with coronary artery disease, particularly following myocardial infarction and stroke (Connerney et al, 2001; Hackett et al, 2005), chronic obstructive pulmonary disease (COPD)
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_60, © Springer Science+Business Media, LLC 2010
941
942
(Yohannes et al, 2000), autoimmune diseases such as multiple sclerosis (Patten et al, 2003) and inflammatory bowel disease (IBD) (Graff et al, 2009), rheumatoid arthritis (Dickens et al, 2002), hepatitis C (Carta et al, 2007), and sickle cell disease (Levenson et al, 2008), just to name a few. However, not all medical conditions are consistently associated with increased psychiatric distress. For example, while depression and anxiety are high among patients with traumatic conditions such as limb amputations or spinal cord injuries up to 2 years post-amputation, these rates fall to general population levels once the condition stabilizes (Horgan and MacLachlan, 2004). Among illnesses that are not necessarily stable, the prevalence of psychiatric disorders may vary with the amount of disease activity. For example, patients whose epilepsy is uncontrolled are at greater risk for depression, while prevalence among patients with controlled epilepsy is similar to the general population (Kanner, 2003). However, even when the prevalence of diagnosable psychiatric disorders is not elevated, this does not mean that there is no distress. For example, among diabetes patients, rates of diagnosable depression have been shown to be similar to the general population, however, subthreshold depressive symptoms remain elevated (Fisher et al, 2007). These patterns of variable relationships between psychiatric illness, severity of psychiatric symptoms, and medical illness are also seen within disease groups. There are fairly consistent findings that rates of depression are higher among cancer patients generally. However, rates are particularly high among some cancer populations (oropharyngeal, pancreatic, breast, lung) and lower in others (colon, gynecological, lymphoma) (Massie, 2004). Psychiatric disorders are also more prevalent in the first year or two after diagnosis, but become equivalent to the general population thereafter (Stanton, 2006). Although rates of diagnosable psychiatric problems decline, patients may nevertheless continue to experience adjustment difficulties. There is a substantial literature indicating that many cancer patients experience long-term psychosocial difficulties, including impaired quality of life and
D.C. Mohr et al.
disease-specific concerns (e.g., fears of recurrence, body image concerns, sexual dysfunction) (Stanton, 2006). The increased rates of psychiatric disorders among medical populations may be attributable to either psychosocial or biological factors. Arguments for the role of psychosocial factors in producing psychiatric symptoms generally fit into a diathesis-stress model (Banks and Kerns, 1996), in which stress triggers a psychiatric disorder among a subset of the population with a specific illness who also carry specific genetic, biological, psychological, or social vulnerabilities. Perceived threats or losses in health, wellbeing, and social functioning coupled with the belief that these threats or losses cannot be effectively managed or controlled can increase risk of psychological distress in vulnerable individuals. The decline in prevalence of psychiatric disorders following initial adjustment to a medical problem may be related to patients’ capacities to adjust to adverse events and circumstances such as disability or threat of recurrence (Burgess et al, 2005; Horgan and MacLachlan, 2004; Kanner, 2003). On the other hand, other diseaserelated symptoms, most notably pain, are more difficult to adjust to and may constitute an ongoing stressor that triggers increased psychological distress (Banks and Kerns, 1996). Psychiatric symptoms may also be directly related to the disease processes. This can occur in at least two ways. Increased psychiatric symptoms can result from anatomical pathology related to the medical disorder. For example, illnesses that result in the destruction of brain tissue, such as cerebrovascular disease or multiple sclerosis, can produce lesions in regions of the brain that regulate emotion and behavior, resulting in increased risk of psychological and behavioral symptoms (Fang and Cheng, 2009; Feinstein et al, 2004). To the degree that the anatomical pathology is chronic, the vulnerability to psychological and behavioral symptoms may also be permanent. Alternatively, disease processes or the pathogenesis of medical illness may produce psychiatric symptoms. For example, inflammation-associated injury, such as spinal cord injury, can produce psychological and behavioral symptoms of depression and
60
Methodological Issues in Randomized Controlled Trials
anxiety that subside as inflammation decreases (Riegger et al, 2009). Alternatively, inflammatory diseases such as multiple sclerosis (MS) and IBD are characterized by periodic increases in inflammation that may result in symptoms of depression and anxiety (Gold and Irwin, 2006; Rosenkranz, 2007). Likewise, neuroendocrine dysregulation associated with medical diseases such as cardiovascular disease could produce depressive symptoms (Joynt et al, 2003). In many cases these pathogenic causes may exert episodic and time-limited influences on psychiatric symptoms. Admittedly, the conceptual distinctions between psychosocial and biological or between anatomical pathology and pathogenesis may be somewhat blurred in reality. That is, inflammation may also increase pain, which in turn may contribute to psychological distress. Damage to central nervous system tissue may produce both permanent damage and inflammation that varies over time. Also, as noted in Chapters 44 and 45, psychiatric symptoms can cause changes in neuroendocrine and immune function. Despite the overlapping and recursive nature of these relationships, it is useful to consider these sources independently when considering the effects of medical illness on RCTs for psychiatric disorders.
2.2 Identifying Psychiatric Disorders in Medical Populations A critical component of an RCT is selecting a sample of patients with the disorder or problem that the intervention is intended to treat. A sample that is well defined with respect to the target problem will support a good test of the experimental treatment. A sample that is poorly defined, containing misdiagnoses or false positives, is less likely to provide an accurate test of the intervention. Much of medicine has improved diagnostic validity and reliability through the use of laboratory tests, imaging, and other technologies. In contrast, psychiatric diagnosis continues to rely on clusters of symptoms that are often relatively non-specific and
943
which, even among patients without medical illness, produce groups that are likely heterogeneous in terms of the underlying etiology of the symptoms used in diagnosis. This heterogeneity can decrease power in RCTs, since those who are false positives may be less likely to respond to the treatments (Cipriani et al, 2009). The degree to which medical illness creates symptoms that mimic, but are unrelated to psychiatric disorders, the problem of heterogeneity is only aggravated. An example of this are symptoms used to diagnose post-stroke depression, which can have a variety of etiologies including brain tissue damage, inflammation, and psychosocial stressrelated reactions (Fang and Cheng, 2009). The heterogeneity of etiological factors underlying symptoms of post-stroke depression is further evidenced by the frequent lack of responsiveness to antidepressant medications (Hackett et al, 2008). Thus, among patients with medical illness and presumed comorbid psychiatric diagnoses, the symptoms used to diagnose psychiatric illness may be caused by the medical illness rather than the psychiatric disorder. Complicating the matter, there is no reason why any individual symptom must have a single etiology. For example, fatigue is a common symptom in MS experienced by 65–97% of all patients (Bakshi, 2003). Depression is also common, with 15–26% of patients experiencing a major depressive episode in a 12-month period (Patten et al, 2003) and nearly 50% experiencing significant symptoms of depression (Chwastiak et al, 2002). Thus, for many patients with MS, fatigue may be multiply determined. Simply understanding whether or not the symptom is caused by the medical illness may not necessarily be of assistance in determining whether or not it should be counted as a symptom for a psychiatric assessment.
2.3 Measurement Issues Specific to Medical Populations The potential complications of confounded symptoms emerge in RCTs in the assessment of the psychiatric disorder. Numerous symptoms of
944
psychiatric disorders may be confounded with medical illness, such as fatigue, changes in sleep and appetite, agitation and/or tremors, sweating, gastrointestinal symptoms, and neuropsychological changes (Koenig et al, 1997; Mohr et al, 1997). Thus, assessments of psychiatric disorders in populations or samples with medical illnesses may produce false positives, both on symptoms and diagnoses, and may result in elevated levels of symptom severity. Several methods have been proposed to handle the problem of confounded symptoms (Cohen-Cole and Harpe, 1987). The etiologic approach requires the evaluator to determine the etiology of the symptom. This can be effective, but requires a high level of expertise from the evaluators (Koenig et al, 1995). The exclusive approach simply excludes those symptoms that are confounded. This is reliable, as the decision can be made a priori and not on a case-by-case basis, but it requires making a diagnostic determination based on a smaller number of symptoms, which may reduce validity. The substitutive approach substitutes non-confounded symptoms for confounded ones. For example, social withdrawal might be substituted for fatigue in the diagnosis of major depressive episode. This can also be reliable, but it is unclear if modified diagnostic criteria will identify the same patients as the original categories. Finally, the inclusive approach includes any symptom, without consideration of its potential etiology. While these different approaches likely identify different groups of people as meeting diagnostic criteria, no method is necessarily superior to any other (Koenig et al, 1997). Not surprisingly, inclusive approaches are among the most reliable, since there is no determination, either by the evaluator or a priori by the investigator, as to whether a psychiatric symptom is confounded with a medical symptom or not. However, inclusive approaches produce high false-positive rates. Exclusive approaches, on the other hand, may increase false negatives. When following patients longitudinally, as in an RCT, an approach that excludes symptoms based on etiology has been shown to be the most sensitive to change, although it likely produces
D.C. Mohr et al.
false negatives (Koenig et al, 1997). Etiologic approaches are a middle road. However, etiologic approaches require a high level of skill on the part of evaluating clinicians in making such determinations (Koenig et al, 1995), and thus may be beyond the budgets of many RCTs. The relationship between medical illness and psychiatric symptoms is complex and has important implications for RCTs. Psychiatric and medical symptoms often overlap and symptom etiology is not always clear. These comorbidities can interfere with accurate detection of psychiatric disorders, thereby increasing heterogeneity and false-positive rates in an RCT sample. We encourage researchers to consider the ways in which these overlapping symptoms can influence recruitment and assessment when designing RCTs of psychological and behavioral treatments.
3 Effects of Medical Illness and Environmental Factors on Psychiatric Symptoms Longitudinally: Implications for RCTs Much of the discussion above has focused mainly on the impact of medical illness on psychiatric disorders at a single point in time. However, both medical illness and psychiatric comorbidities can change over time, as can the relationship between the two. The complexity of these relationships can make designing an RCT with medically ill populations particularly challenging. On the medical side of the equation, the pathological and pathogenic features of an illness can change over time or exert a consistent influence on psychiatric symptoms. Treatment of the medical illness can also change over time and influence psychiatric symptoms. On the psychiatric comorbidity side of the equation, the natural history of the disorder can change – indeed many psychiatric problems improve to some degree without treatment. A person’s ability to adapt to the symptoms, problems, and
60
Methodological Issues in Randomized Controlled Trials
sequelae of long-term medical illness can also change over time. Environmental factors complicate the picture even further. Environmental factors specific to medical patients can influence psychiatric symptoms during the course of an RCT. For example, patients with chronic medical conditions are treated in medical centers where they are more likely to have their psychiatric symptoms may be identified and treated (Harman et al, 2005), which can produce effects that compete with the treatment effect under scrutiny in the trial. Thus, to understand the relationship between medical illness and psychiatric disorders in an RCT, one must conceptualize these variables as processes that occur and interact with each other over time, and not just as fixed, unchanging constructs.
3.1 Interactions Between Medical Illness and Psychiatric Symptoms Longitudinally The interaction between medical illness and psychiatric disorder over the course of an RCT is displayed graphically in Fig. 60.1. The relationship between medical illness and psychiatric disorder at baseline is depicted as arrow (a). As described above, this relationship includes pathological and pathogenic processes related to the medical illness that produce symptoms that aggravate or mimic the symptoms of psychiatric illness, as well as psychological reactions related to symptoms and adjustment. Below we will describe many potential relationships between medical illness and psychiatric disorders that can occur over time. Each potential relationship is represented by an arrow. As noted above, many medical illnesses produce symptoms of psychiatric disorders. In some cases these may be comparatively stable symptoms, at least over the course of a 2–4 month trial of a psychological or behavioral treatment. This is depicted in arrow (c). In other words, the medical illness might be expected to exert a fairly consistent effect on the
945
psychiatric disorder (and measured outcome) over the course of the trial. For example, among post-stroke patients, depression does not change appreciably in the first weeks and months following stroke, and response to psychological and behavioral treatment is very modest at best (Hackett et al, 2008). The persistence of depressive symptoms and their apparent resistance to treatments that are known to work in other populations suggests that these symptoms are in part driven by pathologic or pathogenic features of the medical disease. The relatively small effect produced by psychological and behavioral treatments indicates that very large samples would be required for trials that were adequately powered. Trials sometimes select time points at which depression is likely to be most prevalent or most severe. This can occur inadvertently or by design. For example, selecting patients when the psychiatric disorder is at its worst often happens inadvertently, since people often seek treatment when symptoms have worsened. Randomization with an appropriate control arm should control for any improvement that occurs as part of the natural course of the illness (Mohr et al, 2009). A problem particular to trials in behavioral medicine occurs when the heightened psychiatric symptoms are due to medical illness, as illustrated by arrow (b). In this case, the baseline medical condition affects not only the baseline psychiatric symptoms (a), but also symptoms of depression later in the course of the RCT (b). An example of this is trials that examine interventions for depression following myocardial infarction (MI). There is considerable interest in post-MI depression, given the strong relationships between depression, and mortality and morbidity in this population (Carney et al, 2002). This has prompted numerous trials examining treatment for depression, most of which have generally produced small or even negligible effect sizes (Thombs et al, 2008). Part of the reason for the failure of these trials may lie in the effect of heart disease on depressive symptoms. While depression is very common immediately following MI, spontaneous remission occurs frequently, with nearly half of all depressions
946
D.C. Mohr et al.
Medical Factors (Illness, treatments environment) a
b
d
e
e
c Psychiatric Disorder f
g
Experimental Treatment Effect
Randomization
End-of-Tx
Follow-up
(a) Relationship between medical factors and psychiatric outcome at baseline (b) Medical factors exert a relatively constant effect over time on psychiatric disorder over the course of the RCT. (c) Changes in the course of medical illness may affect the course of psychiatric disorder. (d) Medical disease events (e.g. exacerbations, relapses, etc) can impact psychiatric disorder. (e) Medical factors can impact maintenance of gains in psychiatric symptoms during the post-treatment follow-up period (f) Experimental treatment is expected to have an effect on the psychiatric disorder during the treatment period (g) Experimental treatment is expected to have an effect on the psychiatric disorder after treatment cessation
Fig. 60.1 Temporal relationship between medical factors, psychiatric disorder, and experimental treatments (solid lines indicate effects increasing psychiatric
symptoms; dashed lines indicate effects decreasing psychiatric symptoms)
remitting within a year following the MI (Hance et al, 1996). This change in depressive symptoms may be in part due to changes in pathogenic factors of cardiovascular disease such as neuroendocrine dysregulation or inflammation that may cause symptoms similar to depression (Joynt et al, 2003). Improvement in these factors in the months following MI may lead to decreasing depressive symptoms. If a substantial number of patients in a trial experience improvement in measures of psychiatric outcomes that are the result of improvements in medical conditions (e.g., reduced depression resulting from lower inflammation), the trial will require a larger sample size to be powered to detect a difference. Thus, the prognosis at baseline for the psychiatric symptoms, and for the medical factors that may drive depression, should be taken into account when designing clinical trials. Medical illnesses also may have pathogenic and clinical features that are episodic or
relapsing remitting. For many such illnesses, including MS, IBD, sickle cell, and others, these relapses are associated with significant increases in psychiatric symptoms such as depression and anxiety (Dalos et al, 1983; Graff et al, 2009; Levenson et al, 2008), which can result in a unique set of disease-related effects on psychiatric outcomes in an RCT (see arrow (d)). MS is a good example of this phenomenon. Multiple sclerosis is in part an autoimmune disease in which many patients experience sudden exacerbations or increases in inflammation and symptoms that can last a period of weeks or months. During disease exacerbation, distress may be experienced by as many as 90% of patients (Dalos et al, 1983). Depression in MS may be due in part to the increased inflammation and cytokine production that are part of the pathogenesis of multiple sclerosis (Gold and Irwin, 2006). Furthermore, these exacerbations are most commonly treated with high-dose infusions
60
Methodological Issues in Randomized Controlled Trials
of corticosteroids, which are known to produce side effects that include euphoria, and less commonly depression and psychosis (Lyons et al, 1988). Typically trials of treatments for psychiatric disorders among illnesses like MS exclude patients from enrollment while exacerbations are occurring to reduce the likelihood of spontaneous improvement in psychiatric symptoms resulting from resolution of the exacerbation (Mohr et al, 2005). This avoids problems illustrated above with arrow (c), in which changes in illness-driven psychiatric symptoms result in high rates of spontaneous remission. However, given MS patients with relapsing forms of the disease may have an exacerbation every 1–2 years, 16.7–33.3% of all patients enrolled in a trial might be expected to experience an exacerbation during the course of a 16-week intervention, resulting in MS-related increases in psychiatric symptoms. Assuming that an exacerbation may last 2 months, 8.4–16.7% of the sample could be in exacerbation at the time of the outcome assessment. The process of randomization should remove any bias in analyses comparing treatment arms. However, the potential influence of increased psychiatric symptoms resulting from exacerbation and inflammation could increase error variance, reduce variance associated with time by treatment effects, and thereby reduce a study’s power to detect treatment differences. Because waxing and waning symptoms that are potentially linked to the primary outcome could have an impact on power, it is advisable to consider these potential effects during study design, and adjust the sample size accordingly. If one is solely interested in the question of whether or not a treatment is efficacious, it may be useful to include the occurrence of sudden increases in the disease exacerbations in the analytic model as a time-dependent covariate. Maintenance of gains is an increasingly important question in RCTs of psychological and behavioral treatments (Hollon et al, 2005). The effects of medical illness on psychiatric outcomes may be different during post-treatment follow-up, compared to during treatment. For example, in MS, disease severity, level of
947
cognitive impairment (a common symptom of MS), and brain lesion volume are generally unrelated to the efficacy of treatments for depression (Mohr et al, 2003a). However, depression is significantly more likely to worsen during the first 6 months of post-treatment follow-up among patients with greater neuropsychological impairment and greater brain lesion volume. This suggests that treatment may buffer the negative effects that medical illness has on psychiatric symptoms while treatment is occurring, as illustrated by arrow (f). But once treatment is completed, the effects of the medical illness (arrow e) are no longer buffered by the treatments (arrow g), and the psychiatric symptoms can return. Including follow-up periods to examine maintenance of gains is particularly important in trials conducted with medically ill populations. It is also important to examine potential moderating effects of medical illness factors not only on treatment outcome, but also on maintenance of gains.
3.2 The Influence of Environmental Factors on Psychiatric Symptoms The environments of medical patients, compared to the environments of non-medical populations, may contain factors that have unique influences on their psychiatric symptoms. For example, many medical patients have frequent contact with the medical providers. Given that most RCTs of psychological and behavioral treatments do not preclude pharmacotherapy for the target psychiatric problems, the increased potential for competing treatments may increase power requirements, compared to trials focused on populations that do not have frequent contact with medical clinics. On the other hand, many of the treatments for medical illness can produce psychiatric side effects. For example, medications such as interferon-alpha, beta blockers, or chemotherapies can increase the risk of depression (Ried et al, 2005; Russo and Fried, 2003; Wichers et al, 2006), while other medications such as levodopa and corticosteroids
948
can produce symptoms of mania and psychosis (Black and Friedman, 2006; Lyons et al, 1988). Similar to how features of the medical illness can have variable effects on psychiatric symptoms, so can treatments of the illness. These treatments can exert continuous or episodic influences on psychiatric symptoms, potentially influencing RCT outcomes. Changing patterns of contact with medical providers may also exert effects on psychological adjustment during an RCT. For example, the transition from active treatment to early survivorship (i.e., re-entry phase) can be a particularly distressing time for cancer patients (Stanton et al, 2005). This difficult adjustment period is thought to be in part due to the loss of a “safety net.” Patients typically have less frequent contact with health-care providers following active treatment and they might receive less support from family and friends as they transition back to their “normal” lives. Continued side effects from treatment (e.g., fatigue, menopause, sexual dysfunction, lymphodema) are often unexpected and can also contribute to this difficult transition. There is some evidence that most patients do not experience significant psychiatric distress during the re-entry phase (Costanzo et al, 2007) and only a subset of patients experience adjustment difficulties. However, preliminary evidence suggests that psychological and behavioral treatments aimed at facilitating adjustment during this phase can be beneficial, particularly for patients at high risk for adjustment difficulties (e.g., younger women with breast cancer) (Scheier et al, 2005). Just as characteristics of the disease should be considered when designing RCTs of psychological and behavioral treatments for medical patients, so should these environmental factors.
4 The Effects of Medical Illness on Access and Adherence to Psychological and Behavioral Treatments RCTs of psychological interventions in patients with medical illnesses are plagued by problems
D.C. Mohr et al.
of generalizability resulting from who is enrolled and completes studies. RCTs of psychological and behavioral treatments often focus on samples drawn from narrow socioeconomic strata. For example, critiques of the oncology literature argue that ethnic minorities, men, patients with advanced cancer, and patients of lower socioeconomic status are underrepresented in RCTs of psychological and behavioral treatments in cancer (Helgeson, 2005). Samples may be further biased, as barriers to psychological and behavioral care can reduce access to treatments and treatment arms that require frequent clinic visits. Up to two-thirds of general primary care patients identify one or more barriers to attending psychological and behavioral treatments and that rate rises to 75% among patients with depression (Mohr et al, 2006). While cost is certainly a barrier, other barriers include transportation problems, time constraints, interference from medical symptoms, and living too far from specialized care. These barriers are even more pronounced among medical patients, given that aspects of the illness likely aggravate these factors. For example, approximately 40% of individuals screened for a recent RCT examining treatment for depression following coronary artery bypass graft surgery were excluded due to transportation issues (Freedland et al, 2009). This is not uncommon even in well executed trials such as this one. The resulting biases may reduce both the generalizability of findings to a broader population and limit the potential public health impact of such interventions. Over the past 15–20 years there has been a growing effort to develop and to evaluate treatments that overcome barriers to access, primarily by bringing the treatment to the patient. Some studies have examined extending psychological and behavioral care by providing home visits, for example in treating post-partum depression (Dennis, 2005) and distress in terminal cancer patients (Mohr et al, 2003b). Increasingly, the telephone has been examined as a tool to deliver treatments to patients who have cancer, HIV, MS, are blind, are elderly, or are caregivers of disabled patients, just to name a few
60
Methodological Issues in Randomized Controlled Trials
patient populations. A recent meta-analysis suggests that telephone-delivered treatment results in rates of attrition that are much lower than those seen in face-to-face delivery (Mohr et al, 2008). More recently there has been a remarkable increase in investigations into internetdelivered treatments, which hold promise as cost-effective methods of delivering treatments (Spek et al, 2007; see Chapter 64). Advances in telecommunication are greatly increasing the capacity to bring treatments to patients. Evaluations of these trials have focused almost exclusively on efficacy, which may be appropriate for early-stage evaluations. However, many of these interventions are being developed for medical populations with the goal of overcoming barriers to care. Yet this goal remains largely untested. In the design of these trials, access to care remains an implicit rather than explicit goal that is not measured or analyzed. The use of different treatment delivery methods may decrease some barriers while creating entire classes of new barriers (Eysenbach, 2005). As these telemental health treatments begin to show initial efficacy, it will be important to make the goals of increasing reach and adherence explicit and to develop designs that can test these hypotheses directly.
5 Reconceptualizing RCTs of Psychological and Behavioral Treatments in Medical Populations to Include Prognosis RCTs of psychological and behavioral Treatments in treating psychiatric disorders in patients with medical disorders often produce effect sizes that compare quite favorably to those seen in trials with medically healthy populations (Freedland et al, 2009). However, not uncommonly the outcomes are more mixed, less robust than seen in healthy populations, or the treatments are simply ineffective (Hackett et al, 2008; Sheard and Maguire, 1999). A common
949
response in the field is to attempt to alter the treatment by tailoring it more specifically to the needs of the population. We will argue here that a second approach is to refine our diagnostic capabilities and incorporate the evaluation of prognostic indicators into RCTs. Much of the discussion in this chapter has involved problems that arise from difficulties in detecting and measuring symptoms of psychiatric disorders in medical populations. Diagnosis is used clinically to determine the utility of a treatment, and under good circumstances it provides some information about the differential prognoses associated with various treatment options. Hence, the oft-quoted notion in psychiatry that diagnosis is prognosis (Goodwin and Guze, 1974). However, if our diagnostic system is producing a highly heterogeneous group, with numerous false positives who either do not respond to treatment or improve even without treatment, then that diagnosis is no longer providing much prognostic value. The identification and validation of prognostic indicators that can differentiate patients who are likely to respond to treatment or who do not need treatment, would substantially improve our ability to provide effective and efficient care. Randomized trials (using control arms or comparative outcome designs) should supply the evidence upon which this prognostic information is based. The RCT design has generally been conceptualized as a method of testing the efficacy or effectiveness of an intervention or therapy. However, when conducting an RCT of a validated intervention or therapy in a medical population, the conceptual clarity of the RCT methodology is muddied, largely due to the confounding factors discussed earlier. Therefore, when we conduct an RCT of a validated treatment in a medical population, we are really asking if that treatment, which is known to be effective in a psychiatric population without a single medical comorbidity, is also effective in the medical population. There are two reasons why a validated treatment may not be effective under these circumstances. One reason is that factors related to the medical illness could interfere with the efficacy of the treatment. As
950
described earlier in the chapter, many features of medical illness and its treatment can threaten the validity of RCTs of psychological and behavioral treatments. That is, the patients in fact have the psychiatric disorder, but something is preventing the treatment from working properly. The other reason that a validated treatment might be less effective in a medical population is that the method of identifying patients is producing large numbers of false positives. In other words, when validated treatments fail to have results similar to those seen in a medically healthy population, the problem may not be the treatment – the treatment failure may reflect our inability to accurately diagnose the psychiatric disorder in that medical population. From this perspective, an RCT is required not only to test the effectiveness of an experimental treatment for a psychiatric disorder, but also to know if our diagnostic procedures are identifying the disorder for which the treatment is known to be effective. If we explicitly recognize that part of the reason for conducting RCTs of otherwise validated treatments in medical populations is because we are unable to accurately identify psychiatric problems in these populations, this has substantial implications for the design of RCTs. In other words, an RCT under these circumstances, to be of maximum benefit, has two broad aims. One is the question of whether the treatment works. The other is to reduce unexplained heterogeneity in the targeted samples. To address this second aim, the focus of RCTs would have to include methodologies that identify symptoms, features, or characteristics of patients that can be used to provide prognostic information as to who is likely to improve and who is likely not to improve, and reduce unexplained heterogeneity in the sample, with respect to the target psychiatric disorder. The design of such a prognostic trial would require careful consideration of which psychiatric symptoms may likely remain unaffected by the medical illness, which symptoms may be confounded, which measureable features of the medical illness predict non-response, and potentially even define the mechanisms by which the confounding may occur. Such a design would
D.C. Mohr et al.
essentially use the active treatment, validated in non-medical populations, as a method of identifying prognostically useful features in the patient population. Once identified, the prognostic model would have to be tested, using the treatment response as the predictive criterion validity for the diagnostic model. Validation could only occur if the prognostic strategy were identified a priori. One argument against this thesis is that by identifying symptoms that are responsive to our treatments, we are confounding diagnosis and outcome in a way that could lead to a loss of diagnostic clarity. Traditionally, methodology in clinical research suggests that first a problem should be identified (diagnosis) and only afterwards can a solution to that problem (treatment) be developed. To identify symptoms that predict response to treatment is in effect developing a solution and then looking for a problem that it fixes. The drawback of using a traditional linear approach to validate psychological and behavioral treatments with medical patients is that it limits our ability to develop effective care strategies for populations who experience significant psychological and psychiatric difficulties and it is out of keeping with practices and standards that are currently the norm among clinical investigators in medicine. We would counter this argument for a linear approach with three points. First, the process of problem identification and evaluation of solutions is not nearly so linear in practice. Certainly the development of pharmaceutical therapies usually begins with the identification of a specific problem and the attempt to manufacture a pharmacological therapy that is safe and effective. But it is not uncommon for the target of a promising compound to change as more is learned about the effects of the agent. And the development of off-label alternative uses, even for problems that are not clearly diagnosable under the International Classification of Diseases, is also common. Second, we are not suggesting developing a new diagnosis. Rather, we are suggesting identifying prognostic factors that can augment the diagnosis in medical populations, thereby providing information that could
60
Methodological Issues in Randomized Controlled Trials
be very useful for clinicians and policymakers. Third, it is true that expecting a trial to validate a treatment, investigate predictors of response, and test a prognostic model would overburden any single RCT. But treatments are not validated by single trials; they are validated by programs of research and multiple trials. Likewise, no single study could validate both a diagnostic/prognostic strategy and a treatment. However, programs of research in which initial trials include methodological components that promote the identification of prognostic indicators, and later trials that validate those indicators, have the potential to move behavioral medicine forward in populations where validated treatments have been less effective than in medically healthy populations.
6 Summary Given the prevalence of psychiatric disorders in medically ill populations, there is great interest in understanding whether traditional psychological and behavioral treatments of psychiatric disorders are effective for these patients. Unfortunately, there are numerous confounding features of a medical illness that can threaten validity and influence outcomes in a standard RCT. The illness itself, the treatment of the illness, and the environmental factors can all interact with psychiatric symptoms. Furthermore, in an RCT, medical illnesses and psychiatric disorders change over time, and the relationship between the two can change across all stages of a trial, from recruitment to post-treatment follow-up. Despite these challenges, RCTs of psychological and behavioral treatments in medical populations provide unique opportunities. We have proposed a modification in RCT conceptualization that makes explicit the challenges of diagnosis and the role of prognosis in RCTs of psychological and behavioral treatments in medical populations. By designing trials aimed at identifying prognostic features, behavioral medicine researchers have the opportunity to gain valuable information about the specific features that do or do not predict treatment response
951
among these patient groups and thereby improve our ability to provide effective targeted care.
References Bakshi, R. (2003). Fatigue associated with multiple sclerosis: diagnosis, impact and management. Multi Scler, 9, 219–227. Banks, S. M., and Kerns, R. D. (1996). Explaining high rates of depression in chronic pain: a diathesis-stress framework. Psychol Bull, 119, 95–110. Black, K. J., and Friedman, J. H. (2006). Repetitive and impulsive behaviors in treated Parkinson disease. Neurology, 67, 1118–1119. Burgess, C., Cornelius, V., Love, S., Graham, J., Richards, M. et al (2005). Depression and anxiety in women with early breast cancer: five year observational cohort study. Brit Med J, 330, 702–705. Carney, R. M., Freedland, K. E., Miller, G. E., and Jaffe, A. S. (2002). Depression as a risk factor for cardiac mortality and morbidity: a review of potential mechanisms. J Psychosom Res, 53, 897–902. Carta, M. G., Hardoy, M. C., Garofalo, A., Pisano, E., Nonnoi, V. et al (2007). Association of chronic hepatitis C with major depressive disorders: irrespective of interferon-alpha therapy. Clin Pract Epidemiol Ment Health, 3, 22. Chwastiak, L., Ehde, D. M., Gibbons, L. E., Sullivan, M., Bowen, J. D. et al (2002). Depressive symptoms and severity of illness in multiple sclerosis: epidemiologic study of a large community sample. Am J Psychiatry, 159, 1862–1868. Cipriani, A., Furukawa, T. A., Salanti, G., Geddes, J. R., Higgins, J. P. et al (2009). Comparative efficacy and acceptability of 12 new-generation antidepressants: a multiple-treatments meta-analysis. Lancet, 373, 746–758. Cohen-Cole, S. A., and Harpe, C. (1987). Diagnostic assessment of depression in the medically ill. In A. Stoudemire & B. S. Fogel (Eds.), Principles of Medical Psychiatry (pp. 23–36). New York: Grune and Stratton. Connerney, I., Shapiro, P. A., McLaughlin, J. S., Bagiella, E., Sloan, R. P. et al (2001). Relation between depression after coronary artery bypass surgery and 12month outcome: a prospective study. Lancet, 358, 1766–1771. Costanzo, E. S., Lutgendorf, S. K., Mattes, M. L., Trehan, S., Robinson, C. B. et al (2007). Adjusting to life after treatment: distress and quality of life following treatment for breast cancer. Br J Cancer, 97, 1625–1631. Cuijpers, P., Brannmark, J. G., and van Straten, A. (2008). Psychological treatment of postpartum depression: a meta-analysis. J Clin Psychol, 64, 103–118.
952 Dalos, N. P., Rabins, P. V., Brooks, B. R., and O’Donnell, P. (1983). Disease activity and emotional state in multiple sclerosis. Ann Neurol, 13, 573–577. Dennis, C. L. (2005). Psychosocial and psychological interventions for prevention of postnatal depression: systematic review. Brit Med J, 331, 15. Dickens, C., McGowan, L., Clark-Carter, D., and Creed, F. (2002). Depression in rheumatoid arthritis: a systematic review of the literature with meta-analysis. Psychosom Med, 64, 52–60. Eysenbach, G. (2005). The law of attrition. J Med Internet Res, 7, e11. Fang, J., and Cheng, Q. (2009). Etiological mechanisms of post-stroke depression: a review. Neurol Res, 31, 905–909. Feinstein, A., Roy, P., Lobaugh, N., Feinstein, K., O’Connor, P., and Black, S. (2004). Structural brain abnormalities in multiple sclerosis patients with major depression. Neurology, 62, 586–590. Fisher, L., Skaff, M. M., Mullan, J. T., Arean, P., Mohr, D. et al (2007). Clinical depression versus distress among patients with type 2 diabetes: not just a question of semantics. Diabetes Care, 30, 542–548. Freedland, K. E., Skala, J. A., Carney, R. M., Rubin, E. H., Lustman, P. J. et al (2009). Treatment of depression after coronary artery bypass surgery: a randomized controlled trial. Arch Gen Psychiat, 66, 387–396. Gold, S. M., and Irwin, M. R. (2006). Depression and immunity: inflammation and depressive symptoms in multiple sclerosis. Neurol Clin, 24, 507–519. Goodwin, D. W., and Guze, S. B. (1974). Psychiatric Diagnosis. New York: Oxford University Press. Graff, L. A., Walker, J. R., and Bernstein, C. N. (2009). Depression and anxiety in inflammatory bowel disease: a review of comorbidity and management. Inflamm Bowel Dis, 15, 1105–1118. Hackett, M. L., Anderson, C. S., House, A., and Halteh, C. (2008). Interventions for preventing depression after stroke. Cochrane Database Syst Rev, CD003689. Hackett, M. L., Yapa, C., Parag, V., Anderson, C. S., Hackett, M. L. et al (2005). Frequency of depression after stroke: a systematic review of observational studies. Stroke, 36, 1330–1340. Hance, M., Carney, R. M., Freedland, K. E., and Skala, J. (1996). Depression in patients with coronary heart disease: a 12-month follow-up. Gen Hosp Psychiatry, 18, 61–65. Harman, J. S., Edlund, M. J., Fortney, J. C., and Kallas, H. (2005). The influence of comorbid chronic medical conditions on the adequacy of depression care for older Americans. J Am Ger Soc, 53, 2178–2183. Helgeson, V. S. (2005). Recent advances in psychosocial oncology. J Cons Clin Psychol, 73, 268–271. Hollon, S. D., DeRubeis, R. J., Shelton, R. C., Amsterdam, J. D., Salomon, R. M. et al (2005). Prevention of relapse following cognitive therapy vs medications in moderate to severe depression. Arch Gen Psychiatry, 62, 417–422.
D.C. Mohr et al. Horgan, O., and MacLachlan, M. (2004). Psychosocial adjustment to lower-limb amputation: a review. Disabil Rehabil, 26, 837–850. Joynt, K. E., Whellan, D. J., and O’Connor, C. M. (2003). Depression and cardiovascular disease: mechanisms of interaction. Biol Psychiatry, 54, 248–261. Kanner, A. M. (2003). Depression in epilepsy: prevalence, clinical semiology, pathogenic mechanisms, and treatment. Biol Psychiatry, 54, 388–398. Koenig, H. G., George, L. K., Peterson, B. L., and Pieper, C. F. (1997). Depression in medically ill hospitalized older adults: prevalence, characteristics, and course of symptoms according to six diagnostic schemes. Am J Psychiatry, 154, 1376–1383. Koenig, H. G., Pappas, P., Holsinger, T., and Bachar, J. R. (1995). Assessing diagnostic approaches to depression in medically ill older adults: how reliably can mental health professionals make judgments about the cause of symptoms? J Am Ger Soc, 43, 472–478. Levenson, J. L., McClish, D. K., Dahman, B. A., Bovbjerg, V. E., de A. Citero, V. et al (2008). Depression and anxiety in adults with sickle cell disease: The PiSCES project. Psychosom Med, 70, 192–196. Lyons, P. R., Newman, P. K., and Saunders, M. (1988). Methylprednisolone therapy in multiple sclerosis: a profile of adverse effects. J Neurol Neurosurg Psychiatry, 51, 285–287. Massie, M. J. (2004). Prevalence of depression in patients with cancer. J Natl Cancer Inst Monogr, 32, 57–71. Mohr, D. C., Epstein, L., Luks, T. L., Goodkin, D., Cox, D. et al (2003a). Brain lesion volume and neuropsychological function predict efficacy of treatment for depression in multiple sclerosis. J Cons Clin Psychol, 71, 1017–1024. Mohr, D. C., Goodkin, D. E., Likosky, W., Beutler, L., Gatto, N. et al (1997). Identification of Beck Depression Inventory items related to multiple sclerosis. J Behav Med, 20, 407–414. Mohr, D. C., Hart, S. L., Howard, I., Julian, L., Vella, L. et al (2006). Barriers to psychotherapy among depressed and nondepressed primary care patients. Ann Behav Med, 32, 254–258. Mohr, D. C., Hart, S. L., Julian, L., Catledge, C., HonosWebb, L. et al (2005). Telephone-administered psychotherapy for depression. Arch Gen Psychiatry, 62, 1007–1014. Mohr, D. C., Moran, P. J., Kohn, C., Hart, S., Armstrong, K. et al (2003b). Couples therapy at end of life. Psychooncology, 12, 620–627. Mohr, D. C., Spring, B., Freedland, K. E., Beckner, V., Arean, P. H. et al (2009). The selection and design of control conditions for randomized controlled trials of psychological interventions. Psychother Psychosom, 78, 275–284. Mohr, D. C., Vella, L., Hart, S., Heckman, T., and Simon, G. (2008). The effect of telephone-administered psychotherapy on symptoms of depression and attrition: a meta-analysis. Clin Psychol Sci Pract, 15, 243–253.
60
Methodological Issues in Randomized Controlled Trials
Patten, S. B., Beck, C. A., Williams, J. V., Barbui, C., and Metz, L. M. (2003). Major depression in multiple sclerosis: a population-based perspective. Neurology, 61, 1524–1527. Ried, L. D., Tueth, M. J., Handberg, E., Kupfer, S., and Pepine, C. J. (2005). A study of antihypertensive drugs and depressive symptoms (SADD-Sx) in patients treated with a calcium antagonist versus an atenolol hypertension treatment strategy in the International Verapamil SR-Trandolapril Study (INVEST). Psychosom Med, 67, 398–406. Riegger, T., Conrad, S., Schluesener, H. J., Kaps, H. P., Badke, A., Baron, C. et al (2009). Immune depression syndrome following human spinal cord injury (SCI): a pilot study. Neuroscience, 158, 1194–1199. Rosenkranz, M. A. (2007). Substance p at the nexus of mind and body in chronic inflammation and affective disorders. Psychol Bull, 133, 1007–1037. Russo, M. W., and Fried, M. W. (2003). Side effects of therapy for chronic hepatitis C. Gastroenterology, 124, 1711–1719. Scheier, M. F., Helgeson, V. S., Schulz, R., Colvin, S., Berga, S. et al (2005). Interventions to enhance physical and psychological functioning among younger women who are ending nonhormonal adjuvant treatment for early-stage breast cancer. J Clin Oncol, 23, 4298–4311. Scott, K. M., Bruffaerts, R., Tsang, A., Ormel, J., Alonso, J. et al (2007). Depression-anxiety relationships with
953
chronic physical conditions: results from the World Mental Health Surveys. J Affect Dis, 103, 113–120. Sheard, T., and Maguire, P. (1999). The effect of psychological interventions on anxiety and depression in cancer patients: results of two meta-analyses. Br J Cancer, 80, 1770–1780. Spek, V., Cuijpers, P., Nyklicek, I., Riper, H., Keyzer, J. et al (2007). Internet-based cognitive behaviour therapy for symptoms of depression and anxiety: a meta-analysis. Psychol Med, 37, 319–328. Stanton, A. L. (2006). Psychosocial concerns and interventions for cancer survivors. J Clin Oncol, 24, 5132–5137. Stanton, A. L., Ganz, P. A., Rowland, J. H., Meyerowitz, B. E., Krupnick, J. L. et al (2005). Promoting adjustment after treatment for cancer. Cancer, 104, 2608–2613. Thombs, B. D., de Jonge, P., Coyne, J. C., Whooley, M. A., Frasure-Smith, N. et al (2008). Depression screening and patient outcomes in cardiovascular care: a systematic review. JAMA, 300, 2161–2171. Wichers, M. C., Kenis, G., Leue, C., Koek, G., Robaeys, G. et al (2006). Baseline immune activation as a risk factor for the onset of depression during interferonalpha treatment. Biol Psychiatry, 60, 77–79. Yohannes, A. M., Baldwin, R. C., and Connolly, M. J. (2000). Depression and anxiety in elderly outpatients with chronic obstructive pulmonary disease: prevalence, and validation of the BASDEC screening questionnaire. Int J Ger Psychiatry, 15, 1090–1096.
Chapter 61
Quality of Life in Light of Appraisal and Response Shift Sara Ahmed and Carolyn Schwartz
The value of evaluating quality of life (QOL) has always resonated in the minds of patients, clinicians, and society at large. While the term has existed since the time of Pigou in 1920 (Pigou, 1920), it is only in the past two decades that QOL has been operationally defined and that standardized measures exist to allow us to attach a meaningful metric that can be considered for monitoring patient progress and clinical research. The accumulation of such advances in the development of QOL measures and of other patient-reported outcomes (PRO) is reflected in two major changes that serve as the foundation for the practical application of PROs in research and clinical care. The first is the 2006 publication of the Food and Drug Administration (FDA) Guidance on the use of PROs in medical product development to support labeling claims (Guidance for Industry, 2006). This Guidance formalized the use of PROs in drug development and emphasized the use of symptom and function measures in such research (Puhan et al, 2004). The Guidance also provided a clear and unignorable link between PROs and commercial products aimed at improving health (see Chapter 8).
C. Schwartz () DeltaQuest Foundation Inc, 31 Mitchell Road, Concord, MA, USA; Research Professor of Medicine and Orthopaedic Surgery, Tufts University School of Medicine, Boston, MA, USA e-mail: [email protected]
The second is that psychometric methods and theory have grown substantially. These methods have opened the door to a deeper and more nuanced approach to working with data as well as for thinking about change over time. They have also provided useful tools for characterizing clinically important change (Browne and Cudeck, 1993), of relevance, both for individual patient monitoring and for assessment of treatment value (Maltais et al, 2008). Theoretical advances have focused on the impact of adaptation on the interpretation of QOL scores. For example, the meaning of change depends on where you start (Hays and Woolley, 2000). Such “response shifts” represent health-related changes in the meaning of measured concepts, due to changes in the individual’s internal standards, values, and conceptualization of the concept(s) being measured. The growing evidence base for response shift suggests that it is of primary importance in rehabilitation research, since many interventions aimed at helping people with disability outcomes involve teaching response shifts. If they are not adequately measured, the intervention may appear to have no impact because relevant changes are obfuscated. This chapter discusses the relevance of QOL to clinical care and research. We will also describe the evolution of the theoretical scope of QOL research, extending from theories in psychology and other social sciences. We also highlight methodological challenges with evaluating change in QOL and how these may be mitigated by incorporating appraisal and response shift assessments.
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_61, © Springer Science+Business Media, LLC 2010
955
956
S. Ahmed and C. Schwartz
may provide finer-grained estimates of QOL by including standardized evaluations of pain, fatigue, disability, participation in life roles, and other domains related to physical, social, and emotional functions. Such measures are important for evaluating the impact of disease and for assessing the efficacy of treatments (Donaldson, 2006; Greenhalgh et al, 2005; Haywood, 2007; Lipscomb et al, 2007; Stull et al, 2007). Health-related quality of life (HRQL) is a more restricted term in that it refers specifically to the impact of disease and treatment on the lives of patients and is defined as “the capacity to perform the usual daily activities for a person’s age and major social role” (Guyatt et al, 1993). The concepts we describe in this chapter can be applied to any PRO, but from this point on we refer to QOL as it is all-encompassing of other PRO domains.
1 Patient-Reported Outcomes of Quality of Life PROs are measurements of any aspect of a patient’s QOL or health status that comes directly from a patient. QOL is defined by the WHO as an individual’s perception of his/her position in life in the context of the culture and value systems in which he/she lives, and in relation to his/her goals, expectations, standards and concerns. It is a broad-ranging concept, incorporating in a complex way the person’s physical health, psychological state, level of independence, social relationships, and their relationship to salient features of their environment (World Health Organization, 1998). QOL can be thought of as a hierarchical concept, similar to Maslow’s hierarchy of needs (Maslow, 1943; Smith, 1981) (Fig. 61.1). This hierarchy would have at its base physical aspects of functioning, including mobility, fatigue, and pain. At the next layer would be social functioning, participation, etc., followed by emotional functioning. At the top of the pyramid would be existential well-being, including such concepts as purpose in life and self-acceptance. All of the domains are best assessed by PROs because they are subjective by nature and thus require the unique perspective of the patient. QOL PROs
Self-Actualization
Esteem
Love/Belonging
Safety
Physiological
Morality, creativity, spontaneity, problem-solving, lack of prejudice, acceptance of facts
1.1 Generic and Disease-Specific Measures Generic measures of QOL include broad domains and can be used across a wide range of healthy and chronic disease populations. The advantage of generic measures is that they
Existential well-being
Self-esteem, confidence, achievement, respect of others, respect by others
Friendship, family, sexual intimacy
Emotional functioning
Social functioning, sexual functioning
Security of body, of employment, of resources, or morality, of the family, of health
Breathing, food, water, sleep, sex, homeostasis, excretion
Fig. 61.1 Integrating QOL into Maslow’s hierarchy of needs
Physical functioning, financial security
Fatigue, pain, balance
61
Quality of Life in Light of Appraisal and Response Shift
allow for comparisons across groups and have often been used in population and health services delivery studies. Commonly used measures include the SF-36 (Ware Jr., 2000; Ware Jr. et al, 1994), the Sickness Impact Profile (SIP) (Bergner et al, 1976; Bergner et al, 1981), and the WHO-QOL (World Health Organization, 1998). Clinical researchers recognized that generic measures were not specific enough to capture changes in clinical populations. Consequently, over the years, several disease-specific measures have emerged that capture particular domains of relevance to a specific patient population. Some examples for HIV include the MOS-HIV (Holmes and Shea, 1999; O’Leary et al, 1998), WHOQOL-HIV (WHOQOL HIV Group, 2004) and for cancer are the Functional Assessment of Cancer Therapy-General (FACT-G) (Cella et al, 2002b; Cormier et al, 2008) and the EORTC Core Quality of Life Questionnaire (EORTC QLQ-C30) (Groenvold et al, 1997). While disease-specific measures may be more sensitive to changes in a particular patient population, they do not allow for broad comparisons across populations. To benefit from both types of measures some developers have used a modular approach whereby a disease-specific component is built as an adjunct to a core generic measure that allows for greater generalisability.
1.2 The Value of Evaluating QOL With advances in medical technology and drug therapy, individuals in developed countries are living longer with chronic illness (Bodenheimer et al, 2002). This has broadened the focus from only measuring outcome indicators, such as survival, to also evaluating the impact of disease and treatment on the QOL of individuals for the years gained (e.g., Quality-Adjusted Life Year (QALY)) (Donaldson, 2006). QOL assessments have been used as an outcome, a predictor, or an intervention. As an outcome measure, QOL assessments have provided
957
information about the benefits of an intervention in randomized controlled trials (Lim et al, 2003; Mayo et al, 2000) to attach a value to an increased length of survival (Siddiqui et al, 2008) and to evaluate the long-term impact of illness (Mayo et al, 2001). The importance of QOL measures for capturing concepts beyond clinical indicators is reflected in the prognostic value of QOL scores (Sprangers, 2002). In cancer research there is evidence that HRQL is an independent predictor of survival (Siddiqui et al, 2008) and in some studies QOL was found to be even more predictive of survival than known biologic prognostic factors (Sprangers, 2002). QOL assessments have also been used as an intervention in clinical care by providing a mechanism to improve patient–clinician communication (Chumbler et al, 2007; Jacobsen et al, 2002). This allows doctors to identify areas that may otherwise go unnoticed and that can be treated by the medical team if they are medically related problems, such as symptoms or activity limitations. If the problems are non-medical, they may lead to referrals to social workers or psychologists. Therefore, QOL evaluations can play a central role in enhancing the richness of the patient– clinician encounter. In clinical care, the prognostic value of QOL assessments, supported by studies in cancer research, may allow QOL assessments to be used to tailor medical and psychosocial therapy for patients soon after diagnosis.
2 Methodological Advances in QOL Research Methodological advances from educational testing using item response theory (IRT) (Embretson and Reise, 2000) have been applied to HRQL assessments, leading to a paradigm shift in patient-reported outcomes assessment (see Chapter 9). These methods allow the selection of items for short forms based on the range
958
of the underlying trait that is of most interest. Thus, short forms can be created for different disease groupings or levels of disabilities, rather than having one short form for all. These IRT methods have also led to the development of generic computerized adaptive tests, expanded and made widely available through the NIH-Roadmap initiative called PatientReported Outcome Measurement Information System (PROMIS). The PROMIS collaboration has yielded item banks for 11 different QOL domains for use across patient populations (Reeve et al, 2007). Static short forms and dynamic computerized adaptive tests were developed for the following domains: Emotional Distress (Anger, Anxiety, Depression); Fatigue; Pain (Behavior, Impact); Physical Function; Satisfaction (with Discretionary Social Roles, with Social Roles); Sleep Disturbance; and Wake Disturbance. Additionally, a static short form was developed for Global Health. It is unknown, however, how well these computerized adaptive tests function for disease-specific applications. A comparison of the responsiveness of generic computerized adaptive tests and disease-specific short forms is essential for determining the best tool battery for use in clinical research and patient monitoring.
3 The Influence of Adaptation and Appraisal Processes on QOL Evaluations Clinicians have often noted that their clients are continually adapting to their illness and recognize that often patients who would be expected to feel despair given their physical health report being happier and more satisfied with life than expected. Over the past 10 years the QOL field has taken note of the possible influence of “response shift” on the QOL assessments. When individuals experience a health-state change, they may change their internal standards (i.e., recalibration), values (i.e., reprioritization), or meaning (i.e., reconceptualization) of the target
S. Ahmed and C. Schwartz
construct one is asking them to self-report, in this case QOL (Schwartz and Sprangers, 1999; Sprangers and Schwartz, 1999). For example, people with a substantial physical disability may experience a severity of fatigue that they did not know prior to the development of the disability. Consequently, they would recalibrate what “severe fatigue” means to them, making it difficult to compare their pre-disability and post-disability ratings of fatigue as it relates to physical health. They may also reprioritize life domains, such that sense of community and interpersonal intimacy become more important to their sense of well-being than career success or material gains. Finally, they may reconceptualize QOL to focus on those domains where they continue to have control and be effective when rating their QOL. These subtle and not-so-subtle response shifts are to be expected with evaluative constructs, which are assessed by idiosyncratic rather than objective criteria (Sprenkle et al, 2004). Evaluative ratings of participation are products of an appraisal process, where individuals must consider what QOL means to them, what experiences they have had that are relevant to QOL, how experiences compare to desired circumstances or outcomes, and the relative importance of different experiences (Rapkin and Schwartz, 2004). Although clinicians and philosophers have long noted response shift phenomena, with early references linked to Aristotle (Jette et al, 2008) and Heraclitus (Kahn, 1981), the challenge for researchers has been to operationalize the construct in ways that are measurable and robust.
3.1 History of Response Shift Response shift was originally noticed and studied in educational intervention and management science research in the 1970s, where investigators noticed that students’ internal standards of competency changed as a result of learning more about the subject (Armenakis and Zmud,
Quality of Life in Light of Appraisal and Response Shift
1979; Hoogstraten, 1982). For example, students rated their abilities or knowledge in a particular area as stronger or better until they learned more about it, and then, after learning more, rated their abilities as less or the same as before the educational intervention. Similarly, people with spinal disorders may rate themselves as more disabled after treatment than they did before treatment because the yardstick has changed. In the 1990s, interest in response shift developed in studying QOL. Clinicians began to recognize that response shift could obfuscate important treatment-related changes and indeed might even be the subtext or the desired effect in rehabilitation and psychosocial interventions (Schwartz and Sprangers, 1990; Schwartz et al, 1999, 2007). Response shift has now been studied and recognized to affect adaptation to a wide degree of health conditions, including multiple sclerosis (Christensen et al, 1999; Brandtstadter and Renner, 1990; Helson, 1964; Schwartz and Sendor, 1999), cancer (Ahmed et al, 2009a; Bernhard et al, 1999, 2001; Boyd et al, 1990; Breetvelt and Van Dam, 1991; Cella et al, 2002a; Hagedoorn et al, 2002; Jansen et al, 2000; Kagawa-Singer, 1993; Oort et al, 2005; Schwartz et al, 1999; Sprangers et al, 1999), stroke (Ahmed et al, 2003, 2004, 2005), diabetes (Li and Rapkin, 2009; Postulart and Adang, 2000), geriatrics (Daltroy et al, 1999; Rapkin, 2009), palliative care (Schwartz et al, 2002, 2004a, 2005), dental disorders (Ring et al, 2005), and, most recently, orthopedics (Razmjou et al, 2006). A meta-analysis done on response shift reported that response shift findings ranged from moderate to small effect sizes (Schwartz et al, 2006). Although this may seem of minor clinical significance, Oort and colleagues demonstrated that adjusting for response shift in the data analytic phase of a study can boost effect sizes from moderate to large for clinical interventions for cancer patients (Oort et al, 2005). Other research has demonstrated that adjusting for response shift can even reverse putative null or deleterious findings (Schwartz et al, 2007), seemingly a Type II error can occur if response shift is not accounted for (Ring et al, 2005).
959
3.2 Theoretical Foundation of Response Shift The motivation for research in response shift in relation to QOL outcomes began in the late 1990s with the development of the Sprangers and Schwartz (Sprangers and Schwartz, 1999) response shift theoretical model. Within this framework, response shift refers to healthrelated changes in the self-evaluation of a concept (e.g., health, quality of life, pain) due to (1) changes in internal standards (i.e., recalibration); (2) changes in values (i.e., reprioritization); or (3) changes in the conceptualization (i.e., reconceptualization) (Sprangers and Schwartz, 1999). The model proposed that with a change in a person’s health as a catalyst, antecedents (i.e., stable characteristics of the individual such as gender, personality, expectations, and spiritual identity) interact with mechanisms (i.e., behavioral, cognitive, and affective processes that accommodate the catalyst) that may initiate a response shift and result in an overestimation or underestimation of HRQL as measured by objective criteria (Sprangers and Schwartz, 1999). Rapkin and Schwartz (Rapkin and Schwartz, 2004; Sprenkle et al, 2004) (Fig. 61.2) further expanded the model to distinguish mechanisms that are initial responses to catalysts from
Antecedents
Catalyst
Mechanisms
Appraisal
Response shift is inferred when changes in appraisal explain the discrepancy between expected and observed QOL scores Direct Response Shift
Ex pec ted Ob ser ved
61
Moderated Response Shift
Fig. 61.2 Rapkin and Schwartz Model of appraisal and quality of life. Adapted and reprinted with permission from BioMed Central (Rapkin and Schwartz, 2004)
960
response shifts that continue the process of adaptation. In an attempt to make these distinctions, the Rapkin and Schwartz model incorporates appraisal processes as a possible explanation for intra-individual variations in HRQL change scores. Based on the Rapkin and Schwartz model (Rapkin and Schwartz, 2004) individual differences in longitudinal changes in appraisal will affect how people respond to HRQL items. Any response to a HRQL item is dependent on four distinct cognitive processes which correspond to psychological aspects of coping and adjustment. These include (1) induction of a frame of reference; (2) recall and sampling of salient experiences; (3) use of standards of comparison to appraise experiences; and (4) application of a subjective algorithm to prioritize and combine appraisals to arrive at a QOL rating (Rapkin and Schwartz, 2004). Within the appraisal framework, response shift is inferred when changes in appraisal explain discrepancies between expected and observed HRQL scores.
3.3 The Relationship Between QOL and Response Shift to Other Frameworks from Psychology and the Social Sciences Empirically based research on adaptation increasingly highlights that the personal level of happiness is more flexible and thus changeable than was previously thought (Diener, 2006). The field of positive psychology has provided mounting evidence that sustainable increases in happiness levels are possible via interventions that teach ways of refocusing one’s perspective and priorities, and that these increases are sustained over time (Lyubomirsky and Sheldon, 2005; Lyubomirsky et al, 2006; McCullough, 2000; Otake et al, 2006; Seligman et al, 2005). In contrast to this demonstrated flexibility is the increasingly documented genetic influence on HRQOL (Christensen et al, 1999; Kendler et al, 2000; Leinonen et al, 2005; Lykken and
S. Ahmed and C. Schwartz
Tellegen, 1996; Romeis et al, 2000, 2005; Roysamb, 2002; Roysamb et al, 2003; Stubbe et al, 2005; Svedberg et al, 2005, 2006). Although distinct, these areas of research have in common that they provide new insights into the changeability of quality of life (research on adaptation and positive psychology) versus its stability (genetic research). The convergence of these lines of investigation thus supports a state (i.e., situational) and trait (i.e., genetic) conceptualization of HRQOL (Schwartz and Sprangers, 2009). This trait and state distinction has implications for methods and clinical applications of response shift. Measuring relevant personality characteristics may be needed to predict who will undergo response shifts, with what magnitude and in which direction. While personality is included under Antecedents in the original theoretical model proposed by Sprangers and Schwartz (1999), the work on the genetic predisposition for personality and well-being underscores the need to measure it. Further, it points to the need to take different personality characteristics into account that encompass “affective reserve.” One may also want to measure targeted characteristics, such as resilience or emotional flexibility. For healthcare professionals to help patients achieve a response shift, we should focus on those aspects that can change. For example, we can teach better affective, behavioral, and cognitive methods for dealing with health state changes but these may only work optimally for people who have an adequate “affective reserve” or emotional flexibility. This notion does not mean, however, that the constellation of personality characteristics underlying “affective reserve” is given and unchangeable. It is possible that this affective reserve is something that can be hidden or obfuscated by maladaptive traits that can be modified by affective, behavioral, or cognitive methods. For example, cognitive behavioral interventions that teach people how to modify negative appraisals or self-talk may also help people to uncover or enliven traits related to their response shift potential. Thus, interventions to teach response shifts may be able to heighten
61
Quality of Life in Light of Appraisal and Response Shift
one’s response shift proneness. We have to keep in mind that only 50% of a personality trait is estimated to be genetically determined; thus the remaining 50% is amenable to change.
4 Limitations of Current Measures of QOL in Light of Response Shift 4.1 Psychometric Properties of QOL Measures in Light of Response Shift One of the most challenging aspects of response shift research is that it calls into question fundamental assumptions of questionnaires (e.g., measurement invariance) and psychometric criteria, such as reliability, validity, and responsiveness. Schwartz and Rapkin (2004) noted that every quantitative index of reliability, validity, and responsiveness may be distorted by reasonable and expected adaptation-related changes over time. For example, high internal consistency (reliability) and cross-measure correlations (validity) provide little psychometric information about what a measure is evaluating, but rather support the idea that people are answering a set of items in a similar way and that these items reflect a narrow “bandwidth” of a given construct. Similarly, for inter-observer agreement to be high (another aspect of reliability), observers must share a frame of reference, sample the same experiences, apply the same standards, and give experiences equal priority. It is likely, however, that observers may differ in many aspects of QOL appraisal, particularly if their health trajectory has been quite different (Schwartz and Rapkin, 2004). Responsiveness, another key psychometric index that is an extension of validity (Hays and Hadorn, 1992), may also not reflect what is assumed. A measure that is not responsive, that is, it does not change in step with objective indices of health, may be reflecting a provisionally stable set point, to which an individual returns despite a constant level of stress (Carver
961
and Scheier, 2000; Helson, 1964; Schwartz et al, 2004b). This unresponsiveness or stability may be due to habituation (Folkman et al, 1997) or active coping (Brandtstadter and Renner, 1990) and may follow a pattern described by engineers and economists as hysteresis (Mayergoyz, 1991). That is, stress may be added without inducing apparent change in a system (or a person), up to a certain level of tolerance, beyond which the system may undergo permanent and profound change that makes it impossible to returning to earlier tolerances. The impact of these response shifts on psychometric characteristics such as reliability, validity, and responsiveness is not only conceptually important but also operationally important because they influence the interpretation of clinical research findings.
4.2 Implications of Response Shift for Evaluation of Psychosocial and Healthcare Interventions As HRQL is increasingly becoming part of the evaluation profile for interventions, particularly for chronic disease and for those interventions involving health services delivery, developing an estimator of HRQL that differentiates between objective change and changes in standards, conceptualization, and values is essential for the interpretation of the results. The strength of randomized trials is that balance is achieved at the outset on measured and unmeasured variables, which would include conceptualization of HRQL and internal standards. However, in trials where the intervention involves a psychosocial component or support from a healthcare team, the intervention arm may receive information and support to help them cope and manage their illness. As a consequence, the intervention arm may induce a response shift leading to a differential response shift in the two groups. This differential response shift effect may attenuate or exaggerate findings from clinical trials that use HRQL as an end-point (Schwartz
962
and Sprangers, 1999). The implications are that the conclusions drawn from evaluations of the impact of disease or health interventions on HRQL may be incorrect and in turn may guide clinical-care decisions in the wrong direction. The dynamic nature of individuals’ standpoint regarding their health may explain several paradoxical findings in health care. Even a small response shift effect can move an effect size from small to moderate or moderate to large (Oort et al, 2005). The impact of response shift on evaluations of change in HRQL has been reported in a number of studies including those that have evaluated the effects of support groups (Schwartz, 1999) and self-management programs (Ahmed et al, 2009b; Osborne et al, 2006).
5 Methodological Advances in Evaluating Changes in QOL and Response Shift Detection In order to draw appropriate conclusions regarding treatment effects and to fully understand the impact of illness over time, methodological approaches that detect response shift are needed before scores are analyzed and interpreted as actual change. Diverse approaches for assessing response shift have been developed (Schwartz and Sprangers, 1999; Schwartz et al, 2006; Visser et al, 2005). Some of these stem from work initiated in the educational (Howard, 1979) and management sciences (Schmitt, 1982; Schmitt et al, 1984) (Golembiewski et al, 1976; Norman and Parker, 1996). The range and details of these approaches have been outlined in detail elsewhere (Ahmed et al, 2009b; Schwartz and Sprangers, 1999, 2000). This chapter highlights recent advances, mainly statistical approaches, which show promise in being able to monitor and provide estimates that distinguish response shift from changes in QOL for an individual patient. Current methods for detecting response shifts are evolving from a predominant focus on the ‘then-test’ design approach to an emphasis on
S. Ahmed and C. Schwartz
statistical or individualized methods. The thentest defines the magnitude of the response-shift as the difference between the pre-assessment and then-test, which is a retrospective rating of the pre-assessment (Howard and Bray, 1979; Sprangers et al, 1999). The then-test approach has the advantage of being easy to administer and analyze but the disadvantages of random error and/or confounding with recall bias as well as being difficult to interpret. For these reasons, we now briefly describe promising statistical or individualized methods that have evolved in the past few years. There are three statistical methods that have been applied to response shift detection that hold promise: structural equation modeling, latent trajectory analysis with subject-centered residuals, and classification and regression tree analysis. All of these methods require substantial sample sizes, on the order of 10 subjects per variable and a minimum of 200. These methods vary in terms of how much they focus on aggregate analyses versus individual patient-focused, and thus how sensitive they are to individual response shifts. Originally evolving from factor analytic methods, structural equation modeling (SEM) is a technique that combines factor analysis and regression analysis to solve multivariate research questions at a group level (Bollen, 1989; Hoyle, 1995) (see Chapter 57). By analyzing covariance matrices, these models test measurement and structural models to first test the assumption of measurement invariance and then to examine whether relationships among variables are similar over time (i.e., the structural model). Recent advances of this method were made by Oort and colleagues (Oort et al, 2005), to clarify how distinct changes detectable with SEM reflect different aspects of response shift. This work extended earlier work done by Schmitt (1982) and yielded more sensitive algorithms for detecting response shifts. Although this method has the advantage of allowing secondary analysis of existing data to test response shift hypotheses, it has the disadvantage of being sensitive to response shifts only when a majority of the sample does so (Ahmed et al, 2009a, b). Since preliminary estimates of the prevalence of response
61
Quality of Life in Light of Appraisal and Response Shift
shift suggests that about one-half to one-third of respondents exhibit response shifts that are detectable by these methods (Finkelstein et al, 2009; Mayo et al, 2009), one would have to over-sample people prone to response shifts to be able to detect such change using SEM. Oversampling will be feasible when we are better able to predict who experiences response shifts. Latent trajectory analysis with subjectcentered residuals is a method developed by Mayo and colleagues that focuses on the individual and seeks to develop a predictive model to examine patterns in discrepancies between expected and observed scores (Bryk and Raudenbush, 1992; Mayo et al, 2009). By obtaining and scaling model residuals, Mayo creates subject-centered residuals to categorize respondents as either (Cupples and McKnight, 1994) (1) exhibiting no response shift, i.e., the person’s residuals are consistent over time, but there was some change in their perceived QOL; (2) exhibiting a positive response shift, i.e., the person’s evaluation started low, and then shifted or reassessed upward; or (3) exhibiting a negative response shift, i.e., started higher than expected and then reassessed down over time. This method is of interest because it classifies response shift at the individual rather than group level, and because it distinguishes groups based on the timing as well as the direction of the response shift. It is useful for stratified analyses with existing data and thus does not impose additional demands on the respondent. Its primary weakness is that it cannot distinguish random error from response shift. Like other statistically sophisticated methods, it requires a substantial sample size measured over multiple time points to create a predictive model. Classification and regression tree analysis (CART) (Breiman et al, 1993; Haykin, 2002) is a method applied by Li and Rapkin (Li and Rapkin, 2006; Li et al, 2007; Li and Rapkin, 2009) that combines qualitative and quantitative methods to yield a rich analysis of complex data. These investigators utilized the Appraisal Profile developed by Rapkin (Rapkin and Schwartz, 2004), which yields qualitative text data in response to open-ended questions
963
as well as quantitative data in response to multiple choice questions. The tool measures four distinct parameters of the appraisal process: (a) Framing, i.e., what does quality of life mean to the individual; (b) Sampling, i.e., What relevant experiences do I have; (c) Evaluating, i.e., How do experiences compare to relevant standards; (d) Combining, i.e., what is the relative importance of different experiences. These data are then content analyzed to yield categories (Li and Rapkin, 2009) amenable to quantitative analysis, and “trees” are generated. The final product of this analysis is homogenous groupings of respondents who share patterns of appraisal. In this example, Li and Rapkin evaluated appraisal processes in 644 AIDS patients 6 months after enrollment into a study evaluating how appraisal patterns were related to reported general health. The method revealed substantial differences in level of reported general health as a function of distinct combinations of appraisal preferences (Li and Rapkin, 2006, 2009; Li et al, 2007). All of the above-mentioned methods require large sample sizes, which can be a hindrance for researchers conducting small trials or observational studies. We would make two suggestions for response shift detection methods for such researchers. First, we would suggest collecting data using the Rapkin Appraisal Profile (Rapkin and Schwartz, 2004) and working with the data descriptively. For example, one could simply describe how patients answered the open-ended questions using qualitative methods or summarize the most frequent categories endorsed in the multiple choice questions. A second suggestion would be to use another individualized method, the Schedule for the Evaluation of Individual Quality of Life (SEIQOL) (Joyce et al, 1999). This method explicitly allows the cues, levels, and weights to vary within and across individuals. This approach reduces the ambiguity of QOL scores by making this variation explicit and measurable, while retaining the option of comparing a global score over time. In contrast, current QOL measures such as the SF-36 (Ware Jr. et al, 1994) commonly compare overall scores
964
S. Ahmed and C. Schwartz
Table 61.1 Summary of strengths of response shift detection methods Strength Then-test SEM Latent √ √ Easy to use √ Easy to analyze √ Meaningful interpretation if expert involved TBD Individual-level interpretation √ √ Low participant burden √ RSP adjustment possible √ RSP stratification possible √ Does not need large samples
but they do not query or contain information about the disparate domains, cues, or anchors being considered and combined in these overall scores. All of the methods described above have strengths and weaknesses, as summarized in Table 61.1. Regardless of the response shift detection approach however, the investigator should adhere to the following guidelines: always have a comparison/control group to enable theory-driven hypothesis testing; have clearly stated hypotheses about when the response shift will occur (catalyst and change); use a combination of approaches to provide information about convergence among methods; and include an objective clinical criterion measure so that it is possible to distinguish between expected and observed change in quality of life over time.
6 Future Directions in QOL and Response Shift Research The theoretical and methodological developments of QOL assessments have progressed over the past decade. As work using advanced psychometric approaches continues, the routine use of QOL in clinical care will become more feasible. Future developments based on methods that can generate a response shift parameter for each individual will provide stronger insight into our ability to evaluate and account for response shift when estimating change in QOL.
CART √
SEIQOL
√
√ √
√ √
if expert involved
√
Our understanding of response shift will change as different disease trajectories are investigated using existing and novel approaches currently being developed, building on existing tools. Evaluating the biopsychosocial determinants of QOL appraisal will also be critical in understanding whether individuals differ in their cognitive and affective capacities relevant to QOL appraisal, what factors influence different kinds of changes in QOL appraisal, and the timing of changes in appraisal. Not only will this help inform methodological developments for assessing response shift but it will also lead to a better understanding of when, how, and for whom to intervene to improve QOL (Rapkin, 2009). Considering response shift and appraisal can enrich and increase the detected impact of illness and behavioral and psychosocial interventions on individuals’ well-being and QOL. Further QOL research that integrates appraisal and response shift evaluations will ensure that scores are interpreted correctly and that the patient perspective is accurately reflected in the resulting change estimates.
References Ahmed, S., Mayo, N., Hanley, J., And Wood-Dauphinee, S. (2003). Individualized health-related quality of life (Hrql) post stroke: revealing response shift. Qual Life Res, 12, 765. Ahmed, S., Mayo, N. E., Wood-Dauphinee, S., Hanley, J. A., and Cohen, S. R. (2004). Response shift influenced estimates of change in health-related quality of life poststroke. J Clin Epidemiol, 57, 561–570.
61
Quality of Life in Light of Appraisal and Response Shift
Ahmed, S., Mayo, N. E., Wood-Dauphinee, S., Hanley, J. A., and Cohen, R. (2005). The structural equation modeling technique did not show a response shift, contrary to the results of the then test and the individualized approaches. J Clin Epidemiol, 58, 1125–1133. Ahmed, S., Schwartz, C., Ring, L., and Sprangers, M. A. (2009a). Applications of health-related quality of life for guiding healthcare: advances in response shift research. Ed J Clin Epidemiol, 62(11), 1115–1117. Ahmed, S., Bourbeau, J., Maltais, F., and Mansour, S. (2009b). The Oort structural equation modeling approach detected a response shift after a COPD selfmanagement program not detected by the Schmitt technique. J Clin Epidemiol, 62, 1165–1172. Armenakis, A. A., and Zmud, R. W. (1979). Interpreting the measurement of change in organizational research. Person Psychol, 32, 709–723. Bergner, M., Bobbitt, R. A., Pollard, W. E., Martin, D. P., and Gilson, B. S. (1976). The sickness impact profile: validation of a health status measure. Med Care, 14, 57–67. Bergner, M., Bobbitt, R. A., Carter, W. B., and Gilson, B. S. (1981). The sickness impact profile: development and final revision of a health status measure. Med Care, 19, 787–805. Bernhard, J., Hurny, C., Maibach, R., Herrmann, R., and Laffer, U. (1999). Quality of life as subjective experience: reframing of perception in patients with colon cancer undergoing radical resection with or without adjuvant chemotherapy. Ann Oncol, 10, 775–782. Bernhard, J., Lowy, A., Maibach, R., and Hurny, C. (2001). Response shift in the perception of health for utility evaluation. An explorative investigation. Eur J Cancer, 37, 1729–1735. Bodenheimer, T., Lorig, K., Holman, H., and Grumbach, K. (2002). Patient self-management of chronic disease in primary care. Jama, 288, 2469–2475. Bollen, K. A. (1989). Structural Equations With Latent Variables. New York, NY: Wiley And Sons. Boyd, N. F., Sutherland, H. J., Heasman, K., Tritchler, D., and Cummings, B. (1990). Whose utilities for decision analysis? Med Decis Making, 10, 58–67. Brandtstadter, J., and Renner, G. (1990). Tenacious goal pursuit and flexible goal adjustment: explication and age-related analysis of assimilative and accommodative strategies of coping. Psychol Aging, 5, 58–67. Breetvelt, I. S., and Van Dam, F. S. (1991). Underreporting by cancer patients: the case of response-shift. Soc Sci Med, 32, 981–987. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1993). Classification and Regression Trees. New York, NY: Chapman and Hall/Crc. Browne, M., and Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.), Testing Structural Equation Models (pp. 136–162). London: Sage Publications.
965
Bryk, A. S., and Raudenbush, S. W. (1992). Hierarchical Linear Models: Applications and Data Analysis Methods. Thousand Oaks,CA: Sage Publications. Carver, C. S., and Scheier, M. F. (2000). Scaling back goals and recalibration of the affect system are processes in normal adaptive self-regulation: understanding ‘response shift’ phenomena. Soc Sci Med, 50, 1715–1722. Cella, D., Hahn, E. A., and Dineen, K. (2002a). Meaningful change in cancer-specific quality of life scores: differences between improvement and worsening. Qual Life Res, 11, 207–221. Cella, D., Eton, D. T., Fairclough, D. L., Bonomi, P., Heyes, A. E. et al (2002b). What is a clinically meaningful change on the functional assessment of cancer therapy-lung (Fact-L) questionnaire? Results From Eastern Cooperative Oncology Group (Ecog) Study 5592. J Clin Epidemiol, 55, 285–295. Christensen, K., Holm, N. V., Mcgue, M., Corder, L., and Vaupel, J. W. (1999). A Danish population-based twin study on general health in the elderly. J Aging Health, 11, 49–64. Chumbler, N. R., Mkanta, W. N., Richardson, L. C., Harris, L., Darkins, A. et al (2007). Remote patientprovider communication and quality of life: empirical test of a dialogic model of cancer care. J Telemed Telecare, 13, 20–25. Cormier, J. N., Ross, M. I., Gershenwald, J. E., Lee, J. E., Mansfield, P. F. et al (2008). Prospective assessment of the reliability, validity, and sensitivity to change of the functional assessment of cancer therapy-melanoma questionnaire. Cancer, 112, 2249–57. Cupples, M. E., and Mcknight, A. (1994). Randomised controlled trial of health promotion in general practice for patients at high cardiovascular risk. BMJ, 309, 993–996. Daltroy, L. H., Larson, M. G., Eaton, H. M., Phillips, C. B., and Liang, M. H. (1999). Discrepancies between self-reported and observed physical function in the elderly: the influence of response shift and other factors. Soc Sci Med, 48, 1549–1561. Diener, E. (2006). Guidelines for national indicators of subjective well-being and illbeing. J Happiness Stud, 7, 397–404. Donaldson, M. (2006). Using patient-reported outcomes in clinical oncology practice:benefits, challenges and next steps. Expert Rev Pharmacoecon Outcomes Res, 6, 87–95. Embretson, S. E., and Reise, S. P. (2000). Item Response Theory for Psychologists. Mahwah, NJ: Lawrence Erlbaum Associates. Finkelstein, J. A., Razmjou, H., and Schwartz, C. E. (2009). Response shift and outcome assessment in orthopedic surgery: is there is a difference between complete vs. partial treatment? J Clin Epidemiol, 62, 1189–1190. Folkman, S., Moskowitz, J. T., Ozer, E. M., Park, C. L., and Gottlieb, B. H. (1997). positive meaningful
966 events and coping in the context of HIV/AIDS. In B. H. Gottlieb (Ed,), Coping with Chronic Stress (pp. 293–315). New York, NY: Plenum Press. Golembiewski, R. T., Billingsley, K., and Yeager, S. (1976). Measuring change and persistence in human affairs: types of change generated by Od designs. J Appl Behav Sci, 12, 133–157. Greenhalgh, J., Long, A. F., and Flynn, R. (2005). The use of patient reported outcome measures in routine clinical practice: lack of impact or lack of theory? Soc Sci Med, 60, 833–843. Groenvold, M., Klee, M. C., Sprangers, M. A., and Aaronson, N. K. (1997). Validation of the Eortc QlqC30 quality of life questionnaire through combined qualitative and quantitative assessment of patientobserver agreement. J Clin Epidemiol, 50, 441–450. Guidance For Industry: Patient-Reported Outcome Measures: Use In Medical Product Development To Support Labeling Claims: Draft Guidance (2006). Health Qual Life Outcomes, 4, 79. Guyatt, G. H., Feeny, D. H., and Patrick, D. L. (1993). Measuring health-related quality of life. Ann Intern Med, 118, 622–629. Hagedoorn, M., Sneeuw, K. C. A., and Aaronson, N. K. (2002). changes in physical functioning and quality of life in patients with cancer - response shift and relative evaluation of one’s condition. J Clin Epidemiol, 55, 176–183. Haykin, S. (2002). Neural Networks: A Comprehensive Foundation, 2nd Ed. Delhi, India: Pearson Education (Singapore). Hays, R. D., and Hadorn, D. (1992). Responsiveness to change: an aspect of validity, not a separate dimension. Qual Life Res, 1, 73–75. Hays, R. D., and Woolley, J. M. (2000). The concept of clinically meaningful differnce in healthrelated quality-of-life research: how meaningful is it? Pharmacoeconomics, 18, 419–423. Haywood, K. L. (2007). Patient-reported outcome ii: selecting appropriate measures for musculoskeletal care. Muscoskel Care, 5, 72–90. Helson, H. (1964). Adaptation Level Theory. New York, NY: Harper and Row. Holmes, W. C., and Shea, J. A. (1999). Two approaches to measuring quality of life in the HIV/AIDS population: Hat-Qol and Mos-HIV. Qual Life Res, 8, 515–527. Hoogstraten, J. (1982). The retrospective pretest in an educational training context. J Exp Educ, 50, 200–204. Howard, G. S. (1979). Response-Shift bias: a source of contamination of self-report measures. J Appl Psychol, 64, 144–150. Howard, G. S., and Bray, J. H. (1979). Internal invalidity in studies employing self-report instruments: a suggested remedy. J Educ Meas, 16, 129–135. Hoyle, R. H. (1995). Structural Equation Modeling: Concepts, Issues, And Application. Thousand Oaks: Sage Publications, C1995.
S. Ahmed and C. Schwartz Jacobsen, P. B., Davis, K., and Cella, D. (2002). Assessing quality of life in research and clinical practice. Oncology (Williston Park) 16(9 Suppl 10): 133–9. Jansen, S. J., Stiggelbout, A. M., Nooij, M. A., Noordijk, E. M., and Kievit, J. (2000). Response shift in quality of life measurement in early-stage breast cancer patients undergoing radiotherapy. Qual Life Res, 9, 603–615. Jette, A. M., Haley, S. M., Ni, P., Olarsch, S., and Moed, R. (2008). Creating a computer adaptive test version of the late-life function and disability instrument. J Gerontol A Biol Sci Med Sci, 63, 1246–1256. Joyce, C. R. B., O’boyle, C. A., and Mcgee, H. (1999). Individualising qestionnaires. In C. R. B. Joyce, C. A. O’boyle, & H. Mcgee (Eds.), Individual Quality of Life. Approaches to Conceptualization and Assessment (pp. 87–104). Amsterdam: Harwood. Kagawa-Singer, M. (1993). Redefining health: living with cancer. Soc Sci Med, 37, 295–304. Kahn, C. H. (1981). The Art and Thought of Heraclitus: An Edition of the Fragments with Translation and Commentary. London: Cambridge University Press. Kendler, K. S., Myers, J. M., and Neale, M. C. (2000). A multidimensional twin study of mental health in women. Am J Psychiatry, 157, 506–513. Leinonen, R., Kaprio, J., Jylha, M., Tolvanen, A., Koskenvuo, M. et al (2005). Genetic influences underlying self-rated health in older female twins. J Am Geriatr Soc, 53, 1002–1007. Li, Y., and Rapkin, B. (2006). HIV/AIDS patients’ quality of life appraisal depends on their personal meaning of quality of life and frame of reference. Qual Life Res, E-Suppl 15, A–36. Li, Y., and Rapkin, B. (2009). Classification and regression tree uncovered hierarchy of psychosocial determinants underlying quality-of-life response shift in HIV/AIDS. J Clin Epidemiol, 62, 1138–1147. Li, Y., Rapkin, B., and Patel, S. (2007). Attainment of goals in HIV/AIDS patients in New York City. Qual Life Res Suppl, A-39. Lim, W. K., Lambert, S. F., and Gray, L. C. (2003). Effectiveness of case management and post-scute services in older people after hospital discharge. Med J Aust, 178, 262–266. Lipscomb, J., Gotay, C. C., and Snyder, C. F. (2007). Patient-reported outcomes in cancer: a review of recent research and policy initiatives. Ca Cancer J Clin, 57, 278–300. Lykken, D. T., and Tellegen, A. (1996). Happiness is a stochastic phenomenon. Psychol Sci, 7, 186–189. Lyubomirsky, S., and Sheldon, K. M. (2005). Pursuing happiness: the architecture of sustainable change. Rev Gen Psychol, 9, 111–131. Lyubomirsky, S., Sousa, L., and Dickerhoof, R. (2006). The costs and benefits of writing, talking, and thinking about life’s triumphs and defeats. J Pers Soc Psychol, 90, 692–708.
61
Quality of Life in Light of Appraisal and Response Shift
Maltais, F., Bourbeau, J., Shapiro, S., Lacasse, Y., Perrault, H. et al (2008). Effects of home-based pulmonary rehabilitation in patients with chronic obstructive pulmonary disease: a randomized trial. Ann Intern Med, 149, 869–878. Maslow, A. H. (1943). A theory of human motivation. Psychol Rev, 50, 370–396. Mayergoyz, I. D. (1991). Mathematical Models of Hysteresis. New York, NY: Springer-Verlag. Mayo, N., Scott, C., and Ahmed, S. (2009). Case management post-stroke did not induce response shift: the value of residuals. J Clin Epidemiol, 62, 1148–1156. Mayo, N. E., Wood-Dauphinee, S., Cote, R., Gayton, D., Carlton, J. et al (2000). There’s no place like home : an evaluation of early supported discharge for stroke. Stroke, 31, 1016–1023. Mayo, N. E., Wood-Dauphinee, S., Cote, R., Durcan, L., and Carlton, J. (2001). Activity, participation, and quality of life six months post-stroke. Arch Phys Med Rehabil, 83, 1035–1042. Mccullough, M. E. (2000). Forgiveness : Theory,Research, And Practice. New York, NY: Guilford Press. Norman, P., and Parker, S. (1996). The interpretation of change in verbal reports: implications for health psychology. Psychol Health, 11, 301–314. O’Leary, J. F., Ganz, P. A., Wu, A. W., Coscarelli, A., and Petersen, L. (1998). Toward a better understanding of health-related quality of life: a comparison of the medical outcomes study hiv health survey (MOSHIV) and the HIV overview of problems-evaluation system (Hopes). J Acquir Immune Defic Syndr Hum Retrovirol, 17, 433–441. Oort, F. J., Visser, M. R., and Sprangers, M. A. (2005). An application of structural equation modeling to detect response shifts and true change in quality of life data from cancer patients undergoing invasive surgery. Qual Life Res, 14, 599–609. Osborne, R. H., Hawkins, M., and Sprangers, M. A. (2006). Change of perspective: a measurable and desired outcome of chronic disease self-management intervention programs that violates the premise of preintervention/postintervention assessment. Arthritis Rheum, 55, 458–465. Otake, K., Shimai, S., Tanaka-Matsumi, J., Otsui, K., and Fredrickson, B. L. (2006). Happy people become happier through kindness: a counting kindnesses intervention. J Happiness Stud, 7, 361–375. Pigou, A. C. (1920). The Economics of Welfare. London: Mac Millan. Postulart, D., and Adang, E. M. (2000). response shift and adaptation in chronically ill patients. Med Decis Making, 20, 186–193. Puhan, M. A., Behnke, M., Laschke, M., Lichtenschopf, A., Brandli, O. et al (2004). Self-administration and standardisation of the chronic respiratory questionnaire: a randomised trial in three German-speaking countries. Respir Med, 98, 342–350.
967
Rapkin, B. (2009). Considering the application of the trait/state distinction for response shift research: continuing the conversation. J Clin Epidemiol, 62, 1124–1125. Rapkin, B. D., and Schwartz, C. E. (2004). Toward a theoretical model of quality-of-life appraisal: implications of findings from studies of response shift. Health Qual Life Outcomes, 2, 14. Razmjou, H., Yee, A., Ford, M., and Finkelstein, J. A. (2006). Response shift in outcome assessment in patients undergoing total knee arthroplasty. J Bone Joint Surg, 88, 2590–2595. Reeve, B. B., Burke, L. B., Chiang, Y. P., Clauser, S. B., Colpe, L. J. et al (2007). Enhancing measurement in health outcomes research supported by agencies within the US Department of Health and Human Services. Qual Life Res, 16 Suppl, 175–186. Ring, L., Hofer, S., Heuston, F., Harris, D., and O’boyle, C. A. (2005). Response Shift masks the treatment impact on patient reported outcomes (Pros): the example of individual quality of life in edentulous patients. Health Qual Life Outcomes, 3, 55. Romeis, J. C., Scherrer, J. F., Xian, H., Eisen, S. A., Bucholz, K. et al (2000). Heritability of self-reported health. Health Serv Res, 35, 995–1010. Romeis, J. C., Heath, A. C., Xian, H., Eisen, S. A., Scherrer, J. F. et al (2005). Heritability of Sf-36 among middle-age, middle-class, male-male twins. Med Care, 43, 1147–1154. Roysamb, E. H. (2002). Subjective well being. sex specific effects of genetic and environmental factors. Pers Indiv Differ, 32, 211–223. Roysamb, E., Tambs, K., Reichborn-Kjennerud, T., Neale, M. C., and Harris, J. R. (2003). Happiness and health: environmental and genetic contributions to the relationship between subjective well-being, perceived health, and somatic illness. J Pers Soc Psychol, 85, 1136–1146. Schmitt, N. (1982). The use of analysis of covariance structures to assess beta and gamma change. Multivariate Behav Res, 17, 343–358. Schmitt, N., Pulakos, E., and Lieblein, A. (1984). Comparison of three techniques to assess group-level beta and gamma change. Appl Psychol Meas, 8, 249–260. Schwartz, C. E. (1999). Teaching coping skills enhances quality of life more than peer support: results of a randomized trial with multiple sclerosis patients. Health Psychol, 18, 211–220. Schwartz, C. E., and Rapkin, B. D. (2004). Reconsidering the psychometrics of quality of life assessment in light of response shift and appraisal. Health Qual Life Outcomes, 2, 16. Schwartz, C. E., and Sendor, M. (1999). Helping others helps oneself: response shift effects in peer support. Soc Sci Med, 48, 1563–1575. Schwartz, C. E., and Sprangers, M. A. G. (1990). Introduction to symposium on the challenge of
968 response shift in social science and medicine. Soc Sci Med, 48, 1505–1506. Schwartz, C. E., and Sprangers, M. A. (1999). Methodological approaches for assessing response shift in longitudinal health-related quality-of-life research. Soc Sci Med, 48, 1531–1548. Schwartz, C. E., and Sprangers, M. A. G. (2000). Adaptation to Changing Health Response Shift in Quality-of-Life Research, 1st Ed. Washington, DC: American Psychological Association. Schwartz, C. E., and Sprangers, M. A. G. (2009). Reflections on genes and sustainable change: toward a trait and state conceptualization of response shift. J Clin Epidemiol, 62, 1118–1123. Schwartz, C. E., Feinberg, R. G., Jilinskaia, E., and Applegate, J. C. (1999). An evaluation of a psychosocial intervention for survivors of childhood cancer: paradoxical effects of response shift over time. Psycho-Oncology, 8, 344–354. Schwartz, C. E., Wheeler, H. B., Hammes, B., Basque, N., Edmunds, J. et al (2002). Early intervention in planning end-of-life care with ambulatory geriatric patients: results of a pilot trial. Arch Intern Med, 162, 1611–1618. Schwartz, C. E., Merriman, M. P., Reed, G. W., and Hammes, B. J. (2004a). Measuring patient treatment preferences in end-of-life care research: applications for advance care planning interventions and response shift research. J Palliat Med, 7, 233–245. Schwartz, C. E., Sprangers, M. A. G., Carey, A., and Reed, G. (2004b). Exploring response shift in longitudinal data. Psychol Health, 19, 51–69. Schwartz, C. E., Merriman, M. P., Reed, G., and Byock, I. (2005). Evaluation of the Missoula-Vitas quality of life index--revised: research tool or clinical tool? J Palliat Med, 8, 121–135. Schwartz, C. E., Bode, R., Repucci, N., Becker, J., Sprangers, M. A. et al (2006). The clinical significance of adaptation to changing health: a metaanalysis of response shift. Qual Life Res, 15, 1533–50. Schwartz, C. E., Andresen, E. M., Nosek, M. A., Krahn, G. L., And Rrtc Expert Panel On Health Status Measurement (2007). Response shift theory: important implications for measuring quality of life in people with disability. Arch Phys Med Rehabil, 88, 529–36. Seligman, M. E., Steen, T. A., Park, N., and Peterson, C. (2005). Positive psychology progress: empirical validation of interventions. Am Psychol, 60, 410–421. Siddiqui, F., Pajak, T. F., Watkins-Bruner, D., Konski, A. A., Coyne, J. C. et al (2008). Pretreatment quality of life predicts for locoregional control in head and neck cancer patients: a radiation therapy oncology group analysis. Int J Radiat Oncol Biol Phys, 70, 353–60.
S. Ahmed and C. Schwartz Smith, J. A. (1981). The idea of health: a philosophical inquiry. Ans Adv Nurs Sci, 3, 43–50. Sprangers, M. A. (2002). Quality-of-life assessment in oncology. Achievements and challenges. Acta Oncol, 41, 229–237. Sprangers, M. A., and Schwartz, C. E. (1999). Integrating response shift into health-related quality of life research: a theoretical model. Soc Sci Med, 48, 1507–1515. Sprangers, M. A., Van Dam, F. S., Broersen, J., Lodder, L., Wever, L. et al (1999). Revealing response shift in longitudinal research on fatigue--the use of the thentest approach. Acta Oncol, 38, 709–718. Sprenkle, M. D., Niewoehner, D. E., Nelson, D. B., and Nichol, K. L. (2004). The veterans short form 36 questionnaire is predictive of mortality and healthcare utilization in a population of veterans with a self-reported diagnosis of asthma Or Copd. Chest, 126, 81–89. Stubbe, J. H., Posthuma, D., Boomsma, D. I., and De Geus, E. J. (2005). Heritability of life satisfaction in adults: a twin-family study. Psychol Med, 35, 1581–1588. Stull, D. E., Leidy, N. K., Jones, P. W., and Stahl, E. (2007). Measuring functional performance in patients with Copd: a discussion of patient-reported outcome measures. Curr Med Res Opin, 23, 2655–65. Svedberg, P., Gatz, M., Lichtenstein, P., Sandin, S., and Pedersen, N. L. (2005). Self-rated health in a longitudinal perspective: a 9-year follow-up twin study. J Gerontol B Psychol Sci Soc Sci, 60, S331–S340. Svedberg, P., Bardage, C., Sandin, S., and Pedersen, N. L. (2006). A prospective study of health, life-style and psychosocial predictors of self-rated health. Eur J Epidemiol, 21, 767–776. Visser, M. R., Oort, F. J., and Sprangers, M. A. (2005). Methods to detect response shift in quality of life data: a convergent validity study. Qual Life Res, 14, 629–39. Ware, J. E. Jr. (2000). Sf-36 health survey update. Spine, 25, 3130–3139. Ware, J. E., Jr.. Kosinski, M., and Keller S.D. (1994). Sf-36 Physical and Mental Scales: A User’s Manual. Boston, MA: The Health Institute, New England Medical Center. Whoqol HIV Group (2004). Whoqol-HIV for quality of life assessment among people living with HIV and AIDS: results from the field test. Aids Care, 16, 882–9. World Health Organization Quality Of Life Assessment (Whoqol): Development And General Psychometric Properties (1998). Soc Sci Med, 46, 1569–1585.
Chapter 62
Behavioral Interventions for Prevention and Management of Chronic Disease Brian Oldenburg, Pilvikki Absetz, and Carina K.Y. Chan
1 Background Many different factors influence changing patterns of morbidity, mortality, and the spread of diseases, both globally and within and between countries. Global economic forces influence health trends around the world, as do demographic changes related to population growth, ageing, and social patterns. More locally, changes in people’s living and working environments and other settings, where individuals’ health is more directly affected, also play a crucial role. An impressive amount of epidemiological evidence collected over the past 50 years has identified the influence of a number of key behavioral determinants and lifestyle risk factors on social, physical, and mental health. A small subset of these health behaviors are particularly critical lifestyle risk factors for noncommunicable chronic diseases, their management and progression (WHO, 2005). Physical inactivity, unhealthy eating, alcohol consumption, and tobacco use are the primary behavioral risk factors for cardiovascular and respiratory disease. These potentially modifiable behavioral risk factors are contributing to an ever increasing global burden of disease and rising health-care costs in most countries (WHO, 2005). Indeed, tobacco use is a risk factor for
B. Oldenburg () Department of Epidemiology and Preventive Medicine, Monash University, 89 Commercial Rd, Melbourne, VIC 3004, Australia e-mail: [email protected]
six of the eight leading causes of death in the world and is the single most preventable cause of death today (WHO, 2008). Sedentary lifestyle and poor nutrition are major risk factors for overweight and obesity which lead to adverse metabolic changes including increases in blood pressure, unfavorable cholesterol levels, and increased resistance to insulin, which then lead to an increased risk of coronary heart disease (CHD), stroke, diabetes mellitus, and several forms of cancer (WHO, 2002). The World Health Organization’s 2002 World Health Report indicated that physical inactivity alone now causes about 15% of the disease burden associated with diabetes, heart disease, and some cancers (WHO, 2002). Additionally, poor nutrition, including low intake of fruit and vegetables and high intake of (saturated) fat, sugar, and salt, is responsible for almost 3 million deaths a year due to the resulting development of cardiovascular disease (CVD) and cancer (WHO, 2002). Ischemic heart disease and cerebrovascular disease were identified as the global leading causes of mortality, accounting for 42.4% of all deaths across the world in 2000 (WHO, 2001). It has been estimated that without action to address the underlying lifestyle risk factors, noncommunicable chronic diseases will account for a further 17% of deaths globally by 2015 (WHO, 2005). Of even greater concern is the fact that while unhealthy lifestyle behaviors and their associated diseases are already at high levels in developed countries, they are now also becoming increasingly prevalent in developing countries as well (WHO, 2002).
A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_62, © Springer Science+Business Media, LLC 2010
969
970
In summary, the established links between behavior and health are now very considerable. Research in a number of different chronic diseases has now clearly established the complex interplay that occurs between behavioral, psychological, social, and environmental factors and how collectively all of these factors can have an important bearing on disease progression, quality of life, and health outcomes (WHO, 2002). Addressing key lifestyle risk factors can lead to improved health (primary prevention), reduced risk of disease (secondary prevention), and improved health outcomes (tertiary prevention). While a substantial evidence base has established the effectiveness of lifestyle change approaches as an important component of smoking cessation (Barth et al, 2008; DiClemente et al, 1991), chronic disease prevention and management through lifestyle changes related to diet and physical activity are not nearly as well developed (Yach et al, 2005). For example, the first US Surgeon General’s Report on Smoking and Health was published in 1964 (US Department of Health and Human Services, 1964), but the first US Surgeon General’s Reports on Nutrition (US Department of Health and Human Services, 1988, 1996) and on Physical Activity (US Department of Health and Human Services, 1988, 1996) were not published until over 20 years later, in 1988 and 1996, respectively. Furthermore, a number of self-care behaviors such as regular blood glucose monitoring and adherence to treatment regimens that involve the taking of multiple medications and also complex clinical care are also often required for management of chronic diseases such as heart disease and diabetes. Therefore, it is important to establish the effectiveness of approaches targeting self-care behaviors either alone or in combination with other lifestyle behaviors. This chapter reviews the existing evidence base for behavioral interventions in relation to the prevention and management of chronic disease. Our focus is on key lifestyle and self-care behaviors – dietary behaviors, exercise, smoking, and disease management behaviors – that are causally linked to circulatory and commonly
B. Oldenburg et al.
related conditions, including CVD, diabetes, and respiratory conditions. The review is conducted in three steps. First, we consider the evidence for the effectiveness for behavioral interventions by considering the systematic reviews in this field. Next, we supplement these findings with issues and findings from relevant narrative reviews. Finally, we discuss the implications of these findings for future research and practice in the field.
2 Overview of Systematic Reviews of Behavioral Change Interventions 2.1 Search Strategy and Selection Criteria We identified and conducted a review of systematic reviews of behavioral intervention trials targeting lifestyle risk factors related to the prevention and/or management of circulatory and related conditions. Suitable reviews were identified by conducting an electronic search with the Database of Systematic Reviews of the Cochrane Library (Issue 3, 2009) and by crossing the keywords ‘diet,’ ‘eating,’ ‘physical activity,’ ‘exercise,’ ‘smoking,’ ‘nutrition,’ ‘lifestyle,’ ‘behavior,’ ‘change,’ ‘smoking,’ ‘obesity,’ ‘overweight,’ and ‘adiposity.’ These were crossed separately with ‘cardiovascular,’ ‘heart disease,’ ‘coronary,’ ‘metabolic syndrome,’ ‘type 2 diabetes,’ ‘pre-diabetes,’ and ‘chronic disease.’ Last, the search was combined with ‘intervention’ or ‘trial.’ We identified 165 reviews published between 1997 and 2009. We selected only reviews that were published in English and related to adults aged 18 or above. Reviews were excluded if they were: (1) related to medical conditions other than circulatory and associated conditions (e.g., psychiatric disorders, pregnancy, HIV/AIDS, cancer); (2) pharmacological interventions and did not incorporate any explicit lifestyle or behavioral change
62
Behavioral Interventions for Prevention and Management of Chronic Disease
strategies; (3) primarily focused on interventions to treat biological mechanisms; (4) focused primarily on determinants of health behaviors rather than health behavior change per se; and (5) primarily narrative or qualitative reviews. We identified 27 reviews that met these inclusion criteria and these reviews form the basis of our evaluation of the effectiveness of lifestyle interventions for this chapter. Additionally, we conducted a search that also included other databases (PSYCINFO, MEDLINE) in order to identify non-systematic and narrative reviews that considered other relevant issues that are not usually well addressed by systematic reviews. These other issues are discussed in more detail in the final section of this chapter. Among the 27 systematic reviews included from the Cochrane Library, there were six on nutrition and diet (Table 62.1, SR1–6), four on exercise (SR7–10), three on both diet and exercise (SR11–13), five on smoking (SR14– 18), and one on multiple risk factors (SR19). Another eight reviews evaluated interventions targeting different aspects of disease management (SR20–27).
971
the reviews. Smoking intervention settings were more varied and included other community settings and delivery systems, including the use of mass media. Disease management interventions were most often delivered via health-care or related settings.
2.2.3 Mode of Delivery Channels of delivery were varied and diverse, although face-to-face delivery – either individually or in groups – was the most common approach utilized. Other approaches included delivery via telephone, internet, mail, and mass media. Most interventions were delivered by health professionals. Many different kinds of professionals were involved in delivery of smoking interventions including counselors, psychotherapists, teachers, and pharmacists, in addition to nurses and physicians. There were two systematic reviews of intervention delivery by peers.
2.2.4 Purpose of Systematic Reviews
2.2 Characteristics of the Intervention Trials in the Systematic Reviews 2.2.1 Target Population The reviews considered lifestyle interventions that target adult populations on a continuum from healthy people to people with elevated disease risk through to people with established disease.
2.2.2 Intervention Setting Diet and physical activity interventions were predominantly conducted in health-care settings, although some worksite and other community interventions were also represented in
The primary objective for all reviews was to establish the efficacy of the interventions in terms of clinical, behavioral, and/or other outcomes. Typically, a number of comparisons were included, with the main one being a comparison between a single intervention condition and some kind of usual or routine care. A number of reviews also assessed the relative efficacy of different types of interventions (e.g., diet only vs. diet and exercise); different modes of delivery (e.g., physicians vs. nurses; individuals vs. groups); or different intensities or duration of interventions. Often, however, when these latter comparisons were a secondary objective of the review, the data available and sample sizes were insufficient for drawing any definitive conclusions. Except for a few exceptions, the reviews did not evaluate the influence of setting or delivery on outcomes.
Brief titles
N/A
Outcomes Behavior
Dietary advice for the prevention of T2DM in adults
Dietary advice for treatment of T2DM in adults Dietary advice for cholesterol reduction Exercise-based rehabilitation for heart failure
SR4 (Nield et al, 2008)
SR5 (Nield et al, 2007)
SR7 (Rees et al, 2004b)
SR6 (Thompson et al, 2003)
Dietary advice for reducing cardiovascular risk
SR3 (Brunner et al, 2007)
Quality of life/ cost-effectiveness
HRQoL: 7/9 studies found improvements
N/A
N/A
N/A
Urinary sodium excretion: −35.5 mmol/l Cardiovascular N/A Systolic RR: −1.1 mmHg events Diastolic RR: −0.6 mmHg inconsistently defined/reported N/A N/A Cardiovascular events: −16% for all studies; −24% for studies with follow-up > 2 years Urinary sodium excretion: −44.2 mmol/l N/A N/A Systolic RR: −2.1 mmHg Diastolic RR: −1.1 mmHg Total cholesterol: −0.16 mmol/l LDL cholesterol: −0.18 mmol/l
Significant (p 8%; no difference individual versus group OR for reduction in Total cholesterol: −0.27 mmol/l non-fatal LDL, HDL cholesterol: no effect reinfarction: 0.78 Triglycerides: no effect
Disease outcomes
Mortality
N/A
HRQoL: no effect
N/A
Cardiac Anxiety: SMD−0.08 mortality: no Depression: SMD−0.3 effect Composite measure for Total mortality: mental health: no effect SMD−0.22 HRQoL: no effect in 2/3 N/A studies
HRQoL: mixed findings from two studies
QoL: no significant effect N/A OR for reduced need for diabetes medication: 11.8 (NNT = 5) Cost-effectiveness: $2.12 per point gained in QoL in one study
Quality of life/ cost-effectiveness
QOL= quality of life; HRQOL= health-related quality of life; HbA1 c= glycated hemoglobin; OR = odds ratio; SMD=standardized mean difference
SR27 (Foster et al, Self-management 2007) education programs by lay leaders for people with chronic conditions
Heterogeneous measures for self-management, exercise, diet, foot care, self-monitoring: modest findings in support of intervention in 5/6 studies
Group-based training for self-management strategies in people with T2DM
SR23 (Deakin et al, 2005)
Smoking: mixed findings in eight studies
Outcomes Behavior
Brief titles
Authors
Table 62.1 (Continued) Physiological and anthropometric outcomes
976 B. Oldenburg et al.
62
Behavioral Interventions for Prevention and Management of Chronic Disease
2.3 Intervention Outcomes 2.3.1 Dietary Interventions Evaluation of dietary advice to reduce disease risk factors or to treat an existing disease is based on six reviews with approximately 44,000 participants. Outcomes of these reviews are listed in Table 62.1 (SR1–6). Usually, they were physiological or anthropometric risk factors. While behavior was reported in some trials as an indicator of intervention compliance, it was rarely measured as an outcome. Quality of life and cost-effectiveness were not reported. Overall, small but statistically significant improvements were found in physiological and anthropometric outcomes (Table 62.1, SR1–6). Only one review (Brunner et al, 2007) reported on behavioral outcomes: in comprehensive dietary interventions for reducing CVD risk, modest beneficial changes in behavior were translated into statistically significant improvements in physiological and anthropometric outcomes. Beneficial disease outcomes included reduction of cardiovascular events, but this effect could only be established for interventions to reduce fat (Hooper et al, 2000). Another disease outcome was reduction of type 2 diabetes incidence (Nield et al, 2008), but the finding was based on just one trial and therefore provided only weak evidence. Outcome evaluation of diet only interventions in treatment of type 2 diabetes (Nield et al, 2007) was rendered impossible by the heterogeneity and poor quality of the studies. The only review on dietary interventions capable of reporting on mortality (Hooper et al, 2000) found no statistically significant effects on either cardiovascular or total mortality. Effects of intervention delivery and duration were evaluated in two of the dietary intervention reviews. While dietary advice from a dietitian was more effective than advice from a doctor, it was no more effective than fairly simple self-help material. Comparison between dietitians versus nurses was not possible because of limited data (Thompson et al, 2003). The beneficial effect of fat reduction to cardiovascular
977
events was limited to trials with intervention extending over 2 years (Hooper et al, 2000). Although one review (Hooper et al, 2000), p. 18) specifically referred to the potential benefits of applying a behavioral theory for improving intervention outcomes, none actually discussed the interventions within any sort of theoretical framework.
2.3.2 Exercise Only Interventions Evaluation of exercise interventions covers over 10,000 participants but it is mainly based on reviews among patient populations. The intervention programs were heterogeneous (e.g., exercise alone or as part of comprehensive rehabilitation program including educational or psychological interventions) – often meaning that the independent effect of exercise could not be separated. A range of outcome measures was covered but typically the focus was on physiological measures and as a secondary outcome, quality of life. Morbidity and mortality were rarely studied, and no review reported on costeffectiveness. Behavior was regarded as a compliance factor rather than a primary intervention outcome. None of the reviews discussed behavioral theories used in the interventions although the comprehensive rehabilitation programs often included an educational or psychosocial component. Overall, exercise training was found to improve several of the measured physiological or anthropometric factors (Table 62.1, SR7–10), such as glycated hemoglobin (HbA1c ) among patients with type 2 diabetes (Thomas et al, 2006) and lipid profile among patients in comprehensive cardiac rehabilitation (Jolliffe et al, 2001). No reduction was found in body mass index (BMI), but body composition changed significantly, with adipose tissue decreasing and fatfree mass increasing. In most cases only shortterm effects on risk factors could be established due to lack of long-term follow-ups. However, the one review with a longer follow-up was able to show reduction in both all cause and cardiac
978
mortality (Jolliffe et al, 2001). Findings on quality of life were mixed, although most studies tended to have found small improvements. Duration of the interventions ranged widely, and one review evaluated its effect on outcomes. Among patients with type 2 diabetes, decrease in HbA1c was greater for briefer (< 6 months) interventions than for longer interventions (6–12 months) (Thomas et al, 2006). One review also evaluated the effects of intervention delivery on outcomes (Ashworth et al, 2005). Assessment of home- versus center-based physical activity programs in older adults showed that on the short term, center-based training produced better outcomes. However, two studies in the review were able to evaluate longer-term adherence (1– 2 years) which was shown to be better in the home-based program.
2.3.3 Combined Diet and Exercise/Weight Reduction Interventions Altogether 15,000 patients are included in the assessment of the efficacy of combined diet and exercise or weight reduction interventions on prevention (Norris et al, 2005b; Orozco et al, 2008) and treatment (Norris et al, 2005a) of type 2 diabetes. Prevention interventions included participants with elevated risk for type 2 diabetes, who typically (but not necessarily) had an impaired glucose tolerance. Behavior was not reported as a specific outcome measure although one review included physical activity and diet as indicators of compliance (Orozco et al, 2008). Typically, outcomes included incidence of type 2 diabetes and physiological and anthropometric risk factors (Table 62.1, SR11– 13). None of the reviews were able to report on behavior or mortality. No aggregated data was provided for cost-effectiveness although two trials in one review provided support for costeffectiveness (Orozco et al, 2008). Length of follow-up ranged from 12 months to 10 years (Norris et al, 2005b). Although most of the reviews mentioned the use of behavioral theories and/or specific intervention strategies (including goal setting, self-monitoring and feedback, and
B. Oldenburg et al.
stress management and coping), these were not systematically evaluated. Small improvements were found in weight, BMI, and waist circumference (Table 62.1, SR11–13), although statistical heterogeneity for these outcomes was high and effects were minimized by significant weight loss in the comparison groups (Norris et al, 2005a). Modest, but statistically significant improvements were shown on many physiological and anthropometric outcomes. Both reviews on prevention of type 2 diabetes reported statistically significant reductions in the incidence of type 2 diabetes, but only one provided a pooled effect (Orozco et al, 2008). Intervention duration per se was not shown to effect outcomes although the number of contacts correlated positively with weight loss among adults with pre-diabetes (Norris et al, 2005b). Furthermore, examination of different intervention arms suggested that multi-component interventions with low or very low calorie diets might help to achieve weight loss among patients with type 2 diabetes (Norris et al, 2005a).
2.3.4 Tobacco Control Interventions Evaluation of tobacco control interventions is based on nearly 110,000 participants including mainly healthy adults. Without exception, the outcomes were always measured in terms of behavior. For interventions addressing cessation the outcome was abstinence, in prevention interventions it was smoking behavior (Table 62.1, SR14–18). None of the reviews reported on physiological or anthropometric outcomes or quality of life, and only one study reported on disease or mortality outcomes. Cost-effectiveness was not commonly reported although three reviews estimated the number required to treat one individual successfully (NNT). Unlike the dietary and exercise interventions, smoking interventions were commonly based on theoretical models, especially if they were delivered by professionals other than physicians or nurses. However, none of the reviews
62
Behavioral Interventions for Prevention and Management of Chronic Disease
compared different theoretical approaches to behavior change. Nursing interventions (Rice and Stead, 2008), physician advice (Stead et al, 2008), psychosocial interventions (Barth et al, 2008), and interventions delivered by oral health professionals in connection with oral examination (Carr and Ebbert, 2006) were all shown to be effective. The overall likelihood for quitting was 28–66% higher in these interventions in comparison to usual care. Heterogeneity of community interventions prevented pooling of those data, so overall quitting rates could not be established. However, only 2 out of the 13 community interventions were more effective than no treatment (Sowden et al, 2003). Estimated NNT ranged between 10 and 120, the lowest NNT being in CHD patients who tend to have high unassisted quit rates anyhow (30–50%) (Barth et al, 2008) and the highest NNT was in primary care patients, whose unassisted quit rate was estimated to be only 2–3% (Stead et al, 2008). The minimum follow-up time for all interventions was 6 months and the majority had 12 month or longer follow-up. Higher intensity interventions increased the effectiveness of interventions. Psychosocial interventions including behavioral therapies outperformed usual care, as did those based on self-help or telephone support, but none of these were found superior to each other. Furthermore, a subgroup analysis among patients in the dental setting showed the interventions effective regardless of whether participants had actively sought treatment (Table 62.1).
2.3.5 Multiple Risk Factor Interventions One extensive review with almost 150,000 participants evaluated the effects of multiple risk factor interventions among adults without clinical evidence of established CVD. All the trials compared an intervention comprising some form of education or counseling targeting combinations of diet, exercise, weight loss, smoking cessation, diabetes management, and use of medication with control groups receiving either usual
979
care or no treatment. Behavioral theories underlying the interventions were rarely specified, with a few studies using the Transtheoretical Model of Stages of Change as an exception (DiClemente et al, 1991). Smoking was the most common behavioral outcome included, other reported outcomes included blood pressure, cholesterol, and mortality. Quality of life outcomes or cost-effectiveness were not reported. Overall, the interventions had a small, positive effect on smoking prevalence (Table 62.1, SR19). Also, modest but statistically significant improvements were shown in blood pressure and cholesterol, but these were most likely related to pharmacological treatments rather than the lifestyle interventions used. Ten of the trials provided data on CHD mortality and total mortality, but overall, no effect could be established. Studies where participants had highest initial risk factor levels demonstrated larger improvements in these factors (Ebrahim et al, 2006).
2.3.6 Disease Management Interventions Interventions for improving risk or disease management were assessed among almost 30,000 patients with vascular conditions. Included were interventions with a narrow focus to adherence to treatment recommendations (Table 62.1, SR20–21) as well as broader self-management education and support programs delivered by professionals (SR22–25) and peers (SR26–27). Reporting on outcomes focused on physiological and anthropometric risk factors but also behavior and quality of life were included. Only one review reported on disease outcomes and mortality. The disease management interventions were probably even more heterogeneous in content, intensity, and duration than the other lifestyle interventions already described in this chapter. While some tackled only relatively simple behaviors (such as blood glucose monitoring or taking a lipid lowering medicine), others addressed very complex sets of behaviors (e.g., comprehensive disease management including lifestyle, self-care, and adherence to
980
medical care). Rather surprisingly, the interventions targeting simpler behaviors were often more heterogeneous and more poorly described in the reviews. They also typically lacked any description of the behavioral component(s). The more comprehensive programs, however, were often theory based, well described, and they also allowed comparison of different modes of delivery. Behavioral outcomes were rarely established in interventions targeting adherence (Table 62.1, SR20–21) and improvements in physiological outcomes tended to be small at best. None of the few studies including quality of life measures showed any significant effects on it. Morbidity and mortality outcomes were not reported for adherence interventions. The effect of intervention characteristics on the outcomes was not evaluated in any of the reviews. The more comprehensive self-management programs delivered by professionals (Table 62.1, SR22–25) provided heterogeneous results in terms of behavior change. In terms of physiological or anthropometric outcomes among patients with type 2 diabetes, individual patient education was no better than usual care or group education (Duke et al, 2009). However, group training resulted in moderate improvement in most of these risk factors (Deakin et al, 2005). Disease and mortality outcomes were only measured among CHD patients. No effect was found on mortality, and the significant reduction in the number of non-fatal reinfarction was found to be influenced by publication bias (Rees et al, 2004a). Quality of life outcomes in interventions among type 2 diabetes patients were mixed, but CHD patients were shown to gain modest psychological benefit from the interventions in terms of reductions in anxiety and depression (Rees et al, 2004a). Self-management programs by lay leaders had very little effect on behaviors, physiological and anthropometric outcomes, or quality of life (Table 62.1, SR26–27). None of the studies reported on morbidity or mortality (Dale et al, 2008; Foster et al, 2007).
B. Oldenburg et al.
3 Relevant Findings from Narrative Reviews 3.1 Intervention Settings Some settings are likely to make recruitment, targeting, and tailoring of interventions easier than others. For example, schools and worksites are community settings where many people can easily be reached. However, findings from worksite diet and exercise programs (L. Anderson et al, 2009) only showed very modest reductions in weight and BMI. The worksite interventions were typically based on informational and behavioral strategies, with few having promoted changes to the work environment to support healthy behavioral choices. As with similar interventions in other settings, more intensive interventions (duration, number of components, structured vs. unstructured) were more effective (L. Anderson et al, 2009). It also seems that workplace interventions have tended to be quite focused compared to more generic lifestyle interventions. A recent meta-analysis (Abraham and Graham-Rower, 2009) showed more than threefold effect sizes for physical activity interventions in comparison to general lifestyle change interventions.
3.2 Information and Communications Technology in Intervention Delivery 3.2.1 Web-Based Interventions Information and communications technology (ICT) has become an increasingly popular channel for delivery of interventions. However, despite the exciting potential for ICT-delivered interventions, program reach and adherence are still a significant concern. Wantland et al reviewed web-based interventions with nearly 12,000 participants (Wantland et al, 2004). Compared with interventions utilizing more
62
Behavioral Interventions for Prevention and Management of Chronic Disease
“traditional” means of delivery, web-based interventions report reaching an equal proportion of men and women. Although the average drop-out rate was relatively low (only 21%), this needs to be considered in relation to measures of program exposure and intensity. For example, participants showed significant variation in time spent per session (4.5–45 min). Some had only few logons while others entered the intervention site very frequently (from 2.6 over 32 weeks to 1008 logons/person over 36 weeks). The interventions included one-time studies, self-paced interventions, and longitudinal, repeated measures intervention studies (3–78 weeks). Despite wide variation in intensity, nearly all (16/17) studies revealed improved knowledge and/or behavioral outcomes (e.g., exercise duration, 18month weight loss maintenance, participation in health care). Among users with chronic disease (Murray, 2006), interactive health communication applications have also been shown to improve knowledge, social support, self-efficacy, health behaviors, and clinical outcomes.
3.2.2 Interventions Delivered via Telephone The telephone provides another channel with easy access for participants. Eakin et al reviewed 26 studies on diet and physical activity interventions, most delivered by different health professionals but some with automated telephone systems (that fully free the participants from both temporal and spatial restrictions) (Eakin et al, 2007). Recruitment methods influenced reach, with studies recruiting highly selected clinical samples and having stringent criteria reporting higher participation rates. However, few of the studies reported how representative their study populations were. Unlike many other interventions reviewed in this chapter, a majority of the telephone interventions were based on one or more specific theories, with Transtheoretical Model, Social Cognitive Theory, and/or Motivational Interviewing being the most commonly reported, however, the effect
981
of theory was not formally evaluated in the review. The majority of the studies (20/26) reported significant behavioral improvements with a medium average effect size (0.60, [0.24–1.19]). Positive outcomes were reported for 69% of exercise studies, 83% of dietary behavior studies, and 75% of studies addressing both behaviors. Furthermore, the positive outcomes were associated with duration and intensity (number of calls) of the intervention (Eakin et al, 2007).
3.3 Effectiveness of Theory-Based Interventions We found only a couple of reviews where the effectiveness of theory-based interventions was formally evaluated. Stage models propose that individuals can be distinguished by their behavior-related cognitions into discrete stages of action readiness, hence behavior change interventions are claimed to be most effective when tailored to match the needs of groups defined by the stages. van Sluijs et al reviewed effectiveness of stage-based lifestyle interventions in primary care with two kinds of outcomes: positive stage changes and behavior changes (van Sluijs et al, 2004). For physical activity and smoking, neither kind of change was achieved. For diet, limited evidence supported an effect on stage change and an effect on behavior change. Altogether, the findings do not lend much support for stage theories. Motivational Interviewing (MI) (Miller and Rollnick, 2002) is a theory-based technique rather than a theory. It is based on the empowerment ideology (J. M. Anderson, 1996) and SelfDetermination Theory (Deci and Ryan, 1980). It was first developed as a counseling method for working with patients with substance abuse, but it is increasingly frequently used also in other lifestyle interventions. Dunn et al reviewed 29 randomized trials using motivational interviewing interventions across different behaviors including diet and exercise (Dunn et al, 2001).
982
Although this method was found effective in substance abuse interventions, data were inadequate to judge the effects in other domains. However, increase in exercise was consistent in size and direction (although many of the studies were underpowered). A positive finding across domains was that the effects of MI did not diminish with longer follow-ups. Interactions between client attribute and treatment were understudied and “sparse and inconsistent findings revealed little about the mechanism by which MI works” (p. 1725). Furthermore, it remained unknown what levels of MI training, skill, and duration would be optimal. Self-regulation theories emphasize the importance of goal setting, planning, and review in behavior change and maintenance. Reviewing worksite physical activity interventions, Abraham et al (Abraham and Graham-Rower, 2009) found that setting specific goals that defined the frequency and duration of physical activity, setting of graded tasks, and goal review techniques enhanced outcomes of the interventions in comparison to interventions without these techniques. Furthermore, interventions providing advice, even if it was individually tailored, were not effective (Abraham and Graham-Rower, 2009). Another review suggested that problem solving strategies – also a technique based on self-regulation theories – might be a critical intervention element in promoting long-term weight loss (Seo and Sa, 2008).
4 Lifestyle Change – Current Issues and Future Challenges While systematic reviews are a good method for identifying and summarizing the effects of lifestyle change interventions on important behavioral, clinical, and disease outcomes, it is important to acknowledge that they do have some significant limitations. Importantly, information on many key issues can be very limited in systematic reviews and detailed information
B. Oldenburg et al.
cannot be reported from all the original studies on issues and topics that would be important for the research question being addressed in the review. The analysis of and reporting on sub-questions can also be problematic because of the small number of studies and/or sample sizes related to these. We have outlined some of the important unresolved issues and questions in relation to behavioral interventions in Table 62.2. The problems related to study designs and measurements have been adequately discussed in the original reviews and also in the previous section, so they will not be addressed further here. In this final section, we will discuss major findings from the reviews in light of the two other sets of issues, the intervention features and delivery, and sustainability and future uptake of the interventions.
4.1 Features of the Intervention and Its Delivery Generally, it is impossible to say whether interventions targeted to one specific behavioral component or behavior are more effective in addressing it than a more comprehensive intervention might be. When targeting physical activity in the worksite, less comprehensive interventions were not necessarily more effective. When addressing disease management, more comprehensive interventions were reportedly more effective. Many interventions were shown to have a positive effect on some lifestyle behaviors or clinical risk factors, while not affecting others. This strongly suggests that a number of interventions and delivery components are likely needed to address all aspects of behavior change related to preventing and managing a specific chronic disease such as CVD or diabetes. This is certainly the case at a population level, however, it is also likely to be the case at a more individual level as well. This view is more strongly supported by the evidence from the field of tobacco control (WHO, 2008) with further support coming from community intervention trials over the past 30 years
62
Behavioral Interventions for Prevention and Management of Chronic Disease
983
Table 62.2 Important issues arising from systematicreviews of interventions targeting lifestyle factors 1. Study design and measurement o Small, underpowered studies o Heterogeneity and lack of specificity in relation to participants: – Socio-demographic characteristics – Clinical characteristics o Heterogeneity in measurement – Different outcomes – Lack of key outcomes in relation to behavior, morbidity, mortality, cost-effectiveness – Variability in the quality and type of measures used o Heterogeneity in length of follow-up – Lack of long-term follow-up o Lack of implementation and process measurement o Difficulty in disentangling the effects of other factors – Medication use – Mediating and moderating factors 2. Features of the intervention and its delivery o Heterogeneity and lack of specificity in relation to: – Content – Setting – Intensity – Duration – Delivery person/system o Intervention is a “black box,” i.e., components are either undefined or impossible to separate from each other o Inadequate use and reporting of health behavior theory, including: – Theoretical model for expected behavior changes and determinants – Techniques to change behaviors – Compliance by program users with techniques to change behaviors – Systematic analysis of theory-based moderators and mediators 3. Intervention sustainability and future uptake o What were the necessary versus sufficient components? o Intensive interventions but small effects o Economic outcomes and costing data are lacking o Long-term outcomes are not established
(Sowden et al, 2003). It is further supported by more recent evidence in relation to interventions that focus on reducing absolute risk of a number of chronic diseases (Ebrahim et al, 2006; Goldstein et al, 2004; Pronk et al, 2004; WHO, 2002). The fact that little particularly useful information was found in reviews in relation to settings for program delivery reflects the complexity of this issue as well. Clearly, some settings are likely to make recruitment, targeting, and tailoring of interventions easier than others. For example, schools and worksites are community settings where many people can easily be reached. When recruiting people for telephone interventions where no “natural” setting necessarily exists, recruitment was generally shown
to be easier when conducted via a clinical setting (Eakin et al, 2007). Furthermore, as the review on exercise training among older adults showed (Ashworth et al, 2005), what works best setting wise might change over time as participants’ needs change. Instead of thinking in terms of home-based versus center-based programs, maybe the ideal would be a program where the participant can choose from either or both and change back and forth between the options as their personal circumstances and needs change. Intensity and duration are issues for which the reviews do provide some important findings. More intensive and longer interventions are generally more effective than less intensive and briefer interventions, and it is critically important that follow-up is included in an
984
intervention in order to enhance maintenance and sustainability. However, there may be a trade-off between effectiveness and availability of resources, but this cannot be assessed without cost-effectiveness studies. Attrition can also be very problematic in long-lasting interventions. However, we do not know much about individual differences in relation to intervention intensity and duration. As the review on internet interventions showed (Strecher, 2007), if people get to choose for themselves, some will decide to be intensively involved for long periods of time while others only have very fleeting and brief contact with the intervention. There is not much information in relation to which professionals are best able to delivery which kinds of lifestyle interventions. While physicians can deliver lifestyle advice and programs in an effective and durable fashion under certain circumstances, there are likely to be many other professionals who can do so much more cost-effectively. However, they do not necessarily have the same “window of opportunity” as physicians might have, particularly those in the primary care setting. There are also some interventions, such as dietary advice, which may be delivered quite effectively and efficiently through the use of information and communications technology. The delivery of such programs by lay leaders or peers (Dale et al, 2008; Foster et al, 2007) is another area that needs more investigation, particularly, when associated with management of a disease such as diabetes (Fisher et al, 2010). Automated telephone programs and the internet also have great potential to supplement and support health-care settings and professionals as platforms for effective intervention delivery. The outcomes are by and large moderately positive on several measures (Dale et al, 2008; Eakin et al, 2007; Strecher, 2007). However, despite the burgeoning interest in internet-based interventions, the potential of the internet in interactivity – user navigation, collaborative filters, expert systems, and human-to-human interaction – is still poorly utilized and understood for the delivery of lifestyle change programs (Strecher, 2007). It is certainly the case that the
B. Oldenburg et al.
internet provides tremendous opportunities for making use of the individual’s characteristics as active ingredients for tailoring and delivery of programs. However, the ways these characteristics moderate the impact of interventions need to be explored first and then purposefully utilized. Furthermore, there are now tremendous opportunities to combine current knowledge with new interactive and mobile technologies, as well as with consumer and other (medical and public health) informatics systems (Strecher, 2007). Probably more important than who or which system delivers an intervention per se is how well the intervention components and the system used to deliver these, properly address the participant’s needs, and the extent to which these are related to their current knowledge, attitudes, skills or support, or most likely, a combination of all of these. The need for theory to inform more appropriately these issues as well as the development, implementation, and evaluation of lifestyle change programs is a really important issue which has received increasing attention in recent years and whole textbooks have been devoted to this issue (e.g. Bartholomew et al, 2006; Glanz et al, 2002). As already mentioned, few of the systematic reviews discussed in this chapter discuss in detail the importance of behavioral or other kinds of theories and only a couple of reviews actually analyzed the use of theory-based interventions. The Transtheoretical Model of Stages of Change (Prochaska and DiClemente, 1982) has been one of the most frequently cited theories in both smoking and telephone interventions in the reviews that have been considered in this chapter. Consequently, it is really the only theory with adequate data for recent evaluation. These data, however, only lend equivocal support to the theory. Abraham, among others, has critiqued stage models for oversimplifying “the cognitive architecture” by defining stage transitions by single determinants and for implying cognitive uniformity within stages (Abraham, 2008). Moreover, instead of stage-based interventions, he has suggested use of multi-determinant, multi-goal continuum approaches. Such an approach recognizes graded discontinuities throughout the
62
Behavioral Interventions for Prevention and Management of Chronic Disease
development of action readiness from attitude formation to maintenance of behavior change as a process that is not linear and that includes movement in both directions (Abraham, 2008). In addition to health behavior theories that help intervention designers to identify psychosocial determinants for behavior change and target and tailor interventions, the need for explicit use of theory-based health behavior techniques has also been acknowledged (Abraham and Michie, 2008). Reporting the use of these specific techniques in interventions would take the field forward and allow gathering of evidence for what works. The approach advocated by Abraham is interesting, not just because of the perspective it provides on behavior change, but also because it provides a framework for conceptualizing a more menu-based approach to interventions. In other words, instead of providing one uniform or stage-specified intervention to all program participants, it is probably more appropriate to provide a menu of interventions from which potential program participants can then selftailor a combination of interventions that best suit their personal needs and circumstances. Utilizing such an approach in relation to the interactive potential that different ICT and webbased systems can provide is likely to lead to a significant paradigm shift in the way in which lifestyle change interventions are provided to and accessed by the community over the next few years. This will lead to a shift away from the view of the individual/patient as an almost passive recipient of expert-driven interventions toward the individual becoming a much more active participant in deciding on their own needs and how to address these.
4.2 Intervention Sustainability in the “Real World” and Future Uptake of Interventions With a few notable exceptions, most published lifestyle change intervention trials have only
985
achieved at best, quite modest outcomes, even when evaluated under very controlled conditions. The further implementation and dissemination of such programs under more “real world” settings is often not evaluated very well (Glanz and Oldenburg, 2008), so we do not usually know whether even these modest effects are maintained. The Diabetes Initiative of the Robert Wood Johnson Foundation in the United States evaluated the resources and supports for selfmanagement of diabetes in various community settings. The program identified six key supports for program success: individualized assessment and tailored measurement; collaborative goal setting; enhancement of key skills for disease management, health behaviors, and problem solving; continuity of high-quality, safe clinical care; ongoing follow-up and support; and a very important role for supportive community resources (Fisher et al, 2005). The authors conclude that the concept of “equifinality” is especially helpful for thinking about how such programs can work for individuals in community settings, that is, that different procedures, strategies, or programs can work in complementary ways to achieve similar ends or effects. Generally speaking, intensive and costly lifestyle change interventions for people with minimal risk might not be very costeffective, and therefore, more community-wide, population-based, or upstream social and economic interventions are likely to be more cost-effective. Given the increasing pressures on limited resources for health care and prevention in most countries and the increasing burden of chronic diseases, it is important that resources are prioritized for populations where the interventions will be most effective. Although none of the systematic reviews we have described contained significant findings in relation to cost-effectiveness, a number of the authors raised the issue of interventions that were evaluated under very controlled conditions, being too resource intensive for broader uptake (Ebrahim et al, 2006). Further investigation of the cost-effectiveness of lifestyle interventions is essential in order to allow for priority setting and for governments and major donors to justify
986
spending resources on modifying behavioral risk factors. The WHO has recognized the importance of reducing lifestyle risk factors in costeffective ways, stating in their 2002 World Health Report that their ultimate goal is to help governments of all countries to raise the healthy life expectancy of their populations. However, the cost-effectiveness of lifestyle interventions to prevent mortality and morbidity from preventable chronic diseases should also be appropriately demonstrated in resource poor countries before recommending their “scaling up.”
5 Summary If properly developed, implemented and evaluated lifestyle change interventions have excellent potential to prevent disease, to improve the self-management of existing conditions, and to increase the quality of life of individuals in all countries. Some evidence also points to the cost-effectiveness of lifestyle change interventions, even when compared to more traditional medical interventions, but this is definitely a field that needs more research. Most importantly, however, it is clear that such approaches can not only have a beneficial impact on particular disease(s) or risk factor(s) in individuals, but they can also have significant effects and benefits for prevention in populations. Use of contemporary communication technologies is an especially exciting development, especially when combined with more traditional delivery approaches used by health professionals, peer leaders and others in health care and other community settings. Given the very rapid increase of disease burden attributable to chronic noncommunicable disease as a result of lifestyle behaviors in developing regions of the world, these kinds of approaches also urgently need further development and adapting to the growing health needs and challenges of these part of the world as well (Beaglehole and Bonita, 2008). Acknowledgment Thanks to Carla Renwick for so much invaluable assistance with the preparation and finalizing of the final manuscript.
B. Oldenburg et al.
References Abraham, C. (2008). Beyond stages of change: multideterminant continuum models of action readiness and menu-based interventions. Applied Psychol, 57, 30–41. Abraham, C., and Graham-Rower, E. (2009). Are worksite interventions effective in increasing physical activity? A systematic review and meta-analysis. Health Psychol Rev, 3, 108–144. Abraham, C., and Michie, S. (2008). A taxonomy of behavior change techniques used in interventions. Health Psychol, 27, 379–387. Anderson, J. M. (1996). Empowering patients: issues and strategies. Soc Sci Med, 43, 697–705. Anderson, L., Quinn, T., Glanz, K., Ramirez, G., Kahwati, L. et al (2009). The effectiveness of worksite nutrition and physical activity interventions for controlling employee overweight and obesity. Am J Prev Med, 37, 340–357. Ashworth, N. L., Chad, K. E., Harrison, E. L., Reeder, B. A., and Marshall, S. C. (2005). Home versus center based physical activity programs in older adults. Cochrane Database Syst Rev, CD004017. Barth, J., Critchley, J., and Bengel, J. (2008). Psychosocial interventions for smoking cessation in patients with coronary heart disease. Cochrane Database Syst Rev, CD006886. Bartholomew, L. K., Parcel, G., Kok, G., and Gottlieb, N. (2006). Planning Health Promotion Programs: An Intervention Mapping Approach, 2nd Ed. San Francisco: Jossey-Bass. Beaglehole, R., and Bonita, R. (2008). Global public health: a scorecard. Lancet, 372, 1988–1996. Brunner, E. J., Rees, K., Ward, K., Burke, M., and Thorogood, M. (2007). Dietary advice for reducing cardiovascular risk.[update of Cochrane Database Syst Rev. 2005;(4)]. Cochrane Database Syst RevCD002128. Carr, A. B., and Ebbert, J. O. (2006). Interventions for tobacco cessation in the dental setting. Cochrane Database Syst Rev, CD005084. Dale, J., Caramlau, I. O., Lindenmeyer, A., and Williams, S. M. (2008). Peer support telephone calls for improving health. Cochrane Database Syst Rev, CD006903. Deakin, T., McShane, C. E., Cade, J. E., and Williams, R. D. (2005). Group based training for self-management strategies in people with type 2 diabetes mellitus. Cochrane Database Syst Rev, CD003417. Deci, E. L., and Ryan, R. M. (1980). Self-determination theory – the iteration of psychophysiology and motivation. Psychophysiology, 17, 321. DiClemente, C. C., Prochaska, J. O., Fairhurst, S. K., Velicer, W. F., Velasquez, M. M. et al (1991). The process of smoking cessation: an analysis of precontemplation, contemplation, and preparation stages of change. J Consult Clin Psychol, 59, 295–304.
62
Behavioral Interventions for Prevention and Management of Chronic Disease
Duke, S. A., Colagiuri, S., and Colagiuri, R. (2009). Individual patient education for people with type 2 diabetes mellitus. Cochrane Database Syst Rev, CD005268. Dunn, C., Deroo, L., and Rivara, F. P. (2001). The use of brief interventions adapted from motivational interviewing across behavioral domains: a systematic review. Addiction, 96, 1725–1742. Eakin, E. G., Lawler, S. P., Vandelanotte, C., and Owen, N. (2007). Telephone interventions for physical activity and dietary behavior change: a systematic review. Am J Prev Med, 32, 419–434. Ebrahim, S., Beswick, A., Burke, M., and Davey Smith, G. (2006). Multiple risk factor interventions for primary prevention of coronary heart disease. Cochrane Database Syst Rev, CD001561. Fisher, E. B., Brownson, C. A., O’Toole, M. L., Shetty, G., Anwuri, V. V. et al (2005). Ecological approaches to self-management: the case of diabetes. Am J Public Health, 95, 1523–1535. Fisher, E. B., Earp, J. A., Maman, S., and Zolotor, A. (2010). Cross-cultural and international adaptation of peer support for diabetes management. Fam Pract, 27(1), i6–16. Epub 2009 Mar 10. Foster, G., Taylor, S. J., Eldridge, S. E., Ramsay, J., and Griffiths, C. J. (2007). Self-management education programmes by lay leaders for people with chronic conditions. Cochrane Database Syst Rev, CD005108. Glanz, K., Lewis, F. M., and Rimer, B. (Eds.). (2002). Health Behavior and Health Education: Theory, Research and Practice. San Francisco: Jossey-Bass. Glanz, K., and Oldenburg, B. (2008). Diffusions of Innovation. In K. Glanz, B. Rimer, & K. Viswanath (Eds.), Health Behavior and Health Education: Theory, Research, and Practice, 4th Ed. San Francisco: Jossey-Bass Inc. Goldstein, M. G., Whitlock, E. P., DePue, J., and Planning Committee of the Addressing Multiple Behavioral Risk Factors in Primary Care, P. (2004). Multiple behavioral risk factor interventions in primary care. Summary of research evidence. Am J Prev Med, 27, 61–79. Hooper, L., Bartlett, C., Davey, S. G., and Ebrahim, S. (2004). Advice to reduce dietary salt for prevention of cardiovascular disease. Cochrane Database Syst Rev, CD003656. Hooper, L., Summerbell, C. D., Higgins, J. P., Thompson, R. L., Clements, G. et al (2000). Reduced or modified dietary fat for prevention of cardiovascular disease. Cochrane Database Syst Rev, CD002137. Jolliffe, J. A., Rees, K., Taylor, R. S., Thompson, D., Oldridge, N. et al (2001). Exercise-based rehabilitation for coronary heart disease. Cochrane Database Syst Rev, CD001800. Miller, W. R., and Rollnick, S. (2002). Motivational Interviewing: Preparing People for Change, 2nd Ed. New York: Guilford Press. Murray, S. (2006). Doubling the burden: chronic disease. CMAJ, 174, 771.
987
Nield, L., Moore, H. J., Hooper, L., Cruickshank, J. K., Vyas, A. et al (2007). Dietary advice for treatment of type 2 diabetes mellitus in adults. Cochrane Database Syst Rev, CD004097. Nield, L., Summerbell, C. D., Hooper, L., Whittaker, V., and Moore, H. (2008). Dietary advice for the prevention of type 2 diabetes mellitus in adults. Cochrane Database Syst Rev, CD005102. Norris, S. L., Zhang, X., Avenell, A., Gregg, E., Brown, T. J. et al (2005a). Long-term non-pharmacologic weight loss interventions for adults with type 2 diabetes. Cochrane Database Syst Rev, CD004095. Norris, S. L., Zhang, X., Avenell, A., Gregg, E., Schmid, C. H. et al (2005b). Long-term non-pharmacological weight loss interventions for adults with prediabetes. Cochrane Database Syst Rev, CD005270. Orozco, L. J., Buchleitner, A. M., Gimenez-Perez, G., Roque, I. F. M., Richter, B. et al (2008). Exercise or exercise and diet for preventing type 2 diabetes mellitus. Cochrane Database Syst Rev, CD003054. Prochaska, J. O., and DiClemente, C. C. (1982). Transtheoretical therapy: toward a more integrative model of change. Psychother Theory Res Pract, 19(3), 276–288. Pronk, N. P., Peek, C. J., and Goldstein, M. G. (2004). Addressing multiple behavioral risk factors in primary care. A synthesis of current knowledge and stakeholder dialogue sessions. Am J Prev Med, 27, 4–17. Rees, K., Bennett, P., West, R., Davey, S. G., and Ebrahim, S. (2004a). Psychological interventions for coronary heart disease. Cochrane Database Syst Rev, CD002902. Rees, K., Taylor, R. S., Singh, S., Coats, A. J., and Ebrahim, S. (2004b). Exercise based rehabilitation for heart failure. Cochrane Database Syst Rev, CD003331. Rice, V. H., and Stead, L. F. (2008). Nursing interventions for smoking cessation. Cochrane Database Syst Rev, CD001188. Schedlbauer, A., Schroeder, K., Peters, T. J., and Fahey, T. (2004). Interventions to improve adherence to lipid lowering medication. Cochrane Database Syst Rev, CD004371. Seo, D. C., and Sa, J. (2008). A meta-analysis of psycho-behavioral obesity interventions among US multiethnic and minority adults. Prev Med, 47, 573–582. Sowden, A., Arblaster, L., and Stead, L. (2003). Community interventions for preventing smoking in young people. Cochrane Database Syst Rev, CD001291. Stead, L. F., Bergson, G., and Lancaster, T. (2008). Physician advice for smoking cessation. Cochrane Database Syst Rev, CD000165. Strecher, V. (2007). Internet methods for delivering behavioral and health-related interventions (eHealth). Annu Rev Clin Psychol, 3, 53–76.
988 Thomas, D. E., Elliott, E. J., and Naughton, G. A. (2006). Exercise for type 2 diabetes mellitus. Cochrane Database Syst Rev, 3, CD002968. Thompson, R. L., Summerbell, C. D., Hooper, L., Higgins, J. P., Little, P. S. et al (2003). Dietary advice given by a dietitian versus other health professional or self-help resources to reduce blood cholesterol. Cochrane Database Syst Rev, CD001366. US Department of Health and Human Services (1964). Smoking and Health: A Report of the Surgeon General. Washington. US Department of Health and Human Services (1988). The Surgeon General’s Report on Nutrition and Health. Washington. US Department of Health and Human Services (1996). Physical Activity and Health: A Report of the Surgeon General. Washington. van Sluijs, E., van Poppel, M., and van Mechelen, W. (2004). Stage-based lifestyle interventions in primary care – Are they effective? Am J Prev Med, 26, 330–343. Vermeire, E., Wens, J., Van Royen, P., Biot, Y., Hearnshaw, H. et al (2005). Interventions for improving adherence to treatment recommendations in people with type 2 diabetes mellitus. Cochrane Database Syst Rev, CD003638.
B. Oldenburg et al. Wantland, D. J., Portillo, C. J., Holzemer, W. L., Slaughter, R., and McGhee, E. M. (2004). The effectiveness of web-based vs. non-web-based interventions: a meta-analysis of behavioral change outcomes. J Med Internet Res, 6, e40. Welschen, L. M., Bloemendal, E., Nijpels, G., Dekker, J. M., Heine, R. J. et al (2005). Self-monitoring of blood glucose in patients with type 2 diabetes who are not using insulin. Cochrane Database Syst Rev, CD005060. WHO (2001). World Health Report- Mental Health: New Understanding, New Hope. Geneva: World Health Organization. WHO (2002). The World Health Report 2002. Reducing Risks, Promoting Health Life. Geneva: World Health Organization. WHO (2005). Chronic Disease Risk Factors. Geneva: World Health Organization. WHO (2008). WHO Report on the Global Tobacco Epidemic: The MPOWER Package. Geneva: World Health Organization. Yach, D., McKee, M., Lopez, A. D., and Novotny, T. (2005). Improving diet and physical activity: 12 lessons from controlling tobacco smoking. BMJ, 330, 898–900.
Chapter 63
Psychosocial–Behavioral Interventions and Chronic Disease Neil Schneiderman, Michael H. Antoni, Frank J. Penedo, and Gail H. Ironson
1 Introduction According to the World Health Organization (WHO, 2008) the leading causes of death worldwide are coronary heart disease (CHD), stroke, chronic obstructive pulmonary diseases, diarrhea, and HIV/AIDS. Using a slightly different metric that aggregates cancers, it is observed that cancer is either the first or second leading cause of mortality (WHO, 2009). As McGinnis and Foege (2004) have pointed out, however, reporting of deaths, diseases, and disabilities using traditional diagnostic categories obscures the importance of antecedent factors that are responsible for disease outcomes. Mokdad and colleagues (2004), for instance, have reported that about half of all deaths in the United States could be attributable to a very limited number of largely preventable behaviors and exposures. Furthermore, the INTERHEART Study has provided evidence that nine potentially modifiable risk factors associated with myocardial infarction (MI) account for more than 90% of population attributable MI risk worldwide (Yusuf et al, 2004). According to INTERHEART, smoking, abdominal obesity, hypertension, diabetes, and psychosocial stressors are associated with increased risk, whereas daily consumption of fruits or vegetables, moderate exercise, and alcohol consumption are protective. The
population attributable risk associated with psychosocial stressors is 32.5% (Rosengren et al, 2004). Most of the risk factors identified in INTERHEART, particularly smoking, abdominal obesity and psychosocial stressors have also been associated with the mortality risk for cancer (Duffy et al, 2009) and other diseases. Of further interest to behavioral scientists is that for the most part the risk factors identified in INTERHEART are amenable to behavior modification. Even when patients reach the stage where they are in need of medication, behavioral skills can help to improve adherence. During the past several decades behavioral scientists have developed a number of psychosocial– behavioral interventions based upon research showing how psychosocial and biobehavioral factors influence quality of life and diseaserelated outcomes. Whereas early studies adhered to a strict dichotomy between psychosocial and behavioral interventions, it has become increasingly apparent that efficacious interventions for patients with chronic disease require behavioral skill interventions that address psychosocial, lifestyle, and medical adherence issues. This chapter describes some of the psychosocial and biobehavioral factors that moderate and/or mediate the outcomes of chronic disease prevention and management programs with particular reference to CHD, HIV/AIDS, and cancer.
N. Schneiderman () Department of Psychology, University of Miami, P.O. Box 248185, Coral Gables, FL 33124-0751, USA e-mail: [email protected] A. Steptoe (ed.), Handbook of Behavioral Medicine, DOI 10.1007/978-0-387-09488-5_63, © Springer Science+Business Media, LLC 2010
989
990
2 Coronary Heart Disease 2.1 Risk Factors The leading cause of death worldwide is CHD (WHO, 2008). Although atherosclerosis, the preclinical antecedent of CHD, begins in childhood, the clinical manifestations of CHD occur in adulthood and include angina pectoris, MI, heart failure, and sudden death. Major cardiovascular risk factors are those that independently influence the development of atherosclerosis and CHD. More than a half century ago the Framingham Heart Study identified cigarette smoking, elevated serum cholesterol, hypertension, and advancing age as major risk factors (Dawber et al, 1951). Since then, conventional wisdom has come to accept that four modifiable traditional cardiovascular risk factors (i.e., smoking, hypertension, hypercholesterolemia, type 2 diabetes mellitus) account for “only 50%” of the risk for CHD (Braunwald, 1997; Hennekens, 1998). However, some investigators have contended that the 50% figure is a myth and that traditional risk factors account for far more than half the prevalence of CHD (Canto and Iskandrian, 2003). In fact, given what we now know about modifiable risk factors, it appears that they account for almost all CHD mortality. INTERHEART was a standardized case-control study of acute MI in 52 countries representing every inhabited continent (Yusuf et al, 2004). As might be expected in a study whose age distribution is determined by MI, the median age in years for men was in the 50 s and for women in the 60 s although there was a variation related to geographic region and ethnic origin. The 15,152 cases and 14,820 controls were compared in terms of self-reported smoking, history of hypertension, history of diabetes, dietary patterns, physical activity, consumption of alcohol, and psychosocial factors as well as by tape measurements for adiposity and blood measurement for apolipoproteins (Apo). Abnormal lipids, smoking, hypertension, diabetes, abdominal obesity, and psychosocial stressors were found to be associated with increased risk, whereas daily
N. Schneiderman et al.
consumption of fruits or vegetables, moderate or strenuous exercise, and consumption of alcohol were protective. INTERHEART found that the major risk factors having odds ratios (OR) of 2 or greater in univariate analyses included smoking, abnormal lipids, psychosocial factors, hypertension, diabetes, and abdominal obesity. They were qualitatively similar and consistently adverse in all regions of the world and in all ethnic groups. INTERHEART (Yusuf et al, 2004) made an important contribution to our knowledge of cardiovascular risk by documenting the generalizability of modifiable risk factors across diverse regions and ethnicities. In order to accomplish this monumental task, the investigators made a number of important compromises. Thus, rather than using fasting blood to evaluate triglycerides, HDL-, and LDL-cholesterol, they used the ratio of ApoB/ApoA1 from non-fasting blood as an index of abnormal lipids. Neither blood pressure, blood glucose nor plasma insulin were assessed directly. Similarly, psychosocial stress was examined by four simple questions about stress at work and at home, financial stress, and major life events in the past year (Rosengren et al, 2004). Depression was evaluated by a modified version of the short form of the composite international diagnostic interview questionnaire (Patten, 1997). Interestingly, all of these psychosocial variables were associated with increased risk of MI. For severe global stress, the size of the effect appeared to be less than that for smoking, but comparable with that for hypertension and abdominal obesity. The measurement deficiencies in INTERHEART, essential as they may have been in order to meet study objectives, suggest that some of the methods used may have led to variations in estimated risk that could be improved by more sensitive measurement (e.g., blood pressure, fasting lipids, impaired glucose tolerance, psychosocial distress). This would be particularly important for planning secondary prevention in CHD patients who, although usually offered pharmacological treatment for traditional risk factors, may have 5–7 times
63 Psychosocial–Behavioral Interventions and Chronic Disease
the relative risk of recurrent MI when compared with the general population of same age adults (National Cholesterol Education Program, 1994). Thus, in such patients it is important that both psychosocial behavioral and pharmacological treatment should be guided by an understanding of the variables likely to be mediating the associations between traditional risk factors and cardiovascular mortality including inflammation, insulin resistance, oxidative stress, and platelet coagulation. The design of such rehabilitation programs for post-MI patients should consider behavioral variables including medication adherence and lifestyle modification, reduction of sympathetic nervous system arousal and glucocorticoid dysregulation, and the bidirectional interaction between behavior and stress. There is also need to assess the role of moderating variables such as low socioeconomic status (e.g., Marmot et al, 1984) (see Chapter 22), whose adverse effects upon cardiovascular mortality may operate through behavioral, biological, psychosocial, and environmental (including access to health care, fresh fruits and vegetables, and safe neighborhoods) risk factors (Albert et al, 2006; Steptoe and Marmot, 2002).
2.2 Psychosocial–Behavioral Interventions with Acute Coronary Syndrome Patients Several meta-analyses have examined randomized psychosocial–behavioral interventions in patients with CHD (Clark et al, 2005; Dusseldorp et al, 1999; Linden et al, 1996, 2007). Most of the studies that were analyzed compared a psychosocial–behavioral intervention with usual care. The meta-analysis by Dusseldorp and colleagues examined the effects of health education and stress management in 37 studies and found a 34% reduction in cardiovascular mortality, a 29% reduction in MI recurrence and significant positive effects for blood pressure, cholesterol, body weight, smoking, physical exercise, and eating habits.
991
Cardiac rehabilitation programs that were successful in improving traditional risk factor profiles were also more effective in decreasing cardiovascular mortality and MI recurrence than those that were not successful in risk factor reduction. Linden et al (1996) conducted a meta-analysis on 3,180 CHD patients in 23 randomized controlled trials (RCTs) and found that patients who did not receive psychosocial–behavioral treatment showed greater mortality (OR=1.70; 95% confidence interval [CI], 1.09–2.64) and MI recurrence (OR=1.84; CI, 1.12–2.99) than those who did. Similarly, Clark et al (2005) conducted a meta-analysis on 21,295 CHD patients in 63 RCTs and reported an OR=0.85; CI, 0.77– 0.94 for all-cause mortality and OR=0.83; CI, 0.74–0.94 for recurrent MI. More recently in a meta-analysis conducted on 9,856 CHD patients in 43 RCTs, Linden et al (2007) found that trials initiating treatment at least 2 months after a cardiovascular event revealed greater mortality savings than those beginning treatment sooner (OR=0.28; CI, 0.11–0.70 vs OR=0.87, CI, 0.86–1.15, respectively). Moreover the mortality benefits applied only to men (OR=0.73; CI, 0.57–1.00) but not to women (OR=1.01, CI, 0.87–1.72). In general then, meta-analyses have confirmed that psychosocial–behavioral interventions in MI patients can improve cardiovascular risk factor profiles, decrease mortality, and reduce recurrent MI. Beneficial effects appear to be more likely if the intervention begins at least 2 months after the MI and are more likely in men than in women. Although meta-analyses are useful in providing a broad overview of outcomes in a research area, it is important to examine key RCT in order to assess the quality of the data and to begin to understand differences and similarities in outcomes. Among the psychosocial–behavioral RCTs that have been conducted upon post-MI patients, there have been exceptionally few large-scale trials that meet the reporting criteria of the Consolidated Standards of Reporting Trials (CONSORT) statement (Moher et al, 2001). The few trials that have approximated these standards have yielded both positive and null
992
results. Because of the heterogeneity of the procedures employed, the exact reasons for the discrepancies in results have not been entirely obvious. The Recurrent Coronary Prevention Project (RCPP) randomized 862 post-MI patients (90% men; 98% white) into either a control condition receiving group-based traditional risk factor counseling (diet, exercise, medication adherence) or an intervention condition receiving group-based risk factor counseling plus cognitive behavior therapy (CBT) to reduce type A behaviors (i.e., hostility, impatience, time urgency) and relaxation training to decrease behavioral arousal (Friedman et al, 1986). Patients were enrolled at least 6 months after their MI. The average control participant attended 25 (76% of total available) sessions and the average intervention participant attended 38 (61% of total available) sessions over 4.5 years. Rate of combined fatal and nonfatal recurrence was significantly lower in the intervention than in the control group. Participants in the intervention group also showed significant decreases in hostility, time urgency, impatience, and depressed mood as well as reliable gains in perceived self-efficacy (Mendes de Leon et al, 1991). In a subsequent RCT, Jones and West (1996) randomized 2,328 post-MI patients into either an intervention condition receiving seven weekly psychological counseling and therapy, relaxation, and stress management sessions (some in a group format) or a usual care condition. Other components of rehabilitation dealing with smoking, diet, weight control, or exercise were not included in the program. Patients were enrolled within 28 days after their MI. Data on the age, sex distribution, or racial/ethnicity of participants are not described in the published article. The investigators found no significant differences within or between groups in reported anxiety and depression between baseline and 6 months and no differences between conditions in clinical complications, clinical sequaelae, or mortality after 1 year.
N. Schneiderman et al.
The Montreal Heart Attack Readjustment Trial (M-HART) was an RCT carried out in 1,376 post-MI patients assigned to an intervention or control condition (Frasure-Smith et al, 1997). Intervention participants were telephoned by a research assistant 1 week after discharge, then monthly for a year. They responded to the 20-item general health questionnaire (Goldberg, 1972), which assesses psychological distress from anxiety, depressed mood, and activity impairment. Participants scoring 5 or higher on the questionnaire or were readmitted to the hospital were then contacted by a cardiology nurse who made a home visit and provided reassurance, education, practical advice, and when necessary referral to a health-care provider. Nurses were not given specific training for implementing the protocol beyond their cardiology nursing training. About 75% of patients in the intervention condition received on average 5–6 1-h nursing visits. In general, the program had no overall impact upon either cardiac or all-cause mortality or on psychological outcomes (depressive symptoms, anxiety, anger, or perceived social support) between intervention and control groups. However, treated women did reveal marginally greater all-cause mortality than control women (OR=1.99; CI, 0.99–4.00) suggesting that the intervention may actually have been harmful to women. The OR for cardiac mortality was 1.96 (CI, 0.95–4.06). Subsequently, the enhancing Recovery in Coronary Heart Disease (ENRICHD) trial randomized 2,481 post-MI patients (44% women; 34% ethnic minority), selected because they were depressed and/or had low social support, into a CBT-based psychosocial–behavioral intervention or to usual medical care (Berkman et al, 2003). The intervention was initiated at a median of 17 days after MI for a median of 11 individual sessions throughout 6 months. During this 6-month period 30% of participants also received group-based CBT and relaxation training and were placed on a selective serotonin reuptake inhibitor if they had severe depression or less than 50% reduction in Beck’s depression inventory scores after 5 weeks of intervention.
63 Psychosocial–Behavioral Interventions and Chronic Disease
By 6 months after randomization ENRICHD modestly decreased depression and increased social support in the intervention compared with the control group. However, after an average follow-up of 29 months, there was no significant difference in event-free survival between the usual care and the psychosocial intervention conditions. Because ENRICHD was designed to enroll large numbers of women and minorities, it was possible to conduct a secondary analysis examining the outcome of sex by ethnicity subgroups (Schneiderman et al, 2004). This secondary analysis indicated that the intervention decreased the incidence of both cardiac death (OR=0.63; CI, 0.40–0.99) and nonfatal MI (or=0.61; CI, 0.40–0.92) in white men but not in the other subgroups. Most recently, the Stockholm Women’s Intervention Trial for Coronary Heart Disease (SWITCHD) randomized 237 patients with severe CHD incidents into a group-based psychosocial–behavioral intervention program or usual medical care (Orth-Gomér et al, 2009). Initiated 4 months after hospitalization, intervention groups of 4–8 women met for a total of 20 sessions over the course of an entire year. The intervention program, in which 75% of the women attended 15–20 sessions, included education about risk factors, self-care, and adherence to medical advice, as well as skills training in relaxation and coping with stress exposure from family and work. The nurses who delivered the intervention were pre-trained and certified in the behavior modification techniques used in the trial. From randomization until the end of follow-up (mean duration 7.1 years), the intervention yielded an almost threefold protective effect on mortality rate (OR=0.33; CI, 0.1–0.74). The meta-analyses that have been conducted on psychosocial–behavioral interventions in patients with severe CHD-related events indicate that such treatments can reduce the incidence of nonfatal and/or fatal events. Examination of major studies carried out on such patients reveals that the studies reporting positive results were initiated several months after the index
993
event, used a group-based format, conducted the intervention for a relatively long temporal duration, and followed the patients for a number of years (Friedman et al, 1986; Orth-Gomér et al, 2009). These trials addressed a broad range of modifiable traditional and psychosocial risk factors, medication adherence, and lifestyle adjustment as well as provided training in behavior change methods by group leaders who themselves were certified in such procedures. Some of the conclusions drawn from these trials are based on post hoc analyses and reviews of trial data, so prospective replication studies are needed.
3 HIV/AIDS 3.1 Disease Processes in HIV/AIDS Human immunodeficiency virus infection and acquired immune deficiency syndrome (HIV/AIDS) are caused by the HIV retrovirus, which is transmitted most commonly through unprotected sexual intercourse or intravenous drug use. HIV selectively targets a subset of lymphocytes expressing a surface T4 glycoprotein, most commonly found in a subpopulation of lymphocytes referred to as CD4+ T helper cells. These CD4+ T cells serve as the host cells for the transcription of HIV RNA and protein synthesis, which begins the process of creating new HIV virions that target other host cells. The infected person undergoes a progressive loss of CD4+ T cells while HIV virus concentration in the circulation (i.e., viral load) is increasing (Pantaleo et al, 1993). The rapid replication and mutation rate of HIV thwarts the effectiveness of immune mechanisms in controlling the infection. Later individuals may develop full-blown AIDS defined as a decline in the number of CD4+ cells to critically low levels (