Multivariable Analysis: A Practical Guide for Clinicians and Public Health Researchers [3 ed.] 0521760984, 9780521760980, 0521141079, 9780521141079

Now in its third edition, this highly successful text has been fully revised and updated with expanded sections on cutti

619 36 2MB

English Pages 251 Year 2011

Table of contents :
Half-title......Page 3
Title......Page 5
Copyright......Page 6
Dedication......Page 7
Contents......Page 9
Preface......Page 15
1.1 Why should I do multivariable analysis?......Page 19
1.2 What are confounders and how does multivariable analysis help me to deal with them?......Page 24
1.3 What are suppressers and how does multivariable analysis help me to deal with them?......Page 27
1.4 What are interactions and how does multivariable analysis help me to deal with them?......Page 29
2.2 How is multivariable analysis used in observational studies of etiology?......Page 32
2.3 How is multivariable analysis used in intervention studies (randomized and nonrandomized)?......Page 34
2.4 How is multivariable analysis used in studies of diagnosis?......Page 39
2.5 How is multivariable analysis used in studies of prognosis?......Page 41
3.1 How does the nature of the outcome variable influence the choice of which type of multivariable analysis to do?......Page 43
3.2 What type of multivariable analysis should I use with an interval outcome?......Page 44
3.2.A Multiple linear regression......Page 45
3.2.B Analysis of variance (ANOVA)......Page 46
3.2.C Underlying assumptions of multiple linear regression and ANOVA......Page 50
3.2.D Choosing between multiple linear regression and ANOVA......Page 53
3.3 What type of multivariable analysis should I use with a dichotomous outcome?......Page 54
3.4 What type of multivariable analysis should I use with an ordinal variable?......Page 57
3.5 What type of multivariable analysis should I use with a nominal outcome?......Page 60
3.6 What type of multivariable analysis should I use with a time-to-outcome variable?......Page 62
3.7.A Loss to follow-up......Page 68
3.7.B Alternative outcome......Page 69
3.7.C Withdrawal......Page 70
3.7.D Varying time of enrollment......Page 72
3.8 How can I test the validity of the censoring assumption for my data?......Page 73
3.9 What is the proportionality assumption of proportional hazards analysis?......Page 76
3.10 What type of multivariable analysis should I use with counts?......Page 78
3.10.A Poisson regression......Page 79
3.10.B Negative binomial models......Page 81
3.11 What type of multivariable analysis should I use with an incidence rate?......Page 82
3.12.A. Dichotomizing an interval variable......Page 84
3.12.B. Changing time-to-outcome to a dichotomous outcome (yes/no)......Page 87
3.12.C. Dichotomizing ordinal variables or treating them as nominal variables......Page 89
3.12.D. Converting a count to time to outcome or to a dichotomous outcome......Page 90
3.12.E. General advice for changing the coding of outcome variables......Page 91
4.2 How do I incorporate nominal independent variables into a multivariable analysis?......Page 92
4.3 How do I incorporate interval-independent variables into a multivariable model?......Page 94
4.3.A Mathematical transformations......Page 98
4.3.B Splines......Page 100
4.3.C Multiple dichotomous variables......Page 102
4.5 How do I incorporate ordinal independent variables into a multivariable model?......Page 104
5.1 Does it matter if my independent variables are related to each other?......Page 106
5.2 How do I assess whether my variables are multicollinear?......Page 107
5.3 What should I do with multicollinear variables?......Page 109
6.2 How do I decide what confounders to include in my model?......Page 111
6.3 What independent variables should I exclude from my multivariable model?......Page 112
6.4 How many subjects do I need to do multivariable analysis?......Page 115
6.5 What if I have too many independent variables given my sample size?......Page 120
6.5.A Exclude variables that are not empirically operating as confounders......Page 121
6.5.B Choose one variable to represent two or more related variables......Page 122
6.5.C.2 Scores......Page 123
6.5.C.3 Multi-item scales......Page 124
6.5.C.4 Factor analysis......Page 125
6.6 What should I do about missing data on my independent variables?......Page 126
6.7 What should I do about missing data on my outcome variable?......Page 133
7.1 What numbers should I assign for dichotomous or ordinal variables in my analysis?......Page 136
7.2 Does it matter what I choose as my reference category for multiple dichotomous (“dummied”) variables?......Page 138
7.3 How do I enter interaction terms into my analysis?......Page 140
7.4 How do I enter time into my proportional hazards or other survival analysis?......Page 142
7.5 What about subjects who experience their outcome on their start date?......Page 147
7.6 What about subjects who have a survival time shorter than physiologically possible?......Page 149
7.7 How do I incorporate time into my Poisson analysis?......Page 151
7.8 What are variable selection techniques?......Page 152
7.9 My model won’t converge. What should I do?......Page 157
8.2.A Multiple linear regression......Page 158
8.2.B Multiple logistic regression......Page 160
8.2.D Multinomial logistic regression......Page 165
8.2.F Poisson regression and negative binomial regression......Page 166
8.3.A Coefficients in multiple linear regression......Page 167
8.3.B Coefficient in multiple (binary) logistic regression......Page 169
8.3.C Coefficients in proportional odds regression......Page 174
8.3.D Coefficients in multinomial logistic regression......Page 175
8.3.F Coefficients in Poisson regression and negative binomial regression......Page 176
8.5 Do I have to adjust my multivariable regression coefficients for multiple comparisons?......Page 177
9.2 What are residuals? How are they used to assess the fit of models?......Page 180
9.3 How do I test the normal distribution and equal variance assumptions of a multiple linear regression model?......Page 183
9.4 How do I test the linearity assumption of a multivariable model?......Page 184
9.5 What are outliers and how do I detect them in a multivariable model?......Page 185
9.6 What should I do when I detect outliers?......Page 188
9.7 What is the additive assumption and how do I assess whether my multiple independent variables fit this assumption?......Page 189
9.9 How do I test the proportionality assumption?......Page 192
9.9.A log-minus-log survival plot......Page 193
9.9.D. Time-dependent covariates......Page 194
9.10 What if the proportionality assumption does not hold for my data?......Page 195
10.1 What are propensity scores? Why are they used?......Page 198
11.1 What circumstances lead to correlated observations?......Page 203
11.2 Should I avoid study designs that lead to correlated observations?......Page 205
11.3 How do I analyze correlated observations?......Page 207
11.3.A Transform repeated observations into a single measure......Page 209
11.3.B Generalized estimating equations......Page 211
11.3.C Mixed-effects model......Page 215
11.3.D Repeated measures analysis of variance/repeated measures analysis of covariance......Page 218
11.3.E Conditional logistic regression......Page 222
11.3.F Anderson–Gill formation of the proportional hazards model......Page 223
11.3.G Marginal approach for proportional hazards analysis......Page 224
11.4 How do I calculate the needed sample size for studies with correlated observations?......Page 225
12.1 How can I validate my models?......Page 226
13.2 What are the advantages and disadvantages of time-dependent covariates?......Page 231
13.3 What are classification and regression trees (CART) and should I use them?......Page 234
13.5 How do I choose which software package to use?......Page 237
14.1 How much information about how I constructed my multivariable models should I include in the Methods section?......Page 239
14.2 Do I need to cite a statistical reference for my choice of method of multivariable analysis?......Page 241
14.3 Which parts of my multivariable analysis should I report in the Results section?......Page 242
15 Summary: Steps for constructing a multivariable model......Page 245
Index......Page 247

Recommend Papers

Encountering Correctional Populations: A Practical Guide for Researchers 9780520966765

While many researchers study offenders and offending, few actually journey into the correctional world to meet offenders

143 39 738KB Read more

Applied Multilevel Analysis: A Practical Guide for Medical Researchers (Practical Guides to Biostatistics and Epidemiology) [1 ed.] 0521849756, 9780521849753

This is a practical introduction to multilevel analysis suitable for all those doing research. Most books on multilevel

381 15 33MB Read more

Discourse Analysis in Adults With and Without Communication Disorders: A Resource for Clinicians and Researchers 9781635503753, 1635503752, 9781635503760

Discourse Analysis in Adults With and Without Communication Disorders: A Resource for Clinicians and Researchers provide

100 66 6MB Read more

Discourse Analysis in Adults With and Without Communication Disorders: A Resource for Clinicians and Researchers 1635503752, 9781635503753

Discourse Analysis in Adults With and Without Communication Disorders: A Resource for Clinicians and Researchers provide

120 113 12MB Read more

Electronic Cigarettes and Vape Devices: A Comprehensive Guide for Clinicians and Health Professionals 3030786714, 9783030786717

Electronic cigarettes (E-cigarettes), also known as vape devices or by trade names such as JUUL, are handheld devices th

113 53 4MB Read more

A research handbook for patient and public involvement researchers 9781526136527

An engaging and comprehensive research handbook for patients and members of the public who want to learn more about the

115 72 46MB Read more

Generalized Anxiety Disorder and Worrying : A Comprehensive Handbook for Clinicians and Researchers 9781119189886, 9781119189862

A comprehensive and authoritative guide to anxiety disorder and worry Generalized Anxiety Disorder offers a comprehensiv

146 0 4MB Read more

Trade Agreements and Public Health: A Primer for Health Policy Makers, Researchers and Advocates (Palgrave Studies in Public Health Policy Research) 9811504849, 9789811504846

The need for policy coherence between trade and health has never been greater, yet few public health workers are equippe

104 84 Read more

Clinical Neuropsychology A Practical Guide to Assessment and Management for Clinicians [1 ed.] 9780470854013, 0470854014

This clear and accessible text provides trainee and qualified clinical psychologists with an up-to-date summary of neuro

1,137 104 3MB Read more

Problems and Pitfalls in Medical Literature: A Practical Guide for Clinicians [1st ed. 2023] 3031402944, 9783031402944

This book aims to teach the skills for assessing the quality of a medical article. It focuses on problems and pitfalls t

97 31 1MB Read more

Multivariable Analysis: A Practical Guide for Clinicians and Public Health Researchers [3 ed.]
0521760984, 9780521760980, 0521141079, 9780521141079

Author / Uploaded
Mitchell H. Katz

Similar Topics
Biology
Biostatistics

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

This page intentionally left blank

Multivariable Analysis A Practical Guide for Clinicians and Public Health Researchers

Why do you need this book? Multivariable analysis is confusing! Whether you are performing your irst research project or attempting to interpret the output from a multivariable model, you have undoubtedly found this to be true. Basic biostatistics books are of little or no help to you, since their coverage oten stops short of multivariable analysis. However, existing multivariable analysis books are too dense with mathematical formulae and derivations and are not designed to answer your most basic questions. Is there a book that steps aside from the math and simply explains how to understand, perform, and interpret multivariable analyses? Yes. Multivariable Analysis: A Practical Guide for Clinicians and Public Health Researchers, as this new edition is titled, is precisely the reference that will lead your way. In fact, Dr. Mitchell Katz has asked and answered all of your questions for you! Why should I do multivariable analysis? How do I choose which type of multivariable to use? How many subjects do I need to do multivariable analysis? What if I have repeated observations of the same persons? Answers and detailed explanations to these questions and more are found in this book. Also, it is loaded with useful tips, summary charts, igures, and references. If you are a medical student, resident, or clinician, Multivariable Analysis: A Practical Guide for Clinicians and Public Health Researchers will prove an indispensable guide through the confusing terrain of statistical analysis. his third edition has been fully revised to build on the enormous success of its predecessors. New features include new sections on Poisson and negative binomial regression, proportional odds analysis, and multinomial logistic regression, and an expanded section on interpretation of residuals.

Praise for irst edition “his is the irst nonmathematical book on multivariable analysis addressed to clinicians. Its range, organization, brevity, and clarity make it useful as a reference, a text, and a guide for self-study. his book is ‘a practical guide for clinicians.’” Leonard E. Braitman, Ph.D., Annals of Internal Medicine Mitchell H. Katz is Clinical Professor of Medicine, Epidemiology and Biostatistics at the University of California, San Francisco; and Director of the Los Angeles Department of Health Services, Los Angeles, USA.

Multivariable Analysis A Practical Guide for Clinicians and Public Health Researchers Third Edition

Mitchell H. Katz Department of Medicine, Epidemiology and Biostatistics, University of California, USA

cam bri d ge uni v e rsi t y pre s s Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Tokyo, Mexico City Cambridge University Press he Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521760980 © M. H. Katz, 1999, 2006, 2011 his publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1999 Second edition published 2006 hird edition published 2011 Printed in the United Kingdom at the University Press, Cambridge A catalog record for this publication is available from the British Library Library of Congress Cataloging in Publication data Katz, Mitchell H., 1959– author. Multivariable analysis : a practical guide for clinicians and public health researchers / Mitchell H. Katz, Department of Medicine, Epidemiology, and Biostatistics, University of California, USA. – 3rd Edition. p. ; cm. Includes bibliographical references and index. ISBN 978-0-521-76098-0 (hardback) – ISBN 978-0-521-14107-9 (paperback) 1. Medicine–Research–Statistical methods. 2. Multivariate analysis. 3. Biometry. 4. Medical statistics. I. Title. [DNLM: 1. Multivariate Analysis. 2. Biometry–methods. WA 950] R853.S7K38 2011 610.72–dc22 2010052187 ISBN 978-0-521-76098-0 Hardback ISBN 978-0-521-14107-9 Paperback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Every efort has been made in preparing this book to provide accurate and up-to-date information which is in accord with accepted standards and practice at the time of publication. Although case histories are drawn from actual cases, every efort has been made to disguise the identities of the individuals involved. Nevertheless, the authors, editors and publishers can make no warranties that the information contained herein is totally free from error, not least because clinical standards are constantly changing through research and regulation. he authors, editors and publishers therefore disclaim all liability for direct or consequential damages resulting from the use of material contained in this book. Readers are strongly advised to pay careful attention to information provided by the manufacturer of any drugs or equipment that they plan to use.

To my parents, for their unwavering support

Contents

Preface 1

Introduction

1

1.1 1.2

1

1.3 1.4

2

2.2 2.3 2.4 2.5

What are the most common uses of multivariable models in clinical research? How is multivariable analysis used in observational studies of etiology? How is multivariable analysis used in intervention studies (randomized and nonrandomized)? How is multivariable analysis used in studies of diagnosis? How is multivariable analysis used in studies of prognosis?

Outcome variables in multivariable analysis 3.1 3.2

vii

Why should I do multivariable analysis? What are confounders and how does multivariable analysis help me to deal with them? What are suppressers and how does multivariable analysis help me to deal with them? What are interactions and how does multivariable analysis help me to deal with them?

Common uses of multivariable models 2.1

3

page xiii

How does the nature of the outcome variable inluence the choice of which type of multivariable analysis to do? What type of multivariable analysis should I use with an interval outcome?

6 9 11 14

14 14 16 21 23 25

25 26

viii

Contents

3.3

What type of multivariable analysis should I use with a dichotomous outcome? 3.4 What type of multivariable analysis should I use with an ordinal variable? 3.5 What type of multivariable analysis should I use with a nominal outcome? 3.6 What type of multivariable analysis should I use with a time-to-outcome variable? 3.7 How likely is it that the censoring assumption is valid in my study? 3.8 How can I test the validity of the censoring assumption for my data? 3.9 What is the proportionality assumption of proportional hazards analysis? 3.10 What type of multivariable analysis should I use with counts? 3.11 What type of multivariable analysis should I use with an incidence rate? 3.12 May I change the coding of my outcome variable to use a diferent type of multivariable analysis? 4

Independent variables in multivariable analysis 4.1 4.2 4.3 4.4

4.5

5

How do I incorporate independent variables into a multivariable analysis? How do I incorporate nominal independent variables into a multivariable analysis? How do I incorporate interval-independent variables into a multivariable model? Assuming that my interval-independent variable its a linear assumption, is there any reason to group it into interval categories or create multiple dichotomous variables? How do I incorporate ordinal independent variables into a multivariable model?

Relationship of independent variables to one another 5.1 5.2 5.3

Does it matter if my independent variables are related to each other? How do I assess whether my variables are multicollinear? What should I do with multicollinear variables?

36 39 42 44 50 55 58 60 64 66 74

74 74 76

86 86 88

88 89 91

ix

6

Contents

Setting up a multivariable analysis 6.1 6.2 6.3 6.4 6.5 6.6 6.7

7

Performing the analysis 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9

8

What independent variables should I include in my multivariable model? How do I decide what confounders to include in my model? What independent variables should I exclude from my multivariable model? How many subjects do I need to do multivariable analysis? What if I have too many independent variables given my sample size? What should I do about missing data on my independent variables? What should I do about missing data on my outcome variable?

What numbers should I assign for dichotomous or ordinal variables in my analysis? Does it matter what I choose as my reference category for multiple dichotomous (“dummied”) variables? How do I enter interaction terms into my analysis? How do I enter time into my proportional hazards or other survival analysis? What about subjects who experience their outcome on their start date? What about subjects who have a survival time shorter than physiologically possible? How do I incorporate time into my Poisson analysis? What are variable selection techniques? My model won’t converge. What should I do?

93

93 93 94 97 102 108 115 118

118 120 122 124 129 131 133 134 139

Interpreting the results

140

8.1 8.2 8.3

140 140

8.4

What information will my multivariable analysis produce? How do I assess how well my model its the data? What do the coeicients tell me about the relationship between each variable and the outcome? How do I interpret the results of interaction terms?

149 159

x

Contents

8.5

9

Do I have to adjust my multivariable regression coeicients for multiple comparisons?

Delving deeper: Checking the underlying assumptions of the analysis How do I know if the assumptions of my multivariable model are met? 9.2 What are residuals? How are they used to assess the it of models? 9.3 How do I test the normal distribution and equal variance assumptions of a multiple linear regression model? 9.4 How do I test the linearity assumption of a multivariable model? 9.5 What are outliers and how do I detect them in a multivariable model? 9.6 What should I do when I detect outliers? 9.7 What is the additive assumption and how do I assess whether my multiple independent variables it this assumption? 9.8 How do I test the proportional odds assumption? 9.9 How do I test the proportionality assumption? 9.10 What if the proportionality assumption does not hold for my data?

159

162

9.1

10

11

162 165 166 167 170 171 174 174 177

Propensity scores

180

10.1 What are propensity scores? Why are they used?

180

Correlated observations

185

11.1 What circumstances lead to correlated observations? 11.2 Should I avoid study designs that lead to correlated observations? 11.3 How do I analyze correlated observations? 11.4 How do I calculate the needed sample size for studies with correlated observations? 12

162

185 187 189 207

Validation of models

208

12.1 How can I validate my models?

208

xi

13

Contents

Special topics 13.1 What if the independent variable changes value during the course of the study? 13.2 What are the advantages and disadvantages of time-dependent covariates? 13.3 What are classiication and regression trees (CART) and should I use them? 13.4 How can I get best use of my biostatistician? 13.5 How do I choose which sotware package to use?

14

15

Publishing your study

213

213 213 216 219 219 221

14.1 How much information about how I constructed my multivariable models should I include in the Methods section? 14.2 Do I need to cite a statistical reference for my choice of method of multivariable analysis? 14.3 Which parts of my multivariable analysis should I report in the Results section?

224

Summary: Steps for constructing a multivariable model

227

Index

229

221 223

Preface

here has been astounding growth in the use of multivariable analysis in clinical research. When the irst edition of this book was published in 1999 logistic regression and proportional hazards models were cutting-edge techniques. Now for many researchers, these are old, staid models and the new edge is mixed-efects models, generalized estimating equations, Poisson regression, and propensity score analysis. he use of these more sophisticated models is fueled by the development of user-friendly sotware for constructing multivariable models, increased availability of electronic databases (medical records, disease and procedure registries) that provide longitudinal data on large populations, and increased funding for and interest in clinical efectiveness studies – studies comparing diferent treatments in use – as a method of improving quality and reducing healthcare costs. What hasn’t changed in the past 11 years is the need for an easy-to-follow guide for nonstatisticians on how to perform and interpret these models. Although the available sotware (e.g., SPSS, SAS, S-plus, R) doesn’t require programming experience or mathematical aptitude to conduct the analyses, if the analysis is not set up correctly, the answer is sure to be wrong! Even when the analysis is performed correctly, researchers may not draw the correct conclusions from the output. To prevent these problems, throughout the book I have focused on how to set up and interpret multivariable analysis. I use examples from the medical and public health literature because illustrations of how to correctly analyze data and present the results will help you analyze and present your data correctly. Modeling your work based on successful published studies is one of the best and most eicient strategies for correctly analyzing data. he biggest changes in this edition are that I have written new sections on Poisson and negative binomial regression, proportional odds analysis, and multinomial logistic regression because these models are increasingly in use. I have improved the section on mixed-efects models and generalized xiii

xiv

Preface

estimating equations, and also expanded the section on checking the underlying assumptions of multivariable models (Chapter 9) using residuals and other techniques. While taking on new and more complicated material, I have maintained the basic organization of the book. Besides retaining the question-and-answer approach, the order of the book mirrors the process of doing multivariable analysis: deciding whether you need to do multivariable analysis (Chapters 1 and 2), choosing the correct model (Chapter 3), preparing your independent variables (Chapters 4 and 5), setting up the model (Chapter 6), performing the analysis (Chapter 7), interpreting the basic output (Chapter 8), delving deeper into the underlying assumptions of the model (Chapter 9), validating your model (Chapter 12) and publishing your study (Chapter 14). One of the reasons I prefer this approach to the more traditional approach (i.e., having a separate chapter on each type of multivariable model) is that it illustrates the similarities and diferences of the diferent approaches. In my experience, when the results are strong, diferent (but reasonable) approaches lead to similar answers; conversely, when the results are very diferent with diferent techniques be suspicious. Also, I have found that the most eicient way to end an argument over what the best way is to analyze a data set is to analyze it multiple ways and see whether the results difer. If there are few diferences then you have strengthened your results. When there are diferences, you have probably learned something important about the nature of your data. Also, by structuring the book to parallel the research process, it allows readers to join the book at whatever stage they are at in the research process. his book assumes that you are familiar with basic biostatistics. If not, I recommend S. Glantz’s Primer of Biostatistics (sixth edition, McGraw-Hill, 2005). I have also written a basic statistics book using a question-and-answer approach similar to that used in this book called Study Design and Statistical Analysis: A Practical Guide for Clinicians (Cambridge University Press, 2006). Some reviewers have suggested that the two books be combined, and while I see the merit in that, I also see a much fatter text that might be more expensive and of-putting to clinical researchers. Please forgive me therefore if I cite that book or my other book on performing interventions (Evaluating Clinical and Public Health Interventions, Cambridge University Press, 2010). It is not an exercise of ego, but rather an attempt to keep each book inexpensive and short. One of the challenges in writing a book for clinical researchers is deciding how much detail to include. One could easily have (and many have) written books larger than this about just one of the procedures described. To keep the presentations short and the material accessible, I direct readers who wish to know more about a particular procedure to more detailed sources in the

xv

Preface

footnotes. Since statistical textbooks are expensive, and many journal articles are not easy to ind, I have particularly emphasized web resources that I have found useful. Twenty years of students in the University of California, San Francisco, Clinical Research Program have contributed to this book through their insightful questions and observations. Serving as the Deputy Editor for the Archives of Internal Medicine during the past two years has deinitely sharpened my eye as to how best to conduct multivariable research. For this opportunity I am grateful to the Editor, Rita Redberg, M.D., our two biostatistical editors who have taught me much, John Neuhaus, Ph.D. and David Glidden, Ph.D., and the other editors, Patrick O’Malley, M.D. and Kirsten Johansen, M.D., who have shared their critical observations with me on hundreds of articles. I greatly appreciate the support of my editor Richard Marley and the staf at Cambridge University Press for encouraging me to do this third edition. he best part of writing and updating this book is the number of researchers who have emailed me with their comments, compliments, and questions. Writing textbooks is a lonely business and I wouldn’t do it unless I had evidence that the books were actually helping people to conduct better research. If you have questions or suggestions for future editions, email me at [email protected]

1

Introduction

1.1 Why should I do multivariable analysis? DEFINITION Multivariable analysis is a tool for determining the relative contributions of different causes to a single event.

We live in a multivariable world. Most events, whether medical, political, social, or personal, have multiple causes. And these causes are related to one another. Multivariable analysis1 is a statistical tool for determining the relative contributions of diferent causes to a single event or outcome. Clinical researchers, in particular, need multivariable analysis because most diseases have multiple causes, and prognosis is usually determined by a large number of factors. Even for those infectious diseases that are known to be caused by a single pathogen, a number of factors afect whether an exposed individual becomes ill, including the characteristics of the pathogen (e.g., virulence of strain), the route of exposure (e.g., respiratory route), the intensity of exposure (e.g., size of inoculum), and the host response (e.g., immunologic defense). Multivariable analysis allows us to sort out the multifaceted nature of risk factors and their relative contribution to outcome. For example, observational epidemiology has taught us that there are a number of risk factors associated with premature mortality, notably smoking, a sedentary lifestyle, obesity, elevated cholesterol, and hypertension. Note that I did not say that these factors cause premature mortality. Statistics alone cannot prove that a relationship between a risk factor and an outcome are causal.2 Causality is established on

1

2

1

he terms “multivariate analysis” and “multivariable analysis” are oten used interchangeably. In the strict sense, multivariate analysis refers to simultaneously predicting multiple outcomes. Since this book deals with techniques that use multiple variables to predict a single outcome, I prefer the more general term multivariable analysis. hroughout the text I use the terms “associated with” and “related to” interchangeably. Similarly, I use the terms “risk factor,” “exposure,” “predictor,” and “independent variable,” and the terms “outcome” and “dependent variable,” interchangeably. Although some of these terms such as “risk factor,” “predictor,” and “outcome” imply causality remember that causality can never be proven with statistical analysis. he best way for establishing causality is through rigorous study design (e.g., randomization to eliminate confounding, longitudinal observations to minimize the chance that the “outcome” caused the “risk factor”).

2

Introduction

the basis of biological plausibility and rigorous study designs, such as randomized controlled trials, which eliminate sources of potential bias. Identiication of risk factors of premature mortality through observational studies has been particularly important because you cannot randomize people to many of the conditions that cause premature mortality, such as smoking, sedentary lifestyle, or obesity. And yet these conditions tend to occur together; that is, people who smoke tend to exercise less and be more likely to be obese. How does multivariable analysis separate the independent contribution of each of these factors? Let’s consider the case of exercise. Numerous studies have shown that persons who exercise live longer than persons with sedentary lifestyles. But if the only reason that persons who exercise live longer is that they are less likely to smoke and more likely to eat low-fat meals leading to lower cholesterol, then initiating an exercise routine would not change a person’s life expectancy. he Aerobics Center Longitudinal Study tackled this important question. 3 hey evaluated the relationship between exercise and mortality in 25, 341 men and 7080 women. All participants had a baseline examination between 1970 and 1989. he examination included a physical examination, laboratory tests, and a treadmill evaluation to assess physical itness. Participants were followed for an average of 8.4 years for the men and 7.5 years for the women. Table 1.1 compares the characteristics of survivors to persons who had died during the follow-up. You can see that there are a number of signiicant diferences between survivors and decedents among men and women. Speciically, survivors were younger, had lower blood pressure, lower cholesterol, were less likely to smoke, and were more physically it (based on the length of time they stayed on the treadmill and their level of efort). Although the results are interesting, Table 1.1 does not answer our basic question: Does being physically fit independently increase longevity? It doesn’t answer the question because whereas the high-fitness group was less likely to die during the study period, those who were physically fit may just have been younger, been less likely to smoke, or had lower blood pressure. To determine whether exercise is independently associated with mortality, the authors performed proportional hazards analysis, a type of multivariable analysis. he results are shown in Table 1.2. If you compare the number of deaths per thousand person-years in men, you can see that there were more 3

Blair, S. N., Kampert, J. B., Kohl, H. W., et al. “Inluences of cardiorespiratory itness and other precursors on cardiovascular disease and all-cause mortality in men and women.” JAMA 276 (1996): 205–10.

3

1.1 Why should I do multivariable analysis?

Table 1.1 Baseline characteristics of survivors and decedents, Aerobics Center Longitudinal Study. Men

Characteristics Age, y (SD) Body mass index, kg/m 2 (SD) Systolic blood pressure, mm Hg (SD) Total cholesterol, mg/dL (SD) Fasting glucose, mg/dL (SD) Fitness, % Low Moderate High Current or recent smoker, % Family history of coronary heart disease, % Abnormal electrocardiogram, % Chronic illness, %

Women

Survivors (n = 24 740)

Decedents (n = 601)

Survivors (n = 6991)

Decedents (n = 89)

42.7 (9.7) 26.0 (3.6) 121.1 (13.5) 213.1 (40.6) 100.4 (16.3)

52.1 (11.4) 26.3 (3.5) 130.4 (19.1) 228.9 (45.4) 108.1 (32.0)

42.6 (10.9) 22.6 (3.9) 112.6 (14.8) 202.7 (40.5) 94.4 (14.5)

53.3 (11.2) 23.7 (4.5) 122.6 (17.3) 228.2 (40.8) 99.9 (25.0)

20.1 42.0 37.9 26.3 25.4 6.9 18.4

41.6 39.1 19.3 36.9 33.8 26.3 40.3

18.8 40.6 40.6 18.5 25.2 4.8 13.4

44.9 33.7 21.3 30.3 27.0 18.0 20.2

Adapted with permission from Blair, S. N., et al. “Inluences of cardiorespiratory itness and other precursors on cardiovascular disease and all-cause mortality in men and women.” JAMA 276 (1996):205–10. Copyright 1996, American Medical Association. Additional data provided by authors.

DEFINITION Stratified analysis assesses the effect of a risk factor on outcome while holding another variable constant.

deaths in the low-itness group (38.1) than in the moderate/high itness group (25.0). his diference is relected in the elevated relative risk for lower itness (38.1/25.0 = 1.52). hese results are adjusted for all of the other variables listed in the table. his means that low itness is associated with higher mortality, independent of the efects of other known risk factors for mortality, such as smoking, elevated blood pressure, cholesterol, and family history. A similar pattern is seen for women. Was there any way to answer this question without multivariable analysis? One could have performed stratiied analysis. Stratiied analysis assesses the efect of a risk factor on outcome while holding another variable constant. So, for example, we could compare physically it to unit persons separately among smokers and nonsmokers. his would allow us to calculate a relative risk for the impact of itness on mortality, independent of smoking. his analysis is shown in Table 1.3. Unlike the multivariable analysis in Table 1.2, the analyses in Table 1.3 are bivariate.4 We see that the mortality rate is greater among those at low itness 4

Some researchers use the term “univariate” to describe the association between two variables. I think it is more informative to restrict the term univariate to analyses of a single variable (e.g., mean, median), while using the term “bivariate” to refer to the association between two variables.

4

Introduction

Table 1.2 Multivariable analysis of risk factors for all-cause mortality, Aerobics Center Longitudinal Study. Men

Women

Deaths per 10 000 person-years

Adjusted relative risk (95% CI)

Fitness Low Moderate/High

38.1 25.0

1.52 (1.28–1.82) 1.0 (ref.)

27.8 13.2

2.10 (1.36–3.26) 1.0 (ref.)

Smoking status Current or recent smoker Past or never smoked

39.4 23.9

1.65 (1.39–1.97) 1.0 (ref.)

27.8 14.0

1.99 (1.25–3.17) 1.0 (ref.)

Systolic blood pressure ≥140 mm Hg