126 74 14MB
English Pages 440 [436] Year 2023
Josafhat Salinas Ruíz Osval Antonio Montesinos López Gabriela Hernández Ramírez Jose Crossa Hiriart
Generalized Linear Mixed Models with Applications in Agriculture and Biology
Generalized Linear Mixed Models with Applications in Agriculture and Biology
Josafhat Salinas Ruíz • Osval Antonio Montesinos López • Gabriela Hernández Ramírez Jose Crossa Hiriart
Generalized Linear Mixed Models with Applications in Agriculture and Biology
Josafhat Salinas Ruíz Innovación Agroalimentaria Sustentable Colegio de Postgraduados Córdoba, Mexico
Osval Antonio Montesinos López Facultad de Telemática University of Colima Colima, Mexico
Gabriela Hernández Ramírez Ciencia de los Alimentos y Biotecnología Tecnológico Nacional de México/ITS de Tierra Blanca, Mexico
Jose Crossa Hiriart International Maize and Wheat Improvement Center (CIMMYT) Texcoco, Mexico
ISBN 978-3-031-32799-5 ISBN 978-3-031-32800-8 https://doi.org/10.1007/978-3-031-32800-8
(eBook)
The translation was done with the help of artificial intelligence (machine translation by the service DeepL.com). A subsequent human revision was done primarily in terms of content. © The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword
This book is another example of CIMMYT’s commitment to scientific advancement, and comprehensive and inclusive knowledge sharing and dissemination. By using alternative statistical models and methods to describe and analyze data sets from different disciplines, such as biology and agriculture, the book facilitates the adoption and effective use of these tools by publicly funded researchers and practitioners of national agricultural research extension systems (NARES) and universities across the Global South. The authors aim to offer different and new models, methods, and techniques to agricultural scientists who often lack the resources to adopt these tools or face practical constraints when analyzing different types of data. This work would not be possible with the continuous support of CIMMYT’s outstanding partners and donors who invest in non-profit frontier research for the benefit of millions of farmers and low-income communities worldwide. For that reason, it could not be more fitting for this book to be published as an open access resource for the international community to benefit from. I trust that this publication will greatly contribute to accelerate the development and deployment of resourceefficient and nutritious crops for a food secure future. Bram Govaerts Director General, CIMMYT
v
Acknowledgments
The work presented in this book described models and methods for improving the statistical analyses of continuous, ordinal, and count data usually collected in agriculture and biology. The authors are grateful to the past and present CIMMYT Directors General, Deputy Directors of Research, Deputy Directors of Administration, Directors of Research Programs, and other Administration offices and Laboratories of CIMMYT for their continuous and firm support of biometrical genetics, and statistics research, training, and service in support of CIMMYT’s mission: “maize and wheat science for improved livelihoods.” This work was made possible with support from the CGIAR Research Programs on Wheat and Maize (wheat.org, maize.org), and many funders including Australia, United Kingdom (DFID), USA (USAID), South Africa, China, Mexico (SAGARPA), Canada, India, Korea, Norway, Switzerland, France, Japan, New Zealand, Sweden, and the World Bank. We thank the financial support of the Mexico Government throughout MASAGRO and several other regional projects and close collaboration with numerous Mexican researchers. We acknowledge the financial support provided by the (1) Bill and Melinda Gates Foundation (INV-003439 BMGF/FCDO Accelerating Genetic Gains in Maize and Wheat for Improved Livelihoods [AG2MW]) as well as (2) USAID projects (Amend. No. 9 MTO 069033, USAID-CIMMYT Wheat/AGGMW, AGG-Maize Supplementary Project, AGG [Stress Tolerant Maize for Africa]). Very special recognition is given to Bill and Melinda Gates Foundation for providing the Open Access fee of this book. We are also thankful for the financial support provided by the (1) Foundations for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR grant 267806, (2) Sveriges Llantbruksuniversitet (Swedish University of Agricultural Sciences) Department of
The original version of this book has been revised. The Acknowledgment section which was inadvertently omitted after the Foreword has now been included. vii
viii
Acknowledgments
Plant Breeding, Sundsvägen 10, 23053 Alnarp, Sweden, (3) CIMMYT CRP, (4) the Consejo Nacional de Tecnología y Ciencia (CONACYT) of México, and (5) Universidad de Colima of Mexico. We highly appreciate and thank the several students and professors at the Universidad de Colima and students as well professors from the Colegio de PostGraduados (COLPOS) who tested and made suggestions on early version of the material covered in the book; their many useful suggestions had a direct impact on the current organization and the technical content of the book.
Contents
1
2
Elements of Generalized Linear Mixed Models . . . . . . . . . . . . . . . . . 1.1 Introduction to Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 1.3 Analysis of Variance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 One-Way Analysis of Variance . . . . . . . . . . . . . . . . . . . . 1.3.2 Two-Way Nested Analysis of Variance . . . . . . . . . . . . . . 1.3.3 Two-Way Analysis of Variance with Interaction . . . . . . . 1.4 Analysis of Covariance (ANCOVA) . . . . . . . . . . . . . . . . . . . . . 1.5 Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 Distribution of the Response Variable Conditional on Random Effects (y| b) . . . . . . . . . . . . . . . . . . . . . . . . 1.5.4 Types of Factors and Their Related Effects on LMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.5 Nested Versus Crossed Factors and Their Corresponding Effects . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.6 Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.7 One-Way Random Effects Model . . . . . . . . . . . . . . . . . . 1.5.8 Analysis of Variance Model of a Randomized Block Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Components of a GLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 The Random Component . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 2 4 6 6 9 13 18 22 22 23 25 26 27 27 30 31 35 40 43 43 44 44 ix
x
3
4
Contents
2.2.2 The Systematic Component . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Predictor’s Link Function η . . . . . . . . . . . . . . . . . . . . . . 2.3 Assumptions of a GLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Estimation and Inference of a GLM . . . . . . . . . . . . . . . . . . . . . . 2.5 Specification of a GLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Continuous Normal Response Variable . . . . . . . . . . . . . . 2.5.2 Binary Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Poisson Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.4 Gamma Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Beta Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44 45 48 48 49 49 51 59 67 72 76 83
Objectives of Inference for Stochastic Models . . . . . . . . . . . . . . . . . . 3.1 Three Aspects to Consider for an Inference . . . . . . . . . . . . . . . . . 3.1.1 Data Scale in the Modeling Process Versus Original Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Inference Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Inference Based on Marginal and Conditional Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Illustrative Examples of the Data Scale and the Model Scale . . . . 3.3 Fixed and Random Effects in the Inference Space . . . . . . . . . . . . 3.3.1 A Broad Inference Space or a Population Inference . . . . . 3.3.2 Mixed Models with a Normal Response . . . . . . . . . . . . . 3.4 Marginal and Conditional Models . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Marginal Versus Conditional Models . . . . . . . . . . . . . . . 3.4.2 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Non-normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85 86
Generalized Linear Mixed Models for Non-normal Responses . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 A Brief Description of Linear Mixed Models (LMMs) . . . . . . . . 4.3 Generalized Linear Mixed Models . . . . . . . . . . . . . . . . . . . . . . . 4.4 The Inverse Link Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 The Variance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Specification of a GLMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Estimation of the Dispersion Parameter . . . . . . . . . . . . . . . . . . . 4.8 Estimation and Inference in Generalized Linear Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Fitting the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87 87 88 88 101 101 103 105 105 107 108 111 113 113 114 114 116 117 117 119 123 123 125 126 126
Contents
5
6
Generalized Linear Mixed Models for Counts . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Poisson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 CRD with a Poisson Response . . . . . . . . . . . . . . . . . . . . 5.2.2 Example 2: CRDs with Poisson Response . . . . . . . . . . . . 5.2.3 Example 3: Control of Weeds in Cereal Crops in an RCBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Overdispersion in Poisson Data . . . . . . . . . . . . . . . . . . . . 5.2.5 Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 Latin Square (LS) Design . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generalized Linear Mixed Models for Proportions and Percentages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Response Variables as Ratios and Percentages . . . . . . . . . . . . . . 6.2 Analysis of Discrete Proportions: Binary and Binomial Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Completely Randomized Design (CRD): Methylation Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Factorial Design in a Randomized Complete Block Design (RCBD) with Binomial Data: Toxic Effect of Different Treatments on Two Species of Fleas . . . . . . . . . . . . . . . . . . . . . 6.4 A Split-Plot Design in an RCBD with a Normal Response . . . . . . 6.4.1 An RCBD Split Plot with Binomial Data: Carrot Fly Larval Infestation of Carrots . . . . . . . . . . . . . . . . . . . . . . 6.5 A Split-Split Plot in an RCBD:- In Vitro Germination of Seeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Alternative Link Functions for Binomial Data . . . . . . . . . . . . . . . 6.6.1 Probit Link: A Split-Split Plot in an RCBD with a Binomial Response . . . . . . . . . . . . . . . . . . . . . . . 6.6.2 Complementary Log-Log Link Function: A Split Plot in an RCBD with a Binomial Response . . . . . . . . . . . . . . 6.7 Percentages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.1 RCBD: Dead Aphid Rate . . . . . . . . . . . . . . . . . . . . . . . . 6.7.2 RCBD: Percentage of Quality Malt . . . . . . . . . . . . . . . . . 6.7.3 A Split Plot in an RCBD: Cockroach Mortality (Blattella germanica) . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.4 A Split-Plot Design in an RCBD: Percentage Disease Inhibition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7.5 Randomized Complete Block Design with a Binomial Response with Multiple Variance Components . . . . . . . . 6.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
129 129 129 131 134 138 142 149 157 170 178 209 209 210 210
217 219 221 232 237 238 239 242 242 245 247 250 253 257 265
xii
7
8
9
Contents
Time of Occurrence of an Event of Interest . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Generalized Linear Mixed Models with a Gamma Response . . . . 7.2.1 CRD: Estrus Induction in Pelibuey Ewes . . . . . . . . . . . . . 7.2.2 Randomized Complete Block Design (RCBD): Itch Relief Drugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Factorial Design: Insect Survival Time . . . . . . . . . . . . . . 7.2.4 A Split Plot with a Factorial Structure on a Large Plot in a Completely Randomized Design (CRD) . . . . . . . 7.3 Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Concepts and Definitions . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 CRD: Aedes aegypti . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 RCBD: Aedes aegypti . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generalized Linear Mixed Models for Categorical and Ordinal Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Concepts and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Cumulative Logit Models (Proportional Odds Models) . . . . . . . . 8.3.1 Complete Randomize Design (CRD) with a Multinomial Response: Ordinal . . . . . . . . . . . . . . . 8.3.2 Randomized Complete Block Design (RCBD) with a Multinomial Response: Ordinal . . . . . . . . . . . . . . . 8.4 Cumulative Probit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Effect of Judges’ Experience on Canned Bean Quality Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Generalized Logit Models: Nominal Response Variables . . . . . . . 8.6.1 CRDs with a Nominal Multinomial Response . . . . . . . . . 8.6.2 CRD: Cheese Tasting . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generalized Linear Mixed Models for Repeated Measurements . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Example of Turf Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Effect of Insecticides on Aphid Growth . . . . . . . . . . . . . . . . . . . 9.4 Manufacture of Livestock Feed . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Characterization of Spatial and Temporal Variations in Fecal Coliform Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Log-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Emission of Nitrous Oxide (N2O) in Beef Cattle Manure with Different Percentages of Crude Protein in the Diet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
279 279 280 280 282 284 287 291 293 295 297 301 306 321 321 322 323 324 328 336 338 348 350 353 357 374 377 377 378 382 387 390 395
397
Contents
Effect of a Chemical Salt on the Percentage Inhibition of the Fusarium sp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Carbon Dioxide (CO2) Emission as a Function of Soil Moisture and Microbial Activity . . . . . . . . . . . . . . . . . . . . . . . . 9.9 Effect of Soil Compaction and Soil Moisture on Microbial Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10 Joint Model for Binary and Poisson Data . . . . . . . . . . . . . . . . . . 9.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
9.7
398 402 407 409 413 416
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Chapter 1
Elements of Generalized Linear Mixed Models
1.1
Introduction to Linear Models
Linear models are commonly used to describe and analyze datasets from different research areas, such as biological, agricultural, social, and so on. A linear model aims to best represent/describe the nature of a dataset. A model is usually made up of factors or a series of factors that can be nominal or discrete variables (sex, year, etc.) or continuous variables (age, height, etc.), which have an effect on the observed data. Linear models are the most commonly used statistical models for estimating and predicting a response based on a set of observations. Linear models get their name because they are linear in the model parameters. The general form of a linear model is given by y = Xβ þ ε
ð1:1Þ
where y is the vector of dimension n × 1 observed responses, X is the design matrix of n × ( p + 1) fixed constants, β is the vector of ( p + 1) × 1 parameters to be estimated (unknown), and ε is the vector of n × 1 random errors. Linearity arises because the mean response of vector y is linear to the vector of unknown parameters β. Mathematically, this is demonstrated by obtaining the first derivative of the predictor with respect to β, and, if after derivation it is still a function of any of the beta parameters, then the model is said to be nonlinear; otherwise, it is a linear model. In this case, the derivative of the predictor (1.1) with respect to beta is equal to X, so, mathematically, the model in (1.1) is linear, since after derivation, the predictor no longer depends on the β parameters. Several models used in statistics are examples of the general linear model y = Xβ + e. These include regression models and analysis of variance (ANOVA) models. Regression models generally refer to those in which the design matrix X is
© The Author(s) 2023 J. Salinas Ruíz et al., Generalized Linear Mixed Models with Applications in Agriculture and Biology, https://doi.org/10.1007/978-3-031-32800-8_1
1
2
1
Elements of Generalized Linear Mixed Models
of a full column rank, whereas in analysis of variance models, the design matrix X is not of a full column rank. Some linear models are briefly described in the following sections.
1.2
Regression Models
Linear models are often used to model the relationship between a variable, known as the response or dependent variable, y, and one or more predictors, known as independent or explanatory variables, X1, X2, ⋯, Xp.
1.2.1
Simple Linear Regression
Consider a model in which a response variable y is linearly related to an explanatory variable X1 via yi = β0 þ β1 X 1i þ εi where εi are uncorrelated random errors (i = 1, 2, ⋯, n) which are commonly assumed to be normally distributed with mean 0 and variance constant σ 2 > 0, εi ~ N(0, σ 2). If X11, X12, ⋯, X1n are constant (fixed), then this is a general linear model y = Xβ + e where
yn × 1 =
y1 y2 ⋮ yn
, Xn × 2 =
1 1 ⋮ 1
X 11 X 12 ⋮ X 1n
, β2 × 1 =
β0 , en × 1 = β1
E1 E2 ⋮ En
Example Let us consider the relationship between the performance test scores and tissue concentration of lysergic acid diethylamide commonly known as LSD (from German Lysergsäure-diethylamid) in a group of volunteers who received the drug
Table 1.1 Average mathematical test scores and LSD tissue concentrations
Tissue concentration of LSD 1.17 2.97 3.26 4.69 5.83 6.00 6.41
Mathematical average 78.93 58.20 67.47 37.47 45.65 32.92 29.97
1.2
Regression Models
3
Table 1.2 Results of the simple regression analysis
(a) Type III tests of fixed effects Effect Num DF Den DF Conc 1 5 (b) Parameter estimates Standard Effect Estimate error Intercept 89.1239 7.0475 Conc -9.0095 1.5031 50.7763 32.1137 Scale σ2
F-value 35.93
DF 5 5 .
t-value 12.65 -5.99 .
Pr > F 0.0019
Pr > |t| F 0.1034 0.0368
F-value 4.42 9.51 Pr > |t| 0.0935 0.1034 0.0368 .
α 0.05 0.05 0.05 .
Lower -1122.13 -0.7243 0.4564 .
Upper 132.10 5.2388 8.7053 .
β
yi = β0 þ X i11 þ β2 X i2 þ ⋯ þ βk expðX ik Þ þ Ei The first example is a linear model, whereas the second one is not, since its β derivatives do not depend on the beta coefficients, with the exception of the term X i11 β1 whose derivative is equal to X i1 logðX i1 Þ . This clearly shows that the second example is a nonlinear model because the derivative of the predictor depends on β1.
1.3 1.3.1
Analysis of Variance Models One-Way Analysis of Variance
Consider an experiment in which you want to test t treatments (t > 2), to the level of the ith treatment with ni experimental units that are selected and randomly assigned to the ith treatment. The model describing this experiment is as follows: yij = μ þ τi þ Eij for i = 1, 2, ⋯, t and j = 1, 2, ⋯, ni . Here, Eij are the uncorrelated random errors with normal distribution with a zero mean and a variance constant σ 2 (εij ~ N(0, σ 2)). If the treatment effects are considered as fixed constants (drawn from a finite number), then this model is a special case of the general linear model (1), with the total number of experimental units n =
t
ni . i
1.3
Analysis of Variance Models
7
Table 1.5 Biomass production of the three types of bacteria
Bacteria A 12 15 9
Table 1.6 Analysis of variance
Sources of variation Bacteria type Error Total
Bacteria B 20 19 23
Bacteria C 40 35 42
Degrees of freedom t-1=3-1=2 t(r - 1) = 3 × 2 = 6 tr - 1 = 3 × 3 - 1 = 8
In matrix terms, the information under this design of experiment is equal to:
yn × 1 =
y11 y12 ⋮ ytnt
en × 1 =
ε11 ε12 E13 ⋮ Etnt
, Xn × ðtþ1Þ =
1n 1 1n 2 ⋮ 1n t
1n 1 0n 2 ⋮ 0n t
0n1 1n2 ⋮ 0n t
⋯ ⋯ ⋱ ⋯
0n1 0n2 ⋮ 1nt
, βtþ1 × 1 =
μ τ1 τ2 ⋮ τt
,
where 1ni is the vector of ones of order ni and 0ni is the vector of zeros of order ni. Note that the matrix Xn×(t+1) is not of a full column rank because its first column can be obtained as a linear combination of its remaining columns. Example Assume that measurements of the biomass produced by three different types of bacteria are collected in three separate Petri dishes (replicates) in a glucose broth culture medium for each bacterium (Table 1.5). The sources of variation and degrees of freedom (DFs) for this experiment are shown in Table 1.6. The components for this one-way model, assuming that each of the response variable yij is normally distributed, are as follows: Distribution: yij N μij , σ 2 Linear predictor: ηi = α þ τi Link function: μi = ηi ðidentityÞ where yij is the response observed at the jth repetition in the ith bacterium, ηi is the linear predictor, α is the intercept (the grand mean), and τi is the fixed effect due to the type of bacterium.
8
1
Table 1.7 Results of the one-way analysis of variance
Elements of Generalized Linear Mixed Models
(a) Fit statistics -2 Res log likelihood 33.36 AIC (Akaike information criterion) (smaller is better) 41.36 AICC (Corrected Akaike information criterion) (smaller 81.36 is better) BIC (Bayesian information criterion) (smaller is better) 40.52 CAIC (Consistent Akaike's information criterion) 44.52 (smaller is better) HQIC (Hannan and Quinn information criterion) 38.02 (smaller is better) Pearson’s chi-square 52.67 Pearson’s chi-square / DF 8.78 (b) Type III tests of fixed effects Effect Num DF Den DF F-value Pr > F Bacteria 64.95 |t| 0.0004 |t|