Handbook of Statistics 13: Design and Analysis of Experiments [13, First ed.] 9780444820617, 0444820612

132 30

English Pages [1236] Year 1996

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Design and Analysis of Experiments

620 133 14MB Read more

The Design and Analysis of Experiments and Surveys 9783486843668, 9783486582994

This volume is the English version of the second edition of the bilingual textbook by Rasch, Verdooren and Gowers (1999)

173 57 27MB Read more

Design and Analysis of Experiments [part 2, 5th Edition]

511 67 9KB Read more

Design and Analysis of Simulation Experiments 0792336623, 9780792336624

This book is devoted to a new branch of experimental design theory called simulation experimental design. There are many

100 82 Read more

JOURNAL OF COMPUTATIONAL ANALYSIS AND APPLICATIONS VOLUME 13, 2011 0651098283

393 68 20MB Read more

Understandable Statistics: Concepts and Methods [13 ed.] 0357719174, 9780357719176

Overcome any apprehension you may have as you master statistics with Brase/Brase/Seibert/Dolor's UNDERSTANDABLE STA

363 14 55MB Read more

Analysis of Plastics, Vol. 13 (2002)(en)(160s) 9781859573334, 1859573339

This review outlines each technique used in plastics analysis and then illustrates which methods are applied to obtain a

327 37 3MB Read more

statistics for business and economics [13 ed.]

152 103 265KB Read more

Understandable Statistics. Concepts and Methods [13 ed.]

407 52 52MB Read more

Potato Production Systems (Handbook of Plant Breeding, 13) 3030391566, 9783030391560

This comprehensive guide to potato production systems management contains 20 chapters and more than 350 color photograph

123 81 37MB Read more

Handbook of Statistics 13: Design and Analysis of Experiments [13, First ed.]
9780444820617, 0444820612

Author / Uploaded
C. R. Rao
S. Ghosh
Editors

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Handbook of Statistics Volume 13

Design and Analysis of Experiments

About the Book The purpose of this volume is to provide the reader with statistical design and methods of analysis, as well as the research activities in developing new and better methods for performing such tasks. Scientific methods in medicine, industry, agriculture and other disciplines are covered.

Handbook of Statistics Volume 13

Design and Analysis of Experiments

Edited by

S. Ghosh C.R. Rao

This page has been left intentionally blank

North-Holland is an imprint of Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright © 1996 Elsevier B.V. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. ISBN: 978-0444820617 ISBN: 0444820612

For information on all North-Holland publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Zoe Kruze Acquisition Editor: Sam Mahfoudh Editorial Project Manager: Peter Llewellyn Production Project Manager: Vignesh Tamil Cover Designer: Mark Rogers Typeset by SPi Global, India

This page has been left intentionally blank

This page has been left intentionally blank

Preface

In scientific investigations, the collection of pertinent information and the efficient analysis of the collected information using statistical principles and methods are very important tasks. The purpose of this volume of the Handbook of Statistics is to provide the reader with the state-of-the-art of statistical design and methods of analysis that are available as well as the frontiers of research activities in developing new and better methods for performing such tasks. The volume is a tribute to all individuals who helped in developing this particular area of knowledge to be useful in our everyday life. Scientific experiments in medicine, industry, agriculture, computer and many other disciplines are covered in this volume. Statistical methods like parametric, semiparametric, nonparametric, adaptive, univariate, multivariate, frequentist and Bayesian, are discussed. Block, row-column, nested, factorial, response surface, spatial, robust, optimum, search, singlestage, multistage, exact and approximate designs are presented. The chapters are written in expository style and should be of value to students, researchers, consultants and practitioners in universities as well as industries. We would like to express our deep appreciation to the following colleagues for their cooperation and help in reviewing the chapters. Deborah Best, Tadeusz Calinski, Kathryn Chaloner, Richard Cutler, Angela Dean, Benjamin Eichhorn, Richard Gunst, Sudhir Gupta, Linda Haines, Thomas Hettmansperger, Klaus Hinkelmann, Theodore Holford, Jason Hsu, Sanpei Kageyama, Andre Khuri, Dibyen Majumdar, John Matthews, Douglas Montgomery, Christine Muller, William Notz, Thomas Santner, Pranab Sen, Stanley Shapiro, Carl Spruill, Jagdish Srivastava, John Stufken, Ajit Tamhane, Isabella Verdinelli and Shelley Zacks. We are very grateful to the authors for their contributions. Our sincere thanks go to Dr. Gerard Wanrooy and others at North-Holland Publishing Company, also to Dr. Rimas MaliukeviEius and others of VTEX Ltd. Subir Ghosh would like to thank two very special individuals in his life, Mrs Benu Roy (the late) and Mr. Amalendu Roy as well as his wife Susnata and daughter Malancha for their patience, help and understanding. S. Ghosh and C. R. Rao

Contributors

C. M. Anderson-Cook, Department of Statistical and Actuarial Sciences, University

of Western Ontario, London, Ontario, N6A 5B7, Canada (Ch. 8) P. Armitage, University of Oxford, 2 Reading Road, Wallingford, Oxon OXIO 9DP, England (Ch. 1) A. C. Atkinson, Department of Statistics, London School of Economics, Houghton Street, London WC2A 2AE, UK (Ch. 14) E. Brunner, Abt. Med. Statistik, University of Goettingen, Humboldt Allee 32, 37073, Goettingen, Germany (Ch. 19) T. Califiski, Department of Mathematics and Statistical Methods, Agricultural University of Poznan, Wojska Polskiego 28, PL 60-637 Poznan, Poland (Ch. 22) ¥-J. Chang, Department of Mathematical Sciences, Idaho State University, Pocatello, ID 83201, USA (Ch. 28) C.-S. Cheng, Department of Statistics, University of California, Berkeley, CA 94720, USA (Ch. 26) A. DasGupta, Department of Statistics, Purdue University, 1399 Mathematical Sciences Building, West Lafayette, IN 47907-1399, USA (Ch. 29) A. M. Dean, Department of Statistics, Ohio State University, 141 Cockins Hall, 1958 Neil Avenue, Columbus, OH 43210, USA (Ch. 20) N. R. Draper, Department of Statistics, University of Wisconsin, 1210 W. Dayton Street, Madison, WI 53706, USA (Ch. 11) V. Fedorov, Oak Ridge National Laboratory, RO. Box 2008, Oak Ridge, TN 378316367, USA (Ch. 16) N. Gaffke, Fakultiit fiir Mathematik, Institut fiir Mathematische Stochastik, Otto-vonGuericke-Universitgit Magdeburg, PosO~ach 4120, D-39016 Magdeburg, Germany (Ch. 30) S. Ghosh, Department of Statistics, University of California, Riverside, CA 925210138, USA (Ch. 13) S. S. Gupta, Department of Statistics, Purdue University, 1399 Mathematical Sciences Building, West Lafayette, IN 47907-1399, USA (Ch. 17) S. Gupta, Division of Statistics, Northern Illinois University, DeKalb, IL 60115-2854, USA (Ch. 23) L. M. Haines, Department of Statistics and Biometry, Faculty of Science, University of Natal, Private Bag X01 Scottsville, Pietermaritzburgh 3209, South Africa (Ch. 14) B. Heiligers, Fakulti~tf~r Mathematik, Institut fu'r Mathematische Stochastik, Otto-vonGuericke-Universitdt Magdeburg, Postfach 4120, D-39016 Magdeburg, Germany (Ch. 30) S. Kageyama, Department of Mathematics, Faculty of School Education, Hiroshima University, 1-1-1 Kagamiyama, Higashi-Hiroshima 739, Japan (Ch. 22) xvii

xviii A. I. Khuri, Department of Statistics, The University of Florida, 103 Griffin-Floyd Hall, Gainesville, Florida 32611-8545, USA (Ch. 12) J. R. Koehler, Department of Mathematics, University of Colorado, Denver, CO 802173364, USA (Ch. 9) D. K. J. Lin, Department of Statistics, University of Statistics, University of Tennessee, Knoxville, TN 37996-0532, USA (Ch. 11) D. Majumdar, Department of Mathematics, Statistics and Computer Science (M/C 249), University of Illinois, 851 South Morgan Street, Chicago, IL 60607-3041, USA (Ch. 27) R. J. Martin, School of Mathematics and Statistics, University of Sheffield, P.O. Box 597, Sheffield $3 7RH, UK (Ch. 15) J. P. Morgan, Department of Mathematics and Statistics, Old Dominion University, Norfolk, Virginia 23529-0077, USA (Ch. 25) R. Mukerjee, Indian Institute of Management, P.O. Box 16757, Calcutta 700027, India (Ch. 23) W. I. Notz, Department of Statistics, Ohio State University, I958 ]Veil Avenue, Columbus, OH 43210, USA (Ch. 28) A. B. Owen, Department of Statistics, Stanford University, Sequoia Hall, Stanford, CA 94305-4065, USA (Ch. 9) S. Panchapakesan, Department of Mathematics, Southern Illinois University, Carbondale, IL 62901-4408, USA (Ch. 17) H. I. Patel, Department of Epidemiology and Medical Affairs, Berlex Laboratories, Inc., 300 Fairfield Road, Wayne, NJ 074770-7358, USA (Ch. 2) M. L. Puri, Department of Mathematics, Indiana University, Rawles Hail, Bloomington, IN 47405-5701, USA (Ch. 19) E R. Rosenbaum, Department of Statistics, The Wharton School, University of Pennsylvania, 3013 Steinberg-Dietrich Hall, Philadelphia, PA 19104-6302, USA (Ch. 6) R K. Sen, Department of Biostatistics, University of North Carolina at Chapel Hill, CB#7400, Chapel Hill, NC 27599-7400, USA (Ch. 4) K. R. Shah, Statistics Department, University of Waterloo, Waterloo, N2L 3G1, Canada (Oh. 24) B. K. Sinha, Stat.-Math. Division, Indian Statistical Institute, 203 B.T. Road, Calcutta 700035, India (Ch. 24) J. N. Srivastava, Department of Statistics, Colorado State University, Fort Collins, CO 80523-1877, USA (Oh. 10) D. M. Steinberg, Department of Statistics and Operations Research, Tel Aviv University, Ramat Aviv 69978, Tel Aviv, Israel (Ch. 7) D. J. Street, University of Technology, Sydney, Broadway NSW 2007, Australia (Ch. 21) J. Stufken, Department of Statistics, Iowa State University, I02-E Snedecor Hall, Ames, IA 50011-1210, USA (Ch. 3) A. C. Tamhane, Department of Industrial Engineering, Northwestern University, Evanston, IL 60208-3119, USA (Ch. 18) D. A. Wolfe, Department of Statistics, Ohio State University, 141 Cockins Hall, 1958 Neil Avenue, Columbus, OH 43210, USA (Ch. 20) S. Zacks, Department of Mathematical Sciences, Binghamton University, P.O. Box 6000, Binghamton, NY 13902-6000, USA (Ch. 5)

S. Ghosh and C. R. Rao, eds., Handbook of Statistics, Vol. 13 © 1996 Elsevier Science B.V. All rights reserved.

1

1_

The Design and Analysis of Clinical Trials

Peter Armitage

1. Introduction

1.1. The background Of all the current forms of statistical experimentation, the clinical trial is the most likely to be familiar to a layperson, and the most likely to engage him or her as a participant. If we date statistical experimentation back to the mid-1920s (Fisher, 1926), then clinical trials have been with us for some 50 of the subsequent 70 years, so they can reasonably claim to be one of the oldest as well as one of the most prolific and pervasive branches of statistical science. That assessment makes the customary assumption that the first published clinical trial to meet modern standards of rigour is the trial of streptomycin in the treatment of pulmonary tuberculosis (Medical Research Council, 1948). This was indeed a landmark, but there were important forerunners, gradually preparing the ground for the new methods. An essential feature in a clinical trial is the comparison between the effects of different medical procedures. This principle had been recognized in the 18th and 19th centuries by many investigators, who had realised the need to provide a fair basis for the comparison; Bull (1959) presents an excellent survey of this early work. But the solution to the problem of ensuring fairness in the comparison had to await Fisher's exposition of the case for random assignment in the 1920s. During the 1930s many statisticians and others who appreciated the value of randomized field studies in agricultural research may well have dismissed the possibility of applying these methods in medicine. For the experimental units were no longer inanimate objects or plots of ground, but were human beings - and sick ones at that. The early trials organized by the (British) Medical Research Council, and those run in the United States by the Veterans' Administration and, later, the National Institutes of Health, showed that such fears were unjustified. The developments were supported by the strong advocacy of statisticians such as A Bradford Hill (Hill, 1962) in the United Kingdom and Harold Dorn in the United States, and by many influential physicians. A glance at the current contents of the major medical journals shows how well established the method has now become. It seems at times to be threatened by financial or legal setbacks, but it is strongly supported by the requirements of statutory bodies for the regulation of new medical procedures, by the needs of the pharmaceutical industry,

2

P. Armitage

and by a growing realisation by the medical profession that there is no other reliable way of assessing the relative merits of competing medical treatments. In this paper I shall outline some of the main features of clinical trials that need to be taken into account by a statistician involved in their planning, analysis and interpretation. Further details may be obtained from a number of excellent books, such as those by Schwartz et al. (1980), Shapiro and Louis (1983), Pocock (1983), Buyse et al. (1984), Friedman et al. (1985) and Meinert (1986).

1.2. The move towards randomization

The random assignment of treatments was advocated by Pierce and Jastrow for psychological experiments as early as 1883, but the nearest approximation to gain any currency during the first three decades of the 20th century was the idea of systematic assignment of two treatments to alternate cases in a series; (in most trials patients become available for treatment serially rather than simultaneously, so alternation is feasible). Alternation would usually be effective if there was no possibility that the order of presentation or assessment of eligibility were affected by a knowledge of the treatment to be applied. But with a systematic assignment schedule this type of selection bias is very difficult to avoid. For this reason, Hill was led to advocate strictly random assignment for the tuberculosis trial published in 1948, and for a trial of pertussis vaccines started earlier but published later (Medical Research Council, 1951). These remarks might have been of purely historical interest, were it not that the principle of random assignment still comes under occasional attack. Gehan and Freireich (1974), for instance, sought to identify situations for which the use of historical controls was preferable to randomization. However, the impossibility of ensuring the absence of bias between the groups being compared, and of measuring the extent of that bias, has led most investigators to reject this approach, except in very special circumstances (see Section 1.3, p. 3), Similarly, Byar (1980) argued powerfully against the use of data banks which supposedly might provide substantial information about the effects of standard treatments, against which results with new treatments might be compared. Again, biases will almost certainly be present, and there are likely to be deficiencies in the recording of information about patient characteristics which is needed to validate the comparison. For an extended discussion, see Section 4.2 of Pocock (1983).

1.3. The medical context

The randomized trial has pervaded almost every branch of medical research. Indeed, it is unduly limiting to refer to 'clinical' trials (since clinical medicine is strictly concerned with patients). Many important randomized trials are conducted in preventive medicine, where the subjects have no specific illness and are receiving prophylactic agents such as vaccines with the hope of reducing the future occurrence of disease. Cochrane (1972) introduced the term randomized controlled trial (RCT), to include

The design and analysis of clinical trials

3

not only the traditional therapeutic and prophylactic trials, but also trials to compare the effectiveness of different measures of medical care, such as the choice between home or hospital treatment for patients of a particular type. Trials occur in virtually every branch of clinical medicine, notable examples being in cardiovascular medicine, oncology, psychopharmacology, ophthalmology and rheumatic disease. Most trials involve comparisons of therapeutic drugs, but other forms of treatment or prophylaxis may be examined, such as surgical procedures, radiotherapy regimens, forms of psychotherapy, screening programs, medical devices or behavioural interventions. Within therapeutic medicine the aim may be to induce a rapid improvement in a patient's condition, as in the treatment of an infectious disease; to improve the long-term prognosis for patients with chronic disease, as in many trials in cancer medicine; or to prevent future exacerbations of disease, as in the treatment of patients who have had a myocardial infarction (heart attack) with the aim of reducing the risk of further attacks (so-called secondary prevention). Trials fall broadly into three categories, according to their financial and scientific sponsorship. First, there are trials, usually relatively small, initiated by single research workers or small localized groups, supported by the resources of a local medical school or hospital. Many of the earliest examples of clinical trials were of this type. The second category, now more prominent in medical research, is that of larger trials, usually multicentre, supported financially by national or international research agencies or by fund-raising organizations concerned with specific branches of medicine. The third category comprises trials supported, and often initiated, by the pharmaceutical industry. A convenient distinction may be drawn between trials conducted at different stages of drug development. Phase I studies, on healthy volunteers, are concerned with the establishment of a safe dosage, and need not be considered here. Phase H trials are small-scale studies to see whether primafacie evidence of efficacy exists. They are often regarded as the first stages in a screening process, the successful contenders proceeding to full-scale evaluation in the next phase. Since Phase II trials are not intended to provide conclusive evidence, they are often conducted without random assignment. Phase Ill trials are the main concern of the present review. They provide authoritative evidence of efficacy, and are an essential feature of submissions to regulatory bodies for marketing approval. They provide evidence also of safety, although adverse events occurring with low frequency, or after prolonged intervals of time, are likely to be missed. Phase IV studies, based on post-marketing experience, may reveal long-term safety and efficacy effects, but do not concern us here.

1.4. Objectives and constraints

Finally, in this introductory section, it is useful to review some of the broader aspects of clinical trials: the varying objectives of their investigators, and the constraints under which they must operate. A simplistic view of the objective in a clinical trial (the selection model) is that the investigator seeks to choose the better, or best, of a set of possible treatments for some specific condition. This may sometimes be appropriate: the investigator may decide that treatment A is clearly better than treatment B, and immediately switch to using

4

P. Armitage

A on the relevant patients. More often, though, the decision will not be so clear-cut. Even though a trial shows clear evidence of a difference in efficacy, the doctor may well need to balance this against the suspected or known influence of adverse effects, to take into account special problems presented by individual patients, and to consider whether the picture is affected by evidence emerging from other related studies. From this point of view the clinical trial is best regarded as a means of building up a reliable bank of information which enables individual physicians to make balanced and well-founded choices. A good deal of decision-theoretic work has been done on the selection model of clinical trials (see Armitage (1985) for a review); we shall not examine this in detail. Another useful distinction is that drawn by Schwartz and Lellouch (1967) between explanatory and pragmatic attitudes to clinical trials. An explanatory trial is intended to be as closely analogous as possible to a laboratory experiment. Conditions in different groups of patients are rigidly controlled, so that a clear picture emerges of the relative efficacy of carefully defined treatments, all other circumstances being kept constant. For instance, one might attempt to compare the effects of two regimens of radiotherapy for patients with malignant disease, the two regimens being clearly defined and adhered to during the trial. A pragmatic trial, by contrast, aims to simulate more closely the conditions of routine medical practice, where treatment regimens rarely receive 100% compliance. In the example cited above, the investigator might choose to compare two therapeutic strategies based on the two regimens, but recognizing that deviations from the prescribed regimens will occur, often for good reason and with the approval of the patient's physician. Most trials exhibit both explanatory and pragmatic features, but in practice the pragmatic attitude tends to dominate. The distinction has important consequences for the analysis of trial results, as will be seen later (Section 4.4). In most trials the investigator will be primarily interested in differences in efficacy between different treatment regimens, but concern for improved efficacy - a higher proportion of rapid cures of an infectious disease, more rapid relief of pain, a reduced risk of reoccurrence of a myocardial infarction - must always be tempered by an awareness of adverse effects. Unwanted side-effects must always be recorded and their incidence compared between treatment groups. However, adverse effects may be exhibited only after the end of the trial, and the investigator may need to observe patients' progress during a subsequent follow-up period. In some trials, particularly those carried out at an early stage of drug development, the aim may be not so much to show improved efficacy, but rather to show that one treatment has an efficacy very similar to a standard treatment. The new treatment may, for examp~,e, be a different formulation of a drug whose efficacy has already been established. In these equivalence trials, it will be necessary to define an acceptable range of equivalence, on some appropriate scale, and to show whether or not the responses to the two treatments are sufficiently similar to justify a claim of equivalence. In the analysis of any clinical trial, a major aim will be to estimate the difference in efficacy between treatments, with an appropriate statement of uncertainty. In the typical efficacy trial, interest will often centre on a null hypothesis of no difference, but it should not be assumed that the establishment of a nonzero difference in a particular

The design and analysis of clinical trials

5

direction is necessarily of clinical importance. Adverse effects or other disadvantages may outweigh small benefits in efficacy, and it is useful to draw a distinction between statistical significance and clinical significance (see Section 2.3). In equivalence trials a test of significance of a null hypothesis of zero difference is pointless, and the essential question is whether, allowing for uncertainty, the true difference clearly lies within the range of equivalence. (A similar point arises in bioequivalence studies, where the equivalence of different formulations of the same active agent is assessed by comparison of blood levels measured at intervals after administration, rather than by their clinical effects. See Chapter 2.) Clinical trials are feasible only insofar as their conduct is consistent with accepted standards of medical ethics. The crucial point is that, in general, physicians will not employ treatments which they believe to be inferior to others currently available. Normally, therefore, an investigator will be prepared to assign treatments randomly only if he or she is agnostic about their relative merits. In forming a judgement on such questions the investigator may strike a balance between prior evidence which may suggest superiority for one treatment in a specific therapeutic response, and uncertainty about long-term or other effects. In a multicentre trial each investigator will need to make an individual judgement on the ethical question, although taking into consideration the views of colleagues. Certainly, ethical issues must be considered in any trial, however small, and however mild the condition under study. In recent decades it has become usual to require the informed consent of any patient entered into a trial, perhaps by a written statement. The justification for this procedure may seem self-evident, yet it remains anomalous that physicians are not required to obtain such consent for procedures (however weakly supported by research) employed in routine practice, whereas they need to do so for well-designed research studies. Fortunately, in most trials consent can be readily obtained.

2. The planning of a clinical trial

2.1. The protocol It is important that the investigator(s) should draw up, in advance, a detailed plan of the study, to be documented in the protocol. This should include a summary of the case for undertaking the trial, and of the evidence already available on the relative merits of the treatments under study. There should be an unambiguous definition of the categories of patients to be admitted, and of the treatment schedules to be applied. The protocol should also include detailed instructions for the implementation of the trial, definitions of the outcome variables (or endpoints) to be used in the analysis, statistical details of the design, the number of patients and (where appropriate) the length of follow-up, and a brief outline of the proposed methods of analysis. The protocol should be accompanied by, or include, copies of the case report form(s) [CRF(s)], in the design of which great care must be exercised to avoid ambiguities and to enable information to be recorded as effectively as possible. Most of these topics are discussed in later sections of this review, but we consider now the question of defining the patient categories and treatments.

6

P. Armitage

In broad terms these features will have been agreed at an early stage. The investigators may, for instance, have agreed to compare a new and a standard drug for the relief of pain in patients with rheumatoid arthritis. The question may then arise as to whether the patients should have as uniform a pattern of disease as possible, say by restriction to a rather narrow age range, a specific combination of symptoms, and a minimum period since the disease was diagnosed. This would be a natural consequence of the 'explanatory' attitude referred to earlier, just as in laboratory experiments one tries to keep experimental units as uniform as possible. By contrast, the 'pragmatic' approach would be to widen the selection to all categories for whom the choice of these treatments is relevant, which might mean almost any patient with the disease in question. This has the merit of more nearly simulating the choice facing the physician in routine practice, and of enabling more patients to be recruited and thereby potentially improving the precision of the trial (although perhaps to a lesser extent than expected, since the variability of outcome variables might be increased). If the results for a narrowly defined type of patient are of particular interest, these can always be extracted from the larger set of data. A similar contrast presents itself in the definition of treatments. The explanatory attitude would be to define precisely the treatment to be applied- for instance, in a drug trial, the dosage, frequency and time of administration - and to control strictly the use of concomitant treatments. Departures from this defined regimen would be regarded as lapses in experimental technique. The pragmatic attitude would be to recognise that in routine practice treatments are administered in a flexible way, schedules being changed from time to time by the physician's perception of the patient's progress or by the patient's own choice. A clear definition of each treatment regimen is required, otherwise the results of the trial are useless, but this definition may well be one of the strategy to be implemented for varying the schedule rather than of a single, unvarying regimen.

2.2. Some basic designs

The range of experimental designs used in clinical trials is much more restricted than in some other branches of technology, such as agricultural or industrial research. The reason is perhaps partly the need to avoid complexity in the context of medical care which often has to be applied urgently; and partly the fact that elaborate blocking systems are impracticable when the experimental units are patients who enter over a period of time and whose characteristics cannot, therefore, be listed at the outset. All trials involve comparisons between different treatments, and an important distinction is between parallel-group trials in which each subject receives only one of the contrasted treatments, so that the comparison is between subjects, and crossover trials in which each subject receives different treatments on different occasions, so that the comparison is within subjects. We consider briefly some special cases.

Subtrials

In some multicentre studies, perhaps involving controversial treatments, it may happen that not all the investigators are willing to randomize between all the treatments.

The design and analysis of clinical trials

7

For instance, if a trial is designed to compare three treatments, A, B and C, some investigators might be prepared to use any of these, others might object to A but be willing to compare B and C, while others might wish to compare A and B, or A and C. There could then be four subtrials, each investigator being free to join whichever subtrial is preferred. (In other circumstances, a similar range of choice might be offered to individual patients.) Note that information about the contrast of, say, A and B, is available (a) from the (A, B, C) subtrial; (b) from the (A, B) subtrial; and (c) from the contrast between A versus C in the (A, C) subtrial, and B versus C in the (B, C) subtrial. The design is not as efficient as an (A, B, C) trial alone, but it may enable more investigators and patients to enrol than would otherwise be possible. Note, however, that the effect of any treatment contrast may vary between subtrials, because of the different characteristics of patients choosing the various options, and it may not be possible to pool results across subtrials. An example of this approach was the ISIS-3 trial of agents for the treatment of suspected myocardial infarction (ISIS-3 Collaborative Group, 1992). Here, patients for whom physicians believed that fibrinolytic therapy was required were randomized between three potentially active agents, whereas those for whom the indication for fibrinolytic therapy was regarded as uncertain were randomized between these three agents and a fourth treatment which was a placebo (Section 2.4). Jarrett and Solomon (1994) discuss a number of subtrial designs which might be used in a proposed trial to compare heroin, methadone, and their combination, for heroin-dependent users.

Factorial designs

The advantages of factorial designs are well-understood and these designs are increasingly widely used. There are many situations where different features of treatment can be varied and combined, and it makes sense to study them together in one trial rather than as separate projects. An example is provided by the ISIS-3 trial referred to above (ISIS-3 Collaborative Group, 1992), which compared the effects of three levels of one factor (streptokinase, tPA and APSAC) in combination with two levels of a second factor (aspirin plus heparin and heparin alone). The recently completed ISIS-4 trial, again for the treatment of suspected myocardial infarction (ISIS-4 Collaborative Group, 1995), studied the eight combinations of three binary factors: oral mononitrate, oral captopril and intravenous magnesium. The factors, the levels of which act in combination, may be different facets of treatment with the same agent, rather than essentially different agents. For example, Stephen et al. (1994) reported a trial to compare the anti-caries efficacy of different fluoride toothpastes in children initially aged 11-12, over a period of three years. There were three active compounds, each of which provided fluoride at one of two levels. The two factors were, therefore, the compound (three levels) and the fluoride dose (two levels). Factorial designs are a powerful scientific tool for the simultaneous study of several questions. If attention focusses on one particular combination of factor levels, the data for that combination alone may be inadequate; the factorial design has overcome this

8

P. Armitage

inadequacy by amalgamating information at several combinations. This may be a problem in some drug trials, where a manufacturer may aim to submit a particular combination of levels for regulatory approval. In such cases it may be helpful to follow a factorial trial by a more clearly focussed trial using the combination in question, together with one control level.

Crossover trials

Crossover trials have the dual advantage that (a) treatment comparisons are affected by within-subject random variation, which is usually smaller than between-subject variation; and (b) they economize in the number of subjects required for a fixed number of observations. However, they are often impracticable - for instance, if the patient's condition is changing rapidly over time or may undergo irreversible deterioration, if the purpose of treatment is to produce a rapid cure, or if each treatment has to be used for a long period. The ideal situation is the comparison of methods for the palliative treatment of a chronic disease, where after administration of each treatment the patient's condition reverts to a fairly constant baseline. The temporary alleviation of chronic cough might be an example. Crossover designs are often used for Phase II trials of new drugs, where a rapid response to treatment may be available and the longer-term effects can be studied later if the drug is selected for a Phase III trial. They are useful also for bioequivalence studies, where the response may be derived from measurements of blood or urinary concentrations after ingestion of the drug. Crossover trials have generated a formidable literature (see, for instance, Jones and Kenward, 1989; Senn, 1993; and an issue of Statistical Methods in Medical Research, 1994, devoted to crossover designs). In the simplest design, for two treatments (A and B) and two periods, subjects are assigned randomly to two sequences: AB and BA. For a standard linear model with Gaussian errors, the analysis proceeds straightforwardly, with treatment (T) and period (P) effects. Subjects are usually taken as a random effect. The T x P interaction plays a crucial role. It may be caused by a carryover of response, from the first to the second period, and investigators should try to minimize the chance of this happening by allowing an adequate wash-out period between the treatments. If the interaction exists, the estimation of treatment effect becomes difficult and ambiguous. Unfortunately, the interaction is aliased with the contrast between the two sequences, which is affected by between-subject error and may therefore be imprecisely measured. These difficulties have led many writers to deprecate the use of this simple design (Freeman, 1989; Senn, 1992), although it should perhaps retain its place in suitably understood situations. Armitage and Hills (1982) pointed out that 'a single crossover trial cannot provide the evidence for its own validity' and suggested that 'crossover trials should be regarded with some suspicion unless they are supported by evidence, from previous trials with similar patients, treatments and response variables, that the interaction is likely to be negligible'. Some of the deficiencies of the simple two-treatment, two-period design can be reduced by extending the number of sequences, treatments and/or periods. The extensive literature on this broad class of repeated-measures designs has been reviewed

The design and analysis of clinical trials

9

by Matthews (1988, 1994b), Jones and Kenward (1989), Shah and Sinha (1989) and Afsarinejad (1990). Optimal design theory can be applied, but optimality depends on the assumed model for the response variable. Even with a Gaussian model, there are ambiguities about the modelling of carry-over effects (Fleiss, 1989; Senn, 1992), about possible serial correlation between successive observations, and in the method of estimation (for instance, ordinary or generalized least squares). Moreover, responses will often be clearly non-Gaussian, the extreme case being that of binary responses. It is, therefore, difficult to go beyond broad recommendations. For two treatments and periods, the simple design (AB and its dual) may be extended by adding AA and its dual (Balaam, 1968); these additional sequences provide no extra within-subject information on the treatment effect, but help to elucidate the interaction. Still with two treatments, the number of periods can be extended beyond two. In general a chosen sequence should be accompanied by equal assignment of subjects to the dual sequence: for example, the sequence ABB should be accompanied by BAA. Under a certain model the sequences ABB and its dual are optimal, but some robustness against different assumptions is provided by the addition of two other sequences: either ABA and its dual (Ebbutt, 1984) or AAB and its dual (Carri~re, 1994). Exploration of larger designs, particularly for more than two treatments, has been somewhat tentative. One problem is uncertainty about the effects of subjects who drop out before their sequence of treatments is completed, particularly as drop-outs are likely to be nonrandom (see Section 4.4). Matthews (1994a) reports the successful completion of a trial with 12 periods and 12 subjects; however, the subjects were healthy volunteers, who may have been more likely than patients to cooperate fully.

2.3. The prevention of bias Methods of randomization The case for random assignment of treatments has already been discussed (Section 1.2), and need not be re-emphasized here. Randomization schedules are normally prepared in advance by computer routines, and (to avoid selection bias) individual assignments are revealed only after a patient has been formally entered into the trial. Randomization will, of course, produce similarity between the characteristics of patients allotted to different treatments, within limits predictable by probability theory. In some trials it is thought advisable to ensure closer agreement between treatment groups, for instance by ensuring near-equality of numbers on different treatments within particular subgroups or strata defined by certain pre-treatment characteristics (or baseline variables). One method used in the past is that of permuted blocks, whereby numbers assigned to different treatments in a particular stratum are arranged to be equal at fixed intervals in the schedule. More recently methods of minimization are often used (Taves, 1974; Begg and Iglewitz, 1980). These assign treatments adaptively so as to minimize (in some sense) the current disparity between the groups, taking account simultaneously of a variety of baseline variables. In multicentre trials, such methods are often used to ensure approximate balance between treatment numbers within each centre. In any scheme of balanced assignment, the effect of

10

P. Armitage

the balanced factors in reducing residual variation should be taken into account, for example by an analysis of covariance. If this is done, the statistical advantage of balanced assignment schemes, over simple randomization with covariance adjustment, is very small (Forsythe and Stitt, 1977). Their main merit is probably psychological, in reassuring investigators in centres providing few patients that their contribution to a multicentre study is of value, and in ensuring that the final report of a trial can produce convincing evidence of similarity of groups. In most parallel-group trials the numbers of patients assigned to different treatments are (apart from random variation) equal or in simple ratios which are retained throughout the trial. Many authors have argued that, on ethical grounds, the proportionate assignments should change during the trial, in such a way that more patients are gradually assigned to the apparently more successful treatments (see, for instance, Zelen, 1969; Chernoff and Petkau, 1981; Berry and Fristedt, 1985; Bather, 1985). Such data-dependent allocation schemes have rarely been used in practice, and can give rise to problems in implementation and analysis. See Armitage (1985) for a general discussion, and Ware (1989) and the published discussion following that paper for a case-study of a controversial trial of extracorporeal membrane oxygenation (ECMO) in which these methods were used.

The prevention of response bias Randomization, properly performed, ensures the absence of systematic bias in the selection of subjects receiving different treatments. This precaution would, though, be useless if other forms of bias caused different standards to be applied in the assessment of response to different treatment regimens. In a trial in which the use of a new treatment is compared with its absence, perhaps in addition to some standard therapy, bias may arise if the subjects are aware whether or not they receive the new treatment, for instance by taking or not taking additional numbers of tablets. The pharmacological effect of a new drug may then be confounded with the psychological effect of the knowledge that it is being used. For this reason, the patients in the control group may be given a placebo, an inert form of treatment formulated to be indistinguishable in taste and consistency from the active agents. Placebos are commonly used in drug trials, although complete disguise may be difficult, particularly if the active agents have easily detectable side-effects. Their use is clearly much more difficult, perhaps impossible, in other fields such as surgery. The principle of masking the identity of a treatment may be extended to trials in which different active agents are compared, for instance in a comparison of two drugs or of different doses of the same drug. Such steps would not be necessary if the endpoints used in the comparison of treatments were wholly objective. Unfortunately the only such endpoint is death. Any other measure of the progress of an illness, such as the reporting of symptoms by the patient, the eliciting of signs by a physician, or major recurrences of disease, may be influenced by knowledge of the treatment received. This includes knowledge by the patient, which may affect the reporting of symptoms or the general course of the disease, and also by medical and other staff

The design and analysis of clinical trials

11

whose recording of events may be affected and who may (perhaps subconsciously) transmit to the patient their own enthusiasm or scepticism about the treatment. For these reasons, it is desirable where possible to arrange that treatments are administered not only single-blind (masked from the patient), but double-blind (masked also from the physician and any other staff concerned with the medical care or assessment of response). When different drugs are administered in essentially different ways, for instance by tablet or capsule, it may be necessary to use the double-dummy method. For instance, to compare drug A by tablet with drug B by capsule, the two groups would receive Active A tablets, plus Placebo B capsules, or

Placebo A tablets, plus Active B capsules. In a drug trial, medicaments should be made available in packages labelled only with the patient's serial number, rather than by code letters such as A and B, since the latter system identifies groups of patients receiving the same treatment and can easily lead to a breakdown of the masking device.

2.4. Trial size The precision of treatment comparisons is clearly affected by the size of the trial, the primary consideration being the number of subjects assigned to each treatment. Where repeated measurements are made on each subject, for instance with regular blood pressure determinations, the precision may be improved somewhat by increasing the number of repeated measurements, but the overriding factor will still be the number of subjects. In many chronic disease studies the response of interest is the incidence over time of some critical event, such as the relapse of a patient with malignant disease, the reoccurrence of a cardiovascular event, or a patient's death. Here the crucial parameter is the expected number of events in a treatment group, and this can be increased either by enrolling more patients or by lengthening the follow-up time for each patient. The latter choice has the advantage of extending the study over a longer portion of the natural course of the disease and the possible advantage of avoiding additional recruitment from less reliable centres, but the disadvantage of delaying the end of the trial. The traditional methods for sample-size determination are described in detail in many books on statistics (for example, Arrnitage and Berry, 1994, Section 6.6), and tables for many common situations are provided by Machin and Campbell (1987). The details depend on the nature of the comparison to be made - for instance, the comparison of means, proportions or counts of events. The calculations usually refer to the comparison of two groups, and are essentially along the following lines: (a) A parameter ~ is chosen to represent the contrast between responses on the two treatments, taking the value 0 under the null hypothesis H0 that there is no difference.

12

P. Armitage

(b) A significance test (usually two-sided) of H0 will be based on a statistic X. (c) The sample size is chosen to ensure that the test has a specified power 1 - / 3 for an alternative hypothesis /-/1 that ~ = fiA, with a specified significance level a. Allowance may be made for a hypothetical rate at which patients might withdraw from prescribed treatment, thus diluting any possible effect. The introduction of sample-size calculations into the planning stages of a trial, and their incorporation into the protocol, have the clear advantage of making investigators aware of some consequences of their choice of trial size, and perhaps of preventing the implementation of trials which are too small to be of any real value. There are, however, some problems about a rigid adherence to the formulation described above: (i) The probabilities c~ and ~ are arbitrary. The power 1 - fl is usually taken to be between 0.8 and 0.95, and the (usually two-sided) significance level a is usually 0.05 or 0.01. Different choices have substantial effects on the required group size. (ii) The critical value t~A is difficult to interpret. As pointed out by Spiegelhalter et al. (1994), it may represent (A) the smallest clinically worthwhile difference, (B) a difference considered 'worth detecting', or (C) a difference thought likely to occur. Whichever interpretation is preferred, the value chosen will be subjective, and agreement between investigators may not be easy to achieve. (iii) For any choice of the parameters referred to in (a)-(c), the consequent group size will depend on other, unknown, parameters, such as the variability of continuous measurements, the mean level of success proportions or of incidence rates, and the withdrawal rate. Well-informed estimates of these quantities may be available from other studies, or from early data from the present study, but some uncertainty must remain. (iv) The standard argument assumes a null hypothesis of zero difference, whereas (as implied in Section 1.4) a zero difference may have no clinical significance, and some nonzero value, trading off improved efficacy against minor inconvenience, may be more appropriate. (v) More broadly, the standard approach is centred around the power of a significance test, although precision in estimation is sometimes more important. (vi) The investigators may intend to combine the results of the trial with those of other studies (Section 5.3). The power of the current trial, considered in isolation, may then be less relevant than that of the complete data set. These reservations suggest that sample-size calculations should be regarded in a flexible way, as providing guidance in the rational choice of group size, rather than as a rigid prescription. In any case, the outcome of formal calculations always needs to be balanced against practical constraints: any determination of trial size is likely to involve some degree of compromise. Many authors have argued that uncertainties of the sort outlined above are best recognised and taken into account within a Bayesian framework (Spiegelhalter et al., 1986; Spiegelhalter and Freedman, 1986; Brown et al., 1987; Moussa, 1989; Spiegelhalter et al., 1994). The Bayesian approach will be discussed further in a later section (Section 5.2), but we note here some points relevant to the determination of trial size (Spiegelhalter et al., 1994). Some basic steps are: (i) The choice of a prior distribution for the difference parameter ~. This may reasonably vary between investigators, and in a multicentre study some compromise

The design and analysis of clinical trials

13

may have to be reached. It may be useful to perform calculations based on alternative priors, representing different degrees of enthusiasm or scepticism about the possible difference. (ii) The identification of an indifference interval (~x, 3s) for the parameter. The implication is that for ~ < ~x one treatment (the 'standard', say) is taken to be superior, and for ~ > ~s the other treatment (the 'new') is superior. When the parameter falls within the indifference interval no clear preference is expressed. (iii) For any given trial size, and with appropriate assumptions about nuisance parameters, the probability may be calculated at the outset that a central posterior interval for ~ will exclude one of the critical values ~x or ~s. For instance, a high posterior probability that 6 > ~I would provide assurance that the new treatment could not be ruled out of favour; a similar result for ~ > ~s would imply that the new treatment was definitely superior. With this approach, the trial size may be chosen with some assessment of the chance of obtaining a result favouring one treatment. It should be understood, though, that an outcome clearly in favour of one treatment is not necessary for the success of a trial. Reliable information is always useful, even when it suggests near-equivalence of treatment effects.

3. The conduct and monitoring of a clinical trial

3.1. The implementation of the protocol The investigators should have been fully involved in the drawing up of the protocol, and will normally be anxious to ensure a high degree of compliance with the agreed procedure. Nevertheless, departures from the protocol are likely to occur from time to time. The impact of protocol departures by individual patients on the analysis of results is discussed in Section 4.4. The investigators will need to set up a procedure for checking that high standards are maintained in the administration of the trial, and in a multicentre study a separate committee is likely to be needed for this purpose. This administrative monitoring will check whether the intended recruitment rate is being achieved, detect violations in entry criteria, and monitor the accuracy of the information recorded. The data processing (coding of information recorded on the CRFs, checking of data for internal consistency, computer entry and verification, and subsequent analysis) is often done by specialist teams with standard operating procedures (SOPs) which should be overseen as part of the administrative monitoring. A number of software packages for trial management are commercially available. As the trial proceeds it may be clear that certain provisions in the protocol should be changed. The rate of accrual or of critical events may be lower than expected, leading to a decision to extend the trial. Experience with the clinical or other technical procedures may point to the desirability of changes. It may, for various reasons, become desirable to augment or reduce the entry criteria. Although investigators will clearly wish to avoid protocol changes without very strong reason, it is better to

14

P Armitage

complete the study with a relevant framework rather than persist with an original protocol which has major deficiencies. In deciding whether to enter patients into a trial an investigator will normally screen a larger number of patients for potential eligibility. There may be (a) some patients in broadly the right disease category, but found on examination to be ineligible for the trial, for instance on grounds of age or particular form of disease; and (b) some patients who would have been fully eligible according to the protocol, but were not admitted to the trial, either by error, for administrative convenience, because of individual characteristics not covered in the protocol but thought by the physician to be important, or through the patient's refusal. Patients in category (b) should ideally be very few, and their numbers and the reasons for exclusion should be recorded. Category (a) is often regarded as interesting, and many investigators insist that such patients be entered into a patient log which may indicate the extent to which the trial population is selected from the wider population of patients entering the relevant clinics. The importance of the patient log is perhaps exaggerated, since the populations screened are likely to differ considerably in number and characteristics in different clinics, and the numerical proportion of screened patients who are entered into the trial may be hard to interpret. The compliance of patients with the treatment regimen required by the protocol is clearly important, and steps should be taken to make this as nearly complete as possible. Some aspects of compliance, such as attendance at follow-up clinics and acceptance of trial medication, are measurable by the doctor. The actual taking of trial medication is another matter, at any rate for non-hospitalized patients. Statements by patients of their consumption of medication are unreliable, but these may sometimes be augmented by biochemical tests of concentrations of relevant substances in blood or urine. Occasionally, patients will cease to follow the regimen prescribed by the protocol. The reasons may be various: death or a severe deterioration in medical condition, change of address, an unspecific refusal to continue, the occurrence of severe adverse effects, the occurrence of a medical complication which leads the physician to advise a change in management, and so on. All such occurrences must be carefully documented. The physician will no doubt try to advise the patient to continue the prescribed treatment as long as possible, but in the last resort the patient has the right to opt out, and the physician has the right to advise whether or not this would be a sensible step to take. Any patient withdrawing from the protocol regimen should be regarded as still in the trial, and should be included in follow-up examinations as far as possible.

3.2. Data and safety monitoring and interim analyses The administrative monitoring referred to in Section 3.1 is a form of quality assurance about the structure of the trial. A quite different form of monitoring is needed for the accumulating results of the trial. The issues here relate to the safety and efficacy of the treatments under study.

The design and analysis of clinicaltrials

15

Many medical treatments produce minor side effects or affect the biochemistry of the body. Minor effects of this sort need not be a source of concern, particularly when their occurrence is expected from previous experience. Serious adverse effects (SAEs), of a potentially life-threatening nature, are a different matter, and must be monitored throughout the trial. An unduly high incidence of SAEs, not obviously balanced by advantages in survival, is likely to be a good reason for early stopping of the trial or at least for a change in one or more treatment regimens. Differences in efficacy may emerge during the course of the trial and give rise to ethical concerns. The investigators will have been uncertain at the outset as to whether the differences, if any, were large enough to be of clinical importance, and this ignorance provided the ethical justification for the trial. If the emerging evidence suggests strongly that one treatment is inferior to another, there may be a strong incentive to stop the trial or at least to abandon the suspect treatment. In a double-blind trial the investigators will be unaware of the treatment assignments and unable themselves to monitor the results. They could arrange to be given serial summaries of the data by the data-processing team, but it is usually better for the monitoring to be handled by an independent group, with a title such as Data [and Safety] Monitoring Committee (D[S]MC). A DMC would typically comprise one or more statisticians, some medical specialists in the field under investigation, and perhaps lay members such as ethicists or patient representatives. Commercial sponsors are not normally represented. The data summaries are usually presented by the trial data-processing team, although occasionally independent arrangements are made. The trial investigators are occasionally represented on the DMC, but otherwise are not made aware of the DMC's discussions. In a large multicentre trial designed to last several years, it would be usual for the DMC to meet several times a year. Its main function is to keep the accumulating data under continuous review, and to make recommendations to the investigators if at any time the DMC members think the trial should be terminated or the protocol revised. (For a different view, favouring data monitoring by the investigators, or a committee reporting direct to them, rather than by an independent DMC, see Harrington et al. (1994).) Any analysis of accumulating data is a form of sequential analysis, and much of the relevant literature seeks to adapt standard methods of sequential analysis to the context presented in a clinical trial (Armitage, 1975; Whitehead, 1992). Two points commonly emphasized are (a) the need to have closed sequential designs, defining the maximum length of the trial; and (b) the merits of group sequential designs, in which the data are analysed at discrete points of time (as, for instance, at successive meetings of a DMC) rather than continuously after each new item of data. In a sequential analysis of either safety or efficacy data, it may be useful to do repeated significance tests of some null hypothesis (often specifying no difference in effects of treatments, but perhaps allowing for a nonzero difference for reasons discussed in Section 2.4). The consequences of such repeated testing are well-known (Armitage, 1958, 1975; Pocock, 1977): the probability of rejecting a null hypothesis at some stage (the Type I error rate) is greater than the nominal significance level at any single stage. A number of different schemes have been devised which provide for early stopping when a statistic measuring an efficacy difference exceeds a predetermined bound, and

16

P. Armitage

ensure an acceptable and known Type I error probability. Other properties such as the power function and sample size distribution are also of interest. See Whitehead (1992) and Jennison and Turnbull (1990) for general surveys. Most of the statistical advantages of continuous sequential schemes are retained by group sequential schemes, in which the accumulating results are examined at a small number of time-points. Three systems have been particularly widely used: (a) Repeated significance tests at a constant nominal significance level (Pocock, 1977); (b) Tests at interim analyses at a very stringent level (say, differences of three standard errors), with a final test at a level close to the required Type I probability (Haybittle, 1971; Peto et al., 1976); (c) Even more stringent tests at earlier stages, with bounds for a cumulative difference determined by the test at the final stage (O'Brien and Fleming, 1979). The choice between these is perhaps best made by considering whether large effects are a priori likely. A trial showing a large difference would be more likely to stop early with scheme (a), and least likely with scheme (c). Sceptical investigators might therefore prefer (c), whereas those regarding a large difference as inherently plausible might prefer (a). With these schemes the stopping rule is predetermined. Lan and DeMets (1983), Kim and DeMets (1992) and others have described methods by which a predetermined Type I error probability can be 'spent' in a flexible way, for instance to meet the schedule of a DMC. However, most workers prefer to regard the whole datamonitoring procedure as being too flexible to fit rigidly into any formal system of rules. A decision whether to recommend termination of a trial is likely to depend on analyses of many endpoints for both safety and efficacy, to be affected by levels of recruitment and compliance, and to take into account other ongoing research. According to this view (Armitage, 1993; Liberati, 1994), formal schemes for stopping rules provide guidelines, and usefully illustrate some of the consequences of data-dependent stopping, but should not be regarded as rigidly prescriptive. A more radical attack on the formal stopping-rule schemes outlined above comes from a Bayesian approach (Berry, 1987; Spiegelhalter and Freedman, 1988; Spiegelhalter et al., 1994). In Bayesian inference, as distinct from frequency theory, probabilities over sample spaces, such as Type I error probabilities, play no part. Inferences at any stage in data collection are uninfluenced by stopping rules used earlier or by those which it is proposed to use in the future. With the sort of framework for Bayesian inference described in Section 2.4, a stopping rule might propose termination if, at any stage, the posterior probability of a definitive difference (i.e., one affecting clinical practice) was sufficiently high. As noted in the last paragraph, considerations of this sort would form merely one part of the DMC's deliberations. Early stopping might occasionally be called for if the efficacy difference was remarkably small. This situation would present no ethical problems, but might suggest that research effort would be better concentrated on other projects. Such a decision might require a prediction of the likely upper limit to the size of the difference to be expected at the termination point originally intended. This prediction could be based on frequency theory, leading to methods of stochastic curtailment (Lan et al., 1982, 1984), or on Bayesian methods (Spiegelhalter and Freedman, 1988). Early stopping because of the emergence of negligible differences is generally to be discouraged, unless there is urgent need to proceed to other projects. It is often

The design and analysis of clinical trials

17

important to show that only small differences exist, and to estimate these as precisely as possible, i.e., by continuing the study. Even 'negative' results of this sort may contribute usefully to an overview covering several studies (Section 5.3). Moreover, if a decision is to be taken to stop early because of small differences, it is preferable to base this on inferences from the data available, rather than introduce unnecessary random error by predicting the future. More extensive discussion on all the topics covered in this section may be found in Armitage (1991) and in the two special issues of Statistics in Medicine edited by Ellenberg et al. (1993) and Souhami and Whitehead (1994).

4. The analysis of data 4.1. Basic concepts Statistical analyses will be needed for any interim inspections required by the DMC, and, much more extensively, for the final report(s) on the trial results. Almost any standard statistical technique may come into play, and it is clearly impossible to provide a comprehensive review here. Instead, we shall outline some of the considerations which are especially important in the analysis of clinical trial data. Descriptive statistics, mainly in the form of tables and diagrams, are needed to show the main baseline (i.e., before treatment) characteristics of the patients in the various groups. Statistical inference is needed to draw conclusions about treatment effects, for both efficacy and safety. The principal contrasts under study, and the intended ways of measuring these, should have been defined in the protocol. As noted in Section 4.2, it is important to avoid the confusion created by the analysis of a very large number of response variables and/or a large number of different contrasts. For this reason it is usually advisable to define, in the protocol, a small number of primary endpoints and a small number of contrasts. The principal analysis will be confined to these. Any remarkable effects seen outside this set should be noted, reported with caution, and perhaps put forward for further study in another trial. Statistical analyses will usually follow familiar lines: significance tests (perhaps of nonzero differences (Section 2.4)), point and interval estimation for measures of treatment effects. Alternatively, Bayesian methods may be used, whereby inferences are summarized by posterior distributions for the relevant treatment-effect parameters. Bayesian methods are discussed further in Section 5.2.

4.2. Multiplicity Clinical trials often generate a vast amount of data for each patient: many measures of clinical response and biochemical test measurements, each perhaps recorded at several points of time. An uncritical analysis, placing all these variables on the same footing and emphasizing those which suggested the most striking treatment effects, would clearly be misleading. Attempts may be made to allow for multiplicity of response

18

~ Armimge

variables (Pocock et al., 1987), but methods of adjustment depend on the correlation between variables, and lead to conservative inferences which may obscure real effects on important variables. The most satisfactory approach is that outlined in Section 4.1, to perform unadjusted analyses on a small number of primary endpoints, and to regard findings on other variables as hypothesis-generating rather than hypothesis-testing. Multiplicity arises in other contexts. The problems of repeated inspections of data have been discussed in Section 3.2, and can be dealt with either informally, by conservative inferences, or by application of a formal sequential scheme. Jennison and Turnbull (1989) have described methods by which repeated confidence intervals for a treatment effect may be calculated at arbitrary stages during the accumulation of trial data, with a guaranteed confidence coefficient. However, these tend to be much wider than standard intervals and may be of limited value. Multiple comparisons of the traditional type, arising through the multiplicity of treatment groups (Miller, 1981), are of less concern. Most clinical trials use only a few treatments, and contrasts between them are usually all of primary importance: their impact should not be reduced by conservative adjustments. A more difficult question concerns interactions between treatments and baseline variables. It would clearly be important to know whether certain treatment effects, if they exist, apply only to certain subgroups of the patient population, or affect different subgroups differently. However, many baseline variables will have been recorded, and it will be possible to subdivide by baseline characteristics in many ways, each of which provides an opportunity to observe an apparent interaction. Such subgroup comparisons should be treated in the same light as the analysis of secondary endpoints: that is, to provide suggestions for further studies but to be interpreted with reserve at present. Some subgroup analyses may have been defined in the protocol as being of primary interest, and these, of course, may be reported without reserve. In a multicentre study, an interaction may exist between treatments and centres. Like other interactions it may be regarded as being affected by multiplicity and therefore of secondary importance. It may, though, be too pronounced to be easily ignored. In that case the first step would be to seek rational explanations, such as known differences between centres in ancillary treatments or in the distribution of baseline characteristics. If no such explanation is found, there are two choices. The interaction may be ignored on the grounds that the relevant patient population is that provided by all centres combined (Fleiss, 1986). Alternatively, the variation in treatment effect between centres may be regarded as an additional component of random variation, to be taken into account in measuring the precision of the treatment effect (Grizzle, 1987). Essentially the same point arises in meta-analysis, and will be discussed further in Section 5.3.

4.3. Adjustment for covariates Randomization should ensure that any baseline variables that are prognostic for outcome variables are similarly distributed in different treatment groups and therefore produce no systematic biases in treatment comparisons. Nevertheless, adjustment for

The design and analysis of clinical trials

19

prognostic baseline variables will tend to improve the precision of treatment comparisons, and will have the secondary effect of correcting for any major imbalance in baseline distributions that may have occurred by chance. If the relation between an outcome variable and one or more baseline variables can be reasonably approximated by a standard linear model with normal errors, the correction can be done by the analysis of covariance, usefully implemented by a general linear model program. A similar approach may be made for binary outcomes using a logistic regression program, and for survival data using a proportional hazards or other standard program. In some instances it may be hard to justify such models, and a more pragmatic approach is to estimate a treatment effect separately within each of a number of subgroups defined by combinations of baseline characteristics, and then combine these estimates by an appropriately weighted mean. Methods of minimization referred to in Section 2.3 have the effect of making treatment groups more alike in baseline characteristics than would occur purely by chance. An analysis ignoring baseline characteristics therefore underestimates precision, although the effect may well be small. Ideally, therefore, adjustment for covariates should always be performed when minimization has been been used for treatment assignment.

4.4. Protocol deviations

However much effort is made to ensure adherence to the protocol, most trials will include some patients whose treatment does not conform precisely to the prescribed schedule. How should their results be handled in the analysis? One approach is to omit the protocol deviants from the analysis. This is the per protocol or (misleadingly) efficacy analysis. It seeks to follow the explanatory approach (Section 1.4) by examining the consequences of precisely defined treatment regimens. The problem here is that protocol deviants are likely to be unrepresentative of the total trial population, and the deviants in different treatment groups may well differ both in frequency and in disease characteristics. A comparison of the residual groups of protocol compliers has therefore lost some of the benefit of randomization, and the extent of the consequent bias cannot be measured. A per protocol analysis may be useful as a secondary approach to the analysis of a Phase III trial, or to provide insight in an early-stage trial to be followed by a larger study. But it should never form the main body of evidence for a major trial. The alternative is an intention-to-treat (ITF) analysis of all the patients randomized to the different groups. Clearly, if there is a high proportion of protocol deviations leading to cessation of the trial regimens, any real difference between active agents may be diluted and its importance underestimated. For this reason, every attempt should be made to minimize the frequency of deviations. An ITT analysis preserves the benefit of randomization, in that the treatment comparisons are made on groups sufficiently comparable for baseline characteristics. It follows the pragmatic approach to trial design (Section 1.4), in that the groups receive treatments based on ideal

20

P. Armitage

strategies laid down in the protocol, with the recognition that (as in routine medical practice) rigidly prescribed regimens will not always be followed. In an ITT analysis, it may be reasonable to omit a very small proportion of patients who opted out of treatment before that treatment had started. But this should only be done when it is quite clear (as in a double-blind drug trial) that the same pressures to opt out apply to all treatment groups. In some cases, for instance in a trial to compare immediate with delayed radiotherapy, this would not necessarily be so. A similar point arises in trials with a critical event as an endpoint. For some patients the critical event may occur before the trial regimen is expected to have had any effect. Unless it is quite clear that these expectations are the same for all treatments, and are defined as such in the protocol, such events should not be excluded from the analysis. One form of protocol deviation is the loss of contact between the patient and the doctor, leading to gaps in the follow-up information. (Deaths can, however, often be traced through national death registration schemes.) In an ITT analysis of, say, changes in serum concentrations of some key substance 12 months after start of treatment, what should be done about the gaps? A common device is the last observation carried forward (LOCF) method, whereby each patient contributes the latest available record. This may involve an important bias if the groups are unbalanced in the timing of the drop-outs and if the response variable shows a time trend. Brown (1992) has suggested a nonparametric approach. An arbitrarily poor score is given to each missing response. This is chosen to be the median response in a placebo or other control group, and all patients with that response or worse are grouped into one broad response category. Treatment groups are then compared by a standard nonparametric method (the MannWhitney test), allowing for the broad grouping. Some information is lost by this approach, and estimation (as distinct from testing) is somewhat complicated. For another approach to the analysis of data with censoring related to changes in the outcome variable, see Wu and Bailey (1988, 1989) and Wu and Carroll (1988).

4.5. Some specific methods It would be impracticable to discuss in detail the whole range of statistical methods which might be applied in the analysis of trial data. We merely comment here on a few types of data which commonly arise, and on some of the associated forms of analysis.

Survival data In many trials the primary endpoint is the time to occurrence of some critical event. This may be death, but other events such as the onset of specific disease complications may also be of interest. The standard methods of survival analysis (Cox and Oakes, 1984) may be applied for comparison of treatment groups with any of these endpoints. For a simple comparison of two groups the logrank test and associated estimates of hazard ratio may be used, stratifyingwhere appropriate by categories formed by baseline covariates. The Cox proportional-hazards model is a powerful way of testing

The design and analysis of clinical trials

21

for and estimating treatment effects in the presence of a number of covariates, which may be introduced in the linear predictor or by defining separate strata. The survival curves for different subgroups are conveniently calculated, and displayed graphically, by the Kaplan-Meier method. Such plots will occasionally show that the proportional-hazards assumption underlying standard methods is inappropriate, and more research is needed to develop diagnostic methods and alternative models (Therneau et al., 1990; Chen and Wang, 1991).

Categorical data

In many analyses the endpoint will be binary: for instance the occurrence of a remission after treatment for a particular form of cancer. The analysis of binary data is now a well-researched area, based principally on the technique of logistic regression (Cox and Snell, 1989). In many trials the endpoint is polytomous, i.e., with several response categories. This is often the case in Phase II and other small trials, where short-term changes in a patient's condition may be more appropriately measured by a subjective judgement by the patient or the doctor, using a small number of categories. There is often a natural ordering of categories in terms of the severity of the disease state or the patient's discomfort. The analysis of ordered categorical data has been a subject of active research in recent years (Agresti, 1989, 1990). Models have now been developed which incorporate treatment and baseline effects, permitting significance tests and estimation of treatment effects.

Repeated measurements

In some trials repeated observations on the same physiological or biochemical variable are made, for each patient, over a period of time. The question of interest may be that of treatment effects on the trends in these variables during a prolonged period of treatment for a chronic disease. In bioequivalence studies attention may focus on the trend in serum concentrations during a fairly short period after ingestion of a drug. Multivariate methods of handling repeated measures data concentrate on an efficient estimate of the time trend, but in trials the emphasis is on the estimation of treatment effects. A useful approach, described in detail by Matthews et al. (1990), is to summarize the essential features of an individual's responses by a small number of summary measures. Examples might be the mean response over time, and the linear regression on time. Each such summary measure is then analysed in a straightforward way. The last example serves to remind us that multivariate methods do not play a major part in the analysis of trial results. The reason is partly the emphasis in multivariate analysis on significance tests, whereas it is important in trials to be able to estimate effects in a clinically meaningful way. Moreover, when a small number of primary variables have been defined at the outset, it is usually essential to provide clear and

22

P Armimge

separate information about each, rather than to combine the variables in obscure ways. However, when there are many endpoints of interest, and perhaps many treatments, multivariate methods may help to solve some of the problems of multiplicity; see Geary et al. (1992) for an example of the use of multivariate analysis of variance in the analysis of four dental trials in which several measures of dental health were available.

5. The reporting and interpretation of results 5.1. The relevance of trial results Whether the patients in a clinical trial were selected by narrow eligibility criteria, or, as advocated in Section 2.1, they were entered more liberally, the results of the trial will be of potential interest for the treatment of patients whose characteristics vary and do not necessarily replicate those of the trial population. How safe is it to generalize beyond the trial itself? It will clearly be safer to do so if the entry criteria are broad rather than narrow, if the trial provides no suggestion of treatment interactions with relevant baseline variables, and if there is no external evidence, either empirical or theoretical, to suggest that important interactions exist. Cowan and Wittes (1994) suggest that extrapolation is safer for treatments having essentially biological effects than for those, such as methods of contraception, the success of which is likely to depend on social factors. Clearly, an element of faith is always needed. It would be impossible to provide strong evidence of treatment effects separately for every possible combination of baseline factors. For this reason, recent proposals that NIH-sponsored trials should aim to provide strong evidence for all minority groups are self-defeating (Piantadosi and Wittes, 1993). There are other possible reasons for failures to convince other workers that the results of a clinical trial are widely applicable. Doubts may exist about some aspects of the treatments used, such as the details of drug administration. There is sometimes a tendency for groups of clinicians to want to verify, in their own environment, results reported elsewhere, even though no explicit doubts are raised. One source of doubt about positive findings (i.e., those indicating real effect differences) should be taken seriously. It seems likely that most treatments compared in trials do not in fact differ greatly in their effects. Purely by chance a few of these will show significant differences. Of those trials reporting statistically significant differences, then, a certain proportion will be 'false positives'. The position is aggrevated by publication bias (Begg and Berlin, 1988), which ensures that, at least for relatively small trials, 'positive' results are more likely to be published than 'negative' ones. Such doubts may be difficult to overcome if reported effects are only marginally significant. Questions may arise also about the clinical relevance of the endpoints reported in a trial. In many research investigations the primary outcome measure may involve longterm follow-up, perhaps to death, and in a trial of finite length the times to these events may be censored by termination of the trial observation period. It may be tempting to

The design and analysis of clinical trials

23

use a surrogate endpoint, namely a measure of response available more quickly (or in some cases more conveniently) than the main endpoint. Unfortunately, even when the two endpoints are clearly correlated, it cannot be assumed that treatment effects on one imply similar effects on the other. An example is provided by the report of the Concorde Coordinating Committee (1994) on the Concorde trial to compare immediate and deferred use of zidovudine in symptom-free individuals with HIV infection. The primary endpoints were survival, serious adverse events and progression to clinical disease (ARC or AIDS). Survival and progression are known to be related to levels of CD4 cell counts, and immediate use of zidovudine certainly caused CD4 counts to rise above the declining levels seen in the deferred group. Unfortunately, the benefit of immediate use was not reflected in the patterns of survival or progression, which remained similar for the two groups.

5.2. The Bayesian approach Bayesian methods have already been referred to, particularly in relation to trial size (Section 2.4) and interim analyses (Section 3.2). It is, of course, common ground to take some account of prior evidence in assessing the plans for a new trial or the results from a recently completed trial. The discussion in Section 5.1, about the prior supposition that differences are likely to be small, is in principle Bayesian. It has been argued strongly in recent years, by a number of statisticians experienced in clinical trials, that an explicitly and exclusively Bayesian approach to the design, analysis and interpretation of trials would overcome some of the ambiguities associated with traditional, frequency-theory, statistics (Berry, 1987, 1993; Spiegelhalter and Freedman, 1988; Spiegelhalter et al., 1993, 1994). The Bayesian approach enables alternative analyses to be explored, using a variety of prior distributions representing different degrees of scepticism or enthusiasm about the likely magnitude of treatment effects. Modern computing methods such as Gibbs sampling have greatly facilitated this process (Gilks et al., 1993; George et al., 1994). Spiegelhalter et al. (1994) suggest that the main analysis of a trial, usually presented in the Results section of a report, should be based on noninformative priors, so as to express as objective a summary as possible, essentially by means of likelihood functions. The effect of alternative priors could then be described in the Discussion section. The use of noninformative priors will usually lead to conclusions similar (apart from changes of nomenclature) to those from frequency analyses, and from this point of view the proposal is not as revolutionary as might at first appear. Each of these two approaches has its own advantages. The frequency approach has the advantage of greater familiarity within the community of clinical trialists, of richness in the available choice of statistical methods, and of easy access to these through statistical packages. The Bayesian approach has the advantage of enabling the effect of different prior beliefs to be explored and presented for discussion. In my view both approaches will co-exist for some time, and many statisticians will make use of them in an eclectic, if not always consistent, manner. Spiegelhalter et al. (1994), for example, writing from a Bayesian standpoint, adduce pragmatic reasons for paying attention to the Type I error probability in a monitoring scheme, in order to avoid too many false positive results.

24

P. Armitage

5.3. Meta-analysis and overviews In most branches of science it is regarded as necessary for experiments to be repeated and confirmed by other workers before being accepted as well-established. During the first few decades of clinical trials, it was very rare to find any trial being replicated by other workers. A positive result, showing treatment differences, would usually inhibit random assignment of the weaker treatment in any future trial. A negative result would often discourage repetition of the same comparison, particularly if other potentially important questions remained to be studied. Unfortunately, early trials were often too small to permit detection of important differences, which may therefore have remained undisclosed (Pocock et al., 1978). During the last 20 years or so, there have been many more instances of replicated trials. The protocols in any one such collection of trials will rarely, if ever, be exactly the same. The treatments may differ in some details, and the inclusion criteria for patients may vary. But it will often be reasonable to assume that the effects of these variations are small in comparison with sampling errors. It therefore seems reasonable to pool the information from a collection of similar trials, and to take advantage of the consequent increase in precision. Such studies are called meta-analyses or overviews. Some examples are overviews of beta blockade in myocardial infarction (Yusuf et al., 1985); antiplatelet treatment in vascular disease (Antiplatelet Trialists' Collaboration, 1988); side-effects of nonsteroidal anti-inflammatory drug treatment (Chalmers et al., 1988); and hormonal, cytotoxic or immune therapy for early breast cancer (Early Breast Cancer Trialists' Collaborative Group, 1992). In a meta-analysis a separate estimate of some appropriate measure of treatment effect is obtained from each trial, and it is these estimates that are pooled rather than the original data. Suppose that for the ith of k trials the estimated treatment effect is Yi with variance vi = w~-1. A standard method of combining the estimates is to take the weighted mean ,~ = ~ w i y i / ~ wi. The homogeneity of the k estimates may be tested by the heterogeneity index G = ~ wiy2i - ( ~ w i y i ) 2 / ~ w~, which is distributed approximately as X2 on k - 1 degrees of freedom. If homogeneity is assumed, var(9) = (Y-~,wi) -1 approximately. In trials in which the primary endpoint is the rate of occurrence of critical events, it is often convenient to measure the treatment effect by the odds ratio of event rates. A commonly used approach (Yusuf et al., 1985) is to test the significance of the effect by a score statistic asymptotically equivalent to the weighted mean, and to use a related estimate and heterogeneity test. This method is somewhat biased when the effect is not small (Greenland and Salvan, 1990). Meta-analysis is a powerful and important technique. Two aspects remain somewhat controversial. The first is the difficulty of deciding precisely which trials should be included. Some authors (Chalmers, 1991) emphasize that primary attention should be given to the quality of the research. Others (Peto, 1987) emphasize the importance of including unpublished as well as published results, in order to minimize publication bias and to ensure inclusion of very recent data. The second question arises when a meta-analysis shows evidence of heterogeneity between treatment effects in different trials. Clearly, an effort should then be made

The design and analysis of clinical trials

25

to seek reasons, such as differences in protocols, which might plausibly explain the interaction. If no such reasons can be found, what should follow? One school of thought (Yusuf et al., 1985; Early Breast Cancer Trialists' Collaborative Group, 1990) leads to use of the estimates described above, as being appropriate for the particular mix of patients and study characteristics found in these trials. A different approach is to regard the variation in treatment effect between trials as a component of variation which is effectively random in the sense of being unexplained. According to this view, any generalization beyond the current trial population must take this variation into account, thereby leading to a less precise estimate of the main effect than would otherwise be obtained (DerSimonian and Laird, 1986). For further discussion of metaanalysis in clinical trials see the papers in the special issue of Statistics in M e d i c i n e edited by Yusuf et al. (1987); also Berlin et al. (1989), Pocock and Hughes (1990), Whitehead and Whitehead (1991) and Jones (1995).

6. Conclusion This has been a deliberately wide-ranging account of the role of statistics in clinical trials. Design, execution, analysis and interpretation are inextricably interwoven, and each aspect gains by being considered in context rather than in isolation. Methodology moves rapidly and standards of performance become ever more rigorous. Provided that financial, legal and societal constraints are not unduly repressive we can look forward to further developments in trial methodology, a wider appreciation of the need for reliable comparisons of medical treatments, and a continued contribution of this important branch of statistical experimentation to the progress of medical science.

Acknowledgements I am grateful to Ray Harris, John Matthews and a reviewer for helpful comments on a draft of this paper.

References Afsarinejad, K. (1990). Repeated measurements designs - A review. Comm. Statist. Theory Methods 19, 3985-4028. Agresti, A. (1989). A survey of models for repeated ordered categorical response data. Statist. Med. 8, 1209-1224. Agresti, A. (1990). Categorical Data Analysis. Wiley, New York. Antiplatelet Trialists' Collaboration (1988). Secondary prevention of vascular disease by prolonged antiplatelet treatment. British Med. J. 296, 320-331. Armitage, P. (1958). Sequential methods in clinical trials. Amer. J. Public Health 48, 1395-1402. Armitage, P. (1975). Sequential Medical Trials, 2nd edn. Blackwell, Oxford. Armitage, P. (1985). The search for optimality in clinical trials. Internat. Statist. Rev. 53, 15-24. Armitage, P. (1991). Interim analysis in clinical trials. Statist. Med. 10, 925-937. Armitage, P. (1993). Interim analyses in clinical trials. In: E M. Hoppe, ed., Multiple Comparisons, Selection Procedures and Applications in Biometry. Dekker, New York, 391-402.

26

P Armitage

Armitage, P. and G. Berry (1994). Statistical Methods in Medical Research, 3rd edn. Blackwell, Oxford. Armitage, P. and M. Hills (1982). The two-period crossover trial. Statistician 31, 119-131. Balaam, L. N. (1968). A two-period design with t z experimental units. Biometrics 24, 61-73. Bather, J. A. (1985). On the allocation of treatments in sequential medical trials. Internat. Statist. Rev. 53, 1-13. Begg, C. B. and J. A. Berlin (1988). Publication bias: A problem in interpreting medical data (with discussion). J. Roy. Statist. Soc. Ser. A 151, 419-453. Begg, C. B. and B. Iglewicz (1980). A treatment allocation procedure for clinical trials. Biometrics 36, 81-90. Berlin, J. A., N. M. Laird, H. S. Sacks and T. C. Chalmers (1989). A comparison of statistical methods for combining event rates from clinical trials. Statist. Med. 8, 141-151. Berry, D. A. (1987). Interim analysis in clinical research. Cancer Invest. 5, 469-477. Berry, D. A. (1993). A case for Bayesianism in clinical trials. Statist. Med. 12, 1377-1393. Berry, D. A. and B. Fristedt (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London. Brown, B. M. (1992). A test for the difference between two treatments in a continuous measure of outcome when there are dropouts. Cont. Clin. Trials 13, 213-225. Brown, B. W., J. Herson, N. Atkinson and M. E. Rozell (1987). Projection from previous studies: A Bayesian and frequentist compromise. Cont. Clin. Trials 8, 29-44. Bull, J. P. (1959). The historical development of clinical therapeutic trials. J. Chronic Dis. 10, 218-248. Buyse, M. E., M. J. Staquet and R. J. Sylvester, eds. (1984). Cancer Clinical Trials: Methods and Practice. Oxford Univ. Press, Oxford. Byar, D. P. (1980). Why data bases should not replace randomized clinical trials. Biometrics 36, 337-342. Cam~re, K. C. (1994). Crossover designs for clinical trials. Statist. Med. 13, 1063-1069. Chalmers, T. C. (1991). Problems induced by meta-analysis. Statist. Med. 10, 971-980. Chalmers, T. C., J. Berrier, P. Hewitt, J. A. Berlin, D. Reitman, R. Nagalingam and H. S. Sacks (1988). Meta-analysis of randomized control trials as a method of estimating rare complications of non-steroidal anti-inflammatory drug therapy. Alim. Pharm. Therapeut. 2 (Suppl. 1), 9-26. Chert, C.-H. and P. C. Wang (1991). Diagnostic plots in Cox's regression model. Biometrics 47, 841-850. Chernoff, H. and A. J. Petkau (1981). Sequential medical trials involving paired data. Biometrika 68, 119-132. Cochrane, A. L. (1972). Effectiveness and Efficiency: Random Reflections on Health Services. Nuffield Provincial Hospitals Trust, London. Concorde Coordinating Committee (1994). Concorde: MRC/ANRS randomized double-blind controlled trial of immediate and deferred zidovudine in symptom-free HIV infection. Lancet 343, 871-881. Cowan, C. D. and J. Wittes (1994). Intercept studies, clinical trials, and cluster experiments: To whom can we extrapolate? Cont. Clin. Trials 15, 24-29. Cox, D. R. and D. Oakes (1984). Analysis of Survival Data. Chapman and Hall, London. Cox, D. R. and E. J. Snell (1989). Analysis of Binary Data. Chapman and Hall, London. DerSimonian, R. and N. Laird (1986). Meta-analysis in clinical trials. Cont. Clin. Trials 7, 177-188. Early Breast Cancer Trialists' Collaborative Group (1990). Treatment of Early Breast Cancer, Vol. 1: Worldwide Evidence 1985-1990. Oxford Univ. Press, Oxford. Early Breast Cancer Trialists' Collaborative Group (1992). Systemic treatment of early breast cancer by hormonal, cytotoxic, or immune therapy. Lancet 339, 1-15; 71-85. Ebbutt, A. F. (1984). Three-period crossover designs for two treatments. Biometrics 40, 219-224. Ellenberg, S., N. Geller, R. Simon and S. Yusuf, eds. (1993). Proceedings of the Workshop on Practical Issues in Data Monitoring of Clinical Trials. Statist. Med. 12, 415-616. Fisher, R. A. (1926). The arrangement of field experiments. J. Min. Agric. Great Britain 33, 503-513. Fleiss, J. L. (1986). Analysis of data from multiclinic trials. Cont. Clin. Trials 7, 267-275. Fleiss, J. L. (1989). A critique of recent research on the two-treatment crossover design. Cont. Clin. Trials 10, 237-243. Forsythe, A. B. and E W. Stitt (1977). Randomization or minimization in the treatment assignment of patient trials: Validity and power of tests. Tech. Report No. 28, BMDP Statistical Software. Health Sciences Computing Facility, University of California, Los Angeles.

The design and analysis of clinical trials

27

Freeman, P. R. (1989). The performance of the two-stage analysis of two-treatment, two-period crossover trials. Statist. Med. 8, 1421-1432. Friedman, L. M., C. D. Furberg and D. L. DeMets (1985). Fundamentals of Clinical Trials, 2nd edn. Wright, Boston. Geary, D. N., E. Huntington and R. J. Gilbert (1992). Analysis of multivariate data from four dental clinical trials. J. Roy. Statist. Soc. Ser. A, 155, 77-89. Gehan, E. A. and E. J. Freireich (1974). Non-randomized controls in cancer clinical trials. New Eng. J. Med. 290, 198-203. George, S. L., L. Chengchang, D. A. Berry and M. R. Green (1994). Stopping a clinical trial early: Frequentist and Bayesian approaches applied to a CALGB trial in non-small-cell lung cancer. Statist. Med. 13, 1313-1327. Gilks, W. R., D. G. Clayton, D. J. Spiegelhalter, N. G. Best, A. J. McNeil, L. D. Sharples and A. J. Kirby (1993). Modelling complexity: Applications of Gibbs sampling in medicine (with discussion). J. Roy. Statist. Soc. Ser. B, 55, 39-102. Greenland, S. and A. Salvan (1990). Bias in the one-step method for pooling study results. Statist. Med. 9, 247-252. Grizzle, J. E. (1987). Letter to the Editor. Cont. Clin. Trials 8, 392-393. Harrington, D., J. Crowley, S. L. George, T. Pajak, C. Redmond and S. Wieand (1994). The case against independent monitoring committees. Statist. Med. 13, 1411-1414. Haybittle, J. L. (1971). Repeated assessment of results in clinical trials of cancer treatment. British J. RadioL 44, 793-797. Hill, A. Bradford (1962). Statistical Methods in Clinical and Preventive Medicine. Livingstone, Edinburgh. ISIS-3 (Third International Study of Infarct Survival) Collaborative Group (1992). ISIS-3: A randomized trial of streptokinase vs tissue plasminogen activator vs anistreplase and of aspirin plus heparin vs aspirin alone among 41 299 cases of suspected acute myocardial infarction. Lancet 339, 753-770. ISIS-4 (Fourth International Study of Infarct Survival) Collaborative Group (1995). ISIS-4: A randomized factorial trial comparing oral captopril, oral mononitrate, and intravenous magnesium sulphate in 58 050 patients with suspected acute myocardial infarction. Lancet 345, 669-685. Jarrett, R. G. and P. J. Solomon (1994). An evaluation of possible designs for a heroin trial. In: Issues for Designing and Evaluating a 'Heroin Trial'. Three Discussion Papers. Working Paper Number 8. National Centre for Epidemiology and Population Health, Canberra, 11-30. Jennison, C. and B. W. Tumbull (1989). Interim analyses: The repeated confidence interval approach (with discussion). J. Roy. Statist. Soc. Ser. B. 51, 305-361. Jennison, C. and B. W. Tumbull (1990). Interim monitoring of clinical trials. Statist. Sci. 5, 299-317. Jones, B. and M. G. Kenward (1989). Design and Analysis of Cross-over Trials. Chapman and Hall, London. Jones, D. R. (1995). Meta-analysis: Weighing the evidence. Statist. Med. 14, 137-149. Kim, K. and D. L. DeMets (1992). Sample size determination for group sequential clinical trials with immediate response. Statist. Med. 11, 1391-1399. Lan, K. K. G. and D. L. DeMets (1983). Discrete sequential boundaries for clinical trials. Biometrika 70, 659-663. Lan, K. K. G., R. Simon and M. Halperin (1982). Stochastically curtailed tests in long-term clinical trials. Comm. Statist. C 1, 207-219. Lan, K. K. G., D. L. DeMets and M. Halperin (1984). More flexible sequential and non-sequential designs in long-term clinical trials. Comm. Statist. Theory Methods 13, 2339-2353. Liberati, A. (1994). Conclusions. 1: The relationship between clinical trials and clinical practice: The risks of underestimating its complexity. Statist. Med. 13, 1485-1491. Machin, D. and M, J. Campbell (1987). Statistical Tables .for the Design of Clinical Trials. Blackwell, Oxford. Matthews, J. N. S. (1988). Recent developments in crossover designs, lnternat. Statist. Rev. 56, 117-127. Matthews, J. N. S. (1994a). Modelling and optimality in the design of crossover studies for medical applications. J. Statist. Plann. Inference 42, 89-108. Matthews, J. N. S. (1994b). Multi-period crossover trials. Statist. Methods Med. Res. 3, 383-405. Matthews, J. N. S., D. Altman, M. J. Campbell and P. Royston (1990). Analysis of serial measurements in medical research. British Med. J. 300, 230-235.

28

~ Armi~ge

Medical Research Council (1948). Streptomycin treatment of pulmonary tuberculosis: A report of the Streptomycin in Tuberculosis Trials Committee. British Med. J. 2, 769-782. Medical Research Council (1951). The prevention of whooping-cough by vaccination: A report of the Whooping-Cough Immunization Committee. British Med. J. 1, 1463-1471. Meinert, C. L. (1986). Clinical Trials: Design, Conduct and Analysis. Oxford Univ. Press, New York. Miller, R. G., Jr. (1981). Simultaneous Statistical Inference. Springer, New York. Moussa, M. A. A. (1989). Exact, conditional, and predictive power in planning clinical trials. Cont. Clin. Trials 10, 378-385. O'Brien, P. C. and T. R. Fleming (1979). A multiple testing procedure for clinical trials. Biometrics 35, 549-556. Peto, R. (1987). Why do we need systematic overviews of randomized trials? Statist. Med. 6, 233-240. Peto, R., M. C. Pike, P. Armitage, N. E. Breslow, D. R. Cox, S. V. Howard, N. Mantel, K. McPherson, J. Peto and P. G. Smith (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I. Introduction and design. British J. Cancer 34, 585-612. Piantodosi, S. and J. Wittes (1993). Letter to the Editor: Politically correct clinical trials. Cont. Clin. Trials 14, 562-567. Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika 64, 191-199. Pocock, S. J. (1983). Clinical Trials: A Practical Approach. Wiley, Chichester. Pocock, S. J. and M. D. Hughes (1990). Estimation issues in clinical trials and overviews. Statist. Med. 9, 657-671. Pocock, S. J., P. Arrnitage and D. A. G. Galton (1978). The size of cancer clinical trials: An international survey. UICC Tech. Rep. Ser. 36, 5-34. Pocock, S. J., N. L. Geller and A. A. Tsiatis (1987). The analysis of multiple endpoints in clinical trials. Biometrics 43, 487498. Schwartz, D., R. Flamant and J. Lellouch (1980). Clinical Trials (transl. M. J. R. Healy). Academic Press, London. Schwartz, D. and J. Lellouch (1967). Explanatory and pragmatic attitudes in therapeutic trials. J. Chronic Dis. 20, 637-648. Senn, S. J. (1992). Is the 'simple carry-over' model useful? Statist. Med. 11, 715-726. Senn, S. (1993). Cross-over Trials in Clinical Research. Wiley, Chichester. Shah, K. R. and B. K. Sinha (1989). Theory of Optimal Designs. Springer, Berlin. Shapiro, S. H. and T. A. Louis, eds. (1983). Clinical Trials: Issues and Approaches. Dekker, New York. Souhami, R. L. and J. Whitehead (1994). Proceedings of the Workshop on Early Stopping Rules in Cancer Clinical Trials. Statist. Med. 13, 1289-1500. Spiegelhalter, D. J. and L. S. Freedman (1986). A predictive approach to selecting the size of a clinical trial, based on subjective clinical opinion. Statist. Med. 5, 1-13. Spiegelhalter, D. J. and L. S. Freedman (1988). Bayesian approaches to clinical trials. In: J. M. Bemado et al., eds., Bayesian Statistics, 3. Oxford Univ. Press, Oxford, 453-477. Spiegelhalter, D. J., L. S. Freedman and P. R. Blackburn (1986). Monitoring clinical trials: Conditional or predictive power? Cont. Clin. Trials 7, 8-17. Spiegelhalter, D. J., L. S. Freedman and M. K. B. Parmar (1993). Applying Bayesian thinking in drug development and clinical trials. Statist. Med. 12, 1501-1511. Spiegelhalter, D. J., L. S. Freedman and M. K. B. Parmar (1994). Bayesian approaches to randomized trials (with discussion), J. Roy. Statist. Soc. Ser. A 157, 357-416. Statistical Methods in Medical Research (1994). Editorial and five review articles on Crossover Designs. 3, 301-429. Stephen, K. W., I. G. Chestnutt, A. P. M. Jacobson, D. R. McCall, R. K. Chesters, E. Huntington and E Schafer (1994). The effect of NaF and SMFP toothpastes on three-year caries increments in adolescents. Internat. Dent. J. 44, 287-295. Tares, D. R. (1974). Minimization: A new method of assigning patients to treatment and control groups. Clin. Pharmacol. Ther. 15, 443-453. Therneau, T. M., P. M. Grambsch and T. R. Fleming (1990). Martingale-based residuals for survival models. Biometrika 77, 147-160.

The design and analysis of clinical trials

29

Ware, J. H. (1989). Investigating therapies of potentially great benefit: ECMO (with discussion). Statist. Sci. 4, 298-340. Whitehead, J. (1992). The Design and Analysis of Sequential Clinical Trials, 2nd edn. Horwood, Chichester. Whitehead, A. and J. Whitehead (1991). A general parametric approach to the meta-analysis of randomized clinical trials. Statist. Med. 10, 1665-1677. Wu, M. C. and K. Bailey (1988). Analyzing changes in the presence of informative right censoring caused by death and withdrawal. Statist. Med. 7, 337-346. Wu, M. C. and K. Bailey (1989). Estimation and comparison of changes in the presence of informative right censoring: Conditional linear model. Biometrics 45, 939-955. Wu, M. C. and R. J. Carroll (1988). Estimation and comparison of changes in the presence of informative right censoring by modelling the censoring process. Biometrics 44, 175-188. Yusuf, S., R. Peto, J. Lewis, R. Collins and P. Sleight (1985). Beta-blockade during and after myocardial infarction: An overview of the randomized clinical trials. Prog. Cardiovasc. Dis. 27, 335-371. Yusuf, S., R. Simon and S. Ellenberg, eds. (1987). Proceedings of the Workshop on Methodologic Issues in Overviews of Randomized Clinical Trials. Statist. Med. 6, 217-409. Zelen, M. (1969). Play the winner rule and the controlled clinical trial. J. Amer. Statist. Assoc. 64, 131-146.

S. Ghosh and C. R. Rao, eds., Handbookof Statistics, VoL 13 © 1996 Elsevier Science B.V. All rights reserved.

/~

Clinical Trials in Drug Development: Some Statistical Issues

H. L P a t e l

1. Introduction 1.1. Background For improving the quality of health care, the search for better alternative or new therapies continues. To demonstrate efficacy and safety of a therapy the sponsor conducts a sequence of clinical trials. A clinical trial is generally understood as being a designed experiment in patients with the targeted disease to answer questions related to the efficacy and safety of a drug or an intervention therapy in comparison with a control group. In a broader sense a clinical trial is not limited to patients. For example, in the early stages of the drug development healthy volunteers are administered single or multiple doses of the drug to evaluate the drug tolerability. The field of clinical trials is very fascinating and challenging as it deals with complex scientific experiments conducted in human beings. As in any scientific experiment, there exist many factors other than the treatments that influence the clinical responses which are commonly referred to as endpoints. Some of these factors cause confounding effects with the treatments and others behave as extraneous sources of variation. The factors of the first group cause a bias in the estimation of the treatment difference; often it is not easy to understand the nature of the bias. The factors of the second group can be divided into two subgroups: patients themselves and the environmental conditions during the course of a trial. The sources of between-patient variability include age, gender, race, disease severity, disease duration and genetic make-up among others. The environmental conditions include life style, diet, compliance, use of concomitant medications, etc. There are some ways (experimental designs) to reduce the impact of between-patient variability, but it is practically impossible to have a good control on environmental factors during the course of the trial. There exists an extensive literature on general methodology for designing and analyzing clinical trials. An excellent paper on this topic, citing many books and review papers and giving historical perspective of clinical trials, is written by Professor Armitage in this book. His paper also nicely covers the implementation of the experimental design principles, randomization, local control and replication, in clinical trials. In the next subsection we describe the scope of this paper. 31

32

H. I. Patel

The randomized controlled trials have revolutionized the way we infer about the treatment benefit and risk. However, the evaluation of just efficacy and safety is not regarded enough. Attempts are made to integrate these aspects with patients' views on their quality of life and economic consequences of the treatment alternatives.

1.2. The scope of the paper

The purpose of this paper is to briefly describe the various steps of the drug development process in the industrial setting and highlight the commonly used statistical designs and their analyses. The clinical trials sponsored by the pharmaceutical companies are designed, conducted and analyzed under the umbrella of regulatory requirements. Therefore, the environment under which they are done is more restricted than that for the NIH or academic institution sponsored trials, even though they all use the same basic statistical principles. Since Professor Armitage's paper is leaning towards the latter trials, we emphasize on the industry side in this paper. All attempts are made to minimize the overlapping between this paper and the paper by Professor Armitage. Professor Sen's paper in this book emphasizes different aspects of clinical trials; it covers the use of nonparametric methods in clinical trials. Despite having a common theme of developing new compounds, there exist differences in the general approach to selecting designs and their analysis models in the industry. In this context, it would not be unfair to say that the tone of this paper may have been influenced by the author's own experience. Here we have adopted a broader definition of a clinical trial to include experiments done in healthy volunteers. Section 2 briefly summarizes the pharmacokinetic studies done in healthy volunteers. It emphasizes bioequivalence studies which are of primary interest for the approval of generic drugs. Dose-response studies are described in Section 3. Mixed effects linear and nonlinear models for applications in clinical trials are described in Chapter 4. Chapter 5 briefly covers Markovian models used for analyzing repeated measures designs, between-subject and within-subject designs. While attempting to tackle some practical problems faced by biostatisticians in the pharmaceutical industry, this chapter emphasizes on statistical modelling and analysis of some of the recent topics in clinical trials.

1.3. Clinical trial phases in industry

The experiments begin with pre-clinical research. The drug is tested in vitro (in an environment not involving living organisms) and in vivo (in an environment involving living organisms). Pharmacokinetic and pharmacologic profiles are studied in animals. Pharmacokinetic studies in human are described in Section 2. Pharmacology is a branch of science dealing with the (desirable or undesirable) effects of chemical substances on biological activities. Initial formulation is developed and its manufacturing process is documented. Physical and chemical properties of the formulation are studied. Such studies include drug stability studies which are conducted to project the shelf life of the drug. If the drug is perceived to be safe and effective in humans, the

Clinical trials in drug development: Some statistical issues

33

sponsor files an IND (Investigational New Drug) application, supported by the in vitro and in vivo pre-clinical data and clinical data if available from foreign countries along with a clinical program outline. If the sponsor does not hear from the US (United States) regulatory agency within a certain period of time, the clinical program may begin. The drug testing in humans is generally divided into three phases. In Phase I small studies, generally using 8 to 10 healthy male volunteers per study, are conducted to investigate how well the subjects tolerate single and multiple doses of different strengths of the experimental drug. Pharmacokinetic studies are also conducted in this phase using single and multiple doses of different strengths. The purpose here is to project safety based on observed pharmacokinetic characteristics. Phase II studies are conducted in patients with the targeted disease. The primary goal here is to see whether the therapy is efficacious and evaluate the dose-response relationship. Sometimes pharmacokinetic information is obtained along with efficacy measurements to understand their causal-effect relationship. Phase II studies are relatively small, but well controlled and monitored, evaluating patients from a tightly defined population. Typically fewer than 200 patients are studied in this phase. The management of the sponsor company uses the efficacy and safety results obtained from this phase to decide whether or not to proceed to Phase III. A good pilot study may play an important role for such decision making. Phase III studies are large, involving several participating centers. A typical Phase III study enrolls several hundred to several thousand patients. These studies generally have longer treatment duration to evaluate safety and efficacy than Phase II studies. The entry criteria for the patients to be studied in the trial are generally much broader than those in Phase II studies. The clinical and statistical design issues depend on the therapeutic area. The knowledge gained from Phase II studies and historical studies conducted for other similar therapies is used in planning Phase III studies. In the United States, after completion of all three phases, a New Drug Application (NDA) is submitted by the sponsor to the Food and Drug Administration (FDA) which is part of the US Government's Department of Health and Human Services. Similar submissions are made in European and other foreign countries by the sponsors of new drugs to their respective government agencies. After a thorough review the regulatory agency decides whether or not to approve the drug. Often the approval of the drug is subject to some restrictions. The FDA may require some additional studies to gain more information on long-term safety and overall benefits. In Phase III it may not have been possible to treat all segments of the patient population of the targeted disease. In this case the FDA may ask the sponsor to conduct additional studies. In general, the FDA may ask the sponsor to conduct additional studies to answer the questions that could not have been answered from the NDA submission. The studies conducted after NDA approval are referred to as Phase IV studies which are planned and conducted while the drug is available in the market for treating patients. Phase IV studies, conducted for long-term safety, are usually simple but involve thousands of patients. The sponsors usually maintain special safety databases to monitor the incidences of major adverse experiences. Not too long ago, the FDA tried to convince the industry to collect pharmacokinetic information from Phase III trials and relate it to efficacy and safety measurements.

34

H. L Patel

However, because of the cost and logistic problems involved in the data collection, this concept never became widely acceptable in the industry. Recently, the cost-effectiveness of a marketed drug has become an issue. In the current competitive environment, showing that the drug is safe and efficacious is not considered enough. Patients should be able to feel that the drug improves their quality of life and also provides economic benefits.

1.4. The role o f statistics in clinical trials

Typically we do randomized, double-blind, multicenter, comparative trials. Unlike experiments in physical sciences, it is practically impossible to control the sources of variation due to environmental factors during the course of the trial. Statistics plays a very important role in all stages of clinical trials: planning, data collection, analysis and interpretation of the results. During the planning stage, a clinical researcher and a statistician jointly prepare a protocol detailing the objectives of the study, the patient population to be enrolled, clinical and statistical designs, sample size, measurement issues, rules of conduct, statistical analysis plan and many other aspects of the trial. During this stage, we face many hurdles constrained by science, resources, ethics, and regulatory requirements. In the presence of many sources of variation, estimating treatment effects as purely as possible and with as few units as possible becomes a special challenge to a statistician. Here the statistician tries to introduce both innovative designs and quality assurance measures. The sample size calculation is true only to the extent to which the premises under which it is calculated are true. One needs a good projection of the variance from historical data and a clinically meaningful difference to be detected between the two treatments. A critical thinking is required here to make sure that a study design would help answer the questions posed by clinical researchers. A bad design and poor planning cannot be rescued by the data. A statistician also prepares an analysis plan as a part of the protocol. A few, usually not more than three, primary efficacy variables are identified. The remaining endpoints are regarded as tentative. Many other issues such as hypotheses formulation, modelling the endpoints, multiple comparisons, subgroup analysis, interim analysis, handling dropouts, etc. are outlined in the protocol. Unfortunately, some statisticians prefer to stick to the proposed analysis plan rigidly even though the data fail to satisfy important assumptions underlying the chosen models. This is probably for the fear of being criticized by FDA. One solution is not to give too much details while writing the methodology. What is written before examining the data should not let us undermine the importance of performing valid statistical analysis. The statistician plays at least an equally important role in analyzing data and interpreting the results in the context of the data collected. We know the limitations of statistical methods; they cannot remove all biases in estimating the treatment differences. Sometimes the patient population actually studied is different from what is defined by the protocol. In this case the interpretation of the results is limited in scope. As professional statisticians we cannot afford to work in isolation; we must take an active role in the process of designing a trial, be prepared in presenting the

Clinical trials in drug development: Some statistical issues

35

truth, continue learning each others disciplines, and have better communication with our colleagues at FDA. This will certainly help earn respect for our profession.

2. Pharmacokinetic studies 2.1. Introduction

Gibaldi and Perrier (1982) describe pharmacokinetics as the study of the time course of drug absorption, distribution, metabolism and excretion. This branch of biological sciences describes the processes and rates of drug movement from the site of absorption into the blood stream, distribution to the tissues and elimination by metabolism or excretion. It deals with mathematical modelling of the kinetics, related inference problems and applications to pharmacology. There exists a vast literature on this topic, dealing with the kinetics under single and multicompartment models and under single and multiple dosing (see, e.g., Wagner, 1971; Metzler, 1994; Rowland and Tozer, 1980; Gibaldi and Perrier, 1982). The usefulness of the pharmacokinetic models in patient monitoring in clinical practice is appreciated when the relationships between drug kinetics and the therapeutic and toxicologic effects are understood. Recently, some pharmacokinetists have advocated special studies called population kinetic studies. Here kinetic parameters are treated as random variables (varying from patient from patient) having some prior distributions rather than fixed quantities as treated in pharmacokinetic mathematical modelling. This topic is further discussed in Section 4. Of special applications are bioavailability (the extent and rate at which the drug is available in blood over a period of time after administration of a given dose) and bioequivalence (assessment of equivalence of two formulations with respect to the drug bioavailability) studies in the development of new formulations. The effectiveness of the drug therapy depends on the percent of the administered dose that is available to the site of pharmacologic action. For a solid dosage form this amount depends on patient characteristics, the physicochemical properties of the drug and the manufacturing process. Factors such as particle size, salt form, the coating, and the conditions under which the dose is administered also play an important role. It has become practice to rely on the rate and extent of the absorption of the drug into the bloodstream to measure the drug availability. This assumes that the drug distribution to the site of pharmacologic action and its elimination from the body are proportional to the amount of the drug absorbed. For an IV (intravenous) dose the absorption is 100 percent because it directly enters into the blood stream and therefore it is regarded as yardstick to compute the (absolute) bioavailability of any other route of administration. Sometimes, the bioavailability of an oral formulation is computed relative to the solution given orally. Two formulations are called bioequivalent if their bioavailabilities are equivalent. The rates and extents of the bioavailability of various doses are used in determining the appropriate dose and frequency of dosing that are expected to be optimal for efficacy and tolerability of the drug, depending on the type of disease. This requires a dose-response study in pharmacokinetics. For chronic and subchronic diseases when

H. L Patel

36

a patient receives multiple doses, the question of achieving a steady state situation arises. Bioequivalence and dose-response studies are further discussed in Sections 2.2 and 2.3, respectively.

2.2. Bioavailability and bioequivalence studies Pharmacokinetic studies for bioavailability and bioequivalence are conducted, generally in normal subjects, to compare two or more formulations with respect to the extent and rate of the drug availability. The extent is measured by AUC (area under the plasma curve) and the rate by Cmax (maximum concentration over the observation period), and Tmax (time when Cmaxoccurs). According to the current regulatory guidance (OGD, Office of Generic Drugs, FDA, 1993), two formulations are considered bioequivalent if the ratio of a location parameter of the test (new) formulation to that of the reference (existing and already tested) formulation for the responses, such as AUC and Cmax, falls between 0.8 and 1.25. The previous guidelines (FDA, Food and Drug Administration, 1985; WHO, World Health Organization, 1986) required that these limits be 0.8 and 1.20. If #T and #R are location parameters, generally the population means, for the test and reference formulations, the problem of bioequivalence essentially reduces to that of testing Ho: /ZT//ZR ~< 0.8 vs.

or

]ZT/lZR ~ 1.25"

HI: 0.8 < #T/#R < 1.25.

(2.1)

Obviously, H0 is not a linear hypothesis; it becomes linear with known interval boundaries only after the log transformation. We limit our attention to a 2 x 2 crossover design because of its wide use in bioequivalence studies. This design assigns each subject randomly to one of the two sequences T R and RT, where T and R represent the test and reference formulations, respectively. The subjects assigned to Sequence T R receive T during the first period and R during the second period and the subjects assigned to Sequence R T receive the formulations in the reverse order. A washout period of adequate length is generally considered in this design. The sample sizes are generally small; as few as 6 subjects per sequence. Let Yijk be a derived measurement, e.g., AUC, from the plasma level profile of subject k of Sequence j, during Period i, i = 1,2; j = 1,2; k = 1 , . . . , nj. We assume a bivariate normal distribution for the pairs (yljk, Y2jk), leading to a more general model than Grizzle's (1965) model. There are two concepts of bioequivalence: average and individual. The average bioequivalence, commonly used in practice, is based on a statistic defined as a function of location parameter estimators. On the other hand, the individual bioequivalence emphasizes the likelihood of the ratio YT/YR falling within specified interval boundaries, where YT and YR are the test and reference formulation responses from the same subject. There exists a vast literature on the methods of evaluating the average bioequivalence. The original work of Westlake (1972) and Metzler (1974) in this field is

Clinical trials in drug development: Some statistical issues

37

noteworthy. A good reference book is by Chow and Liu (1993). The most commonly used methods in the pharmaceutical industry for assessing the bioequivalence are (1) interpretation of a 90% conventional CI (confidence interval), referred to as Westlake (1972) CI approach, and (2) an equivalent approach of performing two one-sided 5% level t-tests, referred to as Shuirmann (1987) method. These formulas, however, use incorrect variances. Some statisticians have recently advocated a routine use of the log transformation of the data followed by normal-theory based methods. Of course, after the log transformation, one can linearize the hypotheses H01 and H02 of (2.1). But this is not a scientific approach because one then routinely assumes that the data are lognormal. For any continuous distribution, in the presence or absence of wild observations, one can perform two one-sided Wilcoxon rank-sum tests on log transformed data as proposed by Hauschke et al. (1990). However, the power of this procedure would not be satisfactory especially in small samples. Another problem is that the normal approximation for the rank-sum statistic may be poor for small sample sizes and the exact distribution of the test statistic introduces the discrete sample space which may not lead to an exact c~-size test. Recently Hsu et al. (1994) have introduced a 1 - c~ level CI for the difference between the formulation means which is contained in Westlake's (1976) 1 - c~ level symmetric CI and is therefore shorter than the latter. However, their CI cannot be converted into the CI for the ratio unless one assumes normality for the log transformed data. There exist many alternative procedures including the methods of Fieller (1954) and Anderson-Hauck (1983) and Bayesian methods for evaluating bioequivalence. The Anderson-Hauck method is approximate because it replaces a naturally resulting noncentral t-statistic with the central t-statistic. The empirical studies have shown that this approximation tends to inflate Type I error rate. Mandallaz and Mau (1981) showed, under the assumption of fixed subject effects, that the Fieller's method is highly unlikely to fail. They provided the exact symmetrical CI for #T/#R when specified interval boundaries for decision making are symmetric about 1 and compared it with Fieller's CI. Westlake (1976) symmetric CI is an approximation to their exact symmetric CI. While Bayesian CIs are appealing in the context (1.1), they have not become popular in the pharmaceutical industry primarily because of difficulties in justifying a chosen prior. On the other hand, it is difficult to interpret a nonBayesian CI in this context using fiducial argument. Recently, Shen and Iglewicz (1994) have bootstrapped trimmed t-statistics, in the context of a two one-sided tests procedure, for the difference of location parameters rather than for the ratio. Their procedure is useful in the presence of outliers, but it is associated with loss of information. Anderson and Hauck (1990) proposed a method for assessing individual bioequivalence after formulating the hypotheses on 7r = Pr{lYT/yR- 11 < 0.2}, where ~¢r and yR are responses from the same subject. This approach does measure, in some sense, a simultaneous shift in both location and scale parameters. However, 7r is not a good indicator of bioequivalence because even in the case of equal population means and variances it heavily depends on the size of the common variance. Furthermore the test (sign test) is approximate because of the discreteness of the sample space. Wellek

38

H. L Patel

(1993) recently extended this concept to lognormal data and introduced a noncentral F-statistic for testing the null hypothesis. Although derived under different considerations, this is exactly the same test as was proposed by Patel and Gupta (1984) in the context equivalence in clinical trials where the sample sizes are usually large. Esinhart and Chinchilli (1994) extended the tolerance interval approach, originally proposed by Westlake (1988), to higher order designs for evaluating the individual bioequivalence. Schall and Luus (1993) provided a unified approach for jointly assessing average and individual bioequivalence through a less economical design. It should be noted that this area of research is still evolving from the regulatory view points.

2.3. Dose-linearity

Pharmacokinetic studies are done to examine whether the kinetic, measuring the bioavailability in terms of AUC or Cmax, of a drug is proportional to the dose within a certain range of doses. T h e dose proportionality implies that the dose response curve is linear and passes through the origin. However, this concept is commonly interpreted by the pharmacokineticists as dose linearity. The linearity in a statistical sense would mean that the dose-response curve is linear and the regression line has an arbitrary intercept. Consider a design where the pharmacokinetic measurements AUC and Cm~x are assessed from plasma profiles over a certain period of time in a single dose or multiple dose setting. Because of a relatively large between-subject variability, a doseproportionality study is done using a within-subject design, generally a Latin square or a balanced incomplete block (BIB) design which is replicated a certain number of times. Subjects are assigned randomly to the treatment sequences in such a way that each sequence is received by an equal number of subjects. Sufficiently long washout periods are planned between successive doses so that the carryover effects can be assumed zero. Suppose y~j is the response corresponding to dose x for subject i (i = 1 , . . . , n) and period j (j = 1 , . . . , p ) , then the dose-response is linear if E ( y i j [ x) = ~ + 13x. We assume that y is normally distributed. There are situations where the plasma drug levels are endogenous. For example, hormones are naturally produced by the body. So in a hormone study even a placebo subject will show some levels of hormone in the plasma. In this case the intercept a is positive. Also, when the drug is exogenous, there exist situations where the dose-response is nonlinear at the first pass, i.e., within a small dose-interval near zero. This means even if the dose-response is linear over the observed dose-range, it cannot be assumed linear over the dose-range starting from zero, even though the plasma levels are zero at the zero dose. Another problem in this type study is that for the variables AUC and Cmax the variance of y is proportional to x 2. Hence, the conventional split-plot type analysis of a within-subject design is inappropriate. So the data analysts consider the transformation z = y / x to make the variance of z independent of x. However, in this case the regression of z on x is E(z~j I x) = a / x + t3, which is nonlinear unless a = 0. The data analysts assume that c~ = 0 and test the hypothesis of equal z-means associated with the dose levels using a split-plot type analysis. If the null hypothesis

Clinical trials in drug development: Some statistical issues

39

is not rejected, they conclude the dose proportionality of the response. There are two problems, however, with this currently practiced method. First, a must be zero for E(zij [ x) to be constant. The second problem is that the lack of significance of the hypothesis of equal z-means does not prove the null hypothesis. It may be that the design is not sensitive enough to pick the differences in the z-means. This argument is similar to what is made in the problem of showing equivalence of two populations. Patel (1994a) has proposed a maximum likelihood procedure for stepwise fitting of polynomials over the observed dose-range. This procedure assumes multivariate normality of the y-responses of a subject with the dispersion matrix equal to D i O D i , where Di = diag(xil,...,xip), x i j is the dose received by the ith subject in the jth period and g2 has a compound symmetry structure, i.e., wii = ~ + 77 > 0 and ~oij -- ~//> 0 for i ~ j. Patel has also shown by a simulation study that the successive tests in a stepwise polynomial fitting procedure can be regarded independent. The approach of polynomial regression fitting answers the question whether the doseresponse is linear. If the dose-response is linear, one can express the predicted response as being proportional to the corrected dose, i.e., y = / ~ ( x - ~), where ~ : -/~/~. However, this prediction is good only for a dose in the dose-range examined in the dose proportionality study.

3. Dose-response studies 3.1. Introduction

Before determining a narrow range of safe and efficacious doses that could be prescribed to patients with a given disease, a series of dose-response studies are done in animals and humans. Although the discussion in this section is primarily limited to dose-response studies in humans, we give some background of animal studies designed to help choose safe and efficacious doses for clinical development. Following the drug screening program, special studies are done to assess the pharmacological effect at various doses of a chemical substance. Bioassay is a special field of research where several doses of test and standard preparations (in the form of drugs, vaccines, antibiotics, vitamins, etc.) are administered to living organisms (animals, animal tissues, or micro-organisms), relevant pharmacological responses are obtained and the potency of the test preparation relative to that of the standard preparation is estimated. The literature on bioassay designs and their analysis abounds. See Finney (1971), Carter and Hubert (1985), Srivastava (1986), Hubert et al. (1988), Meier et al. (1993), Yuan and Kshirsagar (1993), among others. A valid and efficient estimate of the relative potency from dose-response profiles of the two preparations is of primary concern in bioassay studies. Potentially harmful effects of human exposure to hazardous chemicals in the environment are generally extrapolated from animal studies. A similar situation arises in a long-term patient treatment. From animal toxicology studies, we try to predict safety for patients who would be taking a pharmaceutical product especially for the treatment of a chronic disease. This is an extremely difficult problem because the extrapolation comes in two ways. The animal models could be quite different from

40

H. L Patel

human models and animal studies are generally conducted at very high doses relative to the doses patients are generally exposed to. There exists a vast literature on animal carcinogenicity studies. See, for example, Cornfield (1977), Armitage (1982), Krewski and Brown (1981), Van Ryzin and Rai (1987), Schoenfeld (1986) and Carr and Portier (1993). Several short-term dose-response studies designed to evaluate safety in animals precede Phase I clinical studies. Long-term carcinogenicity studies in animals are conducted to compare the incidences of various types of tumors associated with different dose levels and a placebo. The regulatory agency has issued guidelines for the timing of these studies and design related issues. Selwyn (1988) has written an excellent review paper on design and analysis aspects of animal toxicology dose-response studies that are conducted by the pharmaceutical industry. Phase I studies are primarily designed to address safety and tolerability whereas Phase II dose-response studies address both efficacy and safety issues. Except in cancer research, Phase I studies are done in healthy volunteers. Phase II studies, on the other hand, use patients of the targeted patient population. We describe these commonly used study designs in the following sub-sections.

3.2. Phase I studies

Phase I trials enroll a small number of subjects and are done in a relatively short time because of the time pressure for the drug development. The primary purpose of these studies is to find a range of doses that are tolerable or are associated with toxicity no higher than some acceptable level. 3.2.1. Cancer chemotherapy studies

Phase I cancer chemotherapy studies, where we expect severe toxic effects, are different in nature from the Phase I studies done for other therapeutic areas. Korn et al. (1994) have described what is called a "standard" Phase I design in cancer research as follows: First, there is a precise definition in the protocol of what is considered dose-limiting toxicity (DLT); it may differ in different settings. The dose levels are fixed in advance, with level 1 (the lowest dose) being the starting dose. Cohorts of three patients are treated at a time. Initially, 3 patients are treated with dose level 1. If none of them experiences DLT, one proceeds to the next higher level with a cohort of 3 patients. If 1 out of 3 patients experiences DLT, additional 3 patients are treated at the same dose level. If 1 out of 6 patients experiences DLT at the current level, the dose escalates for the next cohort of 3 patients. This process continues. If >~ 2 out of 6 patients experience DLT, or ~> 2 out of 3 patients experience DLT in the initial cohort treated at a dose level, then one has exceeded the MTD (maximum tolerable dose; note that the definition of MTD varies with investigators). Some investigations will, at this point, declare the previous dose level as the MTD, but a more common requirement is to have 6 patients treated at the MTD (if it is higher than level 0). To satisfy this requirement, one would treat another 3 patients at this previous dose level if there were only 3 already treated. The MTD is then defined as the highest dose level (~> 1) in which 6 patients have been treated with ~< 1 instance of DLT, or

Clinical trials in drug development: Some statistical issues

41

dose level 0 if there were ) 2 instances of DLT at dose level 1. The number of doses recommended in these studies is about 6. Once the MTD is estimated, Phase II trials will start to evaluate the efficacy with the doses lower or equal to MTD. Storer (1989) defined MTD as the dose associated with some specified percentile of a tolerance distribution and derived its MLE (maximum likelihood estimate), assuming a logistic dose-toxicity curve. He also considered some designs which are variations on what is called an "up and down" method. O'Quigley et al. (1990) introduced the concept of CRM (continued reassessment method), where patients are studied one at a time. They used a Bayesian approach to compute the posterior probability at each step to choose a dose for the next patient. The objective in CRM is to estimate the dose associated with some acceptable targeted toxicity level. O'Quigley and Chevret (1991) have reviewed some dose finding designs and provided simulation results. Korn et al. (1994) compared the performances of the "standard" and CRM methods with simulations. They observed that with CRM, more patients will be treated at very high doses and the trial will take longer to complete then the "standard" design. They have also proposed some modifications in these two designs. Because of small sample sizes, generally less than 30, the estimate of MTD can not be estimated with good precision. Because of ethical issues, it is desirable to be on the conservative side in estimating MTD. 3.2.2. Other therapeutic areas

The treatments for such diseases as hypertension or diabetes are not as toxic as in the cancer chemotherapy. Phase I studies are therefore done in healthy subjects without major ethical issues. The sample sizes are small, ranging generally from 8 to 20. Typically, these are single dose studies in young, healthy male volunteers. In a typical design, subjects are randomly assigned to k + 1 sequences formed with a placebo and k (about 5) doses of drug, separated by washout periods, in such a way that the doses will appear in an increasing order and a placebo at the diagonal places. Each subject receives all doses and a placebo. Here the placebo helps maintain pseudo doubleblinding and also serves as control. Various safety measurements, including vital signs and laboratory determinations (blood chemistry, hematology, etc.) are collected both before and after a treatment. This design meets ethical concerns in that the dose is not increased until the safety and tolerability of subject's current dose is assured. Sometimes a Latin square design with washout periods between successive doses is considered if the order of doses in a sequence does not pose an ethical concern. An alternating panel design is sometimes used. In this design a group of subjects is randomly divided into two subgroups; one is assigned to the lowest dose and the other to a placebo. After safety and tolerability for these subjects is assured, another group of subjects is enrolled to the next higher dose and a placebo. After assuring the safety and tolerability of the second dose, the next higher dose and a placebo are given to another group of subjects. This process continues until all doses are tested or the study is terminated because of safety problems. The termination of the study depends on the investigator's judgement. These designs can be used for a multiple dose study, but the length of the study may be prohibitive. It is becoming more and more common to take blood samples from these early phase studies for pharmacoldnetic measurements and to associate them with safety measurements.

42

H. l. Patel

3.3. Dose-response studies for efficacy After a range of safe and tolerable doses is determined from Phase I studies, Phase I1 efficacy studies in patients begin. Phase II studies are primarily done to evaluate efficacy of the drug in an exploratory manner and to further narrow down the range of doses for designing a Phase 1II study. Because of the time pressure, the management often wants to skip a good part of Phase II. There have been several cases where a wrong dose, generally higher than what would be necessary, has been on the market because of lack of or poor planning of a Phase II dose-response study. Note that finding a range of (reasonably) safe and efficacious doses is of primary interest throughout the development process.

3.3. I. Objectives One should ideally consider fitting two dose-response curves, one for efficacy and the other for safety, to estimate such characteristics as a therapeutic window and CID (a clinically important dose). The lower limit of the therapeutic window is traditionally estimated as a minimum effective dose (MINED) and the upper limit as minimum of maximum effective dose (MAXED) and a maximum tolerable dose (MTD). A dose below the therapeutic window would be considered ineffective and a dose above it unsafe. The term MAXED is defined as the lowest dose associated with a maximum benefit that the drug can produce. As defined earlier, the highest dose for which no clinically important adverse experiences occur is chosen as MTD. The phrase "clinically important" is subjective and is often left to the discretion of individual investigators to interpret. The quantities MAXED and MTD are drug specific and therefore are estimated from the efficacy and safety dose-response profiles, respectively. Although it is difficult to pinpoint a globally agreeable efficacy value that would correspond to CID, clinical researchers have better understanding of this concept than the efficacy corresponding MINED. Ekholm et al. (1990) have defined MINED as the lowest dose that produces a clinically meaningful response. This definition is confusing with the definition of CID. Logically it should be the dose that produces the efficacy that is similar to, but visibly distinct from, that of a placebo. This may be interpreted as an upper or lower bound of an interval consisting of efficacy values that are practically equivalent to the placebo effect. Like clinically important difference, one should a priori define for a given therapeutic area what minimal efficacy is regardless of the drug tested. As examples of MINED and CID, in a treatment for prevention of bone loss (osteoporosis) in postmenopausal women, a zero percent change in bone mineral density from baseline can be considered as MINED and a 2.5 to 3 percent change as CID. The bone mineral density decreases in the absence of any treatment. The ultimate goal of a dose-response study is to recommend one or two doses of the drug that are safe and efficacious for further development in Phase Ill. Depending on the therapeutic area and availability of drugs in the market, one may judge the extent to which safety can be compromised over efficacy, The more potent the dose, the more unsafe it is likely to be.

Clinical trials in drug development: Some statistical issues

43

3.3.2. Designs Several designs, including parallel-group, Latin square, dose-escalation and dosetitration designs, are used for a dose-response study. Some are more commonly used than others. The treatment duration for each dose depends on the therapeutic area. When a good surrogate measurement for efficacy variable exists, it is used to reduce the length of the observation period for each dose. For example, the mean increase in gastric pH from baseline for a single dose is used as a surrogate measurement for the rate of ulcer healing. Sometimes repeated measurements (two or more observations at protocol scheduled time points) are obtained during the treatment period for each dose. Latin square and systematic dose-escalation designs would be acceptable for a chronic disease provided they have sufficiently long washout periods. However, they require a long time to finish. Also, the patient dropout rate increases with the length of a trial. The purpose of a dose-response study is to estimate certain characteristics of the dose-response curve. Thus, how precisely the expected response is predicted at a given dose is more important in such studies than a test of significance of the difference between placebo and a dose. Tests of hypotheses have a place in Phase III trials as they are considered confirmatory. In a dose-titration study, each patient starts the treatment with the lowest dose. After some minimum length of treatment, if the subject shows a response based on well defined criteria, the patient continues receiving the same dose as before. If a patient fails to show a response, the dose is increased to the next level. At any time during the study, if clinically important adverse events occur, the dose is lowered or patient is dropped from the study. Because of its resemblance to clinical practice, a dose-titration design is sometimes preferred to a parallel-group design where a patient is assigned randomly to one of the doses and continues receiving it throughout the study. A dose titration design has several problems. It does not clearly answer the safety question. Because of a limited study length, the successive doses are titrated faster than are normally done in clinical practice. As a result, the doses recommended for Phase III tend to be overestimated. The titration design is not suitable for fitting a dose-response curve for the following reasons: (1) There is no clear causal-effect relationship between the dose and effect; the dose for the next period depends on the observed response of the preceding period. (2) Carryover effects and confounding between time and doses cloud the inference. (3) Only nonresponders continue receiving higher doses. As a result, the observed efficacy at the highest doses could, on average, be worse than that at the lower doses, which is contrary to what we would expect in a true dose-response profile. Several attempts have been made to analyze a titration study using stochastic models (see, for example, Chuang, 1987; Shih et al., 1989). However, it is difficult to make sure that an assumed model adequately represents the true data-generating mechanism and to verify the underlying model assumptions. Recently, Chi et al. (1994) have suggested a modified forced titration design as a compromise between a parallel group design and a systematic dose-escalation design. Their design is illustrated in Figure 1. In this design subjects are randomly assigned to

44

H. I. Patel

D6 Ds D4

[

D~

D4 D~

D2 Dt

D2

Di Placebo

0

tj

q

t3

t4

to

Time Fig. 1. Design proposedby Chi et al.

pre-defined dose-sequences (with placebo being Dose 0). Depending on the sequence assigned, a subject either continues receiving the same dose as received in the previous time interval or receives the next higher dose at a scheduled time point. The authors indicate that this design should help estimate the net incremental effect at each titration step when there is potential for carryover effect. Even for this design, the questions on dose-response cannot be clearly answered for the reasons stated earlier. Although less ethical than a titration design, a parallel-group design should be preferred so that the questions on a therapeutic range can be answered with clarity. This, of course, requires the evaluations of both safety and efficacy. Given a wellestablished therapeutic range from a sufficiently large study, a physician would be comfortable to titrate the doses in actual practice.

3.3.3. Analysis Some statisticians consider this an isotonic regression problem and apply Williams (1972) method for multiple comparisons (dose vs. placebo) assuming that the doseresponse curve is monotonic. They declare the lowest dose showing a significantly greater (or lower) effect than a placebo as MINED. This approach focuses on statistical significance rather than clinical interpretation of efficacy. Another problem is that the dose-response curve may not be monotonic. Rom et al. (1994) mention some situations where the curve initially increases and then, after reaching to a peak, starts decreasing. They also proposed two closed test procedures for multiple comparisons which are based on p-values and are therefore not limited to any particular method of analysis. Whatever multiple comparison method is used, it is not going to solve the basic problem of estimating certain characteristics of a dose-response profile. In practice, we cannot consider a continuum of doses. Only a finite number of doses are selected and the choice of dose intervals might be quite arbitrary. Therefore, the interpretation of the results based on tests of hypotheses are design dependent. Another problem is that clinical researchers take the p-values seriously and a p-value depends on the

Clinical trials in drug development: Some statistical issues

45

sample size. So two trials based on different sample sizes would lead to different conclusions on the therapeutic range. If adequate number of doses are considered in the design, a suitable nonlinear or polynomial dose-response can be fitted. Some insight regarding the number of doses and computing a common sample size per dose for a binary response is given by Strijbosch et al. (1990). They considered a logit-linear model imposing a restriction that at least a prespecified number of doses would satisfy certain bounds on the response on a logit scale. Patel (1992) has provided a method to compute the sample size per dose so that a fitted logistic curve yields a desired level of precision for estimating CID. This method applies to both binary and normal data.

4. Mixed effects models 4.1. Introduction

Like in any experimental situation, linear models play an important role for making inference in clinical trials. A linear combination of fixed and random effects defines a linear model. Fixed effects represent the population mean effects associated with a finite number of levels of a factor. For example, in a dose-response study the effects of a placebo and low, medium, and high doses of a drug are considered fixed effects. The random effects for a given factor are regarded as realization of a random sample from the population of a large set of levels representing the factor. For example, in a simple, two-sequence, two-period, crossover trial, subjects (patients) are randomly assigned to one of the two sequences A B and B A , where A and B represent the treatments. Here subjects are assumed to represent the entire population of subjects described by the protocol. Another example of random effects is the centers in a multicenter trial. Some controversy still surrounds about labeling the centers as random effects. We discuss this point in more detail in Section 4.2. The first example of a crossover design introduces nesting of some effects. The between-subject variation is partitioned as between sequences (fixed) and between subjects within sequences (random). The within-subject variation is partitioned as between periods (fixed), sequence-by-period interaction (fixed), and subjects within sequences-by-period interactions (random). The sequence-by-period interaction is also the between treatment difference. Thus this design is like a split-plot design, except that a unit (subject) cannot be split in order to control the time effect and this imposes some restriction on the analysis of a crossover design. In the second example of random effects, the treatments are crossed with the centers and the model includes the treatments (fixed), centers (random), treatment x center interactions (random), and the experimental error (random). Since the treatments are randomly assigned to subjects within each center, one may regard this as a nested design with treatments within centers as being the sum of the treatment effects and the treatment-by-center interactions. When a subject is treated with one of randomly assigned treatments throughout the trial and observed repeatedly, the design is referred to as a parallel group study with repeated measurements or simply a longitudinal study. This design is also referred

46

H. I. Patel

to as a between-subject design as opposed to a within-subject design where some or all subjects receive two or more treatments. In a within-subject design, a subject is assigned randomly to one of several sequences, formed after permuting the treatments with or without restrictions, and receives the treatments in the order of the assigned sequence. A Latin square design, balanced incomplete design, and Williams (1950) designs are a few examples of within-subject designs. The simple 2 x 2 (2 sequence, 2 period) design, a special case of a Latin square design, is widely used in early phases of clinical trials. It is not uncommon to have incomplete data in a long-term trial. A subject could be discontinued from the trial for one or more reasons. Occasionally, subjects miss a few intermediate visits and this leads to additional missing values. There are two types of early withdrawals: response related and response unrelated. However, this classification may not be clear-cut. Moreover, for the subjects who are just lost to follow-up, without informing the investigator, it is difficult to say whether the reason for discontinuation is response related. Rubin (1976) defined three types of missing data: missing completely at random (MCAR), missing at random (MAR) and informative. If the missing data (censoring) mechanism is independent of both observed and unobserved data, the missing values are MCAR. If the mechanism is independent of unobserved data, the missing values are MAR. If the mechanism depends on unobserved data, the missing values are informative. Rubin (1976) also showed that MCAR values are ignorable in the sense that the analysis based on existing data will give unbiased estimates of the location parameters. For MAR values he showed that likelihood based procedures give valid inference for the treatment comparisons. It is difficult to account for the influence of informative missing values in the analysis. Some attempts have been made to model the probability of informative censoring at successive time points either through a Markovian model or a random effects model (see, e.g., Wu and Carroll, 1988; Wu and Bailey, 1989; Diggle and Kenward, 1994). These methods are, however, model dependent and in practice it is extremely difficult to verify the goodness of fit of a given model. Extensive literature, including books written by Jones and Kenward (1989), Jones (1993), Longford (1993), and Diggle et al. (1994), exist for analyzing repeated measures designs and cluster data. In this section, we limit our attention to mixed effects models, i.e., models having both fixed and random effects, and we emphasize on making inference on fixed effects rather than on estimating the variance components. The latter topic is covered extensively in the books by Searle et al. (1992) and Rao and Kleffe (1988). In Section 4.2, we consider a model for a multicenter trial when each subject has a single response. We introduce the Laird-Ware (Laird and Ware, 1982) model to represent a class of repeated measures designs in Section 4.3. Section 4.4 deals with non-normal distributions.

4.2. Multicenter trial

In a multicenter trial, several centers (clinics) participate in the trial following a common protocol to evaluate efficacy and safety of two or more treatments. The treatments are randomly assigned to subjects at each center. The purpose of designing a multicenter trial is to expedite the patient recruitment and to have a broad coverage

Clinical trials in drug development: Some statistical issues

47

of patient population, medical practice and environmental factors. Since the subjects within a center have something in common, they may be regarded as forming a cluster. The inference on treatment effects is of primary interest and it is made from the data supplied by all centers. For a continuous response variable, suppose we write a linear model as Yijk = "ri + 5j + rid + X~jkfl + Ciyk,

(4.1)

where yijk is the response from the kth (k = 1 , . . . , n i j ) subject on the ith (i = 1 , . . . , t ) treatment in the j t h (j = 1 , . . . , m ) center, xijk is a p x 1 vector of covariates, /3 is a p x 1 vector of regression coefficients, ~-'s are the treatment effects, 6's are the center effects, 3"s are the interaction effects, and e's are residuals which are assumed to be I N ( O , a2), i.e., independent normal with means zero and variances cr2. The treatment effects are clearly fixed effects. Regarding the center effects, however, some controversy exists. The interaction effects are obviously random if at least one of the factors involved is random. For the trials sponsored by the pharmaceutical industry the centers are traditionally considered as fixed effects. Statisticians first test the significance of the treatment-by-center interaction and then test the significance of the treatment difference if the interaction is nonsignificant. When the interaction is present, common practice is to compare the treatments in each center separately. Separate analyses obviously have low powers and do not help to reach an overall conclusion about the treatment effects. Thus current approach of analyzing a multicenter trial may defeat the purpose of designing a multicenter trial. Khatri and Patel (1992) have argued in favor of treating the centers as random effects. We make the following arguments to justify the centers as random: l. We are not interested in making inference for a particular set of centers. 2. While interpreting the treatment differences, we do not limit the inference to the subjects of a given set of centers. Instead, we extend the results to the general population. 3. Although the participating centers are not chosen randomly from the entire population of centers, this is not a good reason for not considering the centers as random. In biological experiments, for example, this type of situation is common. Whatever animals are at disposal of a researcher are used and they do not constitute a random sample from all animals. 4. For a given center, many factors, including type of patients (demography, genetics, socio-economic conditions, disease severity, etc.), clinical methodology, patient behavior (attitude, compliance, etc.), and environmental conditions, play a role in a complex manner. If a few prognostic factors can be identified as covariates, then what is left in the process after removing their influence is simply random noise generated by a center. 5. If we treated the treatment-by-center interaction as fixed, it would mean that in the presence of the interaction we cannot make valid inferences on treatments. This cannot be justified, however, because it would be unlikely to observe the same interaction profile if trials were repeatedly conducted under the same protocol but with different sets of centers. Even the same set of centers is unlikely to generate

H. L Patel

48

the same interaction profile under repeated sampling because of many unknown or uncontrolled factors. These reasons lead us to believe that the treatment-by-center interaction is a consequence of randomly occurring events. An excellent review on the topic of interaction is given by Cox (1984). He gave some criteria to judge whether a given interaction is fixed or random. With a mixed effects model the conventional interpretation of the treatment-by-center interaction no longer applies. If the estimate of the interaction variance component is large, we duly pay a penalty for comparing the treatments by increasing the variance. Chakravorti and Grizzle (1975) derived the likelihood ratio test for comparing the treatment effects for Model (4.1) with no covariates. They assumed that ~'s ~ I N ( 0 , a~) and 7's ~ IN(O, ~r2). A unified theory of a more general problem is given by Harville (1977) which is further discussed in the context of a repeated measures design in Section 4.3. Exact analysis of Model (4.1) with no covariates is given by Gallo and Khuri (1990). As Scheff6 (1959) pointed out, Model (4.1) is not general enough and therefore is limited in scope. For example, in a clinical trial designed to compare an active treatment with a placebo, the treatment groups may be associated with different variances. Khatri and Patel (1992) considered a more general model for the n.j observations from the jth center as

yj = A}Oj + Xjt3 + cj,

for j = 1 , . . . , m,

(4.2)

l / where yjt = (Ylj,... ,Ytj), Yij/ = (Yijl,... ,Y~jn~j), 'IT,ij > 0, for i = 1 , . . . ,t, A'.3 = diag(lnl~, • • •, lnts) is an n.j x t design matrix for the jth center, Xj ( X u , . . . , Xtj) is a p x n.j matrix of covariate values, Xij = ( x i j l , . . . , Xijn,j), Oj and sj are independent random-vectors such that 0j ~ Nt(#, £2) and sj ,., Nn(O, a2In.~), and t3~ = (f~1, • • •, 13p) is a vector of regression slopes. Here Nq stands for a q-dimensional normal distribution.-The covariates are centered at their respective means. The dispersion matrix of yj is 4~j = A}f2Aj + a2In.. Khatri and Patel computed maximum likelihood estimates of the parameters under the conditions that crz > 0 and/2+cr2D~ -1 is non-negative definite for all j, where Dj = AjA} and derived the likelihood ratio test for comparing the treatment effects. When f2 has a compound symmetry structure (equal variances and equal covariances), Model (4.2) reduces to Model (4.1). If the assumption of compound symmetry is satisfied, Model (4.1) will be more powerful. It is not clear as to what should be a minimum m, the number of centers, for a reasonably reliable estimate of f2 and thus for a satisfactory approximation for the likelihood ratio test for the treatment comparisons. The numerical examples in Chakravorti and Grizzle (1975), Gallo and Khuri (1990) and Khatri and Patel (1992) included 3, 9 and 6 centers, respectively. As a rule of thumb, we suggest that for m ~ 5, a fixed effects model be preferred to a mixed effects model. If m ~ 5, one approach is to choose a priori a fixed effects model without the interaction term. The alternative is to choose a fixed effects model with the interaction term and follow these steps: (1) If the treatment-by-center interaction is not significant at 0.2 level of significance (preliminary test), drop it from the model to increase the efficiency of the test for treatments. The reduced model, however, includes the centers as a factor.

Clinical trials in drug development: Some statistical issues

49

(2) If the interaction is significant at the 0.05 level of significance do an exploratory analysis to see if any covariate (in addition to those included in the model) can explain the interaction. Sometimes a few outlying observations play a major role in introducing inconsistency of treatment differences across centers. This is not to suggest that such exploratory analysis should be limited to a fixed effects model. If nothing helps explain the interaction, one can follow the following strategies: (i) If the interaction is quantitative in nature, i.e., the treatment differences are in one direction, one can still use the original model to test the treatment effects after making adjustment for the interaction term. (ii) If the interaction is qualitative in nature, one may rely on the analysis in subgroups of the centers and interpret the results appropriately, making inference in relation to the center specific characteristics, if possible.

4.3. The Laird-Ware model

In this section we concentrate on longitudinal data. In seventies, the analysis of longitudinal data generally used a split-plot type model that required an assumption of equal variances and equal covariances for the repeated measurements, even though multivariate analysis of variance approach (Cole and Grizzle, 1966) and growth curve analysis methods (Rat, 1959; Potthoff and Roy, 1964; Rat, 1965; Khatri, 1966; and Grizzle and Allen, 1969) already existed. In a growth curve analysis, the expected response is modelled as a continuous function of time. In general, they are unsuitable for clinical trial data because of ill-behaved response profiles of subjects over the treatment period. A linear growth curve model has been used by Lee and DeMets (1991) in analyzing a group sequential trial, where the treatments are compared once or more than once during the course of the trial. Techniques for analyzing incomplete data from general (unstructured dispersion matrix) multivariate normal populations were developed (see, for example, Hartley and Hocking, 1971; Kleinbaum, 1973; Beale and Little, 1975; Dempster et al., 1977). However, these methods were not found suitable to clinical trial data because of over parameterization, especially when the number of measurements on a subject is large relative to the number of subjects. The work having a major impact on clinical trials with repeated measures designs came from Laird and Ware (1982). Based on work of Harville (1977), they developed ML (maximum likelihood) and REML (restricted maximum likelihood) procedures for analyzing a general mixed effects model for repeated measurements. Suppose Yi = (Y/l,... ,YiT~) t is a Ti × 1 vector of responses from the ith (i = 1 , . . . , n ) subject at times til, . . . , tiT, with Ti i. The computational methods for analyzing models of Section 4 cannot accommodate ante-dependence models. On the other hand, the models of Section 4 do not require the times of measurements to be common for all subjects. In Section 5.2, we review applications of ante-dependence models for analyzing between-subject designs (longitudinal data) with an emphasize on monotone patterned data. We also give a brief review of applications of Markovian models for analyzing non-normal data in this section. Markovian models for analyzing within-subject designs are reviewed in Section 5.3. =

5. 2. Between-subject designs

Anderson (1957) and Bhargava (1975) derived maximum likelihood estimators under a monotone pattern assuming multivariate normality for the repeated measurements. Patel (1979) extended this to a k-sample problem for analysis of covariance. All this work assumed the general dependence; the likelihood was written as the product of

H. L Patel

54

conditionally independent densities under an ante-dependence model of order p - 1. Patel and Khatri (1981) used the Markovian (ante-dependence of order one) normal density for analyzing one-way classification with fixed covariates and provided the likelihood ratio test statistics and their improved asymptotic distributions for testing various hypotheses. Byrne and Arnold (1983) studied this model for a one-sample problem and showed that it is more powerful than Hotelling's T 2 requiring an unstructured dispersion matrix. Ante-dependence models of order g > 1 for analyzing monotone data in a parallel group design have been applied by Kenward (1987) and Patel (1991). They also derived better approximations for the distributions of Likelihood ratio statistics. Murray and Findlay (1988) analyzed blood pressure data with a monotone pattern to compare two treatments using an ante-dependence model of order 1. Patel (1991) also provided a method of estimating the marginal treatment mean profiles after adjusting for time-independent covariates. Using the adjusted cell (time-by-treatment) means and their dispersion matrix, one can make inference on a set of between and within treatment contrasts. For example, a test for treatment-bytime interaction may be of interest. Software is written in PROC IML of SAS6.07. Likelihood ratio for testing the hypothesis that the order of the ante-dependence model is g versus that it is h > 9 is derived by Kenward (1987). Although this test is sensitive to the departure from the assumption of normality, it can serve as a guideline in choosing an approximate value of g. If there exist a few intermediate missing values, they should be estimated first in order to analyze a monotone pattern. A method for this can be found in Patel (1991). Recently, Patel (1994b) applied an ante-dependence model for implementing a group sequential procedure for a longitudinal trial. This model is also used by Patel (1994c) in another application. For longitudinal data representing a member of the exponential family, a generalized linear model can be written to represent the Markovian property. For example, for repeated binary responses we can write a logit regression model for each response using the past history as covariates. These models also allow time-dependent or timeindependent covariates. Analogous to ante-dependence models, we have Markov chain of order 9 for a sequence of binary responses. Bristol and Patel (1990) used a Markov chain model for comparing k treatment profiles of incidences of AEs (adverse experiences), associated with successive visits, in a monotone dataset. They derived the likelihood ratio test allowing a non-stationary transition probabilities which does not require equally spaced observations. This method takes into account the fact that the AEs are recurrent events. For a single sequence of binary responses Yt (t = 1 , . . . , n ) , Zeger and Qaqish (1988) modelled a logit link function of the conditional mean #t at time t as g

i=l

where xt is a vector of present and past covariates at time t and O's are stationary regression coefficients associated with the past y-observations. This is the Markov chain of order g suggested by Cox (1970). Here exp(0i) represents the odds of a positive response at time t given Y t - i = 1 relative to the odds given y t - i = 0 after

Clinical trials in drug development: Some statistical issues

55

adjusting for the other covariates. Since O's do not depend on t, this model represents a stationary Markov chain. This idea is extended to a repeated measures design in Diggle et al. (1994). For 9 > 1, the full likelihood cannot be written and therefore they relied on conditional likelihood and quasi-likelihood based inference. Since the conditional means rather than marginal means are modelled, the regression coefficients in such models are interpreted as being conditional on the past history and covariates. Zeger and Qaqish (1988) also applied the analysis of a time series to Poisson and gamma data. Recently, Azzalini (1994) considered a non-homogeneous Markov chain with time-varying transition probabilities for analyzing repeated measures binary data.

5.3. Within-subject designs

We introduced within-subject designs, commonly referred to as crossover designs, in Section 4. For a review of within-subject designs the readers are referred to Matthews (1988) and Jones and Kenward (1989). Primary advantages of allowing run-in and washout periods and obtaining baseline measurements at the start of successive treatment periods in a within-subject design are (1) to have a valid interpretation of carryover effects; (2) to increase the power for testing the significance of their differences; and (3) to obtain valid estimates of the treatment differences. The issues related to baseline measurements in the context of the 2 × 2 design are discussed in Patel (1990). In an incomplete block design each subject receives a subset of all treatments and consequently the influence of between-subject variability may appear on a test for treatment comparisons. It would therefore be desirable to consider a few most important prognostic variables as between-subject (time-independent) covariates. For example, in the treatment of respiratory disease or hyperglycemia, smoking is considered a good prognostic variable. Let us consider a design using a total of t treatments, p periods and m sequences where (a) the uth treatment sequence is administered on n~, subjects, allocations being random; (b) the uth sequence has k~ (~< p) distinct treatments, some being possibly repeated; (c) baseline measurements are used as within-subject (time-dependent) covariates; and (d) one or more prognostic factors are used as between-subject covariates when every subject does not receive all t treatments. We assume that the vector of period responses is multivariate normal. When the number of periods is large relative to the sample size, the power associated with the unrestricted dispersion matrix will not be satisfactory. We assume that the vector of period responses follows a multivariate normal distribution with an ante-dependence structure of order 9. Matthews (1990) considered AR(1) and moving average process models to evaluate the efficiency of a within-subject design relative to the variance structure that assumes independence of the errors. Let y~j be the treatment response for Period i and Subject j and x i j the corresponding baseline measurement, measured from the x-mean at Period i, i = 1 , . . . , p; j = 1 , . . . , n. We consider the model allowing unequal number of subjects per treatment sequence. Let u k j be the value of the kth (k = 1 , . . . , q ) between-subject covariate, measured from its mean, for the jth subject.

56

H. I. Patel

We define Y = (yij), X = (x~j) and U = (ukj) as known matrices of orders p × n, p × n, and q × n, respectively. Now we consider the model as Y = # A + F X + t3U + E,

(5.1)

where # is a p x rn matrix of unknowns whose rows correspond to the periods and columns to the sequences, A = (aij) is a design matrix of order m × n so that aij = 1 if the jth subject is in the ith treatment sequence and 0 otherwise, F - diag(71,..., 7p) with 7i being a within-subject regression parameter for the ith period,/3 = (/3ij) is a p x q matrix with/3ij being a regression parameter associated with the ith period response and the jth between-subject covariate, and E is a p × n error matrix. The model assumes that the columns of E are independently distributed with a common multivariate normal distribution with the p × 1 mean vector of zeros and a p × p dispersion matrix ~. A model with several time-dependent and time-independent covariates for a repeated measures design with complete data was studied by Patel (1986). We did not allow missing values in Model (5.1). However, if missing values exist and if they are caused by premature withdrawals, i.e., when the data form a monotone pattern, this model can be modified and analyzed along the same lines as in Patel (1991). If a few intermediate values are missing, they can be imputed using the models at successive time points. This type of modelling is general for analyzing any within-subject design. Patel (1985) considered a special case for analyzing monotone data (missing data only at Period 2) without covariates from the 2 × 2 design assuming normality. Here it was assumed that the missing data at Period 2 are non-informative. We hope that the future research will tackle non-normal distributions from the exponential family to analyze a class of within-subject designs with incomplete data.

6. Discussion A great deal of statistical expertise is required in both theory and computation for designing and analyzing today's clinical trials. Because of the space limitation we have briefly reviewed a few statistical topics. Such important topics as survival analysis, group sequential trials and multiple comparisons are widely applied in clinical trials, but are not included here. Several computer packages such as SAS, BMDP and S-PLUS are available for statistical analysis and are widely used by biostatisticians and data processing people. The industry, CROs (contract research organizations), academia and government employ a large number of biostatisticians whose primary responsibilities are to consult with clinical researchers on statistical designs, prepare analysis plans, analyze data and interpret results. Biostatisticians in the industry have to work in a somewhat restricted environment because of regulatory requirements. There are some limitations of clinical trials. Because of a limited patient-time exposure to the treatment in clinical trials it is difficult to predict long term toxicity. There have been some instances where an unacceptable level of toxicity is observed after the drug is marketed. Sometimes a sub-optimal (for both efficacy and safety) dose

Clinical trials in drug development: Some statistical issues

57

enters the market. Another problem is confounding factors. For example, the drugcompliance and the use of concomitant medications are major factors that influence efficacy and safety measurements. Not only that the clinical trials cannot simulate the actual clinical practice, it is extremely complicated, if not impossible, to estimate the drug efficacy and safety after removing the influence of these factors. The importance of a good dose-response study cannot be understated. More work is needed in both designing and analyzing such a study. Although some attempts have been made for comparing the treatment groups using incomplete longitudinal data, more research is needed in this area. When the patient recruitment becomes a major problem, innovative designs, including within-subject designs, should be considered for a chronic disease. Another approach would be to recruit a large number of centers without worrying about a minimum number of patients per center and treat the centers as random effects. A meta-analysis where several independent studies are combined to increase the power of the test can be considered in the same spirit. The current environment of health policy demands an expanded role of clinical trials. Besides the drug efficacy and safety, the cost-effectiveness and quality of life are expected to play important roles in the future. There is an increasing interest in collecting data on medical care utilization and quality of life from clinical trials especially for disabling diseases. Two evolving fields, Pharmacoeconomics and Pharmacoepidemiology, will hopefully help in offering more clear choices of treatment strategies to patients under the managed health care system.

Acknowledgements I am grateful to Professors Peter Armitage and Subir Ghosh and a referee for helpful suggestions.

References Anderson, S. and W. W. Hauck (1983). A new procedure for testing equivalence in comparative bioavailability and other clinical trials. Comm. Statist. Theory Methods 12, 2663-2692. Anderson, S. and W. W. Hauck (1990). Consideration of individual bioequivalence. J. Pharmacokinetics and Biopharmaceutics 18, 259-273. Anderson, T. W. (1957). Maximum likelihood estimates for a multivariate distribution when some observations are missing. J. Amer. Statist. Assoc. 57, 200-203. Armitage, P. (1982). The assessment of low-dose carcinogenicity. In: Proceedings of the Memorial Symposium in Honor of Jerome Cornfield. Biometrics (Supplement: Current Topics in biostatistics and Epidemiology) 38, 119-129. Azzalini, A. (1994). Logistic regression for autocorrelated data with application to repeated measures. Biometrika 81, 767-775. Beal, S. L. and L. B. Sheiner (1988). Heteroscedastic nonlinear regression. Technometrics 30, 327-338. Beale, E. M. L. and R. J. A. Little (1975). Missing values in multivariate analysis. J. Roy. Statist. Soc. Ser. B 37, 129-146. Beitler, P. J. and J. R. Landis (1985). A mixed effects model for categorical data. Biometrics 41, 991-1000. Bhargava, R. P. (1975). Some one-sample testing problems when there is a monotone sample from a multivariate normal population. Ann. Inst. Statist. Math. 27, 327-339.

58

H. L Patel

Breslow, N. E. (1984). Extra-Poisson variation in log-linear models. Appl. Statist. 33, 38-44. Bristol, D. R. and H. I. Patel (1990). A Markovian model for comparing incidences of side effects. Statist. Med. 9, 803-809. Byrne, P. J. and S. E Arnold (1983). Inference about multivariate means for a nonstationary Autoregressive model. J. Amer. Statist. Assoc. 78, 850--856. Carr, G. J. and C. J. Portier (1993). An evaluation of some methods for fitting dose-response models to quantal-response developmental toxicology data. Biometrics 49, 779-792. Carter, E. M. and J. J. Hubert (1985). Analysis of parallel-line assays with multivariate responses. Biometrics 41, 703-710. Chakravorti, S. R. and J. E. Grizzle (1975). Analysis of data from multiclinic experiments. Biometrics 31, 325-338. Chi, G. Y. H., H. M. J. Hung, S. D. Dubey and R. J. Lipicky (1994). Dose response studies and special populations. In: Proc. Amer. Statist. Assoc., Biopharmaceutical Section. Chi, E. M. and G. C. Reinsel (1989). Models for longitudinal data with random effects and AR(1) errors. J. Amer. Statist. Assoc. 84, 452-459. Chow, S. C. and J. P. Liu (1992). Design and Analysis of Bioavailability and Bioequivalence Studies. Marcel Dekker, New York. Chuang, C. (1987). The analysis of titration study. Statist. Med. 6, 583-590. Cole, J. W. L. and J. E. Grizzle (1966). Applications of multivariate analysis of variance to repeated measures experiments. Biometrics 22, 810-828. Cornfield, J. (1977). Carcinogenic risk assessment. Science 198, 693-699. Cox, D. R. (1970). Analysis of Binary Data. Chapman and Hall, London. Cox, D. R. (1985). Interaction. Internat. Statist. Rev. 52, 1-31. Crowder, M. J. (1985). Gaussian estimation for correlated binary data. J. Roy. Statist. Soc. Ser. B 47, 229-237. Dempster, A. P., N. M. Laird and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39, 1-38. Dersimonian, R. and N. M. Laird (1986). Meta-analysis in clinical trials. Contr. Clin. Trials 7, 177-188. Diggle, P. J, (1988). An approach to the analysis of repeated measurements. Biometrics 44, 959-971. Diggle, P. J. and M. Gr Kenward (1994). Informative dropout in longitudinal data analysis (with discussion). AppL Statist. 43, 49-93. Diggle, P. J., K.-Y. Liang and S. L. Zeger (1994). Analysis of Longitudinal Data. Oxford Univ. Press, Oxford. Ekholm, B. R., T. L. Fox and J. A. Bolognese (1990). Dose-response: Relating doses and plasma levels to efficacy and adverse experiences. In: D. A. Berry, ed., Statistical Methodology in the Pharmaceutical Sciences. Marcel Dekker, New York. Esinhart, J. D. and V. M. Chinchilli (1994). Extension to the use of tolerance intervals for the assessment of individual bioequivalence. J. Biopharm. Statist. 4, 39-52. FieUer, E. C. (1954). Some problems in interval estimation (with discussion). J. Roy. Statist. Soc. Ser. B 16, 175-185. Finney, D. J. (1971). Statistical Methods in Biological Assay, 2nd edn. Griffin, London. Fitzmaurice, G. M., N. M. Laird and S. R. Lipsitz (1994). Analyzing incomplete longitudinal binary responses: A likelihood-based approach. Biometrics 50, 601-612. Fitzmanrice, G. M., N. M. Laird and N. M. Rotnitsky (1993). Regression models for discrete longitudinal data (with discussion). Statist. Sci. 8, 248-309. FDA (1985). Code of Federal Regulations 21 (Food and Drugs). Part 320.22. Gabriel, K. R. (1962). Ante-dependence analysis of an ordered set of variables. Ann. Math. Statist. 33, 201-212. Gallo, J. and A. I. Khuri (1990). Exact tests for the random and fixed effects in an unbalanced mixed two-way cross-classification model. Biometrics 46, 1087-1095. Gibardi, M. and D. Perrier (1982). Pharmacokinetics, 2nd edn. Marcel Dekker, New York. Gilmour, A. R., R. D. Anderson and A. L. Rae (1985). The analysis of binomial data by a generalized linear mixed model. Biometrika 72, 593-599.

Clinical trials in drug development: Some statistical issues

59

Grizzle, J. E. (1965). The two-period change-over design and its use in clinical trials. Biometrics 21, 467-480. Grizzle, J. E. and D. M. Allen (1969). Analysis of growth and dose-response curves. Biometrics 25, 357381. Hartley, H. O. and R. R. Hocking (1971). The analysis of incomplete data. Biometrics 27, 783-808. Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. J. Amer. Statist. Assoc. 72, 320-340. Hauschke, D., V. W. Steinijans and E. Diletti (1990). A distribution-free procedure for the statistical analysis of bioequivalence studies, lnternat. J. Clinical Pharmacology, Therapy and Toxicology 28, 72-78. Hills, M. and E Armitage (1979). The two period crossover clinical trial. British J. Clinical Pharmacology 8, 7-20. Hsu, J. C., J. T. Hwang, H. K. Liu and S. Ruberg (1994). Confidence intervals associated with tests for bioequivalence. Biometrika 81, 103-114. Hubert, J. J., N. R. Bohidar and K. E. Peace (1988). Assessment of Pharmacological activity. In: K. E. Peace, ed., Biopharmaceutical Statistics .for Drug Development. Marcel Dekker, New York. Jenm'ich, R. I. and M. D. Schluchter (1986). Unbalanced repeated measures models with structured covarianee matrices. Biometrics 42, 805-820. Jones, B. and M. G. Kenward (1989). Design and Analysis of Cross-over Trials. Chapman and Hall, London. Jones, R. H. (1987). Serial correlation in unbalanced mixed models. Bull. Internat. Statist. Inst. Proc. 46th Session, Tokyo, 8-16 Sept. 1987, Book 4, 105-122. Jones, R. H. (1993). Longitudinal Data with Serial Correlation: A State-Space Approach. Chapman and Hall, London. Jones, R. H. and F. Boardi-Boateng (1991). Unequally spaced longitudinal data with AR(1) serial correlation. Biometrics 47, 161-175. Kenward, M. G. (•987). A method for comparing profiles of repeated measurements. Appl. Statist. 36, 296-308. Kenward, M. G. and B. Jones (1992). Alternative approaches to the analysis of binary and categorical repeated measurements. J. Biopharm, Statist. 2, 137-170. Khatri, C. G. (1966). A note on a MANOVA model applied to problems in growth curves. Ann. Inst. Statist. Math. 18, 75-86. Khatri, C. G. and H. I. Patel (1992). Analysis of a multicenter trial using a multivariate approach to a mixed linear model. Comm. Statist. Theory Methods 21, 21-39. Kleinbaum, D. G. (•973). A generalization of the growth curve model which allows missing data. J. Multivariate Anal. 3, 117-124. Koch, G. G., J. R. Landis, J. L. Freeman, D. H. Freeman and R. G. Lehnen (1977). A general methodology for the analysis of experiments with repeated measurements of categorical data. Biometrics 33, 133-158. Korn, E. L., D. Midthune, T. T. Chen, L. V. Rubinstein, M. C. Christian and R. M. Simon (1994). A comparison of two Phase I trial designs. Statist. Med. 13, 1799-1806. Krewski, D. and C. Brown (1981). Carcinogenic risk assessment: A guide to the literature. Biometrics 37, 353-366. Laird, N. M. and J. H. Ware (1982). Random-effects models for longitudinal data. Biometrics 38, 963-974. Laska, E. M. and M. J. Meisner (1989). Testing whether the identified treatment is best. Biometrics 45, 1139-1151. Lee, J. W. and D. L. DeMets (1991). Sequential comparison of changes with repeated measurements data. J. Amer. Statist. Assoc. 86, 757-762. Liang, K.-Y. and S. L. Zeger (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. Lindstorm, M. J. and D. M. Bates (1990). Nonlinear mixed effects models for repeated measures data. Biometrics 46, 673-687. Lipsitz, S. R., K. Kim and L. Zhao (1994). Analysis of repeated categorical data using generalized estimating equations. Statist. &led. 13, 1149-1163. Lipsitz, S. R., N. M. Laird and D. P. Harrington (1991). Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika 78, 153-160. Longford, N. T. (1993). Random Coefficient Models. Oxford Univ. Press, Oxford.

60

H. L Patel

Longford, N. T. (1994). Logistic regression with random coefficients. Comput. Statist. Data AnaL 17, 1-15. Mandallaz, D: and J. Mau (1981). Comparison of different methods for decision-making in bioequivalence assessment. Biometrics 37, 213-222. Matthews, J. N. S. (1988). Recent developments in crossover designs, lnternat. Statist. Rev. 56, 117-127. Matthews, J. N. S. (1990). The analysis of data from crossover designs: The efficiency of ordinary least squares. Biometrics 46, 689-696. Meier, K, L., A. J. Bailer and C. J. Portier (1993). A measure of tumorigenic potency incorporating doseresponse shape. Biometrics 49, 917-926. Metzler, C. M. (1974). Bioavailability: A problem in equivalence. Biometrics 30, 309-317. Murray, G. D. and J. G. Findlay (1988). Correcting for the bias caused by drop-outs in hypertension trials. Statist. Med. 7, 941-946. Ochi, Y. and R. L. Prentice (1984). Likelihood inference in a correlated probit regression model. Biometrika 71, 531-543. OGD (1993). In vivo bioequivalence guidences. Pharmacopeial Forum 19, 6067-6077. O'Quigley, J. and S. Chevret (1991). Methods for dose finding studies in cancer trials: A review and results of a Monte Carlo study. Statist. Med. 10, 1647-1664. O'Quigley, J., M. Pepe and L. Fisher (1990). Continual reassessment method: A practical design for Phase I clinical trials in cancer. Biometrics 46, 33--48. Patel, H. I. (1979). Analysis of covadance of incomplete data on experiments with repeated measurements in clinical trials. Proc. Fourth Conference of the SAS Users Group, International 84-92, SAS Institute, Inc., Cary, NC. Patel, H. I. (1985). Analysis of incomplete data in a two-period crossover design with reference to clinical trials. Biometrika 72, 411-418. Patel, H. I. (1986). Analysis of repeated measures designs with changing covariates in clinical trials. Biometrika 73, 707-715. Patel, H. I. (1990). Baseline measurements in a 2 × 2 crossover trial. In: K. E. Peace, ed., Statistical Issues in Pharmaceutical Development Marcel Dekker, New York, 177-184. Patel, H. I. (1991). Analysis of incomplete data from a clinical trial with repeated measurements. Biometrika 78, 609-619. Patel, H. I. (1992). Sample size for a dose-response study. J. Biopharm. Statist. 2, 1-8. Patel, H. I. (1994a). Dose-response in pharmacokinetics. Comm. Statist. Theory Methods 23, 451-465. Patel, H. I. (1994b). Group sequential analysis of a clinical trial with repeated measurements. Comm. Statist. Theory Methods 23, 981-995. Patel, H. I. (1994c). A repeated measures design with repeated randomization. J. Statist. Plann. Inference 42, 257-270. Patel, H. I. and G. D. Gupta (1984). A problem of equivalence in clinical trials. Biom. J. 5, 471-474. Patel, H. I. and C. G. Khatri (1981). Analysis of incomplete data in experiments with repeated measurements using a stochastic model. Comm. Statist. Theory Methods 22, 2259-2277. Potthoff, R. E and S. N. Roy (1964). A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika 51, 313-326. Prentice, R. L. and L. P. Zhao (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics 47, 825-839. Raghunathan, T. E. (1994). Monte Carlo methods for exploring sensitivity to distributional assumptions in a Bayesian analysis of a series of 2 x 2 tables. Statist. Med. 13, 1525-1538. Raghunathan, T. E. and Yoichi I. (1993). Analysis of binary data from a rnulticenter clinical trial. Biometrika 80, 127-139. Rao, C. R. (1959). Some problems involving linear hypotheses in multivariate analysis. Biometrika 46, 49-58. Rao, C. R. (1965). A theory of least squares when the parameters are stochastic and its application to the analysis of growth curves. Biometrika 52, 447-458. Rao, C. R. and J. Kleffe (1988). Estimation of Variance Components and Applications. North-Holland, Amsterdam. Rom, D. A., R. J. Costello and L. T. Connell (1994). On closed test procedures for dose-response analysis. Statist. Med. 13, 1583-1596.

Clinical trials in drug development: Some statistical issues

61

Rowland, M. and T. N. Tozer (1980). Clinical Pharmacokinetics: Concepts and Applications. Lea and Febiger, Philadelphia, PA. Rubin, D. B. (1976). Inference and missing data. Biometrika 63, 581-592. Rutter, C. M. and R. M. Elashoff (1994). Analysis of longitudinal data: Random coefficient regression modelling. Statist. Med. 13, 1211-1231. Schall, R. (1991). Estimation in generalized linear models with random effects. Biometrika 78, 719-727. Schall, R. and H. G. Luus (1993). On population and individual bioequivalence. Statist. Med. 12, 11091124. Scheff6, H. (1959). The Analysis of Variance. Wiley, New York. Schoenfeld, D. A. (1986). Confidence bounds for normal means under order restrictions, with application to dose-response curves, toxicological experiments, and low-dose extrapolation. J. Amer. Statist. Assoc. 81, 186-195. Searle, S. R., G. Casella and C. E. McCulloch (1992). Variance Components. Wiley, New York. Selwyn, M. R. (1988). Preclinical safety development. In: K. E. Peace, ed., Biopharmaceutical Statistics for Drug Development. Marcel Dekker, New York. Sheiner, L. B. and S. L. Beal (1983). Evaluation of methods for estimating population pharmacokinetic parameters. III. Nonexperimental model: Routine clinical pharmacokinetic data. J. Pharmacokin. Biopharm. 11, 303-319. Shen, C. E and B. Iglewicz (1994). Robust and bootstrap testing procedures for bioequivalence. J. Biopharmaceut. Statist. 4, 65-90. Shih, W. J., A. L. Gould and I. K. Hwang (1989). The analysis of titration studies in Phase III clinical trials. Statist. Med. 8, 583-591. Shuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokin. Biopharm. 15, 657-680. Snapinn, S. M. (1987). Evaluating the efficacy of a combination therapy. Statist. Med. 6, 657-665. Srivastava, M. S. (1986). Multivariate bioassay, combination of bioassays, and Fieller's theorem. Biometrics 42, 131-141. SAS Institute Inc. (1992). SAS Technical Report P-229. Software: Changes and Enhancements, Release 6.07. Chapter 16: The MIXED procedure. Cary, NC. SAS Institute Inc. (1990). SAS/IML Software, Version 6. Cary, NC. Stiratelli, R., N. M. Laird and J. H. Ware (1984). Random-effects model for several observations with binary response. Biometrics 40, 961-971. Storer, B. E. (1989). Design and analysis of Phase I clinical trials. Biometrics 45, 925-937. Strijbosch, L. W: G., R. J. M. M. Does and W. Albers (1990). Design methods for some dose-response models. Statist. Meal 9, 1353-1363. Van Ryzin, J. and K. Rai (1987). A dose-response model incorporating nonlinear kinetics. Biometrics 43, 95-105. Vonesh, E. E and R. L. Carter (1992). Mixed-effects nonlinear regression for unbalanced repeated measures. Biometrics 48, 1-17. Wagner, J. E. (1971). Biopharmaceutics and Relevant Pharmacokinetics. Drug Intelligence Publications, Hamilton, IL. Wellek, S. (1993). Basing the analysis of comparative bioavailability trials on an individualized statistical definition of equivalence. Biota. J. 1, 47-55. Westlake, W. J. (1972). Use of confidence intervals in analysis of comparative bioavailability trials. J. Pharmac. Sci. 61, 1340-1341. Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics 32, 741-744. Westlake, W. J. (1988). Bioavailability and bioequivalence of pharmaceutical formulations. In: K. E. Peace, ed., Biopharmaceutical Statistics for Drug Development. Marcel Dekker, New York, 329-352. Williams, D. A. (1972). The comparison of several dose levels with a zero dose control. Biometrics 28, 519-531. Williams, E. J. (1950). Experimental designs balanced for pairs of residual effects. Austral. J. Sci. Res. 3, 351-363. WHO, Regional Office for Europe (1986). Guidelines for the Investigation of Bioavailability. Copenhagen.

62

H. L Patel

Wu, M. C. and K. R. Bailey (1989). Estimation and comparison of changes in the presence of informative right censoring: Conditional linear model. Biometrics 45, 939-955. Wn, M. C. and R. J. Carroll (1988). Estimation and comparison of changes in the presence of right censoring by modeling the censoring process. Biometrics 44, 175-188. Yuan, W. and A. M. Kshirsagar (1993). Analysis of multivariate parallel-line bioassay with composite responses and composite doses, using canonical correlations. J. Biopharm. Statist. 3, 57-72. Zeger, S. L. and M. R. Karim (1991). Generalized linear models with random effects: A Gibbs sampling approach. J. Amer. Statist. Assoc. 86, 79-86. Zeger, S. L. and K.-Y. Liang (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42, 121-130. Zeger, S. L., K.-Y. Liang and E S. Albert (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44, 1049-1060. Zeger, S. L. and B. Qaqish (1988). Markov regression models for time series: A quasi-likelihood approach. Biometrics 44, 1019-1032.

S. Ghosh and C. R. Rao, eds., Handbook of Statistics, Vol. 13 © 1996 Elsevier Science B.V. All rights reserved.

"1

,31

Optimal Crossover Designs

John Stufken

1. Introduction As for many other designs, the use of crossover designs originated in the agricultural sciences. Crossover designs, also known as change-over or repeated measurements designs, were first used in animal feeding trials. Some early references and a small example, providing only part of the entire data set, are presented in Cochran and Cox (1957). Currently crossover designs have applications in many other sciences and research areas; examples are listed in Kershner and Federer (1981) and Afsarinejad (1990). The use of crossover designs in pharmaceutical studies and clinical trials receives now perhaps more attention than applications in any other area. For some examples and further discussion and references, the reader may want to consult the recent books by Jones and Kenward (1989), Ratkowsky et al. (1992) and Senn (1993). The principal idea associated with crossover designs is to use a number of available units for several measurements at different occasions. We will refer to these units as subjects, and in many applications they are humans, animals or plots of land. The different occasions at which the subjects are used are known as periods. We will assume that the main purpose of the experiment consists of the comparison of t treatments. Each subject will receive a treatment at each of p periods, and a relevant measurement is obtained for each subject in each period. A subject may receive a different treatment in each period, but treatments may also be repeated on a subject. If we denote the number of subjects by n, then we may think of a crossover design as a p x n matrix with entries from { 1 , . . . , t } , where the entry in position ( i , j ) denotes the treatment that subject j receives in the ith period. Corresponding to such a design we will also have an array of pn observable random variables yij, whose values will be determined by the measurements to be made. We will assume that these measurements are of a continuous nature. One possible motive for using a crossover design, as opposed to using each of p n subjects for one measurement, is that a crossover design requires fewer subjects for the same number of observations. This can obviously be an important consideration when subjects are scarce and when including a large number of subjects in the experiment can be prohibitively expensive. Another possible motive for using crossover designs is that these designs provide within subject information about treatment differences. In many applications the different subjects would exhibit large natural differences, and 63

64

J. Stufken

inferences concerning treatment comparisons based on between subject information (available if subject effects are assumed to be random effects) would require a much larger replication of the treatments in order to achieve the same precision as inferences based on within subject information. Indeed, designs are at times chosen based on the within subject information that they provide, and the between subject information is conveniently ignored. There are however also various potential problems with the use of crossover designs. Firstly, compared to using each subject only once, the duration of an experiment when using a crossover design may be considerably longer. Typically, it is therefore undesirable to have a large number of periods. Secondly, we may have to deal with carry-over effects. Measurements may not only be affected by the treatment assigned most recently to a subject, but could also be affected by lingering effects of treatments that were assigned to the same subject in one of the previous periods. Such lingering effects are called carry-over effects. One way to avoid or reduce the problem of carry-over effects is to use wash-out periods between periods in which measurements are made. The idea is that the effect of a previously given treatment can wear out during this wash-out period. Use of washout periods will however further increase the duration of the experiment, and may in some cases meet with ethical objections. (It is hard to deny a pain killer to a suffering patient just because he or she happens to be in a wash-out period!) A third potential problem with crossover designs is that an assumption of uncorrelated error terms may not always be reasonable. It may be more realistic to view the data as n short time series, one for each subject. Different error structures may affect recommendations concerning choice of design. We will return to this issue in Section 5. Fourthly, the use of crossover designs is of course limited to situations where a treatment does not essentially alter a subject. Crossover designs may be fine if the treatments alleviate a symptom temporarily, but not if the treatments provide a cure for the condition that a subject suffers from. This chapter will focus on the choice of design when a crossover design is to be used. Selected results concerning optimal design choices will be discussed. Of course, while the discussion will concentrate on statistical considerations for selecting a design, in any application there may be practical constraints that should be taken into account. The results concerning optimal designs should only be used as a guide to select good designs, or to avoid the selection of very poor designs. The literature on crossover designs is quite extensive. Many different models have been considered, and different models may result in different recommendations concerning design selection. This chapter represents therefore, inevitably, a selection of available results that is biased by the author's personal interest. There are however a number of other recent review papers and book chapters on crossover designs; the interested reader is referred to these sources for further details, additional references, and, possibly, bias in a different direction. Good sources, in addition to the aforementioned recent books, are Afsarinejad (1990), Barker et al. (1982), Bishop and Jones (1984), Matthews (1988, 1994), Shah and Sinha (1989, Chapter 6) and Street (1989).

Optimal crossover designs

65

2. Terminology and notation The approach throughout this chapter will be to assume a linear model for the observable random variables Yij. While different options are possible for such a model, once we settle for a model we will want to address the question of selecting an optimal design for inferences concerning the treatment effects or the carry-over effects. By 1~ and 0~ we will mean the a × 1 vectors of l's and O's, respectively• By f a and 0a x b we will mean the a × a identity matrix and the a x b matrix with all entries equal to 0, respectively• The basic linear model that we will use is the model

y~j = # + a~ + / 3 j + Td(i,j) +

i • {1,2,.•.,p},

"~d(i-l,j)

"~- gij,

j • {1,2,...,n},

where d(i, j ) stands for the treatment that is assigned to subject j in period i under design d. One may think of # as a general mean, of ai as an effect due to the ith period, of [39 as an effect due to the jth subject, of Ta(i,j) as a treatment effect due to treatment d ( i , j ) , and of "Yd(~-l,j) as a carry-over effect due to treatment d(i - 1,j). For the latter, we define 7d(O,j) = O. T h e eij are the non-observable random error terms. In matrix notation we will write our model as

(2.1)

= lZ!pn -~- XlOg q- X2fl__ q- Xd3"l" q- Xd4 ~ + ~,

where y = (Yll,Y21,...,Ypn) t, ~ : (Otl,...,Ogp) t, fl = (/31,...,fin) t, 7" : (~q,...,Tt)', 7_ = ( ' 7 1 , . . . , 7 t ) ' , e__ = ( c l l , e 2 1 , . . . , e p n ) ' , the p n × p and p n x n matrices X1 and X 2 are

!p0~ ...%

g Xl

=

•

=!~ ®Ip,

Xz =

g

:

:

"..

:

= In ® ! p ,

0. o . . . . ! ~

and the p n x t matrices X a 3 and Xa4, which are design dependent, are

Xd31 Xd32 Xd3 =

Xd41 Xd42 ,

Xd3n

Xd4 =

Xd4n

66

J. Stufken

where Xd3j stands for the p × t period-treatment incidence matrix for subject j under design d and where Xd4j = L X a 3 j with the p × p matrix L defined as 00...00 10...00 L=

0 1 ... : : ..

00 : :

00...

1 0

For the model in (2.1) we will assume that e follows a multivariate normal distribution with mean 0pn and variance-covariance matrix cr2V, for an unknown scalar ~r2 and a pn × pn positive definite matrix V, to be specified later. Some comments concerning the model in (2.1) are in order. Firstly, all of the effects, including subject effects, are assumed to be fixed effects. While for many applications it may be quite reasonable to take the subject effects as random effects, with a relatively large between subject variability it may also in those cases be quite reasonable to make a design choice based on within subject information only. As explained earlier, in most applications the latter would be by far the more precise source of information. For some references and results on optimal design choice when between subject information is also considered, see Mukhopadhyay and Saha (1983), Shah and Sinha (1989) and Carri~re and Reinsel (1993). Secondly, while the model includes carry-over effects, it only allows for the possibility of first-order carry-over effects; only the treatment that was used in the period immediately preceding the current period is considered to have a possible lingering effect on a measurement in the current period. The model also reflects the assumption that there is no carry-over effect for measurements in the first period. Some have called this a non-circular model (see Shah and Sinha, 1989), in contrast to a circular model (Magda, 1980). A circular model would be a model where there is also a carry-over effect for measurements in the first period, as a result of treatments given to the subjects in a preperiod. But the use of preperiods is rather uncommon and unintelligible in many applications. For some results on optimal choice of design in the presence of a preperiod see Afsarinejad (1990). Thirdly, in some applications additional information on the subjects may be available through concomitant variables. Measurements could, for example, be taken on each subject at the beginning of the experiment or at the beginning of each period. Such socalled baseline measurements could be used in various ways when analyzing data from a crossover design. The model in (2.1) does not include use of such information. Use of baseline measurements can also be incorporated in the design selection problem; see, for example, Laska and Meisner (1985) and Carri~re and Reinsel (1992) for the basic ideas. Fourthly, the model in (2.1) does not include any interactions. It assumes, for example, that the period effects are the same for each of the subjects. It also assumes that the effect of a treatment is the same no matter which treatment contributes the carry-over

Optimal crossover designs

67

Table 2.1 Notation Symbol

Description

nduj nduj

The The first The The The The

ldul mduv rdu rdu

number of times that treatment u is assigned to subject j number of times that treatment u is assigned to subject j in the p - 1 periods number of times that treatment u is assigned to a subject in period i number of times that treatment u is immediately preceded by treatment v replication of treatment u throughout the design replication of treatment u restricted to the first p - 1 periods

effect. If these assumptions seem unreasonable, recommendations concerning design choices in the next sections may also be quite unreasonable. For some alternative models see, for example, Kershner and Federer (1981). By Tr(A) we will mean the trace of a square matrix A. We will say that an a x a matrix A is completely symmetric, abbreviated as c.s., if A = baIa + b21_~l~ for some constants b~, b2. Following Kunert (1985), for any matrix A we will write w ( A ) to denote the orthogonal projection matrix onto the column space of A, that is, w ( A ) = A ( A ~ A ) - A t. The following basic result on orthogonal projection matrices of partitioned matrices will be quite useful. LEMMA 2.1. For an a x b matrix X = [ Y Z] we have t h a t w ( X ) = w ( Y ) + w ( ( I a -

w(Y))Z). Our notation and terminology will to a large extent follow that of Cheng and Wu (1980). By Y2t,~,p we will denote the class of all crossover designs for t treatments, n subjects and p periods. We will say that a design d E Y2t,n,p is uniform on the periods if d assigns each treatment to n i t subjects in each period. A design d E Y2t,n,p is uniform on the subjects if d assigns each treatment p/~ times to each subject. A design is said to be uniform if it is uniform on the periods and uniform on the subjects. We will also make extensive use of the notation presented in Table 2.1. We also define lduo = O. Among the crossover designs with some desirable properties, as we will see, are those that are balanced or strongly balanced for carry-over effects. A crossover design is said to be balanced for carry-over effects, or balanced for brevity, if no treatment is immediately preceded by itself, and each treatment is immediately preceded by each of the other treatments equally often. A crossover design is called strongly balanced for carry-over effects, or just strongly balanced, if each treatment is immediately preceded by each of the treatments equally often. Orthogonal arrays of Type I form a useful class of designs when searching for optimal crossover designs if p ~< t. These arrays were introduced by Rao (1961). Formally, we define an orthogonal array of Type I and strength 2 as a p x n array with entries from { 1 , 2 , . . . , t} such that any 2 x n subarray contains all t ( t - 1) ordered 2-tuples without repetition equally often. We denote such an array by OAr(n, p, t, 2). Clearly, such an array can only exist if n is a multiple of t(t - 1).

J. Stu.Iken

68

Table 2.2 Examples of orthogonal arrays of Type I

12 21

23 323

13

12 121

1 2 4 3

1 3 2 4

1 4 3 2

2 1 3 4

2 3 4 1

2 4 1 3

3 1 4 2

3 2 1 4

3 4 2 1

4 1 2 3

4 2 3 1

4 3 1 2

Orthogonal arrays of Type I are closely related to orthogonal arrays. Orthogonal arrays were introduced by Rao (1947). An orthogonaI array of strength 2 is a p x n array with entries from { 1 , 2 , . . . ,t} such that any 2 x n subarray contains all t 2 ordered 2-tuples equally often. We denote such an array by OA(n, p, t, 2); a necessary condition for its existence is that n is a multiple of t 2. A forthcoming book by Hedayat, Sloane and Stufken (1996) contains an overview of existence and construction results for orthogonal arrays. For the purpose of this chapter, the following two well known results are sufficient. LEMMA 2.2. A necessary condition for the existence of an OA(t 2, p, t, 2) is that p 2, then ~du is the same f o r all u. Compared to Theorems 3.1 through 3.4 and 3.7, the additional difficulty in Theorems 3.8 and 3.9 is that for a uniform balanced crossover design the matrix Cdl 2 is not 0t×t. In applying Theorem 2.1, this makes it tremendously difficult to show that Tr(Ga) or Tr(Ca) are maximized by uniform balanced crossover designs. This leads to the restrictions for the subclasses as defined in Theorems 3.8 and 3.9. And, indeed, these restrictions are essential because the two traces are in general not maximized over the entire class J2t,,~,p by uniform balanced designs. For example, if p = t and n -- 0 (rood t(t - 1)), the universally optimal designs in Theorem 3.7 have a larger value for Tr(~'a) than uniform balanced designs. Under the same conditions, the efficient designs for Z suggested in Stufken (1991), as discussed in Subsection 3.3, have a larger value for Tr(Ca) than uniform balanced designs. Uniform balanced designs for p = t have, as an immediate consequence of the definition, all mdu~ values equal to 0. The designs that improve on uniform balanced designs suggest that this is not ideal. They also suggest that good designs for 7 tend to have larger values for maul, than do good designs for 7- (see Subsection 3.3). As a result, the efficiency of uniform balanced designs for p = t is smaller for 7 than it is for "r. For some further discussion see Kunert (1984), who also shows the following result. N

THEOREM 3.10. A uniform balanced crossover design with n = )qt and p = t is universally optimal f o r Z in ~2t,n,p if (i)/~1 = 1 and t >~ 3, or (ii) ~1 = 2 and t >~ 6. Thus for small values of n, some uniform balanced design are universally optimal for "r in ~t,n,p. This is also consistent with the recommendations in Stufken (1991). Necessary conditions for the existence of a uniform balanced design in g2t,n,p are (i) n = A l t for a positive integer A1, (ii) p = A2t for a positive integer A2, and (iii) A1 (A2 - 1) -~ 0 (rood t - 1). The first two conditions are needed for the uniformity, while the last condition is a consequence of n(p - 1) = 0 (mod t(t - 1)) which

J. Stuflazn

76

is required for the balance. We will first address the existence question of uniform balanced designs for )`1 = ),2 = 1 and will then return to the more general case. A uniform balanced crossover design in g2t,t,t exists for all even values of t, but only for some odd values. A construction for even values of t is due to Williams (1949). Take the first column, say a t, to be ( 1 , t , 2 , t - 1 , . . . , t / 2 - l , t / 2 + 2 , t / 2 , t / 2 + 1)', and define a t, 1 = 1 , 2 , . . . ,t - 1, by a t = a t + I, where l is added to every entry of a t modulo t. The t x t array

At=

[a 1 ...

at ]

(3.7)

is then the desired uniform balanced crossover design. For odd values of t, it is, among others, known that a uniform balanced design in J'-2t,t, t does not exist for t = 3, 5, 7 and does exist for t = 9. (See Afsarinejad, 1990.) The existence problem is thus much more difficult for odd t than for even t. What is known for every odd value of t is that a uniform balanced design exists in J2t,2t,t. To see this, let bt = (1, t, 2, t - 1 , . . . , ( t + 5 ) / 2 , ( t - 1)/2, ( t + 3 ) / 2 , ( t + 1)/2)'. For I = 1 , 2 , . . . , t - 1, let _bt = b t + l, and for l = 1 , . . . , t, let c t = - b t, where computations are again modulo t. The t x 2t array

B t = [_bl "

b,

""a]

(3.8)

is then the desired uniform balanced crossover design. More general results concerning the existence of uniform balanced crossover designs are formulated in the following theorem. THEOREM 3.11. The necessary conditions for the existence of a uniform balanced

crossover design in ~2t,n,p are also sufficient if t is even. For odd t, a uniform balanced crossover design exists in ~2t,n,p if in addition to the necessary conditions, n i t is even. PROOF. Let )`1 = n / t and )`2 = p / t be integers with ) , I ( A a - 1) = 0 (mod t - 1). I f t is even, let D = (d~j) be a )`2 x )`1 matrix with entries from { 1 , 2 , . . . , t} such that, when computed modulo t, among the )`1(A2 - 1) differences dij - d(i-l)j, i --- 2 , . . . , )~2, j = 1,...,)`1, the numbers 0 , 1 , . . . , t / 2 - 1 , t / 2 + 1 , . . . , t - 1 all appear equally often. (Clearly, a matrix D as required exists for all positive integers )`1, )`2 and even t as in this proof.) With A t as in (3.7), define AL = A t + l for 1 = 1 , 2 , . . . , t - 1, where the addition is modulo t. Define a p x n array A by

Ad,l A=

:

" " " Adl~,l "..

:

Adx2, "'" Ad:~2:~, Using the definition of Az and the properties of D , it is easily verified that A is a uniform balanced crossover design. If t is odd and )`1 is even, construct a ),2 × ),1/2 matrix D = (d~j) with entries from { 1 , 2 , . . . ,t} such that, when computed modulo t, among the ( ) q / 2 ) ( ) ` 2 - 1)

Optimal crossover designs

77

differences d~j - d ( i - 1 ) j , i = 2 , . . . ,A2, j = 1 , . . . , ) q / 2 , the ( t - 1)/2 numbers (t 4- 3)/2, (t + 7 ) / 2 , . . . , ( t - 3)/2, reduced modulo t where needed, all appear equally often. It is again obvious that such a matrix D exists. With B t as in (3.8), let Bz, 1 = 1 , 2 , . . . ,t - 1, be defined as B t 4- l, with addition again modulo t. The p x n array B defined by

B

=

Bd11

"""

:

"..

Bd~(:~/2) :

Bd~,21

"..

Bd),2(~,1/2)

is then a uniform balanced crossover design. Verification of this result is straightforward, and therefore omitted. As pointed out earlier, the condition that A1 is even when t is odd is not a necessary condition for the existence of uniform balanced crossover designs with n = A l t . We now return to the existence of strongly balanced crossover designs that are uniform on the periods and uniform on the units when restricted to the first p - 1 periods. Cheng and Wu (1980) observed that such strongly balanced designs can be constructed if p = t + 1 from a uniform balanced design with p = t by repeating the tth period in the balanced design as period t 4- 1 in the strongly balanced design. The following is a simple extension of their idea. [] THEOREM 3.12. Let n = Air and p = A2t 4- 1, where A1 and A2 are positive integers with AI(A2 - 1) --- 0 ( m o d t - 1). Then there exists a strongly balanced crossover design in [2t,n,p that is uniform on the periods and uniform on the units when restricted to the first p - 1 periods if (i) t is even, or (ii) t is odd and A1 is even. PROOE For even t, let A t be defined as in the proof of Theorem 3.11. Use A t and At~2 to form a A2t x Alt array as follows. For the first t rows juxtapose Aa copies of At. For the next t rows juxtapose A1 copies of At~2. Continue like this, alternating between A t and At~2, until A2t rows are obtained. Add one more row to this array, identical to row A2t. This gives the desired strongly balanced crossover design. I f t is odd and A1 is even, let B t be defined as in (3.8). We start by making a A2tx Alt array as follows. For the first t rows juxtapose Al/2 copies of Bt. For the next t rows do the same thing, but permute the columns within each copy of B t such that the first row of this juxtaposition is identical to the last row in the previous juxtaposition. Continue like this until Azt rows are obtained, each time, through permutations of columns within copies of Bt, making sure that periods It and It + 1, l = 1 , . . . , A2 - 1 are identical. Add one more period to this array, identical to period A2t. This gives the desired strongly balanced crossover design. []

4. O p t i m a l crossover designs w h e n errors are uncorrelated: The special case o f two treatments

While the previous section presents results on optimal crossover designs for uncorrelated errors and for general values of the number of treatments t, the special case

J. Stufl:en

78

t = 2 deserves some extra consideration. Firstly, it is quite common that the number of treatments to be compared in a crossover design is small, including t = 2. Secondly, while the results in the previous section for general t apply also for t = 2, the problem of finding universally optimal designs for t = 2 can be simplified considerably. The considerations in this section will also reveal that we have, typically, a choice among several optimal designs. This choice can be important when practical constraints may make one optimal design preferable over another. The basic ideas in this section appear in Matthews (1990); for our notation and presentation we will however heavily rely on the ideas and development in the previous section. The model assumptions in this section are the same as those in Section 3. The two treatments will be denoted by 1 and 2, and the 2 p possible p x 1 vectors with entries from { 1,2} represent the sequences of treatments that can be assigned to the subjects. A universally optimal design for z in f22,n,p is now simply one that minimizes the variance of the best linear unbiased estimator of T2 - ~-1 over all designs in ~2,n,p. An analogous interpretation holds for universally optimal designs for 7_. For a p × 1 treatment sequence T_T_we will say that 31v - T is its dual sequence; the dual sequence of a treatment sequence T__is thus obtained by permuting l's and 2's in T_T.We will call a design dual balanced if every sequence is used equally often as its dual. If a design is considered to be a probability measure on all possible sequences, then it is easily seen that among the optimal designs, whether for Z or for % there is one that is dual balanced (Matthews, 1987). Thus, there is no loss of generality by restricting attention to dual balanced designs. The strategy that we will employ to find universally optimal designs is analogous to that in Section 3. But because t = 2, the information matrices C d and C d are 2 x 2 matrices, and, since their column and row sums are 0, will automatically be c.s. Hence, to find a universally optimal design for Z we will search among the dual balanced designs for a design d* that maximizes Tr(Cd11) over all designs, and for which Ca. 12 : 02×2. Since we can restrict our attention to dual balanced designs, the expression in (3.1) reduces to

np 2

1 ~ n2 p duj,

which implies that Tr(Call) is maximized if and only if for each sequence in the design the difference in replication for the two treatments in the sequence is at most 1. Thus, for example, if p = 4 only the sequences that replicate treatment 1 twice should be included; if p = 5 only the sequences in which treatment 1 appears twice or thrice should be included. How often a sequence should be used is determined by the requirement that Cd* 12 = 0zx2. The expression in (3.3) reduces for a dual balanced design to l

n

?Tt'duu -- -- E

P j=l

N

ndujnduj.

(4.1)

Optimal crossover designs

79

Table 4.1 Pairs of treatment sequences for p = 4: Searching for optimal designs for r

vl

Pair 1

Pair 2

Pair 3

1 1 2 2

1 2 1 2

l 2 2 1

.5

2 2 1 I

2 1 2 1

-1.5

2 l 1 2

-.5

We only need to evaluate this expression for u = 1, and require that it reduces to 0. Since the design will be dual balanced, we will evaluate (4.1) for n = 2 for each of the pairs of dual sequences, only using those pairs with sequences that replicate treatment 1 as close to p/2 as possible. If there are s such pairs of dual sequences, and if Vl,. • •, vs denote the values in (4.1) for these s pairs, then a universally optimal design for _r is obtained by using the lth pair nft/2 times, where the fz's are nonnegative numbers that add to 1 and for which ~ t ftvz = 0. The latter condition is required in order that Cdl2 = 02×2.

Of course, nfz/2 will only give integer values for all 1 for certain values of n; but that is inevitable with the approach in this section. As an example, consider the case of p = 4. There are only three pairs of dual sequences for which treatment 1 is replicated twice in each sequence. These pairs, with the corresponding values for the vt's, are presented in Table 4.1. Since computation of the vz's is trivial, so is finding optimal designs. With fz, I = 1,2, 3, denoting the proportion of time that pair l is used, any design with ~ t fzvt = 0, or equivalently fl - 3f2 - f3 = 0, is optimal for z in ~2,n,4. ThUS, we can take f2 E [0, 1/4] and fl = 1/2 + f2, f3 = 1/2 - 2f2. A popular solution is the one with f2 = 0, and fl = f3 = 1/2, using pairs 1 and 3 equally often; but, for example, fl = 3/4, f2 = 1/4 and f3 = 0 is another solution that gives an optimal design for r. The optimal designs for r in/22,8,4 corresponding to these two solutions are as follows: 1 1 2 2 1 1 2 2

1 1 1 2 2 2 1 2

1 1 2 2 2 2 1 1

1 1 1 2 2 2 2 1

2 2 1 1 2 2 1 1

2 2 2 1 1 1 1 2

2 2 1 1 1 1 2 2

2 2 2 1 1 1 2 1

Finding optimal designs for "7 can proceed in a similar way. The only difference is that the set of pairs of dual sequences that can possibly be included in an optimal design will be different. The pairs that need to be considered now are those in which each sequence replicates treatment 1 in the first p - 1 periods as close to (p - 1)/2 times as possible. Using only such pairs will lead to a design that maximizes Tr(Ca22) ....

80

J. Stufiwn Table 4.2 Pairs of treatment sequences for p = 4: Searching for optimal designs for "/

v~

Pair 1

Pair 2

Pair 3

Pair 4

Pair 5

Pair 6

1 1

2 2

1 2

2 1

1 2

2 1

1 1

2 2

1 2

2 1

1 2

2 1

2 2

1 1

1 2

2 1

2 1

1 2

2 1

1 2

1 1

2 2

2 2

1 1

.5

-1.5

-.5

-.75

-.75

.25

Table 4.3 Pairs of sequences for optimal two-treatment designs, 4 q + 1; the ei are i.i.d.r.v.'s having the d . f . F . In this setup, the null and alternative hypotheses of interest are H0:/3=0

vs.

Hi: / 3 ¢ 0 .

(2.4)

Note that for the canonical multisample model, (2.4) relates to the homogeneity of the location parameters, and we may also want to estimate the paired differences Ok - 0q, k ~ q ---- 1 , . . . , c, in a robust manner. Note that under the null hypothesis in (2.4), the Yi are i.i.d.r.v.'s with a location parameter 0, so that their joint distribution remains invariant under any permutation of the Yi among themselves. For this reason, the null hypothesis in (2.4) is termed the hypothesis of randomness or permutationinvariance. In the literature, Hi in (2.4) is termed the regression alternative. For such alternatives, tests for (2.4) do not require the symmetry of the d.f. F, and the same regularity assumptions pertain to the estimation of/3. Further, it is possible to draw statistical inference on 0 under an additional assumption that F is a symmetric d.f. Let Rn~ be the rank of Yi among Y 1 , . . . , Yn, for i = 1 , . . . , n and n >~ 1. Also, let a n ( 1 ) , . . . , a n ( n ) be a set of scores which depend on the sample size n and some chosen score generating function. Of particular interest are the following special scores: (i) Wilcoxon scores: an(i) = i / ( n + 1), i = 1 , . . . ,n; (ii) Normal scores: an(i) = E(Zn:~), i = 1 , . . . ,n, where Zn:l 4 " " ( n + 1)/2. In this multisample setup,/2n actually tests for the homogeneity of the d.f.'s / ' 1 , . . . , -Pc against alternatives which are more general than the simple location or shift ones treated earlier. For example, if we let 7rrs = P{Yrl >~Ysl}, for r 7~ s = 1 , . . . , c, then the Kruskal-Wallis test is consistent against the broader class of alternatives that the 7rr~ are not all equal to 1/2, which relate to the so called stochastically larger (smaller) alternatives. Thus, the linearity of the model is not that crucial in this context. In the multisample model, when the scores are monotone, En remains consistent against stochastically larger (smaller) class of alternatives, containing shift alternatives as a particular subclass of such a broader class of alternatives. This explains the robustness of such nonparametric tests. In this context, we may note that the ranks t:g~i are invariant under any strictly monotone transformation 9(') on the Yi, so that if the 9(Y/) = Yi* follow a linear model for some 9('), a rank statistic based on the Y/ and Yi* being the same will pertain to such a generalized linear model setup. This invariance eliminates the need for Box-Cox type transformations on the Yi and

Design and analysis of experiments

97

thereby adds further to the robustness of such rank tests against plausible departures from the model based assumptions. The situation is a little less satisfactory for testing subhypotheses or for multiple comparisons, and we shall discuss them later on. We present R-estimators which are based on such rank tests. Although for one and two-sample location problems, such estimates were considered by Hodges and Lehmann (1963) and Sen (1963), for general linear models, the developments took place a few years later. Adichie (1967), Sen (1968d) and Jure~kowi (1971), among others, considered the simple regression model. For the linear model in (2.3), let Yi(b) = Yi - b'ti, i = 1,... ,n, where b E ~q, and let Rni(b) be the rank of ~ ( b ) among the Yr(b), r = 1,..., n, b E 7~q. In (2.5), replacing the Rn~ by Rn~(b), we define the linear rank statistics Ln(b), b E 7~q. As in Jaeckel (1972), we introduce a

measure of rank dispersion: N

O n ( b ) -~ ~ { a n ( R n i ( b ) )

-- a n ) Y / ( b ) ,

b E ~q,

(2.14)

i=l

where we confine ourselves to monotone scores, so that an(l) ~< ... ~< an(n), for every n ~> 1. An R-estimator of 13 is a solution to the minimization of Dn(b) with respect to b E ~q, so that we write argmin{D~(b): b E ha}.

~,

(2.15)

It can be shown (viz., Jure~kov~i and Sen, 1995, Chapter 6) that Dn(b) is a nonnegative, continuous, piecewise linear and convex function of b E 7-¢.q. Note that Dn (b) is differentiable in b almost everywhere and = -L.(b°),

(2.16)

whenever b ° is a point of differentiability of Dn(.). At any other point, one may work with the subgradient ~TDn(b°). Thus, essentially, the task reduces to solving for the following estimating equations with respect to b E ~q:

L,~(b) = O,

(2.17)

where to eliminate multiple solutions, adopt some convention. These R-estimators are generally obtained by iterative procedures (as in the case of maximum likelihood estimators for a density not belonging to the exponential family), and, often, a one or two-step procedure starting with a consistent and asymptotically normal (CAN) initial estimator serves the purpose very well; for some theoretical developments along with an extended bibliography, we refer to Jure~kov~ and Sen (1995). It follows from their general methodology that under essentially the same regularity conditions as pertaining to the hypothesis testing problem (treated before), the followingfirst-order asymptotic distributional representation (FOADR) result holds: n

(~n

-

13) =

,y-I

~-]~dni¢(F(ei) ) + op(n-1/2), i=1

(2.18)

98

P. K. Sen

where dni = Q~l(ti - tn), for i = 1 , . . . , n, ¢(.) is the score generating function for the rank scores, 1

7=

f0

¢(u)¢f(u)du,

(2.19)

and the Fisher information score generating function Cy(.) is defined by (2.12). Note that under a generalized Noether condition on the d n i , the classical (multivariate) central limit theorem holds for the principal term on the right hand side of (2.18), so that on defining Q~ by -*n-~nO*O-10*-*n= I, we obtain from (2.18) and (2.19) that for large sample sizes,

Qn(fln -/3) - ~ N(O, 7-2A2I),

(2.20)

where A 2 is the variance of the score function ¢. For the normal theory model, the classical maximum likelihood estimator (MLE) agrees with the usual least squares estimator (LSE), and for this (2.20) hold with 7-2A 2 being replaced by a 2, the error variance. As such, the asymptotic relative efficiency (ARE) of the R-estimator, based on the score function ¢, with respect to the classical LSE is given by

e( R; LS)

= ")'20-2/A2,

(2.21)

which does not depend on the design matrix Qn. In particular, if we use the normal scores for the derived R-estimators, then (2.21) is bounded from below by 1, where the lower bound is attained only when the underlying distribution is normal. This explains the robustness as well as asymptotic efficiency of the normal scores R-estimators in such completely randomized designs. From robustness considerations, often, it may be better to use the Wilcoxon scores estimators. Although for this particular choice of the score generating function, (2.21) is not bounded from below by 1, it is quite close to 1 for near normal distributions and may be high for heavy tailed ones. If the error density f(.) is of known functional form, one may use the MLE for that pdf, and in that case, in (2.20), we need to replace 7-2A z by { i ( f ) } - l , where I(f) is the Fisher information for location of the density f . Thus, in this case, the ARE is given by

e(R;ML) = 72/{I(f)A2},

(2.22)

which by the classical Cramtr-Rao inequality is always bounded from above by 1. Nevertheless, it follows from the general results in Hu~kov~i and Sen (1985) that if the score generating function is chosen adaptively, then the corresponding adaptive R-estimator is asymptotically efficient in the sense that in (2.22) the ARE is equal to 1. The same conclusion holds for adaptive rank tests for/3 as well. As has been mentioned earlier, the ranks Rn~(b) are translation-invariant so that they provide no information on the intercept parameter 0. Thus, for testing any plausible null hypothesis on 0 or to estimate the same parameter, linear rank statistics are

Design and analysis of experiments

99

not of much use. This problem has been eliminated' to a greater extent by the use of

signed rank statistics, which is typically defined as Sn = ~

sign(Y/)an(R+i),

(2.23)

i=1

where the rank scores an(k) are defined as in before and R+i is the rank of IYil among the ]Yr], r = 1 , . . . , n. Under the null hypothesis of symmetry of the d.f. F about 0, the vector of the IRnil and tile vector of the sign(l'~) are stochastically independent, so that the set of 2 n equally likely sign-inversions generates the exact null distribution of Sn. This may also be used to derive the related R-estimator of 0. Such a test and estimator share all the properties of the corresponding test and estimator for the regression parameter. But, in the current context, there is a basic problem. The test for 13 based on £n, being translation-invariant, does not depend on the intercept parameter (which is taken as a nuisance one). On the other hand, for testing a null hypothesis on 0 or estimating the same, the parameter/3 is treated as a nuisance one, and the signed ranks are not regression-invariant. Thus, the exact distribution-freeness (EDF) property may have to be sacrifice in favor of asymptotically distribution-free (ADF) ones. An exception is the case when one wants to test simultaneously for 0 = 0 and /3=0. We denote a suitable R-estimator o f / 3 by ~n, and incorporate the same to obtain the residuals: AI

Yni = Yi -/3nti,

i = 1,...,n.

(2.24)

For every real d, let R+i(d) be the rank of [Y,~, - d[ among the IYnr - d I, r = 1 , . . . , n, for i = 1 , . . . , n . Also let n

Sn(d) = E s i g n ( Y n i - d)a~(ff~+i(d)),

d E n.

(2.25)

i=l

If the scores an (k) are monotone (in k, for each n), then, it is easy to show that S,~ (d) is monotone in d E 7"Z, and hence, we may equate Sn (d) to 0 (with respect to d E 7~) and the solution, say, On, is then taken as a translation-equivariant estimator of 0. In the particular case of the sign statistic, 0"n can be expressed as the median of the residuals F'ni, and for the case of the Wilcoxon signed-rank statistic, it is given by the median of the midranges of these residuals. In general, for other score functions, an iterative procedure is needed to solve for 0"n, and in such a case, one may as well start with the Wilcoxon scores estimator as the preliminary one, and in a few steps converge to the desired one. There is a basic difference between this model and the simple location model where/3 is null: In the latter case, the signed rank statistics based on the true value d = 0 are EDF, while in this case they are only A D E To verify that they are A D E one convenient way is to appeal to some

P. K. Sen

lO0

asymptotic uniform linearity results on general signed rank statistics (in the location and regression parameters), and such results have been presented in a unified manner in Chapter 6 of Jure~kovfi and Sen (1996), where pertinent references are also cited in detail. Let us discuss briefly the subhypothesis testing problem for this simple design. A particular subhypothesis testing problem relates to the null hypothesis that 0 =~0 against 0 ~ 0, treating/3 as a nuisance parameter (vector). We have already observed that the basic hypothesis of sign- (or permutation) invariance does not hold when the above null hypothesis holds, and hence, EDF tests may not generally exist. However, ADF test can be considered by incorporating the residuals Yni instead of the Y~ in the formulation of suitable signed rank statistics. Such tests were termed aligned rank tests by Hodges and Lehmann (1962) who considered the simplest ANOVA model. Here alignment is made by substituting the estimates of the nuisance parameters as is also done in the classical normal theory linear models. A very similar picture holds for a plausible subhypothesis testing problem on the regression parameter vector. To pose such a problem in a simple manner, we partition the parameter vector/3 as ~l

l

I

= (/31,/32),

(2.26)

where/3j is a pa--vector, pj >/ 1, for j = 1,2, and/9 = Pl + P2. Suppose now that we want to test for

Ho: /31 = 0

vs.

HI: /31 # 0,

treating/32 as a nuisance parameter. (2.27)

Here also, under H0 in (2.27), the hypothesis of permutational invariance may not be generally true, and hence, an EDF rank test may not generally exist. But, ADF rank tests based on aligned rank statistics can be constructed as follows. Note that if in (2.3), we partition the ti as (t{1, t{2)', involving/91 and 192 coordinates, then under H0 in (2.27), we obtain that Y/=0+/3~ti2+ei,

i= 1,...,n.

(2.28)

Based on the model in (2.28), we denote the R-estimator of/32 by/3n2, and we form the residuals ~t

Yni = Yi -/3nztiz,

for

i = 1,...,n.

(2.29)

As in after (2.4), we define the aligned ranks/5~i wherein we replace the Yi by the residuals IVni. The vector of linear rank statistics Ln is then defined as in (2.5) with the ranks Rni being replaced by/~ni. Also, we partition this/9-vector as (Lnl,-' L,~2) , '- ' and our test is then based on the first component of this aligned rank statistics vector. This is given by

~nl

=

-t An- 2 {L~IQ~I.2Lnl},

(2.30)

Design and analysis of experiments

101

where A2n is defined as in (2.6), and defining Q n as in (2.7) and partitioning it into four submatrices, we have Q n l l . 2 = Q n l l -- Qn12Qn22Qn21. -1

(2.31)

It follows from the general results in Sen and Purl (1977), further streamlined and discussed in detail in Section 7.3 of Puri and Sen (1985) that under the null hypothesis in (2.27), Z~nl has asymptotically chi-squared distribution with pl DF, so that an ADF test for H0 in (2.27) can be based on the critical level given by Xpl,~, 2 the upper c~-percentile of this distribution. For local alternatives, the noncentral distribution theory runs parallel to the case of the null hypothesis of/3 = 0, with the DF p being replaced by Pl and an appropriate change in the noncentrality parameter as well. The regularity conditions governing these asymptotic distributional results have been unified and relaxed to a certain extent in Chapter 6 of JureSkovfi and Sen (1996). Another important area where nonparametrics have played a vital role in such completely randomized designs is the so called mixed effects models. In this setup, we extend (2.3) as follows. Let Y1,..., Y-n be independent random variables, such that associated with the Y/there are (i) given design (nonstochastic) (q-)vectors ti and (ii) observable stochastic concomitant (/)-)vectors Zi, i = 1 , . . . , n. Then conditionally on Zi = z, we have

E,(v I z) =

< v lz, = z}

= F ( y -- a -- /31ti -- 71z),

i = l,.. .,n,

(2.32)

where/3 and "y are respectively the regression parameter vector of Y on the design and concomitant variates, and a is the intercept parameter. In this linear model setup, in a nonparametric formulation, the d.f. F is allowed to be arbitrary (but, continuous), so that the finiteness of its second moment is not that crucial. In a parametric as well as nonparametric formulation a basic assumption on the concomitant variates is that they are not affected by the design variates, so that the Zi are i.i.d.r.v.'s. Here also, in a nonparametric formulation, the joint distribution of Zi is taken to be an arbitrary continuous one (defined on ~P). The Chatterjee-Sen (1964) multivariate rank permutation principle plays a basic role in this nonparametric analysis o f covariance (ANOCOVA) problem. Basically, for the (p + 1)-variate observable stochastic vectors (Y~, Z~)', with respect to the q-variate design vectors ti, one can construct a q × (p + 1) linear rank statistics vector with the elements Lnjk,

for j = 0 , 1 , . . . , p ,

k= 1,...,q,

(2.33)

where Ln0 = ( L n o l , . • •, Lnoq)' stands for the linear rank statistics vector for the primary variate (Y) and is defined as in (2.5) (with the Rni being relabeled as Rni0), while for the jth coordinate of the concomitant vectors, adopting the same ranking method as in before (2.5) and denoting these ranks b y / ~ i j , i = 1 , . . . , n, we define the linear rank statistics vector L n j -= ( L n j l , . . . , L~jq) I as in (2.5), for j = 1 , . . . , p. Note that ranking is done separately for each coordinate of the concomitant vector

lO2

R K. Sen

and the primary variate, so that we have a (p + 1) x n rank collection matrix /~n. The Chatterjee-Sen rank permutation principle applies to the n! column permutations of P ~ (which are conditionally equally likely), and this generates conditionally distribution-free (CDF) tests based on the linear rank statistics in (2.33). We may allow the scores (defined before (2.5)) to be possibly different for the primary and concomitant variates, so for the jth coordinate, these scores are taken as a,~j(k), k = 1 , . . . , n ; j = 0, 1 , . . . , p , and further, without any loss of generality, we may standardize these scores in such a way that adopting the definitions in (2.6), the gnj are all equal to 0 and the A~j are all equal to one, j = 0, 1 , . . . , p. Consider then a (/9 + 1) x (p + 1) matrix Vn whose diagonal elements are all equal to one, and whose elements are given by n

vnjt = ~-~ anj(Rnij)anl(Rnil),

j , l = O, 1 , . . . , p .

(2.34)

i=1

We denote the cofactor of V~oo in V~ by Vnoo, V~o = ( V , ~ o l , . . . , Vnop)', and denote by -1 W n :" Y n o o V n o , *

(2.35) l

--I

VT~oo -'~ VV~oo - - v n o g n o o V n o ~

(2.36)

L*o = Lno - ( L ~ I , . . . , Lnp)Wn.

(2.37)

and

Let us define Qn as in (2.7) and consider the quadratic form C*.o = { ( Z . o ) * ' q . - '

*

(2.38)

which may be used as a test statistic for testing the null hypothesis H0:/3 = 0 against alternatives that/3 ~ 0, treating 0 and "7 as nuisance parameters. Asymptotic nonnull distribution theory, power properties and optimality of such aligned rank order tests (for local alternatives), studied first by Sen and Puri (1977), can most conveniently be unified by an appeal to the uniform asymptotic linearity of aligned rank statistics, and the results presented in Section 7.3 of Puri and Sen (1985) pertain to this scheme; again, the linearity results in their most general form have been presented in a unified manner in Chapter 6 of Jure6kov~i and Sen (1996). This latter reference also contains a good account of the recent developments on regression rank scores procedures which may have some advantages (in terms of computational simplicity) over the aligned rank tests. In the above formulation of a mixed-effect model, the linearity of the regression of the primary response variate on the design and concomitant variates has been taken for granted, while the normality of the errors has been waived to a certain extent by less

Design and analysis of experiments

103

stringent assumptions. While this can, often, be done with appropriate transformations on primary and concomitant variates, there are certain cases where it may be more reasonable to allow the regression on the concomitant variate part to be rather of some arbitrary (unknown) functional form. That is, the regression on the design variates is taken to be of a parametric (viz., linear) form, while the regression on the covariates is taken as of a nonparametric form. In this formulation, for the conditional d.f.'s in (2.32), we take

Fi(ylz)=

F(y-/3'ti-O(z)),

i= l,...,n,

(2.39)

where the d.f. F(.),/3, etc. are all defined as in before, while O(z) is a translationequivariant (location-regression) functional, depicting the regression of the errors Y~ - / 3 ' t i on the concomitant vector Zi. The basic difference between (2.32) and (2.39) is that in the former case, the linear regression function " / z involves a finite dimensional parameter % while in the latter case, the nonparametric regression function O(z) may not be finite-dimensional, nor to speak of a linear one. Thus, here we need to treat O(z) as a functional defined on the domain Z of the concomitant variate Z. This formulation may generally entail extra regularity (smoothness) conditions on this nonparametric functional, and because of that, the estimation of O(z), z ~ Z, may entail a comparatively slower rate of convergence. Nevertheless, as regards the estimation of the fixed-effects parameters (i.e.,/3), the conventional v~-rate of convergence still holds, although these conventional estimators may not be fully efficient, even asymptotically. A complete coverage of nonparametric methods in this type of mixed-effects models is beyond the scope of this treatise; we may refer to Sen (1995a, c) where a detailed treatment is included.

3. Two-way layouts nonparametrics The simplest kind of designs for two-way layouts are the so called randomized block or complete block designs. 5 equal number of times in each block, and the treatment combinations may Consider a randomized block design comprising n (>/2) blocks of p (~> 2) plots each, such that p different treatments are applied to the p plots in each block. The allocation of the treatments into the plots in each block is made through randomization. Let Y~j be the response of the plot in the ith block receiving the jth treatment, for i = 1 , . . . , n, j = 1 , . . . , p. In the normal theory model, it is assumed that

Y~j=#+/3~+Tj+e~j,

i=l,...,n;

j=l,...,p,

(3.1)

where # is the mean effect, ~3i is the ith block effect, "rj is the jth treatment effect, and the eij are the error components which are assumed to be independent and identically distributed according to a normal distribution with zero mean and a finite, positive variance a 2. The block and treatment effects may either be fixed or random, resulting in the hierarchy of fixed-, mixed- and random-effects models. As in the case of oneway layouts, a departure from such model assumptions can take place along the routes

104

R K. Sen

of nonlinearity of the model, possible heteroscedasticity, dependence or nonnormality of the errors. It is quite interesting to note that the method of m-ranking, one of the earliest nonparametric procedures, has a basic feature that it does not need many of these regularity assumptions, and yet works out in a very simple manner. Suppose that we desire to test the null hypothesis of no treatment effect, treating the block effects as nuisance parameters. Under this hypothesis, in (3.1), the ~-j drop out, so that the observations within a block are i.i.d.r.v. We may even allow the errors to be exchangeable (instead of i.i.d.), and this implies that under the above hypothesis, the observations within a block are exchangeable or interchangeable r.v.'s. Therefore, if we denote by vii the rank of Y/j among Y/l, • •., Y/p, for j = 1 , . . . , p , then, for each i (-- 1 , . . . , n), under the hypothesis of no treatment effect, the ranks r i l , . . . , rip are interchangeable r.v.'s. Moreover, for different blocks, such intra-block rank-vectors are stochastically independent of each other. Therefore, the problem of testing the null hypothesis of no treatment effect in a randomized block design can be reduced to that of testing the interchangeability of the within block rankings. On the other hand, this hypothesis can also be stated in terms of the exchangeability of the within block response variables, and in that setup, the linearity of the block and treatment effects are not that crucial. This scenario leaves us to adopting either of the two routes for nonparametrics in two-way layouts: (i) Incorporate such intra-block rankings with the major emphasis on robustness against possible nonnormality of the errors as well as nonlinearity of the effects, and (ii) Deemphasize the normality of errors, but with due respect to the linearity of the model, incorporate inter-block comparisons in a more visible manner to develop appropriate rank procedures which are robust to possible nonnormality of errors. Aligned rank procedures are quite appropriate in this context, and we shall discuss them later on. For intra-block ranking procedures, we consider a set of scores { a ( 1 ) , . . . , a(p)} which may depend on p and some underlying score generating function (but not on the number of blocks). In general these are different from the ones introduced in Section 2 (for one-way layouts). For optimal scores for specific types of local alternatives, we may refer to Sen (1968a). Then, we may define

j = l,...,p.

Tnj = ~ a ( r i j ) ,

(3.2)

i=1

Moreover let P :

P

p-1 ~ a ( j ) ,

A 2 = (p - 1) -1 ~ [ a ( j )

j=l

- ~]2.

(3.3)

j=l

Then, a suitable test statistic for testing the hypothesis of no treatment effect is the following: P

£n = (nA2) -1 ~ ( T n j j=l

- nS) 2.

(3.4)

Design and analysis of experiments

105

In particular, if we let a(j) = j, j = 1,... ,p, the T~j reduce to the rank sums, 5 = ( p + 1)/2 and A 2 = p(p+ 1)/12, so that (3.4) reduces to the classical Friedman (1937) X2 test statistic:

np(p + 1)

rij - n(p + 1)/2 j=l

(3.5)

i=1

Similarly, letting a(j) = 0 or 1 according as j is ~< ( p + 1)/2 or not, we obtain the well known Brown and Mood (1951) median test statistic. In either case, and in general, for (3.4), the exact distribution (under the null hypothesis) can be obtained by complete enumeration of all possible equally likely (p!)n permutations of the intra-block rank vectors, each over ( 1 , . . . , p). This process may become quite cumbersome as p and/or n increase. Fortunately, the central limit theorems are adoptable for the intra-block rank vectors which are independent of each other, and hence, it follows that under the null hypothesis,/;n has closely the central chi squared distribution with p - 1 DF when n is large. The main advantage of using an intra-block rank test, such as/2 n in (3.4), is that it eliminates the need for assuming additive block effects, and also, the treatment effects may not be additive too. As in the case of the Kruskal-Wallis test, introduced for one-way layouts in the last section, stochastic ordering of the treatment responses (within each block) suffices for the consistency of the test based on such intra-block ranks. Thus, such tests are very robust. The Brown-Mood median test is asymptotically optimal for local shift alternatives when the underlying d.f. F is Laplace, while for a logistic F, the Friedman X~ is locally optimal. We may refer to Sen (1968a) for a detailed discussion of the choice of locally optimal intra-block rank tests in some specific models. The main drawback of such intra-block rank tests is that they may not adequately incorporate the inter-block information as is generally provided by comparisons of observations from different blocks. For example, if the block effects are additive then a contrast in the/th block observations has the same distribution as in any other block, and hence, some comparisons of such contrasts may provide additional information and may lead to more efficient tests.There are various ways of inducing such inter-block comparisons in rank tests, and among them the two popular ones are the following: (i) Ranking after alignment, and (ii) weighted ranking. In a weighted ranking method, instead of having the sum statistics ~in__l a(rij), the intra-block rankings or rank scores are weighed to reflect possible inter-block variation, and such weights are typically inversely proportional to some measure of the within block dispersion of the observations (such as the range or standard deviation or even some rank measures of dispersion). Thus, we may use the statistics ~-~in=lwnia(rij),j = 1,... ,p, where the wni are nonnegative weights, and are typically random elements. The analysis can then be carried out in the same manner as in before. Note that such a measure of intra-block dispersion is typically independent of the ranks r~j, so that given these weights, a very similar test statistic can be worked out by reference to the (p!)n permutations of the intra-block rankings. However, such a law is conditional on the given set of weights, so that we end up with conditionally distribution-free tests instead of EDF tests based on £n. One way of achieving the

106

P. K. Sen

EDF property of such weighted ranking procedures is to replace the w,~i by their ranks and allowing these ranks to have all possible (n!) realizations. Since these have been pursued in some other chapters of this volume (and also presented in detail in Chapter 10 of Handbook of Statistics, Volume 4), we shall not go into further details. The main drawback of such weighted ranking procedures is that the choice of the weights (typically stochastic) retains some arbitrariness and thereby introduces some extra variability, which in turn may generally lead to some loss of efficiency when in particular the block effects are additive. This feature is shared by the other type of weighing where the ranks of the wni are used instead of their ordinary values. However, if the block effects are not additive and the intra-block error components have the same distribution with possibly different scale parameters, weighing by some measure of dispersion alone may not be fully rational, and hence, from that perspective, such weighing procedures are also subjected to criticism. Ranking after alignment has a natural appeal for the conventional linear model even when the errors are not normally distributed. The basic idea is due to Hodges and Lehmann (1962) who considered a very simple setup, and it has been shown by Mehra and Sarangi (1967) and in a more general setup by Sen (1968b) that such procedures are quite robust under plausible departures from model based assumptions (including homoscedasticity, normality and independence of the errors). As such, we may like to provide more practical aspects of this methodology. To motivate the alignment procedure, we go back to the conventional linear model in (3.1) (sans the normality of the error components). Suppose further that the block-effects are either random variables (which may be taken as i.i.d.) or they are fixed, and the errors in the same block are interchangeable or exchangeable random variables. In this way, we are able to include both fixed- and mixed-effects models in our formulations. Let ~ be a translation-equivariant function of (Y/I,..., Y/p), such that it is symmetric in its p arguments. Typically, we choose a robust estimator of the ith block mean response, and in order to preserve robustness, instead of the block average, median, trimmed mean or other measures of central tendency can be adopted. Define then the aligned observations as ~j=y/j-~,

j=l,...,p;

i=l,...,n.

(3.6)

By (3.1) and (3.6), we may write ~j=Tj-?+~j;

~ij=eij-~i,

(3.7)

for j = 1 , . . . , p; i = 1 , . . . , n, where ~ and ei are defined by the same functional form as the ~ . Note that for each i (= 1 , . . . , n ) , the joint distribution of ( e i l , . . . , eip) is symmetric in its p arguments, and moreover these vectors have the same joint distribution for all blocks. Therefore, it seems very logical to adopt an overall ranking of all the N = np aligned observations (Y11,..., Ynp) and base a rank test statistic on such aligned ranks. The only negative feature is that the overall ranking procedure distorts the independence of the rank vectors from block to block; nevertheless they

Design and analysis o.f experiments

107

retain their permutability, and this provides the access to developing conditionally distribution-free tests for testing the null hypothesis of no treatment effect. Let ~ : 1 , - . . , ~:p be the order statistics corresponding to the aligned observations ~ 1 , . . - , ~ p in the ith block, for i = 1 , . . . , n . Then under the null hypothesis of interchangeability of the Y/j, j = 1 , . . . , p , for each i (= 1 , . . . , n ) , the Y/j has the (discrete) uniform distribution over the p! possible permutations of the coordinates of (~:1, • . . , ~:p), and this permutation law is independent for different blocks. Thus, we obtain a group of (p!)~ of permutations generated by the within block permutations of the aligned order statistics, and by reference to this (conditional) law, we can construct conditionally distribution-free tests for the hypothesis of interchangeability of the treatments. Under block-additivity, the vector of intra-block (aligned) order statistics are interchangeable, and hence, ranking after alignment (ignoring the blocks) remains rational. For the aligned observations, we define the ranks Rij as in the preceding section, so that these R/j take on the values 1 , . . . , N, when ties among them are neglected, a case that may be done under very mild continuity assumptions on the error distributions. For the pooled sample size N, we introduce a set of scores aN (k), k = 1 , . . . , N, as in Section 2, and consider the aligned rank statistics: n

T N j = n -1EaN(Rij),

1,...,p.

(3.8)

i = 1,...,n;

(3.9)

j=

i=l

Also, define P

~Ni = p-1 E aN(R~j), j=l n

VN : {~(p -- 1)} - 1 E

p

E {aN(Rij) -- ~tNi}2"

(3.10)

i=I j = l

Then an aligned rank test statistic for testing the hypothesis of no treatment effect can be formulated as £~v = n

j -

aN] 2

.

(3.11)

For small values of n (and p), the permutational (conditional) distribution of/2~v can be incorporated to construct a conditionally distribution-free test for the above hypothesis, while, it follows from Sen (1968b) that for large sample sizes, under the null hypothesis,/2~r has closely chi squared distribution with p - 1 DE Various robustness properties of such aligned rank tests have been studied in detail by Sen (1968c). It has been observed there that it may not be necessary that the aligned errors Y~j have the common distribution for all i (i.e., blocks). In particular, for the heteroscedastic model, allowing the scale parameters to vary from block to block, it was observed

P. K. Sen

108

that an aligned rank test may have greater ARE with respect to the classical ANOVA test than in the homoscedastic case. Some of these details are also reported in Chapter 7 of Puri and Sen (1971). Also, the alignment procedure remains applicable in the mixed-effects model too, where the block effects being stochastic or not drop out due to alignment, and hence, better robustness properties percolate. We shall discuss this aspect later on. More important is the fact that the ARE of aligned rank tests relative to the intra-block rank tests based on conjugate scores is generally greater than 1, particularly when p is not so large. For example, for the Wilcoxon score rank statistics, the ARE of the aligned rank test with respect to the Friedman X2 test is ~> ( p + 1)/p, so that for small values of p, there may be considerable gain in using an aligned rank test, albeit in terms of model robustness, the intra-block rank tests fare better. In the above development, it has been assumed that each treatment is applied to one plot in each block. We may consider a more general case where the jth treatment is applied to m j (/> 1) plots in each block, for j = 1 , . . . ,p. We let M ~j

2, p ~> 2, q ~> 2. Here the #~ relate to the replicate effects, uj and 7-k to the main effects for the two factors, 7jk for the interaction effects of the two factors, and w~jk are the residual error components. We may set without any loss of generality p

q

j--1

k=l

(6.2) i=1

and q

7j. = q - 1 E f j k

= O,

j = 1,...,p;

(6.3)

k=l p

7.k = P - 1 E

7jk = 0,

k = 1,...,q.

(6.4)

j=l

It is further assumed that for each i, (Will,... ,Odipq) have a joint d.f. G which is a symmetric function of its pq arguments, and these n (pq-) vectors are independent. This includes the conventional assumption of i.i.d, structure of the wijk as a particular case, and more generally, it allows each replicate error vector to have interchangeable components which may still be dependent, a case that may arise if we allow the replicate effects to be possibly stochastic, so that we would have then a mixed effects factorial model. The null hypothesis of interest is H0: r = (('yjk)) -- 0,

(6.5)

against alternatives t h a t / " is non-null. We would like to formulate suitable aligned rank tests for this hypothesis testing problem.

120

P K. Sen

For an m (~> 1), let l m = ( 1 , . . . , 1)', and consider the following intra-block transformations which eliminates the replicate and main effects. Let Y~ = ((Y/jk))p×q, g2i be the corresponding matrix of the error components, and let

Zi = (Ip - p - l l p l p ) Y i ( I q -q-11ql~q), E i = (Ip

-p

--1 lplp) ! ~'~i(Iq- q --1 lqlq), !

i= 1,...,n; i = 1,... , n .

(6.6) (6.7)

Then from (6.4), (6.6) and (6.7), we have

zi=r+Ei,

i= 1,...,n.

(6.8)

So that on this transformed model, the nuisance parameters are all eliminated. Note that the assumed interchangeability condition on the intra-block error components implies that for each i (= 1 , . . . , n), the components of Ei remain interchangeable too. This provides the access to using permutationally distribution-free procedures based on the stochastic matrices Zi, i = 1,..., n. On the other hand, the E i satisfy the same restrains as in (6.3) and (6.4), so that there are effectively only ( p - 1)(q - 1) linearly independent components among the pq ones (for each i). It follows from (6.7) and the assumed interchangeability of the elements of 12i that the joint distribution of Ei remains invariant under any of the possible p! permutations of its columns, and also under any of the possible q! permutations of its rows. Thus, there is a finite group G of (p!q!) permutations which maps the sample space of E i onto itself and leaves the joint distribution invariant, so that working with the n independent aligned error matrices, we arrive at a group ~n of transformations having (p!q!)n elements, and this provides the access to the exact permutation distribution of suitable test statistics based on these aligned observations. We may proceed as in Section 3 with intra-block rankings of these aligned observations and get a robust test, although it may not generally compare favorably in terms of power with aligned rank tests based on overall rankings, justifiable on the ground that the Zi do not contain any block effect. Let Rijk be the rank of Zijk among the N = npq aligned observations Z . . . . s = 1 , . . . , n ; u = 1 , . . . , p ; v = 1 , . . . , q , and define the scores aN(r), r---- 1 , . . . , N as in Section 3. For notational simplicity, we let rlijk = aN(Rijk), i ---- 1 , . . . , n ; j = 1 , . . . , p ; k = 1 , . . . , q , and let q

~ij. = q - 1 E

~Tijk,

j=

1,...,p;

(6.9)

k= 1,...,q;

(6.10)

k=l P ~i.k = p - 1 E ~ T i j k ' j=l P r/i.. = ( p q ) - I E

q E~TiJk'

j=l k=l

(6.11)

Design and analysis of experiments

for i = 1 , . . . , n, and let r?... = n -1

~ i n = l T]i...

121

Define the aligned rank statistics as

n

L N , j k = n -1 E 7 7 i j k ;

L N = ((LN,jk)).

(6.12)

i=1

Then the rank-adjusted statistics are defined by

L*N = (Ip - p - l l p l p ) L N ( I q

- q - l l q l ' q ) = ((L~jk)), say.

(6.13)

Let us also define the rank measure of dispersion: p

V,~ = [n(p - 1)(q - 1)1-1

E i=1

q

E (v/ijk - ~/ij. - ~?i.k + ~?i..)2.

(6.14)

j=l k=l

Then, as in Mehra and Sen (1969), we consider the following test statistic: P

~*N = [n/Vg] E

q

X--'IL* ~..¢l N j k l~2

(6.15)

j=l k=l

which is analogous to the classical parametric test statistic based on the variance ratio criterion. It may be appropriate here to mention that as in the case of twoway layouts, if we have a mixed-effects model, where the treatment effects and their interactions are fixed effects, while the block effects are stochastic, the alignment process eliminates the block-effects (fixed or not), and hence, aligned rank tests are usable for such mixed-effects models too. At this stage, it may be appropriate to point out the basic difference between the current alignment procedure and an alternative one, the rank transformation procedure. In the latter case, one simply replaces the original Y/jk by their ranks (within the overall set) and performs the usual ANOVA test for interactions based on such rank matrices. Basically, rank transformations relate to the sample counterpart of the classical probability integral transformation. For a continuous d.f. F, the latter is continuous, but still the former is a step function. Moreover, the latter is a bounded and typically nonlinear (monotone) function, so that the original linear model fitted to the Y~jk may not fit to their transformed counterparts F(Y~jk), when F is highly nonlinear. Thus, even if the block effects are eliminated by intra-block transformations, such nonlinearity effects are present in the foundation of rank transformations, and this makes them generally much less adoptable in factorial designs. In particular, if the main effects are not null, their latent effects in the rank transformation procedure may cause serious problems with respect to the validity and efficiency criteria. The aligned ranking procedure sketched here is free from this drawback as long as the basic linearity of the model in (6.1) is tenable. For small values of n, p and q, the exact (conditional) permutational distribution of/2~r can be obtained by considering the (p!q!)n (conditionally) equally likely row and column permutations of the matrices H1 = ((~Tijk)), i = 1 , . . . ,n, and as this process becomes unpracticable for large n, we appeal to the following large sample

P. K. Sen

122

result: As n increases, the permutational (conditional) as well as the unconditional null distribution of Z;~v can be approximated by the central chi squared distribution with (/9 - 1)(q - 1) DE For various asymptotic properties we may refer to Mehra and Sen (1969). The procedure extends readily to more than two-factor designs, and all we have to do is to define the aligned observations first to eliminate the nuisance parameters, and on such aligned observations we need to incorporate appropriate groups of transformations preserving invariance of their joint distributions, and with respect to such a group, we can obtain the permutational (rank) measure of dispersion. This provides the access to constructing variance-ratio type statistics based on such aligned rank statistics. This prescription is of sufficient general form so as to include the general class of IBD's treated in Section 5. Moreover, the results discussed here for univariate response variates percolate through general incomplete multiresponse designs (IMD) pertaining to clinical trials and medical studies (viz., Sen, 1994a). At this stage we may refer to rank transformations as have been advocated by a host of researchers. However, one has to keep in mind that the scope of such procedures for blocked designs may be considerably less than the aligned ones presented here. In practice, replicated m (>/2)-factor experimental designs crop up in a variety of ways, and in this setup, often, each of these factors is adapted at two levels, say 1 and 2. This way, we are led to a class of n replicated 2 m factorial experiments. For such designs, a similar ranking after alignment procedure, due to Sen (1970b), works out well. Let j = ( j l , . . . , Jm) represent the combination of the levels j l , . . . , jm of m factors ( A I , . . . ,Am), where jk = 1,2, for k = 1 , . . . ,m. We denote by J the set of all (2 m) realizations of j . For the ith replicate, the response of the plot receiving the treatment combination j is denoted by Xij, and we consider the usual linear model (sans the normality of the errors):

X~j=13~+ [E(-1)(J'7")'r,.]/2+e~ j, LvER

j c J, i = l , . . . , n ,

(6.16)

J

where (a, b) = a'b, the fli represent the block effects, the eij are the error components, r = ( r l , . . . , r m ) ' with each rj either 0 or 1, R is the set of all possible (2 m) realizations of v, and the treatment effects 7-. are defined as follows.

"r,. = ~-a[1...a~,~,

r 7~ O; To = O,

(6.17)

where for each j (= 1 , . . . , r a ) , A~ = 0. Thus, TA, = ~-I,0,...,0,..-,TAm = T0,...,0,1 represent the main effects, T A I , A 2 = TI,I,0 etc. represent a two-factor interaction, and so on; ~-,. is a k-factor interaction effect if (r, 1) = k, for k = 1 , . . . ,m. As in earlier sections, we assume here that for each i (= 1 , . . . , n), the set {eij, j E J } consists of interchangeable r.v.'s, and the block-effects need not be fixed; they may as well be stochastic. Let now P be a subset of R, and suppose that we want to test the null hypothesis

Ho,p: {~',-, v C P } = O,

(6.18)

Design and analysis of experiments

123

against the set of alternatives that these effects are not all equal to 0. Since (6.16) involves the block effects as nuisance parameters (or spurious r.v.'s), by means of the following intra-block transformations, we obtain the aligned observations. These aligned observations provide both the least squares and R-estimators of the "r,.. Let

ti,,. = 2 -(m-~) ~ ( - 1 ) ( J ' " ) X i j ,

r C R, i = 1 , . . . ,n.

(6.19)

jEJ

Then we may write ti,~ = T,. + gi,r, for every r E R, where the gi,,. are the corresponding aligned error components. It is easy to verify that these gi,7. remain exchangeable r.v.'s too, within each block. Moreover, it has been shown by Sen (1970b) by simple arguments that univariate d.f.'s for these aligned errors are all symmetric about zero, and all their bivariate d.f.'s are diagonally symmetric about o. Actually, the joint distribution of these aligned errors (within each block) is also diagonally symmetric about 0. Thus, for the R-estimation of the ~-~-,we may use the (marginal) set ti,~., i = 1 , . . . , n, and as in Section 2 (see (2.25)), incorporate a general signed rank statistic to yield the desired estimator. These are based on i.i.d.r.v.'s, and hence, no residuals are needed to reconstruct the estimators. As regards rank tests for the null hypothesis in (6.18), we may consider the n i.i.d.r, vectors (ti,,., r E P ) ,

i = 1,...,n,

(6.20)

and use multivariate signed-rank test statistics, displayed in detail in Chapter 4 of Puri and Sen (1971). Asymptotic properties of such tests, studied in detail there, remain in tact for such aligned rank tests in 2 m factorial experiments. Extensions to confounded or partially confounded designs have also been covered in Sen (1970b).

7. Paired comparisons designs: Nonparametrics In order to compare a number (say, t (>~ 2)) of objects which are presented in pairs to a set of (say, n (~> 2)) judges who verdict (independently) a relative preference of one over the other within each pair, the method of paired comparisons (PC), developed mostly by the psychologists, allows one to draw statistical conclusions on the relative positions of all the objects. Paired comparisons designs (PCD) are thus incomplete block designs with blocks of size two and a dichotomous response on the ordering of the intra-block plot yields. There are several detours from this simple description of PCD. For example, it may be possible to have observable responses (continuous variates) for each pair of objects: This will relate to the classical IMD with two plots in each block, so that, the results developed in earlier sections would be applicable here. Hence, we skip these details. Another route relates to paired characteristics so that ordering of the two objects within each pair may have four possible outcomes (instead of the two in the case of a single characteristic). Nonparametrics for such paired comparisons for paired characteristics were developed by Sen and David (1968)

124

P K. Sen

and Davidson and Bradley (1969, 1970), among others. A general account of such PCD methodology is given in David (1988) where other references are also cited. A general characteristics of such paired comparisons procedures is that circular triads may arise in a natural way, and this may lead to intransitiveness of statistical inference tools when viewed from a decision theoretics point; the problem becomes even more complex in a multivariate setup. However, following David (1988) we may say that it is a valuable feature of the method of paired comparisons that it allows such contradictions to show themselves . . . . and hence, the methodology developed addresses this issue in a sound statistical manner. As in earlier sections, it is also possible to work out the (M)ANOVA and (M)ANOCOVA models side by side, and following Sen (1995b), we summarize the main results along the same vein. Paired comparisons procedures in a multivariate setup rest on suitable representations of probability laws for multiple dichotomous attributes. Let us consider p (>/ 1) dichotomous attributes, and let i = (il,..., ip) t, where each ij can take only two values 0 and 1, for j = 1 , . . . , p. The totality of all such 2 p realizations of i is denoted by the set I, and consider a stochastic p-vector X = (X1,..., Xp) ~, such that

P{X=i}=r(i),

ieZ.

(7.1)

This probability law is defined on a 2P-simplex

H={Tr(i)>~O'ViEZ; E' r~c ( ic) =zl } so that there are 2 p -- 1 linearly independent elements in H. Since there ate t objects (forming (~) pairs), the total number of linearly independent parameters is equal to {2 p - 1 } (~), and this is generally large when t and/or p is not small. We consider the following modification of the Bahadur (1961) representation for multiple dichotomous attributes. Let

ri*j ) = P { x j = i } ,

i = O , 1; 1 ~ 1) response variates, denoted by Y1,.. •, Yp respectively. On a smallest set, sat $1, of experimental units, all these p responses are measured simultaneously; for a larger set S2, containing ,91 as a subset, Y2,. • •, Yp (but not Yi) are recorded on the subset 82 \ t-ql, and so on. For the largest set Sp, containing ,-qp-1 as a subset, Yp alone is recorded on the subset Sv\Sp_l. Such a multiresponse design, determined by the inherent nesting Sl C $2 c ..- c S v is termed a hierarchical design (viz., Roy et al., 1971, Chapter 8). It may not always be desirable or even practicable to impose this basic hierarchy condition. For example, the (random pattern) missing observations in multiresponse designs may distort this hierarchy condition to a certain extent. Nevertheless, it may be feasible to incorporate some IMD's wherein the set Y = {Y1,..., Yv) can be partitioned into various subsets { Y / I , . . . , Yi~}, 1 ~< r ~< p, 1 ~< il < -.. < i~ ~< p, such that these subsets are not necessarily nested and they are adoptable for possibly different number of experimental units. For example, f o r p = 2, we have three possible subsets {Ya}, {Y2} and {Yl, Y2}, and possibly different designs (say, 791,792 and 7912) may be chosen for these subsets. In this context it may be recalled that in clinical trials, often, the primary emphasis is on a comparative study of a placebo and one or more treatments, so that these designs are to be chosen in a conventional sense with due emphasis on these treatments. In clinical trials, a primary endpoint, in spite of being the most relevant one, may encounter some basic problems regarding its precise measurement (due to possibly excessive cost or some other practical limitations); therefore, it is not uncommon to make use of a very closely related but presumably, relatively less expensive variate, termed a surrogate endpoint. Generally, surrogate endpoints may not contain as much information as contained in the primary endpoint, and such a substitution may have serious effects on valid and efficient statistical modeling and analysis, unless the surrogate variate has some statistical concordance with the primary one. The situation may particularly be very bleak when this statistical interface of surrogate and primary endpoints is not that clearly known, and this case arises typically when no specific data are available on simultaneous measurement of both these variables. Nevertheless, the use of such surrogate endpoints in clinical trials and medical investigations has generally been accepted by the allied medical community and has caught the attention of statisticians as well. A nice statistical account of such uses, and abuses too, is given in a set of articles published in Statistics in Medicine, Vol. 8, No. 2 (1989). More technical exposition of this field are due to Pepe (1992) and Sen (1994a), among others. Not all auxiliary variables qualify for surrogates, and for qualified ones, it seems very reasonable (if not essential) to design a study in such a way that for a majority of the experimental units, termed the surrogate sample, valid surrogate endpoints and concomitant variates are recorded, while for a smaller subset of experimental units, termed the validation sample, simultaneous recording of the primary and surrogate endpoints throws light on their statistical relation which enables us to combine the evidence from both the subsets of data and draw better statistical conclusions. If statistical conclusions are to be drawn only from the surrogate sample observations,

142

P. K. Sen

some stronger regularity assumptions are generally needed to justify the conclusions, while the use of a validation sample may enhance the scope of the study considerably. We may refer to Prentice (1989) and Pepe (1992) for some useful accounts of these pros and cons of surrogate endpoints in clinical trials. The Prentice-Pepe setups can be characterized as both a hierarchical and incomplete multiresponse model with p = 2. Many clinical trials encounter a more complex setup involving multiple endpoints resulting in multiresponse primary variates. We may refer to Wei et al. (1989) and Prentice and Cai (1992) for some statistical treatments for such designs. A more comprehensive IMD/HD approach in a nonparametric setup is due to Sen (1994a), and we summarize these results here. There may be in general more than one primary endpoints, and we denote this by Y = (Y~,..., Yp)', where p ( ) 1) and there may be a partial ordering of the importance of these primary endpoints, which may also be taken into account in the design and statistical analysis of the study. Similarly, the surrogate endpoint may also be represented by a q-vector Yo, where q is a positive integer. Thus, in general, we have a set of p ÷ q responses, some of which may be costly to record. In order to extract information on the statistical relations between Y and Yo, and to incorporate the same in drawing statistical conclusions, it may be desirable to use IMD's or HD's. In this respect in a conventional approach, one adopts Multivariate General Linear Models (MGLM) for statistical modeling and analysis; however, the basic regularity assumptions are even more unlikely to be tenable in this multivariate situation. Thus, the appropriateness of MGLM's in clinical trials is questionable. Use of Generalized Linear Models (GLM) is also subject to similar limitations, and on model robustness grounds they are even more vulnerable to plausible departures from the assumed regularity assumptions. The Cox (1972) PHM based partial likelihood approach is also subject to serious nonrobustness constraints, and hence, sans sufficient confidence on such a parametric or semi-parametric model, such procedures should not be advocated in real applications. Some aspects of nonrobustness of the Cox PHM approach are discussed in Sen (1994a), and these remarks pertain to general IMD's as well. Nonparametrics, on the other hand, possess good robustness properties, and are better competitors to these alternative ones. This has been the main motivation of Sen (1994a) in pursuing general nonparametrics for such IMD's with adequate emphasis on the related asymptotics. The basic rank procedures described in the earlier sections, particularly, for randomized blocks, incomplete block designs, factorial experiments and multivariate models, provide the necessary access for this development, and we shall unify these in a convenient mold. Aligned rank procedures are particularly useful in this context. It may be recalled that a surrogate endpoint is a qualified substitute for the primary endpoint only if it reflects a picture with reference to the treatment difference concordant with the primary endpoint; in the literature this condition is also referred to as the validity criterion for a surrogate. Such a condition can be tested if one has a validation sample where both the primary and surrogate endpoints are recorded. But, generally, such a validation sample has a smaller size compared to the surrogate sample. Hence, the general nonparametric approach is based on the following scheme: (i) For testing a plausible hypothesis relating to treatment differences construct a suitable nonparametric test statistic based on the surrogate sample observations and adjusted for covariates,

Design and analysis of experiments

143

if any. (ii) For the validation sample, construct a similar nonparametric statistic for both the primary and surrogate endpoints (using the multivariate approach treated in Section 4), also adjusted for concomitant variates, if any. (iii) Test for the concordance of the primary and surrogate endpoints with respect to treatment differences based on the statistics in Step (ii). Again, nonparametric tests can be used here. (iv) Regress the primary endpoint statistics on the surrogate endpoint ones (in the validation sample in Step (ii)), and obtain the aligned statistics for the primary endpoint as residuals from this fitted regression. (v) Combine the statistics in Step (i) and (iv) by the usual weighted least squares principle and use the same in the formulation of the actual test statistic to be used for testing a hypothesis on treatment differences. In this context the joint (asymptotic) normality of multivariate rank statistics provides the theoretical justifications for the various steps sketched above, and also provides the foundation of general asymptotics relevant to this topics. These details are provided in Sen (1994a). For linear models in the parametric case, IMD's entail a secondary task: Recovery of interblock information from the block totals. In the nonparametric case, although the basic linearity of the model may not be fully appreciated, such recovery of interblock information is possible. The basic motivation is the same, although a somewhat different alignment process is needed to incorporate this recovery in nonparametric analysis. This alignment procedure is very similar to the classical parametric case: Block averages or some other measure of central tendency are used for construction of interblock aligned rank statistics, while the residuals within each block are pooled together for all blocks and replicates to construct aligned rank statistics for intra-block analysis. These two sets of rank statistics are then combined (as in Step (v) above) in a convenient way to construct suitable test statistics which have greater power than the one based solely on the intra-block residuals. For clinical trials involving (a treatment-wise) IMD and a surrogate endpoint, recovery of interblock information in nonparametric analysis of covariance models has recently been treated in a unified manner by E1-Moalem and Sen (1997). The idea is quite simple. In addition to using the aligned rank statistics (for the primary and surrogate endpoints as well as the concomitant variates), in a replicated IMD, it is also possible to use the within replicate block totals, align them with a view to eminiating the replicate effects, and then to use aligned rank statistics on such aligned block totals to extract further information on the treatment effects. Use of such aligned rank statistics eliminates the cruciality of the linearity of the model to a certain extent and makes it possible to use the usual weighted least squares methodology to construct suitable pooled aligned rank statistics which may be incorporated in the construction of a plausible test statistic having, at least asymptotically, better power properties. In this context it is not necessary to assume that the treatments are replicated equal number of times within each replicate or any pair of them are done so, and the treatise covers a general class of IMD's.

11. Concluding remarks The current state of art with the developments on nonparametrics in design and analysis of various types of experiments really calls for a far more thorough treatise of-the

144

P. K. Sen

subject matter than presented in this writeup. For lack of space, it has not been possible to include the entire battery of topics in design and analysis of experiments where nonparametrics are relevant. As regards the basic nonparametrics presented in the first eight sections of this writeup, the treatment here is fairly thorough. However, the last two sections are presented with more motivations from applications point of view but from methodological point of view somewhat less technical details are provided than they deserve. The nonparametric task remains as much more challenging, although some work has already been in progress in this direction. On the top of this there is another important consideration underlying practical adoption of statistical designs and analysis packages in clinical trials and medical studies in general. Missing observations may be a part of such experimental data, and statistical analysis should address this issue adequately with due considerations to practical adoptions. In clinical trials, such a missing pattern may be due to censoring of various types discussed in earlier sections, while in a general setup, it may be due to other factors as well. In epidemiological studies, it is not uncommon to encounter multiple causes of failures, and hence, a competing risk setup is often judged as appropriate. Again design and statistical analysis (parametrics as well as nonparametrics) for such studies follow somewhat different tracks, and it may be desirable to pay due attention to the developments of nonparametrics for competing risks models in as much generality as possible. Random missing patterns are often introduced as a part of the basic assumptions to deal with messy data sets arising in such studies. In a nonparametric MANOVA setup, for some developments on such random missing patterns, we may refer to Servy and Sen (1987), where other pertinent references are also cited. There remains much more to be accomplished in this direction. Competing risks models in a general multiresponse (or multiple endpoint) clinical trial poses even more complex statistical designing and analysis tasks. Only in some simplest situations, some relevant nonparametrics have been developed; we may refer to DeMasi (1994) where other references are cited in detail. Since more complex censoring patterns may arise in this context, statistical modeling (underlying either parametric or robust procedures) needs to address the infrastructure in an adequate manner; this not only increases the number of parameters associated with the model, but also may raise some identifiability issues which call for more delicate treatments. These need to be addressed in a more general and integrated manner than done here. Throughout this presentation the major emphasis has been on nonparametrics based on rank statistics and allied estimators. Although such rank procedures can mostly be justified from a global robustness point of view (with very little emphasis on the form of the underlying error distributions), there are some other situations where it may be wiser to take recourse to local robustness properties wherein only small departures from an assumed model are contemplated, so that high efficiency mingled with low sensitiveness to such local departures dominate the scenario. In this setup, as viable competitors to such nonparametrics, robust procedures based on suitable Land M-statistics are often advocated. Regression quantiles have their genesis in this complex, but in the recent past, they have paved the way for the related regression rank scores estimators and test statistics which compare very favorably with nonparametric procedures based on R-estimators. Recently, Jure~kov~i and Sen (1993) have

Design and analysis of experiments

145

established certain asymptotic equivalence results on the classical R-estimators and regression rank scores estimators in a linear model based on a common score generating function, and as such, taking into account the relative computational complexities of these two approaches, in some cases, we may advocate the use of such regression rank scores procedures as well. For some general accounts of these findings we may refer to Jure~kov~i and Sen (1996, Chapter 6), where mostly fixed-effects models are considered, and to Sen (1995a) where some mixed-effects models have also been treated. In the clinical and epidemiological sectors, due to medical ethics standards and current policies of some of the regulatory agencies in USA or other industrized nations, a multi-phase design approach for human usage is generally adopted. In Phase I, primary emphasis is on exploration of biochemical/biomedical effects, toxicity etc, while in Phase II, some therapeutic factors are taken into account. In this setup, it is quite common to have first some animal studies, and the conclusions as may be gathered from such studies are then to be incorporated in the design and general formulation of the main study: Phase III clinical trials. The emerging sub-discipline: clinical epidemiology has been geared to address more complex issues arising in this interdisciplinary field. Because of apparently conflicting attitudes of statisticians and epidemiologists to some of these clinical problems, in clinical epidemiology, there is, often, a blending of ecology and etiology for which the design as well as analysis aspects may differ drastically even in some simple parametric setups (see, e.g., Sen 1994c). Nonparametrics play a fundamental role in this setup too. For example, extrapolating the statistical findings from experiments conducted on subhuman primates to human beings raises the question of their validity and scope. In the statistical literature, such methodologies are categorized under the topic: Accelerated Life Testing (ATL) procedures. In parametric setups, the basic regularity assumptions appear to be quite stringent, and hence, nonparametrics are generally advocated for greater scope and reliability. However, in this context too, validity and reliability of statistical regularity assumptions need to be assessed properly. Biological assays are the main statistical assessment tools in this venture. Designs for such bio-assays may often be somewhat different, and we may refer to the classical text of Finney (1964) for a detailed account of such developments. His treatise has mainly been on a conventional parametric walk, wherein due emphasis has been laid down on transformations on the response and dose variables (termed the response metameter and dosage or dose metameter respectively) under which suitable parametric models can be justified. Nevertheless, in practice, such transformations may not simultaneously achieve the basic linearity of the model and normality (or logistic or some other simple form) of the tolerance distribution. As such, there is good scope for nonparametrics. Some developments in this sector took place during the sixties (viz., Sen, 1963) and early seventies, and a systematic review of this work is reported in Sen (1984a), where pertinent references are also cited. A new area of statistical awareness relates to our endangered environment and the statistical endeavors to cope with such problems; these have led to the developments of another frontier of statistical sciences: the Environmetrics. The tasks are truly challenging and statistical considerations are overwhelming in this venture. Unlike the case of conventional agricultural experiments, animal studies or even the clinical trials, environmental problems are generally characterized by the lack of control in the conduct of

146

P. K. Sen

a scientific study to a much greater extent. Also, a large number of factors contributes to unaccountable variations in the response patterns. Moreover, the response variables are often imprecisely defined and may also encounter serious measurement problems. For example, to assess the air pollution standard of various urban, suburban and rural areas in USA, the basic task may be to define precisely the response variables, identify their probable causes or factor variables (viz., auto-exhaustion, environmental smoking, industrial emissions, etc.), variation with the whether conditions, day-to-night variation, and many other factors which may not be properly defined and may hardly be controllable to a satisfactory extent. Some of these variables may even be binary or polychotomous in nature. On the top of that even when a variable is quantitative, it may usually be recorded in class intervals leading to the so called interval censoring schemes. Thus, measurement errors and to a certain extent misclassifications are usually encountered as a vital part of such response as well as dose variables. Even in the simplest case of two or several sample problems, for such grouped data ties among the observations may not be negligible, and there may not be a unique way to handle such ties; we may refer to H~ijek and Sidak (1967, Chapter 3) for some treatment of ties for the exact null hypothesis distributions of rank statistics, and to Sen (1967) for asymptotic optimality of rank tests for grouped data in a simple regression model. These results extend directly to general linear models. However, linear models are hardly appropriate in this complex setup when the response-dose regression, subject to possible measurement errors/misclassifications, may be quite nonlinear in nature, and suitable transformations on the factor as well as response variables are generally used to induce more linearity in the models; their impacts on the distributional assumptions are needed to be assessed carefully. Assumptions of independence, homoscedasticity and even the symmetry of the error components are to be examined critically in the particular contexts, and for these reasons, statistical designs and analysis schemes are to be developed in more practically adoptable settings. Such environmental problems are not totally out of the reach of clinical trials and biomedical studies. The emerging field of environmental health sciences deals with the impact of the environment on human health and prospects for long-range healthy living. Environmental health effects have been identified to be far more outreaching than in a simple chemical or biochemical setting, and Genotoxicity has also been identified as an important ingredient in this phenomenon. In this quest, biological assays involving biological markers are vital tools for assessments on subhuman primates, and suitable design of (mutagenetic) experiments are generally advocated for extrapolation of the findings from animals to human beings. Because of the fundamental roles of Molecular Biology and Human Genetics in these complex experimental schemes, such designs are generally quite different from the conventional ones considered in this volume. The appropriateness of an interdisciplinary approach is crucial in this context. Inhalation toxicology, water contamination, air pollution and scores of other serious environmental threats are affecting the Quality of Life (QOL) and in many ways, endangering our lives too, and, in this respect, an interdisciplinary approach is very much needed to provide scientifically sound and operationally manageable solutions. We may refer to Sen and Margolin (1995) for some of the basic statistical issues in some environmental studies with major emphasis on inhalation toxicology, and conclude that statistical planning

Design and analysis of experiments

147

a n d a n a l y s i s s c h e m e s are m o s t vital in this v e n t u r e . P a r a m e t r i c s or s e m i - p a r a m e t r i c s are less likely to b e a p p r o p r i a t e in this e m e r g i n g r e s e a r c h field, a n d n o n p a r a m e t r i c s are i n d i s p e n s i b l e in this c o n t e x t to a far g r e a t e r extent.

12. Acknowledgements I a m g r a t e f u l to P r o f e s s o r S u b i r G h o s h for critical r e a d i n g o f the m a n u s c r i p t w h i c h h a s e l i m i n a t e d n u m e r o u s t y p o s a n d s o m e o b s c u r i t i e s as well.

References Adichie, J. N. (1967). Estimates of regression parameters based on rank tests. Ann. Math. Statist. 38, 894-904. Andersen, P. K., O. Borgan, R. D. Gill and N. Keiding (1993). Statistical Models Based on Counting Processes. Springer, New York. Armitage, P. (1975). Sequential Medical Trials, 2nd edn. Blackwell, Oxford. Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items. In: H. Solomon, ed., Studies in Item Analysis and Prediction. Stanford Univ. Press, CA. Benard, A. and P. van Elteren (1953). A generalization of the method of m-rankings. Indag. Math. 15, 358-369. Bhapkar, V. P. (1961a). Some nonparametric median procedures. Ann. Math. Statist. 32, 846-863. Bhapkar, V. P. (1961b). A nonparametric test for the problem of several samples. Ann. Math. Statist. 32,

1108-1117. Bradley, R. A. and M. E. Terry (1952). Rank analysis of incomplete block designs, I. The method of paired comparison. Biometrika 39, 324-345. Brown, G. W. and A. M. Mood (1951). On median tests for linear hypotheses. Proc. 2nd Berkeley Symp. Math. Statist. Probab. 159-166. Chatterjee, S. K. and P. K. Sen (1964). Nonparametric tests for the bivariate two-sample location problem. Calcutta Statist. Assoc. Bull 13, 18-58. Chatterjee, S. K. and P. K. Sen (1973). Nonparametric testing under progressive censoring. Calcutta Statist. Assoc. Bull. 22, 13-50. Cox, D. R. (1972). Regression models and life tables (with discussion). J. Roy. Statist. Soc. Ser. B 34, 187-220. Cox, D. R. (1975). Partial likelihood. Biometrika 62, 369-375. David, H. A. (1988). The Method of Paired Comparisons, 2nd edn. Oxford Univ. Press, New York. Davidson, R. R. and R. A. Bradley (1969). Multivariate paired comparisons: The extension of a univariate model and associated estimation and test procedures. Biometrika 56, 81-94. Davidson, R. R. and R. A. Bradley (1970). Multivariate paired comparisons: Some large sample results on estimation and test of equality of preference. In: M. L. Purl, ed., Nonparametric Techniques in Statistical Inference. Cambridge Univ. Press, New York, 111-125. DeLong, D. M. (1981). Crossing probabilities for a square root boundary by a Bessel process. Comm. Statist. Theory Methods A10, 2197-2213. DeMasi, R. A. (1994). Proportional Hazards models for multivariate failure time data with generalized competing risks. Unpublished Doctoral Dissertation, Univ. North Carolina, Chapel Hill. Durbin, J. (1951). Incomplete blocks in ranking experiments. Brit. J. Statist. Psychol. 4, 85-90. E1-Moalem, H. and P. K. Sen (1997). Nonparametric recovery of interblock information in clinical trials with a surrogate endpoint. J. Statist. Plann. Inference (to appear). van Elteren, P. and G. E. Noether (1959). The asymptotic efficiency of the x2-test for a balanced incomplete block design. Biometrika 46, 475-477. Finney, D. J. (1964). Statistical Methods in Biological Assay, 2nd edn. Griffin, London.

148

P K. Sen

Fleming, T. R. and D. P. Harrington (1991). Counting Processes and Survival Analysis. Wiley, New York. Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Amer. Statist. Assoc. 32, 675-701. Gerig, T. M. (1969). A multivariate extension of Friedman's xZ-test. J. Amer. Statist. Assoc. 64, 1595-1608. Gerig, T. M. (1975). A multivariate extension of Friedman's xZ-test with random covariates. J. Amer. Statist. Assoc. 70, 443--447. Greenberg, V. L. (1966). Robust estimation in incomplete block designs. Ann. Math. Statist. 37, 1331-1337. Grizzle, J. E. (1965). The two-period change-over design and its use in clinical trials. Biometrics 21, 467-480. H~ijek, J. and Z. Sidal~ (1967). Theory of Rank Tests. Academic Press, New York. Hampel, E R., E. M. Ronchetti, E J. Rousseeuw and W. A. Stahel (1986). Robust Statistics: The Approach Based on Influence Function. Wiley, New York. Hodges, J. L., Jr. and E. L. Lehmann (1962). Rank methods for combination of independent experiments in analysis of variance. Ann. Math. Statist. 33, 487-497. Hodges, J. L., Jr. and E. L. Lehmann (1963). Estimates of location based on rank tests. Ann. Math. Statist. 34, 598--611. Huber, P. J. (1981). Robust Statistics. Wiley, New York. Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing dispersion of the residuals. Ann. Math. Statist. 43, 1449-1458. Jure~kov~t, J. (1971). Nonparametric estimate of regression coefficients. Ann. Math. Statist. 42, 1328-1338. Jure~kovfi, J. (1977). Asymptotic relations of M-estimates and R-estimates in linear models. Ann. Statist. 5, 464--472. Jure~kov~i, J. and E K. Sen (1993). Asymptotic equivalence of regression rank scores estimators and R-estimators in linear models. In: J. K. Ghosh et al., eds., Statistics and Probability: A Raghu Raj Bahadur Festschrift, Wiley P.astern, New Delhi, 279-292. Jure~kov~i, J. and E K. Sen (1996). Robust Statistical Procedures: Asymptotics and Interrelations. Wiley, New York. Koch, G. G. (1972). The use of nonparametric methods in the statistical analysis of two-period change-over design. Biometrics 28, 577-584. Krishnaiah, P. R. (ed.) (1981). Handbook of Statistics, Vol. 1: Analysis of Variance. North-Holland, Amsterdam. Krishnaiah, P. R. and E K. Sen (eds.) (1984). Handbook of Statistics, Vol. 4: Nonparametric Methods. North-Holland, Amsterdam. Lan, K. K. B. and D. L. DeMets (1983). Discrete sequential boundaries for clinical trials. Biometrika 70, 659-663. Lehmann, E. L. (1963a). Robust estimation in analysis of variance. Ann. Math. Statist. 34, 957-966. Lehmann, E. L. (1963b). Asymptotically nonparametric inference: An alternative approach to linear models. Ann. Math. Statist. 34, 1494--1506. Lehmann, E. L. (1964). Asymptotically nonparametric inference in some linear models with one observations per cell. Ann. Math. Statist. 35, 726-734. Majumdar, H. and E K. Sen (1978). Nonparametric tests for multiple regression under progressive censoring. J. Multivariate Anal 8, 73-95. Mantel, N. and W. Haenszel (1959). Statistical aspects of analysis of data from retrospective studies of disease. J. Nat. Cancer Inst. 22, 719-748. Mehra, K. L. and J. Sarangi (1967). Asymptotic efficiency of some rank tests for comparative experiments. Ann. Math. Statist. 38, 90-107. Mehra, K. L. and E K. Sen (1969). On a class of conditionally distribution-free tests for interactions in factorial experiments. Ann. Math. Statist. 40, 658-666. Murphy, S. A. and P. K. Sen (1991). Time-dependent coefficients in a Cox-type regression model. Stochast. Proc. AppL 39, 153-180. Pepe, M. S. (1992). Inference using surrogate outcome data and a validation sample. Biometrika 79, 495512. Prentice, R. L. (1989). Surrogate endpoints in clinical trials: Definition and operational criteria. Statist. Med. 8, 431-440.

Design and analysis of experiments

149

Prentice, R. L. and J. Cai (1992). Covariance and survival function estimation using censored multivariate failure time data. Biometrika 79, 495-512. Pukelsheim, E (1993). Optimal Experimental Design. Wiley, New York. Purl, M. L. and P. K. Sen (1967). On robust estimation in incomplete block designs. Ann. Math. Statist. 38, 1587-1591. Purl, M. L. and P. K. Sen (1971). Nonparametric Methods in Multivariate Analysis. Wiley, New York. Puri, M. L. and P. K. Sen (1985). Nonparametric Methods in General Linear Models. Wiley, New York. Quade, D. (1984). Nonparametric methods in two-way layouts. In: P. R. Krisnaiah and P. K. Sen, eds., Handbook of Statistics, Vol. 4: Nonparametric Methods. North-Holland, Amsterdam, 185-228. Roy, S. N. (1953). On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Statist. 24, 220-238. Roy, S. N., R. Gnanadesikan and J. N. Srlvastava (1970). Analysis and Design of Certain Quantitative Multiresponse Experiments. Pergamon Press, New York. Sen, P. K. (1963). On the estimation of relative potency in dilution (-direct) assays by distribution-free methods. Biometrics 19, 532-552. Sen, P. K. (1967a). Asymptotically mostpowerful rank order tests for grouped data. Ann. Math. Statist. 38, 1229-1239. Sen, P. K. (1967b). A note on the asymptotic efficiency of Friedman's X2-test. Biometrika 54, 677-679. Sen, P. K. (1968a). Asymptotically efficient test by the method of n-ranking. J. Roy. Statist. Soc. Ser. B 30, 312-317. Sen, P. K. (1968b). On a class of aligned rank order tests in two-way layouts. Ann. Math. Statist. 39, 1115-1124. Sen, P. K. (1968c). Robustness of some nonparametric procedures in linear models. Ann. Math. Statist. 39, 1913-1922. Sen, P. K. (1968d). Estimates of the regression coefficient based on Kendall's tau. Z Amer. Statist. Assoc. 63, 1379-1389. Sen, P. K. (1969a). On nonparametric T-method of multiple comparisons in randomized blocks. Ann. Inst. Statist. Math. 21, 329-333. Sen, P. K. (1969b). Nonparametric tests for multivariate interchangeability. Part II: The problem of MANOVA in two-way layouts. Sankhya Ser. A 31, 145-156. Sen, P. K. (1970a). On the robust efficiency of the combination of independent nonparametric tests. Ann. Inst. Statist. Math. 22, 277-280. Sen, P. K. (1970b). Nonparametric inference in n replicated 2 m factorial experiments. Ann. Inst. Statist. Math. 22, 281-294. Sen, P. K. (I 971 a). Asymptotic efficiency of a class of aligned rank order tests for multiresponse experiments in some incomplete block designs. Ann. Math. Statist. 42, 1104-1112. Sen, P. K. (1971b). Robust statistical procedures in problems of linear regression with special reference to quantitative bio-assays, 1. lnternat. Statist. Rev. 39, 21-38. Sen, P. K. (1972). Robust statistical procedtues in problems of linear regression with special reference to quantitative bio-assays, II. Internat. Statist. Rev. 40, 161-172. Sen, P. K. (1979). Rank analysis of covarlance under progressive censoring. Sankhya Ser. A 41, 147-169. Sen, P. K. (1980). Nonparametric simultaneous inference for some MANOVA models. In: P. R. Krishnaiah, ed., Handbook of Statistics, VoL 1: Analysis of Variance. North-Holland, Amsterdam, 673-702. Sen, P. K. (1981a). The Cox regression model, invariance principles for some induced quantile processes and some repeated significance tests. Ann. Statist. 9, 109-121. Sen, P. K. (1981b). Sequential Nonparametrics: Invariance Principles and Statistical Inference. Wiley, New York. Sen, P. K. (1984a). Some miscellaneous problems in nonparametric inference. In: P. R. Krishnaiah and P. K. Sen, eds., Handbook of Statistics, Vol. 4: Nonparametric Methods. North-Holland, Amsterdam, 699-739. Sen, P. K. (1984b). Multivariate nonparametric procedures for certain arteriosclerosis problems. In: P. R. Kristmaiah, ed., Multivariate Analysis VI. North-Holland, Amsterdam, 563-581. Sen, P. K. (1985). Theory and Applications of Sequential Nonparametrics. SIAM, Philadelphia, PA. Sen, P. K. (1988). Combination of statistical tests for multivariate hypotheses against restricted alternatives. In: S. Dasgupta and J. K. Ghosh, eds., Statistics: Applications and New Directions. Ind. Statist. Inst., Calcutta, 377--402.

150

P. K. Sen

Sen, E K. (1991a). Nonparametrics: Retrospectives and perspectives (with discussion). J. Nonparamert. Statist. 1, 3-53. Sen, P. K. (1991b). Repeated significance tests in frequency and time domains. In: B. K. Ghosh and P. K. Sen, eds., Handbook of Sequential Analysis, Marcel Dekker, New York, 169-198. Sen, P. K. (1993a). Statistical perspectives in clinical and health sciences: The broadway to modem applied statistics. J. Appl. Statist. Sci. 1, 1-50. Sen, P. K. (1993b). Perspectives in multivariate nonparametrics: Conditional fanctionals and ANOCOVA models. Sankhygt Sen A 55, 516-532. Sen, P. K. (1994a). Incomplete multiresponse designs and surrogate endpoints in clinical trials. J. Statist. Plann. Inference 42, 161-186. Sen, P. K. (1994b). Some change-point problems in survival analysis! Relevance of nonparametrics in applications. J. Appl. Statist. Sci. 1, 425-444. Sen, P. K. (1994c). Bridging the biostatistics-epidemiology gap: The Bangladesh task. J. Statist. Res. 28, 21-39. Sen, P. K. (1995a). Nonparametdc and robust methods in linear models with mixed effects. Tetra Mount. Math. J. 7, 1-12. Sen, P. K. (1995b). Paired comparisons for multiple characteristics: An ANOCOVA approach. In: H. N. Nagaraja, D. F. Morrison and P. K. Sen, eds., H. A. David Festschrift. Springer, New York, 237-264. Sen, P. K. (1995c). Regression rank scores estimation in ANOCOVA. Ann. Statist. (to appear). Sen, P. K. (1995d). Censoring in Theory and Practice: Statistical perspectives and controversies. IMS Lecture Notes Monograph Ser. 27, 175-192. Sen, P. K. and H. A. David (1968). Paired comparisons for paired characteristics. Ann. Math. Statist. 39, 200-208. Sen, P. K. and B. H. Margolin (1995). Inhalation toxicology: Awareness, identifiability and statistical perspectives. Sankhyd Ser. B 57, 253-276. Sen, P. K. and M. L. Puri (1977). Asymptotically distribution-free aligned rank order tests for composite hypotheses for general linear models. Zeit. Wahrsch. verw. Geb. 39, 175-186. Senn, S. (1993). Cross-over Trials in Clinical Research. Wiley, New York. Servy, E. C.-and P. K. Sen (1987). Missing values in multisample rank permutation tests for MANOVA and MANOCOVA. Sankhya Ser. A 49, 78-95. Shah, K. R. and B. K. Sinha (1989). Theory of Optimal Designs. Lect. Notes Statist. No. 54, Springer, New York. Sinha, A. N. and P. K. Sen (1982). Tests based on empirical processes for progressive censoring schemes with staggering entry and random withdrawal. Sankhya Ser. B 44, 1-18. Tudor, G. and G. G. Koch (1994). Review of nonparametric methods for the analysis of cross over studies. Statist. Meth. Med. Res. 3, 345-381. Wei, L. J., D. Y. Lin and L. Weissfeld (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Amer. Statist. Assoc. 84, 1065-1073.

S. Ghosh and C. R. Rao, eds., Handbook of Statistics, Vol. 13 © 1996 Elsevier Science B.V. All rights reserved.

l[ ,,3

Adaptive Designs for Parametric Models

S. Z a c k s

1. Introduction

Adaptive designs are those performed in stages (sequentially) in order to correct in each stage the design level and approach the optimal level(s) as the number of stages grow. Adaptive designs are needed when the optimal design level(s) depend on some unknown parameter(s) or distributions. In the present chapter we discuss adaptive designs for parametric models. These models specify the families of distributions of the observed random variables at the various design levels. An important class of such design problems is that of quantal response analysis of bioassays (see Finney, 1964). A dosage d of some toxic material is administered to K biological subjects. In dilutive assays the probability of response (death, convulsion, etc.) is related often to the dose in a parametric model specifying that the distribution of the number of subjects responding, out of K , at dose d, Ya, is the binomial

yd ~ U(K, F(a +/3x)),

(1.1)

where x = log(d), a and/3 are some parameters, and F(u) is a c.d.f., called the tolerance distribution. The functional form of F is specified. When the biological subjects are not human, and there are no ethical restrictions, adaptive designs may not be necessary. In such cases the optimal design levels (doses) for estimating the parameters of the model are quite different from the ones presented here. In clinical trials with human subjects, in particular in phase I studies (see Geller, 1984), the objective is to administer the largest possible dose for which the probability of life threatening toxicity is not greater than a prespecified level, 7, 0 < 7 < 1. In this case the optimal log-dose is x.y = ( F -1 (7) - a)/fl. This optimal level depends on the unknown parameters c~ and/3. Adaptive designs are performed in such a case, and the information on a and/3, which is gathered sequentially, is used to improve the approximation to the optimal level x.~. In clinical trials, when it is important not to exceed x.r, a constraint is sometimes imposed on the designs, such that P { X ~ 1 - a, for each n ~> 1 and all (a,/3) (see Section 3). In Section 2 we discuss optimal designs, when the objective is to maximize the Fisher information on the unknown parameters. In Sections 2.1 and 2.2 we develop locally optimal designs and Bayes designs for a single sample situation. Section 2.3 151

s. Zacks

152

deals with adaptive designs based on the MLE and on Bayesian estimates of the parameters. Section 3 is devoted to adaptive designs for inverse regression problems, which contain the above mentioned phase I clinical trials as a special case. We present results on adaptive designs when the observed random variables have normal distributions, with means and variances which depend on the design levels. Section 4 discusses the important problem of optimal allocation of experiments. These problems are known under the name of "bandit problems". Allocation problems are one type of adaptive designs in which sequential stopping rules play an important role. We provide an example of a problem in which the optimal allocation is reduced to a stopping time problem. An important non-parametric methodology for attacking some of the problems discussed in the present chapter is that of stochastic approximation, introduced by Robbins and Monro (1951). This important methodology deserves a special exposition and is therefore not discussed here. We refer the reader to the papers of Anbar (1977, 1984), in which stochastic approximation methods have been used for problems similar to the ones discussed here.

2. Optimal designs with respect to the Fisher information

2.1. Locally optimal designs Let ~" = {F(.; x, 0); 0 c 69, x c X} be a regular family of distribution functions, of random variables Y~. x is a known control or design variable. The design space 2( is a compact set in Nr,. 0 is an unknown parameter of the distribution. The parameter space O is an open set in R k. The regularity of U means that all its elements satisfy the well known Cramtr-Rao regularity conditions (Wijsman, 1973). Let f(y; x, O) be a p.d.f, of F(.; x, 0) with respect to some a-finite measure #. The Fisher information function (matrix) is

I(O;x) = Varo,~ { :-~--~logf(Y;x,O) }

(2.1)

where, in the k-parameter case (k ~> 2), I(O; x) denotes a k × k covariance matrix, and ~ log f(y; x, O) is a gradient vector (score vector). A design level x°(O) is called optimal with respect to the Fisher information, if it maximizes some functional of the information matrix. EXAMPLE 2.1. Consider the simple normal linear regression, i.e., Yi "~ N(/30 + /31xi, cr2), i = 1 , . . . , n . Y1,...,Yn are mutually independent, xl,...,xn are chosen from a closed interval [x*,x**]. The parameters (/3o,/31,a2) are unknown, (/30,/31) E I~2 and a 2 E ~+. The Fisher information matrix is n 7

I( o, 51,

n2

z) =

0 0

0

0

n

,

(2.2)

Adaptivedesignsfor parametricmodels

153

where ){ = lY']~n=lXi. It is interesting to realize that the information matrix is independent of (/30,/31). The determinant of the Fisher information matrix is Dn = n (X i - )~)2. A design which maximizes Dn is called D2-~Qx, where Qx = ~i=1 optimal. In the present case a D-optimal design, for n = 2m, exists independently of (/30,/31,a2), and is given by the design which uses m values of x at x* and m values of x at x**. This is a special simple case of Elfing's optimal designs for the linear regression (see Chernoff, 1972, p. 13). This example can be generalized to the multiple linear regression in which Y ,.~ N(Xt3, a21), and X is a design matrix. Zacks (1977) considered the more general case in which the distribution of Y~, is a one-parameter exponential type, with density functions

f (y; a,/3, x) = 9(y; x) exp { yw(a +/3x) + ¢(o~ +/3x)},

(2.3)

where 9(y; x) > 0 on a set S (support of f ) independent of (a,/3), and w(u) and ¢(u) are some analytic functions, such that f g(y; u)e yw(~) dy < oo for all u and ¢(u) = - log f g(y; u)e yw(~) dy for all u. Of special interest is the class of binomial models, where Yz ~ B(K, O(a +/3x)) where exp{w(a +/3x)} O(a +/3x) = 1 + exp{w(a +/3x)}"

(2.4)

This class of models is applied in quantal response analysis of bioassays. In these models, O(u) is a c.d.f. (tolerance distribution). The normal, logistic, Weibull, extremevalues, and other distributions have been applied in the literature. This type of analysis appears also in reliability life testing. The following example is a simple illustration. EXAMPLE 2.2. K systems are subjected to life testing. The time till failure of each system is exponentially distributed with mean /3, 0 < /3 < oo. /3 is an unknown parameter. The times till failure of the K system T1,T2,... ,Tk are i.i.d. Y, is the number of systems that fail in the time interval (0, x], i.e., Yz ~ B(K, 1 - e x p { - x / / 3 } ) . Ehrenfeld (1962) considered the problem of finding the optimal x for minimizing a certain expected loss. See also Zacks and Fenske (1973). Cochran (1973) discussed this example too, under different parameterization. This example can be traced back to Fisher (1922). The Fisher information function on/3, based on the statistic Yx, for x* 1} is a zero-mean martingale. Hence R,~/n --+0 a.s. as n --+ oo. Moreover I(~7; Xn) ~> 4K(x*)27/2 > 0. Thus, 1

n

- lim "---" ) I(cr;Xi) --- /> 4K(x*)2rl 2 > 0. /----.,,d -

(2.27)

Tb n---y-co i = 1

Equations (2.23), (2.24) and (2.27) imply that 8n --+ a in probability, and n

p lim 1

Z i ( a ; X i ) = i(a;uOt7)

n-.+oo n

i=1

(2.28)

S. Zacks

160

From (2.24) and (2.28) we obtain that, as n --+ e~,

v~(an

~&(~) -+" Op i( O,

~ F (-}) r (-}) 2 2 Kx,,i (-}) But

for all (Xi, a). Hence, if n ~> N1 (e), where Nl(¢) =

K(x**)2(f*)2 eF

7ff P

(2.31)

-~[

then for all i = 1 , . . . , n , n/> 1,

P={IW~,~l > e I & - , } = 0

a.s.

(2.32)

Moreover, for all n >~ N2 sufficiently large, IW~nI ~< 1 for all i = 1 , . . . , n. Hence,

E~{W~nI{IW~nl>

e} I & - i } = 0 a.s.,

(2.33)

for all n >~ max{N1 (e), Nz}. Finally,

v~{w,J{IW~,d ~< e} I B~_I} -- ¼ / ( ~ ; x d

a.s., for all n ~> 1.

(2.34)

Hence, by the Central Limit Theorem for martingales (see Shiryayev, 1984, p. 509), v~S,(cr) - ~

n--+~

N(0, I(cr;u°a)).

Equations (2.29) and (2.35) imply (2.20).

(2.35) []

Adaptive designs for parametric models

161

EXAMPLE 2.4. Consider the logistic tolerance distribution, namely

F(u)=(l+e-U) -1,

-cx)~ 1, where 3n E Bn-1 is a posterior estimator of a. For a finite number of stages, N, define

MN(x) = x2E { u

f2

(2.42)

The optimal design level, X ° , is the argument, in [x*,x**], maximizing Recursively we define, for each j = 1 , 2 , . . . , N - 1,

{

0

f2 (x) MN-j(x) = X2EH F (~) P (~) q- MN-j+ 1(XN_j+ 1) BN_j_ 1

}

MN(x).

(2.43)

and X°_ 5 is the argument in [x*, x**] maximizing MN-j(x). Generally it is impossible to obtain explicit expressions for X °. Zacks (1977) discussed a two stage (N -- 2) optimal Bayesian design. This will be shown in the following example. EXAMPLE 2.5. Consider the problem of performing n* trials in two stages ( N = 2). In stage 1 we perform n trials at level xl. The distribution of Y~ is B(n, 1 - e x p { - 0 X l } ) . In stage 2 we perform (n* - n) trials at level x2. 0 is an unknown parameter. The problem is to determine (n, Xl, z2) to maximize the prior expected total information

JH(n, Xl, x2) =

EH {nx2e -0~1(1 - e-°X') -1

+

I

164

S. Zacks

We determine first x ° to maximize the posterior expectation e-°~2) - l In, xl, II1 }, n and Xl are determined so as to maximize

EH{nx~e -Ox' (l

X~EH{e-°~2(1 --

-- e-°X') -1

+ (n* - n)EH{ (x°)2e-°~°(1 -

e-°~°)-lln, x,, Y1} }.

For a prior gamma distribution for O, with scale parameter 1 and shape parameter u, the posterior expectation of I(O; x2) is

(n* - n)E{x~e-°~:(1 - e-°X2) -1 I TI,,Xl, Y1} N'n-Yl (--l~J (n-Yl~ K"°° [1 ~- (n* - - n ) x 2A-'j=O k , k j }A..Jk=lt ~ - x l ( y l Jeff) '~x2k]-u E~.=-o~ (- 1)~ (n-jr1) (1 + Xl(Y1 q- j ) ) - ~

x°(n, xa, ]I1) is determined numerically. The predictive distribution of Y1, given (n, X 1 ) has the p.d.f.

n--j p(j;n, xl)= ( 3 ) E ( - - 1 ) i ( n ~ J )

(1 +xl(j+i))-u"

i=0

Accordingly,

JH(n,x,,x °) =nx 2 E ( 1

+

x,k) -~" + ( N - n) E

k=l

xE(_I) i=0

(x°(n'J))2

j=0

i

n-j i

(l+Xl(i+j)+xO(n,j)k)_ ~,

n and Xl are determined by numerically maximizing the last expression. Thus, for u = 3 and N = 50, n o = 10, x ° = 1.518. In some situations, like clinical trials, two stage designs may be preferable to multistage designs, due to feasibility constraints. There are many papers in the literature in which the Bayesian adaptive sequence is not optimal but is given by [Xn = Uo^ ~** fin]x* (see, for example, Chevret, 1993) where ~n is the Bayesian estimator of fi for the squared-error loss, i.e., the posterior mean

"an

~--

f o fih(a)Ln (o) dfi fo h(fi)Ln(fi)dfi O0

(2.44)

We should comment in this connection that due to the complexity of the likelihood function Ln(fi), when the x-levels vary, the evaluation of ~n by equation (2.44) generally requires delicate numerical integration.

Adaptive designsfor parametric models

165

A general proof of the consistency of ~n, as n --+ oe, can be obtained by showing that the Bayesian estimator and the MLE are asymptotically equivalent.

3. Adaptive designs for inverse regression problems Consider the following tolerance problem. Let Yx be a normal random variable having mean/3o +/31z and variance cr~ = ~r2z2. It is desired that Yx ~< rl with probability 7. is a specified threshold. The largest value of z satisfying this requirement is

,7-3o ~3'(°) - /31 q-o-Zq,'

(3.1)

where 0 = (30,/31, o-) and Z. r = ~-1 (7) is the ../th quantile of the standard normal distribution. If the parameters (/30, 31, ~r) are known, the optimal z-level is z.y. When the parameters are unknown we consider adaptive designs, in which z levels are applied sequentially on groups or on individual trials. We consider here sequential designs in which one trial is performed at each stage. Let { X ~ ; n ~> 1} designate the design levels of a given procedure. It is known a priori that - c ~ < z* < zT(O ) < z** < oe. Thus, all design level Xn are restricted to the closed interval [z*, z**]. We further require, that the random variables Xn will satisfy

(3.2)

Po { X n >.1 - a,

for all 0 = (/30,/31, ~r), and all n ) 1. This requirement is called the feasibility requirement. We wish also that the sequence {Xn} will be consistent, i.e., lim Po{[Xn - z.r(0)l > e} = 0,

n ---koo

(3.3)

for any e > 0 and 0. In some cases we can also prove that a feasible sequence {Xn° } is optimal, in the sense that, N

N

Eo{(x ° n=l

z,(o))-}

Z {(x. - z,(o))-}

(3.4)

n=~/O

for N ~> 1, where {X,~} is any other feasible sequence, and where a - -- - min(a, 0). In the present section we present some of the results of Eichhorn (1973, 1974), Eichhorn and Zacks (1973, 1981). See also the review article of Zacks and Eichhorn (1975).

s. Zacks

166

3.1. Non-Bayesian designs, ~ known To start, let us consider the special case where/30 and a are known. In this case the conditional distribution of (Yn - / 3 o ) / X n , given Xn, is that of a N(/31, a2), almost surely. Hence, the random variables Un = (Yn - flo)/X,~ are i.i.d. N(/31, a2) and

-- ~ i=1

Xi

'

n ~ 1,

(3.5)

is the best unbiased estimator of/31. A UMA (1 - a)-confidence interval for/31 is ~ n ) + Zl_~O./x/-~. It follows that the sequence defined by X~+I = [~,~]~*,

(3.6)

where

n-~0

=

/'z,_~

~

,

n>/,1

(3.7)

is feasible, strongly consistent and optimal. The feasibility is due to the fact that

Po

ffZl-~ } /3,--.O

oZ1 -oe } t90 ~n) q- 7 ~fll-'}-E ~ Po{~?2 ~/31 q-c}, for all 0. Let

(-

n -/30

and

y = x.v(O ) - ~'.

/31 + e + a Z7

Let n-~0

2 n - ~(n)

~'1,o~ q- °'Z3' "

Then, from (3.8) one obtains that, for all y > 0,

P o { ~ ( o ) - x . / > y} v).

(3.8)

Adaptive designsfor parametric models

167

Finally, Eo{(x.~(o) - x n ) +} =

v} dv

Po{x~(O) - 2,~ > v}dv

= Eo{(x.,(O)

2 . ) + },

-

for all n ~> 1. This proves the optimality of {X,~} given by equations (3.6)-(3.7). When the slope/31 (and ~r) are known, we can obtain a feasible, strongly consistent and optimal design, by similar methods (see Zacks and Eichhorn, 1975). On the other hand, when both/50 and/31 are unknown we can obtain a feasible, consistent design but not optimal in the previous sense. The reason for it will be explained below. Roughly speaking, one cannot estimate/30 and/31 if all observations are at one fixed value of x. In the adaptive procedure, to assure consistency, one needs to assume that

(1

1,~,_!_1'~2

Let Ri = Y i / X i

and 17/ =

as n --+ co.

1/Xi, i =

1 , 2 , . . . . After n trials, let X (n) =

(x~, X2,..., x,0,

&

1 --~n

R,,

p,~

1

V~,

n

i=1

i=1

n

n

B S D v = ~ g / 2 - - ' D , Y [ ~2, i=1

Qn = ~__¢Vi2 i=1

and n

SPDvR = ~

V/R/-- n Y n P ~ .

i=1

Notice that R{ ] Xi ~ N(/3, +/3oV/, a2), i = 1 , 2 , . . . . Let flo,n and/Sl,n be the least squares estimators of/30 and/31, for the linear regression

Ri =/31 +/3oYi + el,

i = 1,..., n

(3.9)

where e l , . . . , e,~ are i.i.d. N(0, er2) random variables, independent of X (n). Accordingly, ~0,n = S P D v R

SSDv '

/31. = ~ - Z0,J'~.

(3.1o)

S. Zacks

168

Given X (n), the conditional distribution of (#o,,~, #1,,~) is bivariate normal with mean (/30,/31) and covariance matrix

g(X(,~)) = O'2

SSDv

-- SSD,,

~"n.

1~

v,4 -2

(3.11)

"

SSDv n -- SSDv To apply this approach, we have to require that SSDv > 0 for each n /> 2 and S S D v a.s.> c~ as n -4 oo. This will assure that limn_~oo,~(X(n)) = 0 in a proper metric (the largest eigenvalue). Using Fieller's Theorem (Fieller, 1940) we obtain confidence intervals for zT(0), in the following manner. Let W,~ = ( r / - #0,,~) - zT(0)(Bl.n + O'ZT). The

(3.12)

conditional distribution of W,~,given X(n), is normal with mean zero and variance

V{Wn ' X(n) } -

O'2 SSDv

[1 +

x2(o) -Q- n 7

n

- 2az.r(o)lTzn]

It follows that the conditional distribution of W~, given X (n), is like that of X(n)}X2[1]. Hence, the two real roots of the quadratic equation

An~ 2 - 2 B , ~ + C,~ = 0

(3.13)

V{W,~ [ (3.14)

are (1 - a) level confidence limits for zT(0 ). Here

0"2

Qn

An = (#l,n + o-Z-~)2 - SSDv

2

" - -nX 1 - 2 a ~

Bn =

O-2 (/~l,n q- O-ZT)(~ -- /~0,n) q- -SSDv - ~ nX1-2o,, 2

cn

(77 -

and

O'2

=

#o,n) 2

2

SSDvXl-2~,

(3.15)

and where X 21 --2c* is the (1 - 2a)th quantile of the chi-squared distribution with 1 degree of freedom. Two real roots for equation (3.14) exist with probability 1 - a. Let ~n,~ and (n,,~ denote the two roots of equation (3.14). The sequence of dosages is defined as, X1 = z* and x**

X~+l = [~n,~]x. ,

n ~> 1.

(3.16)

Adaptive designsfor parametric models

169

If for some no, real roots of equation (3.14) do not exist, we set X n 0 + l = Xno. By construction, Po{Xn 0, ~i~lc~i < c~, and where Yj~ is the random yield from gj~, ji c { 1 , 2 , . . . , K } , i = 1 , 2 , . . . . The sequence a = (cq,c~2,...) is called a discounting sequence. If the distribution functions F1,..., FK are known, there is no problem in attaining the objective. One should perform in each trial an experiment gi0 for which E{Y I Fio} = maxl~~ O,~i=lni = n), define T/(n) = 0 if ni = 0 and T~(n) = ~-2~'~1Ti(Y~j) if ni > O. T (") is a minimal sufficient statistic for the parametric family Y:~ associated with gi. Let T (n) = (T(n),..., T(~)), and let B~ be the crfield generated by the statistic ( N (n) , T (n)). The sequence { ( N (n), T (n)); n ~> 1} is sequentially sufficient and transitive for the model (see Zacks, 1971, p. 83).

Adaptive designsfor parametric models

173

A strategy is a sequence r = (T1, ~'2,. • .) such that ~-n E Bn-1 for each n = 1 , 2 , . . . and where, for each n / > 1, the range of ~-n is {1, 2 , . . . , K } . In other words, Tn is a discrete random variable such that {~-n = i} c Bn-1, i = 1 , . . . , K , n / > 1. According to T, if ~-,~ = i then experiment gi is performed at the nth trial. The expected yield of a strategy "r is oo

W ( ' r , a , 0) =

k

Ea~EEo{Yj,~I{'cn=j}}, n=l

(4.1)

j=l

where 0 = ( 0 1 , . . . , OK) is the vector of parameters specifying F 1 , . . . , FK. Let H be a prior distribution on F2 = O1 x ... x OK. The prior expected yield of a strategy with respect to H is called the worth of ~- and is given by oo

K

WH(T°'L):n~l°tnj~lf)Eo{YjnI{r'n:j}}dH(O)= ' .=

(4.2)

A strategy "r° is Bayesian (optimal) against H if

Wu(r °,

a ) = sup

WH("r,oO.

(4.3)

T

Let

V(H, a) = WH(r°, a).

Since 7n e / 3 n - 1 and E{Y~- I Tn = j ) = # j ( 0 j ) where

#j(Oj) = fs~ Ygj(Y) exp { OjTj(y) + Cj(Oj)} dp(y) =-C~(O)lo=o~, j = I , . . . , K ,

(4.4)

we obtain from (4.1) that ~x~

K

W("r,o~,O) = E ~n E#j(Oj)Po{Tn = j}. n=l

j=l

oo

K

n=l

j=l

(4.5)

Furthermore,

=

= j } }.

(4.6)

Notice that K

(4.7) j=l

174

S. Zacks

where e~n) is a K-dimensional row vector whose jth component is 1 and all the other components are 0. Similarly, K

T (n) = T (~-0 + EYjne~.K)I{'r~ = j}.

(4.8)

j=l We see that the amount of information on (01,..., OK) after n trials depends on the strategy T. The likelihood function of (01,..., 0K), given (N('~) , T (n)) is equivalent to

L(01,..., Ok; N (n) , T (n)) = exp

K

i=1

~)Ci(O~)

OiT(n) +

(4.9)

i=1

(01,..., Ok) E /2. Notice that the likelihood function L(O; N (n), T (~)) depends on the strategy "r only through the statistic ( N (n) , T(n)). Let H(O) be a prior c.d.f, on f2. By Bayes Theorem, the posterior distribution of 0, given (N(n), T('0) is given by

dg(01,..., Ok I N('O, T(n)) L( 01, . . . , Ok; N (n), T (n) ) dH ( 01, . . . , Ok) = foL(01,..., Ok; N ('~), TOO) dH(01,..., Ok)"

(4.10)

If for some j, N} ~) = 0, then the posterior marginal distribution of Oj is its prior marginal distribution. The predictive p.d.f, of Yj,n+l given Bn, is

fj(y; O) dHj(O I N('O, T(n)), S;(y; N (n), T (n)) = f_ Jo

(4.11)

J

where Hj (0 I N(~), T(~)) is the marginal posterior c.d.f, of Oj, given Bn. Thus, the predictive expectation of Yj,n+l, given Bn, is for j = 1 , . . . , k, n >~ 1,

M(n+U (N(n)' T(n)) = fo7 #3(0) dHj(O [ N(n), T(n))"

(4.12)

Without loss of generality, let al = 1. If a2 > 0 then, the function V(H, o~) satisfies the functional equation V(H, or) = l ~ 1 + 5. If N(5) = N1(5) we stop and construct the interval k~Nl(5)(~Tr(1)- - v, "g r~r(l)~Nl(6) -~- 5). If N(5) = N2(5) we stop the trials on $1 and add M ( 5 ) observations on $2, where M ( 5 ) = least integer ra >/0 such that

m >~nl (5) - N2(5)/QN2(6).

(4.19)

The interval estimator is (~ - 5, ~ + 5) where

~, =

(4.20)

" ' " ~.~J N2(6) ~ M ( 5 )

MN:---'~QN2(5 ) ) + N2(5) For N1 (5) we use the Chow-Robbins stopping variable (see Chow and Robbins, 1965) 52 _ N1 (5) ----least integer n / > 1, such that Qn ~< _x2_~

n 2

(4.21)

S. Zacks

178

Table 4.1 Simulationestimates of EN(') (6), E{M(~)} and coverage probability nl (fi) = 50, "/= .95 E{N2(a)} W

ELM(6)} C

P

P { I ~ - ul < a}

a

P

W

C

P

0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

3.72 2.32 2.32 0.00 0.00 0.00 .9329 9.11 5.82 5.82 0.00 0.00 0.00 .8223 17.75 15.07 14.48 2.30 0.00 0.93 .8115 16.16 23.22 19.10 22.57 11.08 15.65 .8424 8.41 12.90 12.67 36.47 29.19 29.39 .8492 5.24 7.96 8.81 42.23 35.98 35.54 .8725 4.24 6.11 7.50 44.15 39.68 39.26 .8839 3.71 5.04 6.28 45.83 41.00 40.73 .8989

W

C

.8976 .7153 .6995 .7612 .7826 .7978 .8235 .8279

.8976 .7153 .6993 .7629 .7825 .7972 .8236 .8281

whose properties are well known. The question is what variable one should use for N2(6). Zacks (1973) considered three types of stopping variables for N2(~), called parabolic (P), Wald ( W ) and Chernoff (C). For the rational leading to each one of these variables see Zacks (1973). The formulae of these stopping variables are:

N (P) (5) =

least n, n ~> 2,

such that

n2

N(2w)(5)

= least n, n / > 2,

/

such that

Q,~ ) n exp{(log(nl, ( ~ ) ) ) / n } ,

(4.23)

and

N(2c)(5)

= least n, n ~> 2,

Qn ~>

n exp n,

such that log \ ~ ) ]

,

if n ~< (8~)1/3, '~'(~) otherwise.

The performance of these procedures with respect to coverage probability and expected sample size was evaluated by simulations. The results are given in the following table.

Acknowledgement

The author gratefully acknowledges the helpful discussions with Professor Anton Schick.

Adaptive designs for parametric models

179

References Anbar, D. (1977). The application of stochastic approximation procedures to the bioassay problem. J. Statist. Plann. Inference 1, 191-206. Anbar, D. (1984). Stochastic approximation methods and their use in bioassay and Phase I clinical trials. Comm. Statist. 13, 2451-2467. Bather, J. (1983). Optimal stopping of Brownian motion: A comparison technique, In: M. H. Rizvi, J. S. Rustagi and D. Siegmund, eds., Recent Advances in Statistics. Academic Press, New York, 19--49. Bather, J. and G. Simons (1985). The minimax risk for clinical trials. J. Roy. Statist. Soc. Ser. B 47, 466-475. Berry, D. A. and B. Fristedt (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London. Chalone~r, K. and I. Verdinelli (1995). Bayesian experimental design: A review. Statist. Sci. 10, 273-304. Chernoff, H. (1967). Sequential models for clinical trials. In: Proc. Fifth Berkeley Symp. of Math. Statist. and Probab. Vol. 4, 805-812. Chernoff, H. (1972). Sequential Analysis and Optimal Design. Regional Conferences Series in Applied Mathematics, Vol. 8. SIAM, Philadelphia, PA. Chernoff, H. and S. N. Ray (1965). A Bayes sequential sampling inspection plan. Ann. Math. Statist. 36, 1387-1407. Chevret, S. (1993). The continual reassessment method in cancer Phase I clinical trials: A simulation study. Statist. Med. 12, 903-1108. Chow, Y. S. and H. Robbins (1965). On the asymptotic theory of fixed-width sequential confidence intervals for the mean. Ann. Math. Statist. 36, 457-462. Cochran, W. G. (1973). Experiments for non-linear functions, R. A. Fisher memorial lecture. J. Amer. Statist. Assoc. 68, 771-781. Colton, T. (1963). A model for selecting one of two medical treatments. J. Amer. Statist. Assoc. 58, 388-400. Ehrenfeld, S. (1962). Some experimental design problems in attribute life testing. J. Amer. Statist. Assoc. 57, 668-679. Eichhorn, B. H. (1973). Sequential search of an optimal dosage. Naval Res. Logistics Quaterly 20~ 729-736. Eichhorn, B. H. (1974). Sequential search of an optimal dosage for cases of linear dosage-toxicity regression. Comm. Statist. Theory Methods 3, 263-271. Eichhorn, B. H. and S. Zacks (1973). Sequential search of an optimal dosage, I. J. Amer. Statist. Assoc. 68, 594-598. Feldman, D. (1962). Contributions to the two-armed bandit problem. Ann. Math. Statist. 33, 847-856. Fieller, E. C. (1940). The biological standardization of insulin, Roy. Statist. Soc. Supplement 7, 1-64. Finney, D. J. (1964). Statistical Methods in Biological Assay, 2nd edn. Griffin, London. Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc., London A 222, 309-368. Flournoy, N. and W. E Rosenberger (1995). Adaptive Designs. Selected Proceedings of a 1992 Joint AMSIMS-SIAM Summer Conference, Lecture Notes - Monograph Series, Vol. 25. IMS. Geller, N. (1984). Design of Phase I and Phase II clinical trials: A statistician's view. Cancer Invest. 2, 483-491. Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. Roy. Statist. Soc. Ser. B 41, 148-177. Gittins, J. C. and D. M. Jones (1974). A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66, 561-565. Kelley, T. A. (1974). A note on the Bernoulli two-armed bandit problem. Ann. Statist. 2, 1056-1062. Khan, M. K. (1984). Discrete adaptive design in attribute life testing. Comm. Statist. Theory Methods 13, 1423-1433. Khan, M. K. (1988). Optimal Bayesian estimation of the median effective dose. J. Statist. Plann. Inference 18, 69-81. Lai, T. L., H. Robbins and D. Siegmund (1983). Sequential design of comparative clinical trials. In: M. H. Rizvi, J. S. Rustagi and D. Siegmund, eds., Recent Advances in Statistics. Academic Press, New York, 51-68. Lehmann, E. LI (1959). Testing Statistical Hypotheses. Wiley, New York.

180

S. Zacks

Petkau, A. J. (1978). Sequential medical trials for comparing an experiment with a standard treatment. J. Amer. Statist. Assoc. 73, 328-338. Prakasa Rao, B. L. S. (1987). Asymptotic Theory of Statistical Inference. Wiley, New York. Robbins, H. and S. Monro (1951). A stochastic approximation method. Ann. Math. Statist. 22, 400-407. Rodman, L. (1978). On the many armed bandit problem. Ann. Probab. 6, 491--498. Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, New York. Shiryayev, A. N. (1984). Probability. Springer, New York. Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer, New York. Wijsman, R. A. (1973). On the attainment of the Cramer-Rao lower bound. Ann. Statist. 1, 538-542. Zacks, S. (1971). The Theory of Statistical Inference. Wiley, New York. Zacks, S. (1973). Sequential estimation of the common mean of two normal distributions. I: The case of one variance known. J. Amer. Statist. Assoc. 68, 422-427. Zacks, S. (1977). Problems and approaches in design of experiments for estimation and testing in non-linear models. Multivariate Anal. IV, 209-223. Zacks, S. and B. H. Eichhom (1975). Sequential search of optimal dosages: The linear regression case. In: J. N. Srivastava, ed., Survey of Statistical Designs and Linear Models. North-Holland, New York. Zacks, S. and W. J. Fenske (1973). Sequential determination of inspection epochs for reliability systems with general life time distributions. Naval Res. Logistics Quaterly 3, 377-385.

S. Ghosh and C. R. Rao, eds., Handbook of Statistics, Vol. 13 © 1996 Elsevier Science B.V. All rights reserved.

U

Observational Studies and Nonrandomized Experiments

Paul R. Rosenbaum

1. An informal history or observational studies in statistics In an observational study or nonrandomized experiment, treated and control groups formed without random assignment are compared in an effort to estimate the effects of a treatment. The term "observational study" was first used in this way by William G. Cochran; see, for instance, Cochran (1965). In an observational study, the treated and control groups may have differed prior to treatment in ways that are relevant for the outcomes under study, and these pretreatment differences may be mistaken for an effect of the treatment. If the groups differed in terms of relevant pretreatment variables or covariates that were observed and recorded, then there is an overt bias. If the groups differed in terms of covariates that were not recorded, then there is a hidden bias. Controlling overt biases and addressing hidden biases are central concerns in an observational study. In a sense, studies that would today be called observational studies have been conducted since the beginning of empirical science. In the field of epidemiology, a field that conducts many observational studies, attention is often called to the investigations in the mid 1850's of John Snow concerning the causes of cholera. Snow's observational studies compared cholera rates for individuals in London served by different water companies having different and changing sources of water. Snow's careful studies have a distinctly modern character and are still used to teach epidemiology; see MacMahon and Pugh (1970, pp. 6-11) for detailed discussion of Snow's work. In another sense, observational studies were born at the same time as randomized experiments, with the work in the 1920's and 1930's of Sir Ronald Fisher and Jerzy Neyman on the formal properties of randomization. Prior to that time, there was no formal distinction between experiments and observational studies. However, the systematic development of statistical methods for observational studies came much later. Many principles that structure the statistical theory of observational studies were created by individuals actively involved in the 1950s in the controversy about smoking as a cause of lung cancer. Sir Austin Bradford Hill and Sir Ronald Fisher, though taking opposing sides in the controversy, agreed that a key part of an observational study is the effort to detect hidden biases using an "elaborate theory" describing how the treatment produces its effects. See Hill (1965) for a concise statement of his general methodological principles and see Cochran (1965, Section 5) for discussion of 181

182

P. R. Rosenbaum

Fisher's general views. The first formal method for appraising sensitivity of conclusions to hidden bias came in a paper by Cornfield, Haenszel, Hammond, Lilienfeld, Shimkin and Wynder (1959) which surveyed the evidence linking smoking with lung cancer; see also Greenhouse (1982). Cochran had been an author of the U.S. Surgeon General's report on Smoking and Health (Bayne-Jones, et al., 1964) which reviewed the results of many observational studies. Cochran's survey articles on observational studies identified the common structure of most observational studies: (i) the control of visible biases using matching, stratification, or model based adjustments, and (ii) the devices used to distinguish actual treatment effects from hidden biases. Mantel and Haenszel (1959) introduced a test for treatment effect which, in effect, views a single observational study as a series of independent randomized experiments within strata defined observed covariates, where the randomization probabilities vary between strata but are constant within strata. In addition to epidemiology, the statistical theory of observational studies also draws from the large literature concerning educational and public program evaluation. Examples of observational studies in this area are the studies concerning the relative effectiveness of public and Catholic private high schools - e.g., Coleman, Hoffer and Kilgore (1982) and Goldberger and Cain (1982) - and studies of the effects of various approaches to bilingual education - e.g., Meyer and Fienberg (1992). General discussions of methodology in this area include Campbell and Stanley (1963), Campbell (1969), Cook and Campbell (1979), Kish (1987) and Rossi and Freeman (1985). General discussions of statistics in observational studies are found in: Breslow and Day (1980, 1987), Cochran (1965, 1972), Cox (1992), Gastwirth (1988), Holland (1986), Holland and Rubin (1988), Rosenbaum (1995), Rubin (1974, 1977, 1978), Wold (1956). This chapter is a concise summary of a point of view that is developed in Rosenbaum (1995), though the chapter also discusses a few additional topics and a new example.

2. Inference in randomized experiments In a randomized experiment, N units are divided into S strata with ns units in stratum s, N = nl + . . . + n s . The strata are defined using observed covariates, that is, variables describing the units prior to treatment. In stratum s, a fixed number ms of units are randomly selected to receive the treatment, where 0 ~< ms ~ ns, and the remaining ns - ms receive the control. If 5' = 1, this is a completely randomized experiment. If ns = 2, m s = 1 for s = 1 , . . . , S, then this is a paired randomized experiment. Write m

=

T

Write Zsi = 1 if unit i in stratum s receives the treatment and Zsi = 0 if this unit receives the control, so ms = ZiZsi. Write Z for the N-dimensional column vector containing the Zsi. Write IAI for the number of elements of a set A. Let f2 be the set containing the K = = 1-I possible values of Z. In a conventional randomized experiment, Z is selected using a random device or random numbers that ensure that prob(Z = z) = 1 / K for each z E /2. Unit i in stratum s exhibits a (vector) response rsi. Write R for the matrix whose N rows are the rsi's. The null hypothesis of no treatment effect asserts that the responses

IS?l

(~;)

Observational studies and nonrandomized experiments

183

that units exhibit do not change as the treatment assignment changes; that is, the null hypothesis asserts that R is fixed, so that this same R would have been observed no matter which treatment assignment Z had been randomly selected. Let T = t(N, R) be a test statistic, and consider the one-sided tail area of this test statistic under the null hypothesis of no effect. Under the null hypothesis, the chance that T ~> k is

prob(T/> k) = I{z E 12: t(z, R) >>.k}l 112l

(1)

that is, the proportion of the equally probable treatment assignments z C g2 giving rise to values of the test statistic greater than or equal to k. This calculation makes use only of the probability distribution implied by the random assignment of treatments, so it involves no assumptions about distributions, and for this reason Fisher (1935) referred to randomization as the "reasoned basis for inference" in experiments. Many commonly used tests are of either of this form or are large sample approximations to (1), including nonparametric tests such as the rank sum and signed rank tests, tests for binary responses such as the Fisher's exact test, McNemar's test for paired binary responses, and the test of Mantel and Haenszel (1959), Birch (1964) and Cox (1966) for stratified binary responses, Mantel's (1963) test for discrete scores, as well as various rank tests for censored data. Given the test (1) of the null hypothesis of no effect, a confidence interval for an additive or multiplicative effect is obtained in the usual way by inverting the test. Point estimates are often obtained by the method of Hodges and Lehmann (1963).

3. Observational studies in the absence of hidden bias

3.1. Overt and hidden bias In an observational study, treatments are not randomly assigned, so there may be no reason to believe that all units have the same chance of receiving the treatment, and so no reason to trust the significance level (1). This section discusses inference when there are overt biases but no hidden biases, that is, the chance of receiving the treatment varies with the observed covariates used to define the strata but does not vary with unobserved covariates. Later sections discuss the more fundamental concern that hidden biases may be present. Consider the following model for the distribution of treatment assignments Z in an observational study. Let 12" be the set containing the 112"I = 2N vectors of dimension N with coordinates equal to 1 or 0. The model asserts that the Zsi are independent with prob(Z~i = 1) = %i, where the % i are unknown with 0 < %i < 1; that is, 8

~7,s

p r o b ( Z = z) = H H 7r~' (1 - 7r~i)l-zsl s=l i=1

for all z c 12".

(2)

184

P. R. Rosenbaum

By itself, (2) cannot be used as the basis for inference because the 7r8i are not known, so (2) cannot be calculated. Using (2), the terms "overt bias" and "hidden bias" will be defined. There is overt bias but no hidden bias if 7r8i is constant within each stratum; that is, there is no hidden bias if there exist probabilities ~8 such that 7r~i = ~8 for i = 1 , . . . , ns. Under model (2), there is hidden bias if 7r8i varies within strata, that is, if 7r8i ~ 7rsj for some s, i, j. When (2) holds with no hidden bias, Rubin (1977) says there is "randomization on the basis of a covariate".

3.2. Permutation inference in the absence of hidden bias

If there is no hidden bias, then overt bias is not, in principle, a difficult problem. The following argument appears explicitly in Rosenbaum (1984a), but it appears to be implicit in Mantel and Haenszel (1959)'s choice of test statistic. If there is no hidden bias then the ~s are unknown, but m is a sufficient statistic for the ~s, and 1

p r o b ( Z = z [ m ) = / I'y2"----~ for all z E S2.

(3)

In other words, if there is no hidden bias, the conditional distribution of treatment assignments given the numbers m treated in each stratum is the usual randomization distribution. Hence, if there is no hidden bias, conventional randomization inference may be applied to the stratified data. This leads, for instance, to the Mantel and Haenszel (1959) statistic when the responses are binary. In some instances, there is a sufficient statistic for the #8 which is of lower dimension than m , and this is useful when the data are thinly spread across many strata. Methods of inference in this case are discussed in Rosenbaum (1984a, 1988a). For instance, this approach may be applied to matched pairs when the matching has failed to control imbalances in certain observed covariates. In short, if there is no hidden bias, the analysis of an observational study is not fundamentally different from the analysis of a stratified or matched randomized experiment. If there is no hidden bias, care is needed to appropriately adjust for observed covariates, but careful adjustments will remove overt biases. Addressing hidden bias is, therefore, the central concern in an observational study. The remainder of Section 3 discusses practical aspects of controlling overt biases, and later sections discuss hidden bias.

3.3. The propensity score

A practical difficulty in matching and stratification arises when there is a need to control for many observed covariates simultaneously. Suppose that there are P observed covariates. Even if each of the P covariates take on only two values, there will be 2 p possible values of the covariate, or more than a million possible values for P = 20 covariates. As a result, with the sample sizes typically available in observational studies, it may be impossible to find for each treated subject a control subject with exactly

Observational studies and nonrandomized experiments

185

the same value of all covariates. Exact matching is rarely feasible when there are many covariates. Fortunately, exact matching is not needed to control overt biases. Examination of the argument in Subsections 3.1 and 3.2 reveals that the argument is valid in the absence of hidden bias providing all subjects in the same stratum or matched pair have the same probability ~8 of receiving the treatment; however, it is not necessary that subjects in the same stratum have identical values of observed covariates. The propensity score is the conditional probability of receiving the treatment given the observed covariates. It has several useful properties. First, in the absence of hidden bias, strata or matched sets that are homogeneous in the one-dimensional propensity score control bias due to all observed covariates (Rosenbaum and Rubin, 1983, Theorem 4; Rosenbaum, 1984a, Theorem 1). Second, even in the presence of hidden bias, strata or matched sets that are homogeneous in the propensity score tend to balance observed covariates, that is, to produce treated and control groups with the same distribution of observed covariates (Rosenbaum and Rubin, 1983, Theorem 1). In other words, strata or matched sets that are homogeneous in the propensity score control overt biases even if hidden biases are present. In practical work, the propensity score is unknown and must be estimated using a model, perhaps a logit model, predicting the binary treatment assignment from the P-dimensional vector of observed covariates. Estimated propensity scores appear to perform well. In empirical studies, Rosenbaum and Rubin (1984, 1985) found that estimated propensity scores produced greater balance in observed covariates than theory anticipates from true propensity scores. It appears that estimated propensity scores remove some chance imbalances in observed covariates. In a simulation study, Gu and Rosenbaum (1993) found that matching on an estimated propensity score produces greater covariate balance when P = 20 than multivariate distance based matching methods. Rosenbaum (1984a) discusses exact conditional inference given a sufficient statistic for the unknown parameter of the propensity score.

3.4. The structure of an optimal stratification Stratifications and matchings in observational studies have varied structures. In pair matching, ns = 2 and ms = 1 for s = 1 , . . . , S = N / 2 . In matching with a fixed number k of controls, ns = k + 1 and ms = 1 for s = 1 , . . . , S = N / ( k + 1); see, for instance, Ury (1975). In matching with a variable number of controls, ns >7 2 and m s = 1 for s = 1 , . . . , S ; see, for instance, the example presented by Cox (1966). Haphazard choices of ns and ms are common in practice. Does any particular structure have a claim to priority? Consider M treated subjects and N - M controls and a distance 5ij >~ 0 between treated subject i and control j. The distance 5ij measures the difference between the values of the observed covariate vectors for i and j. A stratification is a division of the N subjects into S' strata so that each stratum contains at least one treated subject and one control, that is, ms >/ 1 and ns - ms ~> 1. For a given stratum s, let As be

P. R. Rosenbaum

186

the average of the ms(n~ - ms) distances 5ij between the m~ treated and n~ - rn~ control subjects in this stratum. Let S

A= Ew(ms,ns

-ms)As,

8=1

where w(a, b) >1 0 is a weight function defined for positive integers a and b. The weight function w(a, b) is neutral if for all a ~> 2 and b ~> 2, w(a, b) = w(a 1, b - 1) + w(1, 1); that is, the weight function is neutral if the total weight neither increases nor decreases when a pair is separated from a stratum. For instance, three neutral weight functions are w(ms,ns - ms) = n s / N , w(ms,ns - ms) = m s / M , and w(ms, ns - ms) = (us - m s ) / ( N - M). A full matching is a stratification in which min(ms, ns - m s ) = 1 for s = 1 , . . . , S. It may be shown that for any neutral weight function and any distance 5ij, there is a full matching that minimizes A over the set of all stratifications. This is not true of pair matching, matching with k controls or matching with a variable number of controls. Indeed, as measured by A, pair matching may be arbitrarily poor compared to the best full matching. More than this, for a neutral weight function, if the covariates are multivariate Normal and 5~j is the Mahalanobis distance then, with probability l, a stratification that is not a full matching is not optimal. For proof of these claims and extensions, see Rosenbaum (1991a). In this specific sense, the optimal form for stratification is a full matching. Optimality refers to one particular criterion, in this case distance or overt bias. Under simple constant variance models, full matching is not optimal in terms of minimizing the variance of the mean of the matched treated-minus-control differences; rather, 1 - k matching is optimal. As shown by elementary calculations, a bias that does not diminish as N -+ cc quickly becomes more important to the mean squared error than a variance that is decreasing as 1 / N as N --+ oc. Still, variance is a consideration. In addition, the optimal full matching may be highly unbalanced, which may be unpleasant for aesthetic reasons. In practice, it is probably best to limit the degree to which matched sets can be unbalanced while not requiring perfect balance. An example of doing this is discussed by Rosenbaum (1989a, Section 3.3). In a simulation, Gu and Rosenbaum (1993, Section 3.3) compared: (i) optimal full matching with M / N = 1/2 to optimal pair matching and (ii) optimal full matching with M / N = 1/4 to optimal matching with k = 3 controls per treated subject. In other words, in both (i) and (ii), the number of controls was the same in full matching and 1 - k matching (k = 1 or k = 3), the difference being the flexible structure of full matching. Two distances were used, the difference in propensity score and the Mahalanobis distance. In terms of both distances and in terms of covariate imbalance, full matching was much better than 1 - k matching for both k = 1 and k = 3.

3.5. Constructing optimal matched samples Matched pairs or sets that minimize the total distance within sets may be constructed using minimum cost network flow algorithms. This process is illustrated in Rosenbaum

Observational studies and nonrandomized experiments

187

(1989). Unlike many combinatorial optimization algorithms, the relevant network flow algorithms are relatively fast, and in particular, some are polynomial bounded. An excellent, modern general discussion of network flow algorithms is given by Bertsekas (1991) who also supplies FORTRAN code and performance comparisons on a Macintosh Plus. Gu and Rosenbaum (1993, Section 3.1) use simulation to compare optimal pair matching and greedy pair matching when 50 treated subjects and 50 or 100 or 150 or 300 controls are available to form the 50 matched pairs. Greedy matching consists of forming pairs one at a time, the pair with the smallest distance being selected first, the remaining pair with the second smallest distance being selected second, and so on. Traditionally, some form of greedy matching has been used to create matched samples in statistics. Greedy matching does not generally minimize the total distance within pairs, and in theory it can be quite poor compared to optimal matching. The simulation suggests that optimal matching is sometimes noticeably better than greedy matching in terms of distance within pairs, sometimes only slightly better, but optimal matching is no better at producing balanced matched samples. Optimal and greedy matching seem to select roughly the same 50 controls, but optimal matching does a better job with the pairing, so distance within pairs is affected but the balance of the entire matched groups is not.

3.6. Matching and stratification followed by analytical adjustments When matching or stratification provide imperfect control for observed covariates, analytical adjustments may by added. Rubin (1973b, 1979) uses simulation to study covariance adjustment for covariates that have been imperfectly controlled by matching, He finds that covariance adjustment of matched pair differences is more robust to misspecification of the form of the covariance model than is covariance adjustment of unmatched samples. Holford (1978) and Holford, White and Kelsey (1978) perform model-based adjustments for matched pairs with a binary outcome in case-control studies. Rosenbaum and Rubin (1984, Section 3.3) use a log-linear model to adjust for covariates in conjunction with strata formed from an estimated propensity score. Rosenbaum (1989a) discusses permutation tests that stratify imperfectly matched pairs. This leads to generalizations of Wilcoxon's signed rank test and McNemar's test that control residual imbalances in observed covariates.

3.7. The bias due to incomplete matching Incomplete matching occurs when some treated subjects are discarded because controls cannot be found. It is possible to show that incomplete matching can introduce a bias into a study that is free of hidden bias even if matched pairs are exactly matched for observed covariates; see Rosenbaum and Rubin (1985b). There, an example is presented in which unmatched treated subjects differ more from matched treated subjects than the treated subjects initially differed from the controls. Incomplete matching can substantially and unintentionally alter the nature of the treated group.

188

P. R. Rosenbaum

3.8. Consequences of adjustment for affected concomitant variables Section 3 has discussed adjustments for covariates, that is, for variables measured prior to treatment and hence unaffected by the treatment. Adjustments are sometimes made for variables measured after treatment that may have been affected by the treatment. For instance, Coleman et al. (1982) compare senior year test scores in public and Catholic high schools after adjusting for sophomore year test scores, reasoning that most of the difference in sophomore test scores existed prior to high school and was primarily not an effect of the difference between public and Catholic high schools. They used the observed sophomore year test score in place of an important unobserved covariate, namely test scores immediately before high school. When there is no hidden bias, adjustment for a variable affected by the treatment may remove part of the treatment effect, that is, it may introduce a bias that would not have been present had adjustments been confined to covariates. When hidden biases are present, adjustments for affected concomitant variables may at times introduce an avoidable bias, or it may reduce a hidden bias, or it may do both to some degree. See Rosenbaum (1984b) for general discussion including several analytical options.

4. Sensitivity to hidden bias in observational studies

4.1. The first sensitivity analysis A sensitivity analysis asks how hidden biases of various magnitudes might alter the conclusions of an observational study. The first formal sensitivity analysis was conducted by Cornfield et al. (1959); see also Greenhouse (1982). They concluded that if one were to dismiss the observed association between heavy cigarette smoking and long cancer, if one were to assert that it does not reflect an effect of smoking but rather some hidden bias, then one would need to postulate a hidden bias of enormous proportions. Specifically, they concluded that an unobserved binary covariate would need to be nine times more common among heavy smokers than among nonsmokers and would need to be a near perfect predictor of lung cancer if it were to explain the observed association between smoking and lung cancer. There are many differences between smokers and nonsmokers, differences in diet, personality, occupation, social and economic status; however, it is difficult to imagine that these traits are nine times more common among smokers and are near perfect predictors of lung cancer. A sensitivity analysis restricts the scope of debate about hidden biases by indicating the magnitude of bias needed to alter conclusions. However, a sensitivity analysis does not indicate whether bias is present, nor what magnitudes of bias are plausible. There are examples of observational studies which were contradicted by subsequent randomized experiments even though the observational studies were insensitive to moderately large biases. This suggests that some observational studies have been affected by hidden biases that are quite large. See Rosenbaum (1988b) for discussion of an example, including a sensitivity analysis and a discussion of the subsequent randomized experiment.

Observational studies and nonrandomized experiments

189

4.2. A general method of sensitivity analysis The sensitivity analysis discussed here is similar in spirit to the analysis discussed by Cornfield et al. (1959); however, it is more general in that it is not confined to binary outcomes and it makes appropriate allowance for sampling error. The method is discussed in Rosenbaum (1987a, 1988b, 1991b) and Rosenbaum and Krieger (1990). Other methods of sensitivity analysis are discussed by Bross (1966), Gastwirth and Greenhouse (1987), Gastwirth (1992), and Schlesselmann (1978). If there is hidden bias, then 7r~i # 7rsj for some s, i, j. To quantify the magnitude of the hidden bias, introduce a parameter F >~ 1 such that: 1

7r~i(1

~< ~

-

7r~j) 7r~i) ~-

0.6 X

(c) (P,r) = {.70,.50)

to . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

0.0

0.2

0.4

0.6

0.8

1.0

X

(d) (p,~ = (.95,.90)

Fig. 3. Realizations for the cubic correlation function (p, 7) -----(a) (.15, .03), (b) (.45, .20), (c) (.70, .50), and (d) (.95, .90).

272

J. R. Koehler and A. B. Owen

4.3.2. Cubic

The (univariate) cubic correlation family is parameterized by p E [0, 1] and 7 E [0, 1] and is given for d E [0, 1] by R(d) = 1

-

3(1 p) d2 + 2+7

(1

-

p)(1

2+7

-

id13

where p and 7 are restricted by P>

5-)'2 + 87 - 1 ,72+47+7

to ensure that the function is positive definite (see Mitchell et al., 1990). Here p -- corr(Y(0),Y(1)) is the correlation between endpoint observations and 7 = corr(Y'(0), Y'(1)) is the correlation between endpoints of the derivative process. The cubic correlation function implies that the derivative process has a linear correlation process with parameter "7. A prediction model in one dimension for this family is a cubic spline interpolator. In two dimensions, when the correlation is a product of univariate cubic correlation functions the predictions are piece-wise cubic in each variable. Processes generated with the cubic correlation function are once mean square differentiable. Figure 3 shows several realizations of processes with the cubic correlation function and parameter pairs (.15, .03), (.45, .20), (.70, .50), (.95, .9). Notice that the realizations are quite smooth and almost linear for parameter pair (.95, .90). 4.3.3. Exponential

The (univariate) exponential correlation family is parameterized by 0 E (0, o~) and is given by R ( d ) = exp(-01dl)

for d C [0, 1]. Processes with the exponential correlation function are OrnsteinUhlenbeck processes (Parzen, 1962). The exponential correlation function is not mean square differentiable. Figure 4 presents several realizations of one dimensional processes with the exponential correlation function and 0 = 0.5, 2.0, 5.0, 20. Figure 4(a) is for 0 = 0.5 and these realizations have very small global trends but much local variation. Figure 4(d) is for 0 = 20, and is very jumpy. Mitchell et al. (1990) also found necessary and sufficient conditions on the correlation function so that the derivative process has an exponential correlation function. These are called smoothed exponential correlation functions. 4.3.4. Gaussian

Sacks et al. (1989b) generalized the exponential correlation function by using R ( d ) = exp(-Oidl q)

Computer experiments

\_.

.¢~/~...._r

-~---'"

o

:-",

f"

" ~

0.0

0.2

", .-. ~/c', "

. . . . . . . . . .

0.4

0.6

X (c) e h •

"

=

s

, ~ J~ .

.~q~

/,

,

V

0.0

0.2

0.4

0.6

0.8

1.0

X (d) e = 20

Fig. 4. Realizations for the exponential correlation function with 0 = (a) 0.5, (b) 2.0, (c) 5.0, and (d) 20.0.

where 0 < q ~< 2 and 0 E (0, c~). Taking q = 1 recovers the exponential correlation function. As q increases, this correlation function produces smoother realizations. However, as long as q < 2, these processes are not mean square differentiable. The Gaussian correlation function is the case q = 2 and the associated processes are infinitely mean square differentiable. In the Bayesian interpretation, this correlation function puts all of the prior mass on analytic functions (Currin et al., 1991). This correlation function is appropriate when the simulator output is known to be analytic. Figure 5 displays several realizations for various 0 for the Gaussian correlation function. These realizations are very smooth, even when 0 = 50. 4.3.5. Mat&n All of the univariate correlation functions described above are either zero, once or infinitely many times mean square differentiable. Stein (1989) recommends a more flexible family of correlation function (Matrrn, 1947; Yaglom, 1987). The Matrrn correlation function is parameterized by 0 E (0, c~) and v E (--1, c~) and is given by

R(d)= (Oldl) " K ,(Oldl)

274

J. R. Koehler and A. B. Owen tq.

>-

I

it ............

~

- -

0.0

•

0.2

~:_:_

-~-___.~_

.

0.4

0,6 X (a) e =

!I-:"-':. . . . . .•.

: " ....

:-7

0.2

::

......

.

..--:- . ~ : . :

- -. : _

0.8

_ 1.0

o . s

.......

0.0

..

- - .

.................

•

0.4

0.6

0.8

1.0

X (b) e = 2

• .~ .,.:- . . . . . . . . . . . . . . .

0.0

0.2

• .> ~...._.::7. ~

0.4

--

~ _ _

~-~:~_.,~

...

0,6

0.8

1,0

0.6

0.8

1,0

X (c) e = 1o

~

0.0

j

0,2

0.4 (d) X = 50

Fig. 5. Realizations for the Gaussian correlation function with 0 = (a) 0.5, (b) 2.0, (c) 10.0, and (d) 50.0.

where Kv(.) is a modified Bessel function of order v. The associated process will be m times differentiable if and only if v > m. Hence, the amount of differentiability can be controlled by v while 0 controls the range of the correlations. This correlation family is more flexible than the other correlation families described above due to the control of the differentiability of the predictive surface. Figure 6 displays several realizations of processes with the Mat6rn correlation function with v = 2.5 and various values of 0. For small values of 0, the realizations are very smooth and flat while the realizations are erratic fo r large values of 0. 4.3.6.

Summary

The correlation functions described above have been applied in computer experiments. Software for predicting with them is described in Koehler (1990). The cubic correlation function yields predictions that are cubic splines. The exponential predictions are non-differentiable while the Gaussian predictions are infinitely differentiable. The Mat6rn correlation function is the most flexible since the degree of differentiability and the smoothness of the predictions can be controlled. In general, enough prior information to fix the parameters of a particular correlation family and ~r2 will not be available. A pure Bayesian approach would place a prior distribution on the paramet e r s o f a family and use the posterior-distribution of the parameter in the estimation

275

Computer experiments >

................

0.0

0.2

0.4

0.6

0.8

1.0

X (a) o = 2

>-

~

.

0.0

-

.

~

0.2

............................... 0.4

0.6

2~...-.~

0.8

1.0

X (b) e = 4

-;-

0.0

0.2

0.4

0.6

0.8

1.0

X (c) 0 = 10

>.

~ ,

'

;

-

-

.

~< ........... . /

. ~ . - ~

~

0.0

77-~ "-i

0.2

0.4

0.6

0.8

1.0

X (d) e = 25

Fig. 6. Realizations for the Mat6rn correlation function with v = 2.5 and 0 = (a) 2.0, (b) 4.0, (c) 10.0, and (d) 25.0.

process. Alternatively, an empirical Bayes approach which uses the data to estimate the parameters of a correlation family and ~r2 is often used. The maximum likelihood estimation procedure will be presented and discussed in the next section.

4.4. Correlation function estimation - maximum likelihood

The previous subsections of this section presented the Kriging model, and families of correlation functions. The families of correlations are all parameterized by one or two parameters which control the range of correlation and the smoothness of the corresponding processes. This model assumes that ~rz, the family and parameters of R(.) are known. In general, these values are not completely known a priori. The appropriate correlation family might be known from the simulator's designers experience regarding the smoothness of the function. Also, ranges for ~r2 and the parameters of R(-) might be known if a similar computer experiment has been performed. A pure Bayesian approach is to quantify this knowledge into a prior distribution on ~r2 and R(.). How to distribute a non-informative prior across the different correlation

276

J. R. Koehler and A. B. Owen

families and within each family is unclear. Furthermore, the calculation of the posterior distribution is generally intractable. An alternative and more objective method of estimating these parameters is an empirical Bayes approach which finds the parameters which are most consistent with the observed data. This section presents the maximum likelihood method for estimating fl, ~r2 and the parameters of a fixed correlation family when the underlying distribution of Z(.) is Gaussian. The best parameter set from each correlation family can be evaluated to find the overall "best" a 2 and R(.). Consider the case where the distribution of Z(.) is Gaussian. Then the distribution for the response at the n design points Yo is multinormal and the likelihood is given by lik (/3, a 2, R I Yz)) = (27r)-n/2a-'~

IRD1-1/2

× exp{-~-~(YD-HI3)'RDI(YD-HP)} where

.RD is the design correlation matrix. The log likelihood is Iml(/3,0"2,RD IYD)=--'~n

in (27r) --"~ n In (a 2) -- ~ln 1 (IRD[) (Yo - HZ)'R

1

- HZ).

(5)

Hence

Olml(fl, cr2,R [ Yo) 1 Off = - - ~ (H'RD1YD -- H'RD1Hfl) which when set to zero yields the maximum likelihood estimate of/3 that is the same as the generalized least squares estimate,

flint = [H'RD' H]-1 H, RDIYD.

(6)

Similarly,

alml(~'Cr2'RDi~cr2 [ YD) = --~a + ~1 (YD -- H~3)tRDI(yD - Hi3) which when set to zero yields the maximum likelihood estimate of 0.2

ffml ~ n

(7)

Computerexperiments

277

Therefore, if RD is known, the maximum likelihood estimates of/3 and ~r2 are easily calculated. However, if R(-) is parameterized by 0 -- (01,..., 0s),

Olmt(/3'cr2'RUOOi I YD)

21OlRDlooi 21i(yD~

-- H/3)taro1

H /3)

1 ITf~RDAORD~ i

1 (YD-H/3) rR D10RD ~ 0 ~-1/~s [~D--Hfl)

(8)

does not generally yield an analytic solution for 0 when set to zero for i = 1 , . . . , s. (Commonly s = p or 2p, but this need not be assumed.) An alternative method to estimate/9 is to use a nonlinear optimization routine using equation (5) as the function to be optimized. For a given value of 0, estimates of/3 and cr2 are calculated using equations (6) and (7), respectively. Next, equation (8) is used in calculating the partial derivatives of the objective function. See Mardia and Marshall (1984) for an overview of the maximum likelihood procedure.

4.5. Estimatingand using derivatives In the manufacturing sciences, deterministic simulators help describe the relationships between product design, and the manufacturing process to the product's final characteristics. This allows the product to be designed and manufactured efficiently. Equally important are the effects of uncontrollable variation in the manufacturing parameters to the end product. If the product's characteristics are sensitive to slight variations in the manufacturing process, the yield, or percentage of marketable units produced, may decrease. Furthermore, understanding the sensitivities of the product's characteristics can help design more reliable products and increase the overall quality of the product. Many simulators need to solve differential equations and can provide the gradient of the response at a design point with little or no additional computational cost. However, some simulators require that the gradient be approximated by a difference equation. Then the cost of finding a directional derivative at a point is equal to evaluating an additional point while approximating the total gradient requires p additional runs. Consider Figure 7 for an example in p = 1 showing the effects of including gradient information on prediction. The solid lines, Y in Figure 7(a) and Y' in Figure 7(b), are the true function and it's derivative, respectively, while the long dashed lines are Kriging predictors Y3 and ~ ' based on n = 3 observations. As expected Y3 goes through the design points, D = {.2, .5, .8}, but Y3' is a poor predictor of Y'. The short dashed lines are the n = 3 predictors with derivative information Y3, and Y3t,. Notice that this predictor now matches Y' and Y at D and the interpolations are over all much better. The addition of gradient information substantially improves the fits of both Y and Y~. The dotted lines are the n = 6 predictors Y6 and Y6' and is a fairer comparison if the derivative costs are equal to the response cost. The predictor Y6 is a little better on the interior of S but Yr' is worse at x --- 0 than Y3',.

278

J. R. Koehler and A. B. Owen ~f

f l'//.,,, - -2,/.

>-

....

=

.......

J jJ

tN. 0

0.0

0.2

0.4

0.6

0.8

1.0

x

(a) The response / Q

I,.

(5

,

5',.%

t-

t%l O

o,

0.0

0.2

0.4

0.6

0.8

1.0

X (b) The derivative A

A

A

Fig. 7. (a) An example of a response (Y) and three predictors (Ya, Y3, ,Y6). (b) An example of a derivative (Y') and three predictors (Y3t, Y3~,,Y6~).

The Kriging m e t h o d o l o g y easily extends to m o d e l gradients. To see this for p = 1, let E[Y(.)] = # and d = t2 - tl, then Coy [ Y ( t l ) ,

Y'(t2)] = E [Y(tl)Y'(t2)] - E [ Y ( t l ) ] E [Y'(t2)] .

Now due to the stationarity of

Y(.),

Coy [Y(h), Y'(t2)] = E

E[Y'(.)] = 0 and

[Y(ta)Y'(ta)]

=E[Y(tl)limY(t2+~ = E [lim Y(h)Y(t2 k6~O

-Y(t2)

+ ~) - Y(h)Y(tz)] 6

R(d + 6) - R(d) &~o 6 = a2R ' (d)

= cr2 lim

Computer experiments

279

for differentiable R(.). Similarly, Cov [Y' (tl), Y(t2)] ---- -a2R '(d) and

Cov [Y'(t,), Y/(t2) ] = -o-2Rtt(d) For more general p and for higher derivatives, following Morris et al. (1993) let

y(a, .....~P)(t) = atl~,)...at(a,)Y(t) a~ p

where a = ~ j = l aj and tj is the jth component of t. Then E[Y( a' ..... ap)] = 0 and

p CoY [y(al .....ap)(tl),y(bl .....bp)(t2)] ~---(--1)a(T2H RSaJ-{"bJ)(t2j- tlj)

j=l for R(d) = rIj=l P Rj(dj). Furthermore, for directional derivatives, let Y~(t) be the directional derivative of Y(t) in the direction u = ( " 1 , . . . , "p)', ~P=I u2 = 1,

Y'(t) =

L~Y(t) u ~

j = (vY(t),.).

j=l Then E[Yd(t)] = 0 and for d = t - s, Coy [Y(s), Y~(t)] = E

[Y(s)Y~(t)]

P

= EE

[

ay(t).]

Y(s)---~--j ~j

j=l

" [r(~),--~, ~r(t)]j .j =~Cov j=l a2 X-~ O/~(d)

= ~2(R(e),-) where/~(d) =

(9)

[OR(d)/Odl,..., OR(d)/Odp]'. Similarly,

Coy [r~'(~), Y(t)] = - ~ ( R ( ~ ) , . )

(1o)

280

J. R. Koehler and A. B. Owen

and

Cov Iv" (~), Y" (t)] = -~,',/~(d),~,

(11)

where 0ZR(d)

(/~(d))~, = Od~Od, is the matrix of 2nd partial derivatives evaluated at d. The Kriging methodology is modified to model gradient information by letting , =

YD

[y(Xl ),.

..

, y(Xn), ytUll(Xl),

y,(~,

),

...,y,°~(~,)]r

'

where uit is the direction of the lth directional derivative at xi. Also let ,* = (,,,,...,#,0,0,...

,0)'

with n Us and m n 0s and let V* be the combined covariance matrix for the design responses and derivatives with the entries as prescribed above (equations (9), (10), and (11)). Then

t* .--1

2(~0) = ~ + v~0v

(v~

-

~*)

and

9"(x0) = v~0,.v '* *-' ( ~ , - ~*) '* = C o v [ r ' ( x 0 ) , r 3 ] . '* = Cov[Y(x0), ¥~], and v~0,, where v~0 Notice that once differentiable random functions need twice differentiable correlation functions. One problem with using the total gradient information is the rapid increase in the covariance matrix. For each additional design point, V* increases by p + 1 rows and columns. Fortunately, these new rows and columns generally have lower correlations than the corresponding rows and columns for an equal number of response. The inversion of V* is more computationally stable than for an equally sized VD. More research is needed to provide general guidelines for using gradient information efficiently. 4.6. Complexity of computer experiments

Recent progress in complexity theory, a branch of theoretical computer science, has shed some light on computer experiments. The dissertation of Ritter (1995) contains an excellent summary of this area. Consider the case where Y ( x ) = Z ( x ) , that is where there is no regression function. If for r >/ 1 all of the rrth order partial derivatives of Z ( x ) exist in the mean square sense and obey a Holder condition of order/3, then it

Computerexperiments

281

is possible (see Ritter et al., 1993) to approximate Z(x) with an L z error that decays as O(n-(r+~)/P). This error is a root mean square average over randomly generated functions Z. When the covariance has a tensor product form, like those considered here, one can do even better. Ritter et al. (1995) show that the error rate for approximation in this case is n-r-1/Z(logn) (p-l)(r+l) for products of covariances satisfying Sacks-Ylvisaker conditions of order r / > 0. When Z is a p dimensional Wiener sheet process, for which r = 0, the result is n-1/Z(logn) (p-l) which was first established by Wozniakowski (1991). In the general case, the rate for integration is n -I/2 times the rate for approximation. A theorem of Wasilkowski (1994) shows that a rate n -d for approximation can usually be turned into a r a t e n - d - l ~ 2 for integration by the simple device of fitting an approximation with n/2 function evaluations, integrating the approximation, and then adjusting the result by the average approximation error on n/2 more Monte Carlo function evaluations. For tensor product kernels the rate for integration is n - r - 1(log n) (P- 1)/2 (see Paskov, 1993), which has a more favorable power of log n than would arise via Wasilkowski's theorem. The fact that much better rates are possible under tensor product models than for general covariances suggests that the tensor product assumption may be a very strong one. The tensor product assumption is at least strong enough that under it, there is no average case curse of dimensionality for approximation.

5. Bayesian designs Selecting an experimental design, D, is a key issue in building an efficient and informative Kriging model. Since there is no random error in this model, we wish to find designs that minimize squared-bias. While some experimental design theories (Box and Draper, 1959; Steinberg, 1985) do investigate the case where bias rather than solely variance plays a crucial role in the error of the fitted model, how good these designs are for the pure bias problem of computer experiments is unclear. Box and Draper (1959) studied the effect of scaling factorial designs by using a first order polynomial model when the true function is a quadratic polynomial. Box and Draper (1983) extended the results to using a quadratic polynomial model when the true response surface is a cubic polynomial. They found that mean squared-error optimal designs are close to bias optimal designs. Steinberg (1985) extended these ideas further by using a prior model proposed by Young (1977) that puts prior distributions on the coefficients of a sufficiently large polynomial. However, model (2) is more flexible than high ordered polynomials and therefore better designs are needed. This section introduces four design optimality criteria for use with computer experiments: entropy, mean squared-error, maximin and minimax designs. Entropy designs maximize the amount of information expected for the design while mean squared-error designs minimize the expected mean squared-error. Both these designs require a priori knowledge of the correlation function R(.). The design criteria described below are for the case of fixed design size n. Simple sequential designs, where the location of

282

J. R. Koehler and A. B. Owen (D

to

(o

o d

d 0.0

•

0.6 Xl n=l

0.0

(o

o d

d

0.6 Xl n=2

~1

d

to

0.0

0.6 Xl n=3

to

0.6

d 0.0

Xl ,,1=5

0.6

d

0.0

0.6 Xl

-

0.0

0.6 Xl n=13

-

0.6

0.6 Xl n=lO

d

0.6 Xl n=14

0.6 X1 n=8

o d

0.0

0.6 Xl n=11

0.0

0.6

X1 n=12

c5

d

0.0

0.0

Xl n=7

d

0.0

n=9

o o-

0.0

XI n=6

o d

0.6 X1 n=4

¢,o

d 0.0

0.0

0.0

0.6 Xl n=15

0.0

0.6

Xl n=16

Fig. 8(a). Maximumentropy designs for p = 2, n = 1-16, and the Gaussian correlationfunctionwith 0 = (0.5, 0.5).

the r~th design point is determined after the first n - 1 points have been evaluated, will not be presented due to their tendencies to replicate (Sacks et al., 1989b). However, sequential block strategies could be used where the above designs could be used as starting blocks. Depending upon the ultimate goal of the computer experiment, the first design block might be utilized to refine the design and reduce the design space.

5.1. Entropy designs

Lindley (1956) introduced a measure, based upon Shannon's entropy (Shannon, 1948), of the amount of information provided by an experiment. This Bayesian measure uses the expected reduction in entropy as a design criterion. This criterion has been used in Box and Hill (1967) and Borth (1975) for model discrimination. Shewry and Wynn (1987) showed that, if the design space is discrete (i.e., a lattice in [0, 1Iv), then minimizing the expected posterior entropy is equivalent to maximizing the prior entropy.

Computerexperiments

0.0

. . . . 0.6

d

. 0.0

Xl n=l

.

.

.

.

0,6

283

o d

. 0.0

0.6

Xl n=2

0.0

Xl n=3

o d

d 0.0

0.6

o N 0.0

Xl n-=5

0.6

0.0

0.6

Xl n=6

0.6

Xl n=4

0.0

X1 n=7

0.6

Xl n=8

o d

o

0.0

0.6

0.0

Xl n=9

0.0

. . . . . . . 0.0 0.6

X1 n=lO

-

o

0.6

o

0.6 Xl n=t3

. . . . . . . . 0.0 0.6

X1 n=11

X1 n=12

o d

. . . .

0.0

d

0.6 Xl n=14

0.0

0.6 Xl n=15

. . . . . .

0.0

0.6 Xl n=16

Fig. 8(b). Maximum entropy designs for p = 2, n = 1-16, and the Gaussian correlation function with 0 = (2, 2).

DEFINITION 1. A design D E is a Maximum Entropy Design if

Ey [-lnP(YD~)] = m~nEv [ - l n P ( Y D ) ] where P(YD) is the density of YD. In the Gaussian case, this is equivalent to finding a design that maximizes the determinant of the variance of YD. In the Gaussian prior case, where/3 --~ Nk (b, r 2 S ) , the determinant of the unconditioned covariance matrix is

ivD + r2HSH, I=

VD + T2H~H ' H

0

=(

-T2ZH t

I

' °)

T2SH , I

284

J. R. Koehler and A. B. Owen

d

0.0

0.6 X1 n=l

0.0

0.6 X1 n=5

o d

o d

o d

•

0.0

O.fi X1 n=2

0.0

0.6 X1 n=6

o d

. . . . . . . 0.0 0.6

o d

X1 n=9

o d 0.6

Xl n=13

0.6 X1 n=3

0.0

0.6 X1 n=7

o d

o d

. . . . . . . 0.0 0.6

. . 0.0

.

o d

0.0

0.6 X1 n=4

0.0

0.6 X1 n=8

o d

. . . . . . . . 0.0 0.6

Xl n=lO

o d 0.0

. . . .

0.0

o d

. . . . . . . 0.0 0.6

Xl n=11

X1 n=12

o ~

. . 0.6

o d 0.0

Xl n=14

0.6

0.0

Xl n=15

0.6

X1 n=16

Fig. 8(c). Maximum entropy designs for p : 2, n : 1-16, and the Gaussian correlation function with 0 = (10, 10).

VD

H

-T2ZH

' I

I

0

H

I

-~-21JH' I

_vD H 0 T2ZH'V~IH+I = lVol l~2rH'Vp H

1 +

1I

: IVDIIH'V~ 1H + T- z S - ' l l ~ z~[. Since ~_2S is fixed, the maximum entropy criterion is equivalent to finding the design D E that maximizes

Iv~l I ~ ' v ~ l H + ~-2~-11 .

Computer experiments

285

If the prior distribution is diffuse, T2 ~ ee, the maximum entropy criterion is equivalent to

IVDI IH'vD ' H I and if/3 is treated as fixed, then the maximum entropy criterion is equivalent to IVz)I. Shewry and Wynn (1987, 1988) applied this measure in designs for spatial models. Currin et al. (1991) and Mitchell and Scott (1987) have applied the entropy measure to finding designs for computer experiments. By this measure, the amount of information in experimental design is dependent on the prior knowledge of Z(.) through R(.). In general, R(.) will not be known a priori. Additionally, these optimal designs are difficult to construct due to the required n x p dimensional optimization of the n design point locations. Currin et al. (1991) describe an algorithm adopted from DETMAX (Mitchell, 1974) which successively removes and adds points to improve the design. Figure 8(a) shows the optimal entropy designs for p = 2, n = 1 , . . . , 16, R(d) = e x p { - 0 ~ d~} where 0 = 0.5, 2, 10. The entropy designs tend to spread the points out in the plane and favor the edge of the design space over the interior. For example, the n = 16 designs displayed in Figure 8(a) have 12 points on the edge and only 4 points in the interior. Furthermore, most of the designs are similar across the different correlation functions although there are some differences. Generally, the ratio of the edge to interior points are constant. The entropy criterion appears to be insensitive to changes in the location of the interior points. Johnson et al. (1990) indicate that entropy designs for extremely "weak" correlation functions are in a limiting sense maximin designs (see Section 5.3).

5.2. Mean squared-error designs Box and Draper (1959) proposed minimizing the normalized integrated mean squarederror (IMSE) of Y(x) over [0, 1]v. Welch (1983) extended this measure to the case when the bias is more complicated. Sacks and Schiller (1988) and Sacks et al. (1989a) discuss in more detail IMSE designs for computer experiments, DEFINITION 2. A design D1 is an

Integrated Mean Squared-Error (IMSE) design if

J(DI) = n~n J(D) where J(Z)) =

1 rio

E[Y(x) - Y(x)] 2 dx.

J(D) is dependent on R(.) through Y(x). For any design, J(D) can be expressed as

J ( D ) = a 2 - trace

{I

0 H' H lid

"h(x)h'(x) h(x)v~ v~h'(x) v~v~

1} dx

J. R. Koehler and A. B. Owen

286

and, as pointed out by Sacks et al. (1989a), if the elements of h(x) and V= are products of functions of a single input variable, the multidimensional integral simplifies to products of one-dimensional integrals. As in the entropy design criterion, the minimization of J ( D ) is a optimization in n x p dimensions and is also dependent on R(.). Sacks and Schiller (1988) describe the use of a simulated annealing method for constructing IMSE designs for bounded and discrete design spaces. Sacks et al. (1989b) use a quasi-Newton optimizer on a Cray X-MP48. They found that optimizing a n = 16, p = 6 design with 01 . . . . . 06 = 2 took 11 minutes. The PACE program (Koehler, 1990) uses the optimization program NPSOL (Gill et al., 1986) to solve the IMSE optimization for a continuous design space. For n = 16, p = 6, this optimization requires 13 minutes on a DEC3100, a much less powerful machine than the Cray. Generally, these algorithms can find only local minima and therefore many random

:o d

d

d

,,¢ (5

d

(:5

o

o

o 6

d 0.4

0.0

O.B

0

(:5

0

d 0.4

0.0

Xl n=l

c0

0

0.8

0.4

0.0

X1 n=2

0

0

0.8

Xl n=3

co

0

0

(5

0

0

0 d

(:5 0

d

0

0

o (5

0

0

0 0

o

(5 0.0

0.0

O.B

0.4

0.4

0.8

0.4

0.0

Xl n=5

Xl n=4

0.8

Xl n=6

0 0 0

0

co

0

0

0

d 0

0 0

0 0

0

0

0.4 Xl n=7

0.8

(5

0

0 0 0 0

0

o

0

o

,5 0.0

0

0

(5

d

co (5

0

c;

0.0

0.4 Xl n=B

0.8

0.0

0.4

O.B

Xl n=9

Fig. 9(a). Minimum integrated mean square error designs for 19 = 2, n = 1-9, and the Gaussian correlation function with 0 = (.5, .5).

Computer experiments

287

starts are required.

Since

J(D) is

dependent on R(.), robust designs need to be found for general

R(.). Sacks et al. (1989a) found that for n = 9, p = 2 and

R(d) =

exp{-O ~=1

~}

(see Section 4.3.4 for details on the Gaussian correlation function) the IMSE design for 0 = 1 is robust in terms of relative efficiency. However, this analysis used a quadratic polynomial model and the results may not extend to higher dimensions nor different linear model components. Sacks et al. (1989b) used the optimal design for the Gaussian correlation function with 0 = 2 for design efficiency-robustness. Figure 9(a) displays IMSE designs for p = 2 and n = 1 , . . . ,9 for 0 = .5, 2, 10. The designs, in general lie in the interior of S. For fixed design size n, the designs usually are similar geometrically for different O values with the scale decreasing as 0 increases. They have much symmetry for some values of n, particularly n = 12. Notice that for the case when n = 5 that the design only takes on three unique values for each of the input variables. These designs tend to have clumped projections onto

o

d

o

. . . . . . 0.0 0.6

d

o

. . . . 0.0 0.6

X1 n=l

o

d

c5 0.0

Xl n=2

to

~o

o d

o (5 . . . . 0.0

0.6

0.0

X1 n=3

0.6 Xl n=4

(o

~o

0 0.0

0.6 X1 n=5

d 0.6

6 0.0

0.6 Xl n=13

t0

O0

0

0

. . . . 0.0 0.6 Xl n=14

0.0

c5 0.0

0.6

0.0

Xl n=11

0

0.6 Xl n=8

d

t°°°° 0o o

0.6 Xl n=7

X1 n=lO

~ o

o c5 0.0

. . . . . . 0.0 0.6

X1 n=9

~

o 6

0.6 Xl n=6

d 0.0

0

~

0.6 Xl n=12

~ o

o

d 0.0

0.6 Xl n=15

0.0

0.6 Xl n=16

Fig. 9(b). Minimum integrated mean square error designs f o r p = 2, n = 1-16, and the Ganssian correlation function with O = (2, 2).

288

J. R. Koehler and A. B. Owen

o

d

o

. . . . . 0.0 0.6

o

d

Xl n=l

o

d

6

0.6

0.0

Xl n=5

0.6

o: l °o

o

c5

d 0.0

0.0

n=8

°°

O

0.6

0

[

o

~o

0.6 Xl n=12

~

o

~

o 6

o 6 0.0

Xl n=13

0.0

Xl n=11

oooo

o 1o

~o

0.6 X1

o 6 0.0

Xl n=lO

• 0.6

0.6 Xl n=7

0.6

0.6 X1 n=4

o d

0 00 o c~ • • 0.0

0.0

o

..... 0.6

0.0

X1 n=9

o

0.6 Xl n=3

o d

d 0.0

d 0.0

X1 n=6

o

~o

0.6 Xl n=2

o

0.0

o

~ 0.0

0,6 Xl n=14

o d 0.0

0.6 Xl n=15

0.0

0.6 Xl n=16

Fig. 9(c). Minimum integrated mean square errordesigns for p = 2, n = 1-16, and the Gaussian correlation function with 0 = (10, 10).

lower dimension marginals of the input space. Better projection properties are needed when the true function is only dependent on a subset of the input variables.

5.3. Maximin and

minimax

designs

Johnson et al. (1990) developed the idea of minimax and maximin designs. These designs are dependent on a distance measure or metric. Let d(., .) be a metric on [0, 1]p. Hence Vzl, x2, z3 c [0, 1]p,

d(Xl,Xz)=d(xz, xl), d(xl,x2) ~ O, d ( z l , x 2 ) = O ~ x l =x2,

d(Xl,X2) < d(xl,x3) + d(x3,x2).

Computer experiments ¢q.

289

0

0

0

t'Xl

× ~. o. 0

0

0

0

0 .

.

0 .

.

o 0.0

.

0.4

.

0.8

Xl (a) Minimax

O

o

o 0

0.0

0.4

0.8

Xl (b) Maximin

Fig. 10. (a) Minimax and (b) Maximin designs for n = 6 and p = 2 with Euclidean distance.

DEFINITION 3. Design DMI is a Minimax Distance Design if r ~ n m a x d ( x , D) = max d(x, DMI) where

d(x, D) = min d(x, xo). xoE D

Minimax distance designs ensure that all points in [0, 1]p are not too far from a design point. Let d(., .) be Euclidean distance and consider placing a p-dimensional sphere with radius r around each design point. The idea of a minimax design is to place the n points so that the design space is covered by the spheres with minimal r. As an illustration, consider the owner of a petroleum corporation who wants to open some franchise gas stations. The gas company would like to locate the stations in the most convenient sites for the customers. A minimax strategy of placing gas stations would ensure that no customer is too far from one of the company's stations. Figure 10(a) shows a minimax design for p -- 2 and n = 6 with d(., .) being Euclidean distance. The maximum distance to a design point is .318. For small n, minimax designs will generally lie in the interior of the design space. DEFINITION 4. A design DMA is a Maximin Distance Design if max

min

D

Xl~x2EO

d(x,,x2) =

min Xl,fC2EDMA

d(x~,x2).

J. R. Koehler and A. B. Owen

290

Again, let d(., .) be Euclidean distance. Maximin designs pack the n design points, with their associated spheres, into the design space, S, with maximum radius. Parts of the sphere may be out of S but the design points must be in S. Analogous to the minimax illustration above is the position of the owners the gas station franchises. They wish to minimize the competition from each other by locating the stations as far apart as possible. A maximin strategy for placing the franchises would ensure that no two stations are too close to each other. Figure 10(b) shows a maximin design for p = 2, n = 6 and d(., .) Euclidean distance. For small n, maximin designs will generally lie on the exterior of S and fill in the interior as n becomes large.

5.4. Hyperbolic cross points Under the tensor product covariance models, it is possible to approximate and integrate functions with greater accuracy than in the general case. One gets the same rates of convergence as in univariate problems, apart from a multiplicative penalty that is some power of log n. Hyperbolic cross point designs, also known as sparse grids have been shown to achieve optimal rates in these cases. See Ritter (1995). These point sets were first developed by Smolyak (1963). They were used in interpolation by Wahba (1978) and Gordon (1971) and by Paskov (1993) for integration. Chapter 4 of Ritter (1995) gives a good description of the construction of these points and lists other references.

6. Frequentist prediction and inference The frequentist approach to prediction and inference in computer experiments is based on numerical integration. For a scalar function Y = f ( X ) , consider a regression model of the form Y = y(x)

- z(x)z

(12)

where Z ( X ) is a row vector of predictor functions and fl is a vector of parameters. Suitable functions Z might include low order polynomials, trigonometric polynomials wavelets, or some functions specifically geared to the application. Ordinarily Z ( X ) includes a component that is always equal to 1 in order to introduce an intercept term into equation (12). It is unrealistic to expect that the function f will be exactly representable as the finite linear combination given by (12), and it is also unrealistic to expect that the residual will be a random variable with mean zero at every fixed Xo. This is why we only write f - Z/3. There are many ways to define the best value of/3, but an especially natural approach is to choose/3 to minimize the mean squared error of the approximation, with respect to some distribution F on [0, 1]p. Then the optimal value for fl is

Computer experiments

291

So if one can integrate over the domain of X then one can fit regression approximations there. The quality of the approximation may be assessed globally by the integrated mean squared error

f (Y - Z(X)fl) 2 dF. For simplicity we take the distribution F to be uniform on [0, 1]v. Also for simplicity the integration schemes to be considered usually estimate f g(X)dF by n

1 n

i:1

for well chosen points x l , . . . , xn. Then

n z(xi)'z(xo

fi =

ilLS may be estimated by linear regression _1 n

i=l

Z(xi)tf(xi), i=1

or when the integrals of squares and cross products of Z's are known by

:

(/

z(X)'Z(X)dF

)-',-

n

Z(xO'f(.O.

(13)

i:1

Choosing the components of Z to be an orthogonal basis, such as tensor products of orthogonal polynomials, multivariate Fourier series or wavelets, equation (13) simplifies to 1

n

(14) i=1

and one can avoid the cost of matrix inversion. The computation required by equation (14) grows proportionally to nr not n 3, where r = r(n) is the number of regression variables in Z. If r = O(n) then the computations grow as n 2. Then, in the example from Section 3, an hour of function evaluation followed by a minute of algebra would scale into a day of function evaluation followed by 9.6 hours of algebra, instead of the 9.6 days that an n 3 algorithm would require. If the Z(xi) exhibit some sparsity then it may be possible to reduce the algebra to order n or order n log n. Thus the idea of turning the function into data and making exploratory plots can be extended to turning the function into data and applying regression techniques. The theoretically simplest technique is to take Xi iid U[0, 1]v. Then (Xi, Yi) are iid pairs

292

J. R. Koehler and A. B. Owen

with the complication that Y has zero variance given X. The variance matrix of ~ is then

1

(/)-1

Z'ZdF

n

Var(Z(X)'Y(X))

(/)-1

ZtZdF

(15)

and for orthogonal predictors this simplifies further to

1Var (Z(X)'Y(X)) .

(16)

n

Thus any integration scheme that allows one to estimate variances and covariances of averages of Y times components of Z allows one to estimate the sampling variance matrix of the regression coefficients/3. For iid sampling one can estimate this variance matrix by 1

n-r-1

- ,

(Z(xi)Y(xi)-t3) (Z(xi)Y(xi)-fl)

i=1

when the row vector Z comprises an intercept and r additional regression coefficients. This approach to computer experimentation should improve if more accurate integration techniques are substituted for the iid sampling. Owen (1992a) investigates the case of Latin hypercube sampling for which a central limit theorem also holds. Clearly more work is needed to make this method practical. For instance a scheme for deciding how many predictors should be in Z, or otherwise for regularizing/3 is required.

7. Frequentist experimental designs The frequentist approach proposed in the previous section requires a set of points x l , . . . , xn that are good for numerical integration and also allow one to estimate the sampling variance of the corresponding integrals. These two goals are somewhat at odds. Using an iid sample makes variance estimation easier while more complicated schemes described below improve accuracy but make variance estimation harder. The more basic goal of getting points x~ into "interesting corners" of the input space, so that important features are likely to be found is usually well served by point sets that are good for numerical integration. We assume that the region of interest is the unit cube [0, 1]p, and that the integrals of interest are with respect to the uniform distribution over this cube. Other regions of interest can usually be reduced to the unit cube and other distributions can be changed to the uniform by a change of variable that can be subsumed into f . Throughout this section we consider an example with p = 5, and plot the design points xi.

293

Computer experiments

1.0

0.8

0.6

-

0.4

-

0.2

-

•

•

•

•

•

0.0 J

I

i

r

I

I

0.0

0.2

0.4

0.6

0.8

1.0

X1

Fig. 11.25 distinct points among 625 points in a 55 grid.

7.1. Grids

Since varying one coordinate at a time can cause one to miss important aspects of f , it is natural to consider instead sampling f on a regular grid. One chooses k different values for each of X I through X p and then runs all k p combinations. This works well for small values of p, perhaps 2 or 3, but for larger p it becomes completely impractical because the number of runs required grows explosively. Figure 11 shows a projection of 55 = 625 points from a uniform grid in [0, 1]5 onto two of the input variables. Notice that with 625 runs, only 25 distinct values appear ~ n the plane, each representing 25 input settings in the other three variables. Only 5 distinct values appear for each of input variable taken singly. In situations where one of the responses y k depends very strongly on only one or two of the inputs X j the grid design leads to much wasteful duplication. The grid design does not lend itself to variance estimation since averages over the grid are not random. The accuracy of a grid based integral is typically that of a univariate integral based on k = n 1/p evaluations. (See Davis and Rabinowitz, 1984.) For large p this is a severe disadvantage.

294

J. R. Koehler and A. B. Owen

1.0

0.8

0.6

0.4

0.2

0.0

i..!................................................................................................... I

I

I

I

I

I

0.0

0.2

0.4

0.6

0.8

1.0

Xl

Fig. 12. A 34 point Fibonacci lattice in [0, 1]2.

7.2. Good lattice points A significant improvement on grids may be obtained in integration by the method of good lattice points. (See Sloan and Joe (1994) and Niederreiter (1992) for background and Fang and Wang (1994) for applications to statistics.) For good lattice points

j X i -~

{hi(i-i)+0.5}_ ?~

where {z} is z modulo 1, that is, z minus the greatest integer less than or equal to z and hj are integers with hi = 1. The points vi with v~ = ihj/n for integer i form a lattice in R p. The points xi are versions of these lattice points confined to the unit cube, and the term "good" refers to a careful choice of n and hj usually based on number theory. Figure 12 shows the Fibonacci lattice for p = 2 and n = 34. For more details see Sloan and Joe (1994). Here hi = 1 and h2 = 21. The Fibonacci lattice is only available in 2 dimensions. Appendix A of Fang and Wang (1994) lists several other choices for good lattice points, but the smallest value of n there for p = 5 is 1069.

Computer experiments

295

Hickernell (1996) discusses greedy algorithms for finding good lattice points with smaller n. The recent text (Sloan and Joe, 1994) discusses lattice rules for integration, which generalize the method of good lattice points. Cranley and Patterson (1976) consider randomly perturbing the good lattice points by adding, modulo 1, a random vector uniform over [0, 1]p to all the xi. Taking r such random offsets for each of the n data points gives nr observations with r - 1 degrees of freedom for estimating variance. Lattice integration rules can be extraordinarily accurate on smooth periodic integrands and thus an approach to computer experiments based on Cranley and Patterson's method might be expected to work well when both f ( x ) and Z ( x ) are smooth and periodic. Bates et al. (1996) have explored the use of lattice rules as designs for computer experiments.

7.3. Latin hypercubes While good lattice points start by improving the low dimensional projections of grids, Latin hypercube sampling starts with iid samples. A Latin hypercube sample has X~ - ~rJ(i) - Uj

(17)

n

where the 7rJ are independent uniform random permutations of the integers 1 through n, and the U~ are independent U[0, 1] random variables independent of the 7rj. Latin hypercube sampling was introduced by McKay et al. (1979) in what is widely considered to be the first paper on computer experiments. The sample points are stratified on each of p input axes. A common variant of Latin hypercube sampling has centered points =

J(i) - 0 . 5

(18)

n

Point sets of this type were studied by Patterson (1954) who called them lattice samples. Figure 13 shows a projection of 25 points from a (centered) Latin hypercube sample over 5 variables onto two of the coordinate axes. Each input variable gets explored in each of 25 equally spaced bins. The stratification in Latin hypercube sampling usually reduces the variance of estimated integrals. Stein (1987) finds an expression for the variance of a sample mean under Latin hypercube sampling. Assuming that f f ( X ) 2 d F < co write p

f(x) = , +

j(xJ) + e ( x ) j=l

(19)

J. R. Koehler and A. B. Owen

296

i i i i i i i i i i i i i i ~ i i ~ i i i ~ i i l l ~ . o . . . . . !~..?.~!-..t.....i~.~..~..~..i.....!....~i.....i..~.!.~.i~.~..i..~..i~...!~..?~.~..i~.i~.~.!~..~.?.i~.~.....~....~!~....~ .... ,.....,.....,.....,.....,....,.....,.....,....,.....,.....,.....,....,.....,.....,....,.....,.....,....,....,.....,.....,....,.....,.....,....,. !i---!v! ......... V T i ! T I T V V V ! - ! i i V V V I - i - ~ ~ .... ! ! i ! i i ! i ! ! i ! ! ! ! ! ! i ! ! i ! ! ! O i i i i !e! i ! i i i i i i i i i i i i i ! i

.....! i " V ! - i

V'T'!

I-'V!" ; - : i " i - V ? - i

T~

i

.....

! !

i

I!Fi?IT~

.....

.... i~.?....~..`..~...?....!~.~.i....~?.....~...~.!~.~..i..~..~...~....i....~..~.4.~..!.....i~..@...~.~"i..~.~.i..~..!.~..~?.....? i ! i ! i i O ! ! i i l ! i ~ i ! ~ ! ! i i ! : i : : ! ! ! ! i i ! i ! i i ! i ! ! i ! ~ [ ! ! O i ! i ! i

....i T V T ....~ ~ - V T

0,6

TVVIqi~ T r !i TT ! Ti-~

V T K i IKTI[VIV; i-V-~ii-T-TTT-i V!T~~

2

.... !~....+..~.~.~..~.....~.~..~`.~.-~..~..~...~*..~.~.....~.~...~....~.....~...~.~.".~....`~.~...~..~.*..-.~....:.....~...~.~..~....~....~ ! ! i i ! ! i o ! ! i l i ! i ~ i i i ! i i i ! ! ~

.....

....!'"'?'"'!'""!'""?'"'?'"+ "'"~""?'"'!'""!'""?'"'~'-"'['""~'"'!"'"'!'""!""'~"-?'"'!'"'?'"'!'""!'""~'"'? cM X

i i i i i i i O i i i i i i ~ i i ~ i i i ~ i i l l i ! i ! i i l l l i l l l i i i i i i i i i O i l l .... ~;~.~.:...-.~.....~.~..;~...:`....~.~.~..~`*...~....4.....~..~...~:~...~....~.~.~.....~.-.*.~..~...~.:~....~.""....~.~..~...* i !Oii i i ! i ! i i i i i i i i i i ! ! i i i ! ! ! i i i ! ! ! ! i ! ! i ! i ! i i i ! i l l ! O i ~.4-...+...+....i....+....i....i~....i.....i...~.i...~.i.....i.....i....i.....i.....i....i.....i.....i....i....~.....i....~....i.....i".~.~....~. : : : : : : : : : : : : : : : : : : : : : : : : : : :

....[ V V T i

0.2

~

!

i

VVT!! ioli

!

V!!-Vi i

i

i

i

!

VVT i

....I K V T - V ! VVTVTriVT ....i i V T T V V T T V i ' f V T T V T i i

i

!

i

!

V i!

.... i

i

io~

~ i ! ~ ~ ~ ~

i

i

i

i

i

!

i

K! VTKVV"! TVT VVT

-?'"'?" : : : ~'""i : : : - :?'"!-: : ~: -'~'" : : :~ : ~'i'i'" : : : ?'-'~"'"! : : : : --~: ""~": : -~: : "~'"'y i ! i ! i i i ! ! i ! i i i ! ~ i i ! i o ! ! i ! i !

i

!

~ K7

~ --"~'"-! -- ~- ""!"-" ~---i

...., , , . , . , . , . . , , . , . , , . , . , , . , , , , , , , , , , , , ....i - - T - - i - T T i T - i V ! T ! ! ~ T

!VTTTVVVi

VF!

i i ! ! ! ! i ! i ! i ! i l i ~ O i ! ! ! ! i ! [ ! i

oo 2!i-i I

??i!ii?iiiiiiilH?iHiii?i-i I

0.0

I

0.2

I

0.4

0.6

I 0.8

I 1.0

X1

Fig. 13.25 points of a Latin hypercube sample. The range of each input variable may be partitioned into 25 bins of equal width, drawn here with horizontal and vertical dotted lines, and each such bin contains one of the points.

tz = f f(X)dF and aj(x) = fx:xJ=~(f(X) -#)dF_j in which dF_j = IIk#j dXk is the uniform distribution over all input variables except the j'th. Equa-

where

tion (19) expresses f as the sum of a grand mean p, univariate main effects aj and a residual from additivity e(X). Stein shows that under Latin hypercube sampling

Var(l£

)

1 Ie(X)2dF+o

whereas under iid sampling V a r ( - f ( x~ ni ) l ~/~

i=l

) (/ = _1 n

e(X) 2dF + ~ j=l

o~j(XJ)2 d F

)

.

(21)

By balancing the univariate margins, Latin hypercube sampling has removed the main effects of the function f from the error variance.

Computer e3qoeriments

297

Owen (1992a) proves a central limit theorem for Latin hypercube sampling of bounded functions and Loh (1993) proves a central limit theorem under weaker conditions. For variance estimation in Latin hypercube sampling see (Stein, 1987; Owen, 1992a).

7.4. Better Latin hypercubes Latin hypercube samples look like random scatter in any bivariate plot, though they are quite regular in each univariate plot. Some effort has been made to find especially good Latin hypercube samples. One approach has been to find Latin hypercube samples in which the input variables have small correlations. Iman and Conover (1982) perturbed Latin hypercube samples in a way that reduces off diagonal correlation. Owen (1994b) showed that the technique in Iman and Conover (1982) typically reduces off diagonal correlations by a factor of 3, and presented a method that empirically seemed to reduce the off diagonal correlations by a factor of order n from O(n -1/2) to O(n-3/2). This removes certain bilinear terms from the lead term in the error. Dandekar (1993) found that iterating the method in Iman and Conover (1982) can lead to large improvements. Small correlations are desirable but not sufficient, because one can construct centered Latin hypercube samples with zero correlation (unless n is equal to 2 modulo 4) which are nonetheless highly structured. For example the points could be arranged in a diamond shape in the plane, thus missing the center and comers of the input space. Some researchers have looked for Latin hypercube samples having good properties when considered as designs for Bayesian prediction. Park (1994) studies the IMSE criterion and Morris and Mitchell (1995) consider entropy.

7.5. Randomized orthogonal arrays An orthogonal array A is an n by p matrix of integers 0 ~< A~ ~< b - 1. The array has strength t ~< p if in every n by t submatrix of A all of the bt possible rows appear the same number A of times. Of course n = Abt. Independently Owen (1992b, 1994a) and Tang (1992, 1993) considered using orthogonal arrays to improve upon Latin hypercube samples. A randomized orthogonal array (Owen, 1992b) has two versions,

X [ = 7cj(A~) + U~ b

(22)

and X ] = ~'J(AJ) + 0.5 b

(23)

just as Latin hypercube sampling has two versions. Indeed Latin hypercube sampling corresponds to strength t = 1, with A = 1. Here the 7rj are independent uniform

298

J. R. KoehIer and A. B. Owen 1.0

0.8

Q

0.6 |

X 0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

X1 Fig. 14. 25 points of a randomly centered randomized orthogonal array. For whichever two (of five) variables that are plotted, there is one point in each reference square.

permutations of 0 , . . . , b - 1. Patterson (1954) considered some schemes like the centered version. If one were to plot the points of a randomized orthogonal array in t or fewer of the coordinates, the result would be a regular grid. The points of a randomized orthogonal array of strength 2 appear to be randomly scattered in 3 dimensions. Figure 14 shows a projection of 25 points from a randomly centered randomized orthogonal array over 5 variables onto two of the coordinate points. Each pair of variables gets explored in each of 25 square bins. The plot for the centered version of a randomized orthogonal array is identical to that for a grid as shown in Figure 11. The analysis of variance decomposition used-above for Latin hypercube sampling can be extended to include interactions among 2 or more factors. See Efron and Stein (1981), Owen (1992b) and Wahba (1990) for details. Gu and Wahba (1993) describe how to estimate and form confidence intervals for these main effects in noisy data. Owen (1992b) shows that main effects and interactions of t or fewer variables do not contribute to the asymptotic variance of a mean over a randomized orthogonal array, and Owen (1994a) shows that the variance is approximately n -1 times the sum of integrals of squares of interactions among more than t inputs.

Computer experiments

299

i iitil iii+iii.ltll

1.0

.... r-. --~--+--+--,----+.-i---+--+.---,--.+------+-.--~ . . . . . . . . . . . . ..... ~ " ' ~ : ' " " ~

.......................................

i T o T i

ii

t.... i'""i"'-!'" ....

~" ............... iiil

......... i T T !

........... F T T - [

0.8 .....

:....,~.....~,....~

..........

~.....~..,.~.....~

................................

....

:.---,~.--.-i.--..:-

.........

~----+.--+....~

..........

....." ' T ' ! ' i

.... ! ' " - ~ " T ' !

.

:--.--~---.-i..---:-

.

,

....

.......... ! ' ! ' " " ! " " ~

0.6 .....

C'kl

x

: ' " ' : ' " " : ' " " ~

i i i i

.....:~ .. . . .". ~-. . .-. ..~~. , . .". : - r.

:: ~

:: ~

W ' " ~ ' " ' T ' " ' !

!

i ~

:: ~

i ~

..... i~ ! " ' +! ~ ' " ~i " !

................................. i i i l .... T I T ~ .....

~ i

:.....~.....~.....:

:: ~

......

!

i .

i i

:

!

:

:,

i ~

i

::

. . . . . . . .

i

i |

~ i

..........

:: i

~...._~.....:

:: i_~ .....

!

," ..........

!

i

~....+...,:.....~

: ' " ' : ' " " : ' " " ~

i i i i

i •i

i .~

i

:.

:.o!

.......... F'"TT-"i

i!iiiiii iiiiiiil I i~ii ',Ùi~

....

:; i

i •~

.....

f'-i-'".:'"i

i

t i i i i

'

i

..o.~ -,

..........

...........

•

i i

....

1"'2 W'" "'!'"'!"'"'i"'"'1""'?""!"'"'['""~

......... i i i i

.

i i i i • i] .......................... iilZii?51iiSi

.. . . . .. . . . :r... -. . . -. : , -. . . -+ . ,~. . ~ - ~

..........

+! - ~ " +! + ~ 'i " ~ " '!~ ' + ' " ~ ii1!

......................

0.2

i

. . . ... . ... ..... . . . . ... ~!. .-. . .;~..-. . ,-. : .r. . .~. ~.

.....

0.4

..........

i

i

i

i

!

i

:: :: i

I--T-Y1

. . . . ........................... " , , , , .......... ..... i . . . . . . . i i i i .....iiiiiiiiiiiiiiiiiilLiiiiiiiiiiiilL~iiiiilL.iiiiiiiilSii'.iiiiii..! ~............................... ,~r~ ............................ i...~_~...~ i ! i ! i ! ! i o! i i i

0.0

i

0.0

0.2

0.4

0.6

i

:, i

i

': i

0.8

~ i

1.0

X1 Fig. 15. 25 points of an orthogonal array based Latin hypercube sample. For whichever two (of five) variables that are plotted, there is one point in each reference square bounded by solid lines. Each variable is sampled once within each of 25 horizontal or vertical bins.

Tang (1993) introduced orthogonal array based Latin hypercube samples. The points of these designs are Latin hypercube samples X~, such that [bX]J is an orthogonal array. Here b is an integer and [zJ is the smallest integer less than or equal to z. Tang (1993) shows that for a strength 2 array the main effects and two variable interactions do not contribute to the integration variance. Figure 15 shows a projection of 25 points from an orthogonal array based Latin hypercube sample over 5 variables onto two of the coordinate points. Each variable individually gets explored in each of 25 equal bins and each pair of variables gets explored in each of 25 squares.

7.6. Scrambled nets Orthogonal arrays were developed to balance discrete experimental factors. As seen above they can be embedded into the unit cube and randomized with the result that sampling variance is reduced. But numerical analysts and algebraists have developed some integration techniques directly adapted to balancing in a continuous space. Here we describe (t, m, s)-nets and their randomizations. A full account of (t, m, s)-nets

J. R. Koehler and A. B. Owen

300 !

1.0

!

!

i

TT"T'i -i--.,.i.---i---,.il~

0.8

i

i

:

:

i

i

i

!

i

.......... i""~'"Ti i ......... i"V~'"'i'" ! i

i

:

i

:

i

i

~

i

:

.... i

!

i

i"'i'""ii

l

.

.

:

:

:

!

. , . • i. . .! . .!i . .~i .

" i~

i

!

i

.

.

i

i

:

:

:

:

:

:

:

:

:

:

i

~

:

:

1

J

i

l

l

:

:

:

:

i ! i l

:

:

:

:

:

!

i

i

' ' i~ , ! :

:

:

' i!

:

:

:

.:.....~.,...~....,~ ....

0.6

:

:

:

:

:

:

:

:

..... ! ~ i i

.....

X

~

..... ? i ? ~

!

..... i,

i

!

i

!....-i--~

:

:

i

i

:

~....~.....i...m

......... i

:

;

:

~

:

:

~

~

~

~'-"i---!--'"i . . . .

. . . . . . . . ~. .? . ~ . ! ?

:

:

l

l

i-ii

:

:

:

:

.

i

i

i

i

::

,

::

: .

:

;:

~

i

:

:

:

ei

i

.

.

.

..... i--i..--.!-~: . . . .

i

:

:

:

!. . . !.

i

~

~

~

~

:

:

:

:

:

:

:

.

.

:

:

.

.

..... !..i....i~,i

j

i

:

.....

;

:

;

:

~

;

:

:

:

:

!

i

. . . .

:

:

:

i

:

:

:

:

:

~

..... ~.,..:.....:.....~

J

i

l

l

i

~

i

l

i

;

i

~

i

l

i

~

~

i

!

i

:

:

:

:

:

:

:

:

:

:

:

.

.

.

.

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

:

..... :i"--:-'"! : : ..... :i:

:

:

:

r

~.--.~-~ : : :

. . . .

:

:

....

;: : ;

:

..... !.....i.....i...,.i...i

0.0

~

.....

..... !-~t-~

0.2

:

.......... i V T i

..... :.,...~.....~.....: : : : :

0.4

:

~

~-'"i"-!"-"i"-D-"i'---!--"i'"'-i .... . . . . : : : :

i i l l

:

:

:

. :. : . : . . . . .

:

.......... !.+-~--.~.-: : : :

:

. :

~

0.0

0.2

:

:

.

.

.

:

: .

~i,-.i---i-.--i : : : :

. . . : : :

: .

:

: .

: .

0.4

:

. .

.

.....

.

.

.

0.6

.

:

"'i":i.

.

i:

:

i:

:

1 .

:

.

o"i

.

.

.

...... .

.

i

:

: .

:

.

. .

i

:

: .

:

i -i---i'"

: ~i

. . . .

: .

:

.

O

.

:

:

:

.

0.8

1.0

X1

Fig. 16. 2 5 points o f a s c r a m b l e d (0, 2, 5 ) - n e t in b a s e 5. F o r w h i c h e v e r t w o ( o f five) variables that are plotted, there is o n e point in e a c h r e f e r e n c e square. E a c h variable is s a m p l e d o n c e w i t h i n e a c h o f 2 5 e q u a l bins.

is given by Niederreiter (1992). Their randomization is described by Owen (1995, 1996a). Let p = s >~ 1 and b >~ 2 be integers. An elementary subcube in base b is of the form

E=

~-j,

bkj

j=l

for integers kj, cj with kj > / 0 and 0 ~< cj ~ 0 be an integer. A set of points Xi, i = 1 , . . . , bm, of from [0, 1) s is a (0, m, s)-net in base b if every elementary subcube E in base b of volume b -,'~ has exactly 1 of the points. That is, every cell that "should" have one point of the sequence does have one point of the sequence. This is a very strong form of equidistribution and by weakening it somewhat, constructions for more values of s and b become available. Let ~ ~< m be a nonnegative integer. A finite set of bm points from [0, 1) s is a (t, m, s)-net in base b if every elementary subcube in base b of volume b~-'~ contains exactly bt points of the sequence.

Computer experiments 1.0

! :: rii"!

::

i

i

i

' .oL...i....L...L... o:: :: i ........... ~ ; ! ~ ..... : : :

i

. . . . . . . .

~

;

!o

:: e

i.--e--...::--~ .....

!o ~

.... :.-'--~-----r'----~. . . . : . .~ . .: . .: -~-i-----i-...-i----.i

......... i----!,--~-i

..... i - i I-----i ...........

~i .............

. . . .

:~

.......... i

......~..--~o~ -:- . . . . . . . . ..... i

':

........ i !

i

i. .i. . . . .

i

i

~

~

~

= ......... '.' "." *.' ~.

:~

;

. . . .

. . . . . ; ~ ! ......... ~.----~...i---- .......... ----÷---+.--÷ .....

.......... ~--..i..-..i-..i...

. . . .

0.8

~

~ ~. ' ". ! ' ." ? .

301

~

~

":

i

i

i

i!io

'

' ~

!

..................... i 0 " ! .....

,---i.....i--.-.~.-i . . . .

..... i

~... ~

!ii'

!-=

.........

.....

~-..~ ---~.--~.........i ' i i ~..........i i i i ! ° :: e :: -----i---0----~.---~----. . . . i~'---~--"~---"~ .....

~

i

! " "i - !"" " ~'" ~ "~" " ~........ % ' ~ " i " - i

" ," " ~'"*i " ~ ~ ..................................

0.6

:

:

:

:

..... ~-.'--!-----~'i-~ ........ !---'-~--'-'~-----~ . . . . . . . . . . .o i ~ ' " " " " : ~ !

............................. i io:: i ~ i ! i i

~ :

"

:

~ ........ i " " ! ' ! " ~ " i

. . . . ..... !.....~.....i.....~i.~.....:.....:....~.....i O!

i

::

~

..... .i..-.i..-.-i.-i ...... . . .

..... ----.i----i-oi----.i . . . . . . . . . . . . .

!.-~---.-i--.-e ....

.....i.. ~. ..-..! -. : = ..........! ~ ~. ~. i . ......... • ~ . . i-i-..i----i" . . , --ii-~0.4

:

........i i i i l " i " ~ - ÷ ~ =: : :................................ ~ ~ ~

:

:

'

'

"

"

i i i i i............................

'

"

'

.....i.--!.---i.--i .... .........!--.i--~.--i.........i..i....i=..!:~ i o=: i........ i-=i--'-i .... ..... --~-!-~-!ii!" .... ... ~ '~ ~• : i ~ ~ ! i .... i.L..L..LJ .-r --i -i-----'~..... ......~,---i-i--"i--i~ ..... 7i,T";

..... :.....L...2.....L

•

~

..........

~

~

......

:.....:....:.....:

i

i

O

.....

!--i---.-i-----i.--,

~.

:

.

:

:

"

~

.

.

..... i..--i--.--i-----i ....

.

~

......i-oi---i---.~ ....... e i - ~ - !

i

~

~

~•

.......... ~ - ~ ~ -

....

0.2

..... ~.....~.....~.....~....~ ....i...i..~.i.....i

......................

; .......... i ....i.....i.....i.....

. . . . ...... . . . ~!"'-i"'"~ .

..... ~ ' " ~ " i O ~ - ' " ~ " ! ' " " ' ! - ' i

..... I T O " , " 'i i i i

.......... ....

: i

-i-i

.... ]

: : : i !o i . . . . . .~. . . . .i,. . . . . . . . . . . . . . .

i

i

~I-7~ I 'ii°i

i ................................

i . ....... .,

.

'

.......... r " - - ! - - ! 'i ........ ; '° ~;

. . . . . . . .

'

.... .... ....

0.0

0.0

0.2

0.4

0.6

0.8

1.0

X1

Fig.

17, The

125

points

are plotted,

the result

within

of 125 equal

each

of a scrambled

is a 5 by 5 grid bins.

Each

(0, 3, 5)-net of 5 point triple

of variables

of which

C e l l s that " s h o u l d " h a v e

bt

in base

Latin

has

p o i n t s do h a v e

5. For

hypercube can

whichever samples.

be partitioned

Each into

two

(of five)

variable

variables

is sampled

125 congruent

cubes,

that once each

one point.

bt

points, t h o u g h cells that " s h o u l d " h a v e

1 p o i n t m i g h t not. B y c o m m o n u s a g e the n a m e (t, m , s ) - n e t a s s u m e s that the letter s the d i m e n s i o n o f the i n p u t space, t h o u g h o n e c o u l d s p e a k o f (t, m , c o n v e n t i o n to n o t e is that the s u b c u b e s are h a l f - o p e n . T h i s m a k e s partition the i n p u t s p a c e into c o n g r u e n t s u b c u b e s . T h e b a l a n c e p r o p e r t i e s o f a (t, ra, s ) - n e t are .greater than those

is u s e d to d e n o t e p)-nets. A n o t h e r it c o n v e n i e n t to o f an o r t h o g o n a l

array. I f X { is a (t, m , s ) - n e t in b a s e b then [ b X ] J is an o r t h o g o n a l array o f strength r a i n { s , m - t}. B u t the net also has b a l a n c e p r o p e r t i e s w h e n r o u n d e d to d i f f e r e n t p o w e r s o f b on all axes, so l o n g as the p o w e r s s u m to no m o r e than m - t. T h u s the net c o m b i n e s aspects o f o r t h o g o n a l arrays a n d m u l t i - l e v e l o r t h o g o n a l arrays all in o n e p o i n t set. In the c a s e o f a (0, 4, 5 ) - n e t in base 5, one has 625 p o i n t s in [0, 1) 5 a n d o n e can c o u n t that there are 4 3 7 5 0 e l e m e n t a r y s u b c u b e s o f v o l u m e 1 / 6 2 5 o f v a r y i n g a s p e c t ratios e a c h o f w h i c h has o n e o f the 625 points.

302

J. R. Koehler and A. B. Owen 1.0

0.8

0.6 t'M X 0.4

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

X1 Fig. 18. The 625 points of a scrambled (0, 4, 5)-net in base 5. For whichever two (of five) variables that are plotted, the square can be divided into 625 squares of side 1/25 or into 625 rectangles of side 1/5 by 1/125 or into 625 rectangles of side 1/125 by 1/5 and each such rectangle has one of the points. Each variable is sampled once within each of 625 equal bins. Each triple of variables can be partitioned into 625 hyperrectangles in three different ways and each such hyperrectangle has one of the points. Each quadruple of variables can be partitioned into 625 congruent hypercubes of side 1/5, each of which has one point.

For t >~ 0, an infinite sequence (Xi)i~>I of points from [0, 1) s is a (t, s)-sequence ( X iJi=kb'-+l ~(k+l)b'~ is a (t, ra, s)-net in base b if for all k >~ 0 and m >~ t the finite sequence ~ in base b. The advantage of a (t, s)-sequence is that if one finds that the first br~ points are not sufficient for an integration problem, one can find another b~ points that also form a (t, ra, s)-net and tend to fill in places not occupied by the first set. If one continues to the point of having b such (t, m, s)-nets, then the complete set of points comprises a (t, m + 1, s)-net. The theory of (t, re, s)-nets and (t, s)-sequences is given in Niederreiter (1992). A famous result of the theory is that integration over a (t, ra, s)-net can attain an accuracy of order O(log(n)s-l/n) while restricting to (t, s)-sequences raises this slightly to O ( l o g ( n ) S / n ) . These results require that the integrand be of bounded variation in the sense of Hardy and Krause. For large s, it takes unrealistically large n for these rates

Computer experiments

303

to be clearly better than rz-1/2 but in examples they seem to outperform simple Monte Carlo. The construction of (t, m, s)-nets and (t, s)-sequences is also described in Niederreiter (1992). Here we remark that for prime numbers s a construction by Faure (1982) gives (0, s)-nets in base s and Niederreiter extended the method to prime powers s. (See Niederreiter, 1992.) Thus one can choose b to be the smallest prime power greater than or equal to s and use the first s variables of the corresponding (0, b)-sequence in base b. Owen (1995) describes a scheme to randomize (t, m, s)-nets and (t, s)-sequences. The points are written in a base b expansion and certain random permutations are applied to the coefficients in the expansion. The result is to make each permuted Xi uniformly distributed over [0, 1) s while preserving the (t, m, s)-net or (t, s)-sequence structure of the ensemble of X~. Thus the sample estimate n -I ~i~=1f(X~) is unbiased for f f(X) d F and the variance of it may be estimated by replication. On some test integrands in (Owen, 1995) the randomized nets outperformed their unrandomized counterparts. It appears that the unscrambled nets have considerable structure, stemming from the algebra underlying them, and that this structure is a liability in integration. Figure 16 shows the 25 points of a scrambled (0, 2, 5)-net in base 5 projected onto two of the five input coordinates. These points are the initial 25 points of a (0, 5)sequence in base 5. This design has the equidistribution properties of an orthogonal array based Latin hypercube sample. Moreover every consecutive 25 points in the sequence X25a+l, X25a+z, • • •, Xzs(~+l) has these equidistribution properties. The first 125 points, shown in Figure 17 have still more equidistribution properties: any triple of the input variables can be split into 125 subcubes each with one of the Xi, in any pair of variables the points appear as a 5 by 5 grid of 5 point Latin hypercube samples and each individual input variable can be split into 125 cells each having one point. The first 625 points, are shown in Figure 18. Owen (1996a) finds a variance formula for means over randomized (t, m, s)-nets and (t, s)-sequences. The formula involves a wavelet-like anova combining nested terms on each coordinate, all crossed against each other. It turns out that for any square integrable integrand, the resulting variance is o(n -1) and it therefore beats any of the usual variance reduction techniques, which typically only reduce the asymptotic coefficient of n -1. For smooth integrands with s = 1, the variance is in fact O(n -3) and in the general case Owen (1996b) shows that the variance is O(rz-3(logn)S-1).

8. Selected applications One of the largest fields using and developing deterministic simulators is in the designing and manufacturing of VLSI circuits. Alvarez et al. (1988) describe the use of SUPREM-III (Ho et al., 1984) and SEDAN-II (Yu et al., 1982) in designing BIMOS devices for manufacturability. Aoki et al. (1987), use CADDETH a two dimensional device simulator, for optimizing devices and for accurate prediction of device sensitivities. Sharifzadeh et al. (1989) use SUPREME-III and PISCES-II (Pinto et al., 1984)

304

J. R. Koehler and A. B. Owen

to compute CMOS device characteristics as a function of the designable technology parameters. Nasif et al. (1984) describe the use of FABRICS-II to estimate circuit delay times in integrated circuits. The input variables for the above work are generally device sizes, metal concentrations, implant doses and gate oxide temperatures. The multiple responses are threshold voltages, subthreshold slopes, saturation currents and linear transconductance although the output variables of concern depend on the technology under investigation. The engineers use the physical/numerical simulators to assist them in optimizing process, device, and circuit design before the costly step of building prototype devices. They are also concerned with minimizing transmitted variability as this can significantly reduce the performance of the devices and hence reduce yield. For example, Welch et al. (1990), Currin et al. (1991) and Sacks et al. (1989b) discuss the use of simulators to investigate the effect of transistor dimensions on the asynchronization of two clocks. They want to find the combination of transistor widths that produce zero clock skews with very small transmitted variability due to uncontrollable manufacturing variability in the transistors. TIMS, a simulator developed by T. Osswald and C. L. Tucker III, helps in optimizing a compression mold filling process for manufacturing automobiles (Church et al., 1988). In this process a sheet of molding compound is cut and placed in a heated mold. The mold is slowly closed and a constant force is applied during the curing reaction. The controlling variables of the process are the geometry and thickness of the part, the compound viscosity, shape and location within the charge, and the mold closing speed. The simulator then predicts the position of the flow front as a function of time. Miller and Frenklach (1983) discuss the use of computers to solve systems of differential equations describing chemical kinetic models. In their work, the inputs to the simulator are vectors of possibly unknown combustion rate constants and the outputs are induction-delay times and concentrations of chemical species at specified reaction times. The objectives of their investigations are to find values of the rate constants that agree with experimental data and to find the most important rate constant to the process. Sacks et al. (1989a) explore some of the design issues and applications to this field. TWOLAYER, a thermal energy storage model developed by Alan Solomon and his colleagues at the Oak Ridge National Laboratory, simulates heat transfer through a wall containing two layers of different phase change material. Currin et al. (1991) utilize TWOLAYER in a computer experiment. The inputs into TWOLAYER are the layers dimensions, the thermal properties of the materials and the characteristics of the heat source. The object of interest was finding the configuration of the input variables that produce the highest value of a heat storage utility index. FOAM (Bartell et al., 1981) models the transport of polycyclic aromatic hydrocarbon spills in streams using structure activity relationships. Bartell et al. (1983) modified this model to predict the fate of anthracene when introduced into ponds. This model tracks the "evaporation and dissolution of anthracene from a surface slick of synthetic oil, volatilization and photolytic degradation of dissolved anthracene, sorption to suspended particulate matter and sediments and accumulation by pond biota" (Bartell, 1983). They used Monte Carlo error analyses to assess the effect of the uncertainty in model parameters on their results.

Computer experiments

305

References Alvarez, A. R., B. L. Abdi, D. L. Young, H. D. Weed, J. Teplik and E. Herald (1988). Application of statistical design and response surface methods to computer-aided VLSI device design. IEEE Trans. Comput. Aided Design 7(2), 271-288. Aoki, Y., H. Masuda, S. Shimada and S. Sato (1987). A new design-centering methodology for VLSI device development. IEEE Trans. Comput. Aided Design 6(3), 452-461. Bartell, S. M., R. H. Gardner, R. V. O'Neill and J. M. Giddings (1983). Error analysis of predicted fate of anthracene in a simulated pond. Environ. Toxicol. Chem. 2, 19-28. Bartell, S. M., J. P. Landrum, J. P. Giesy and G. J. Leversee (1981). Simulated transport of polycyclic aromatic hydrocarbons in artificial streams. In: W. J. Mitch, R. W. Bosserman and J. M. Klopatek, eds., Energy and Ecological Modelling. Elsevier, New York, 133-143. Bates, R. A., R. J. Buck, E. Riccomagno and H. P. Wynn (1996). Experimental design and observation for large systems (with discussion). J. Roy. Statist. Soc. Sen. B 58(1), 77-94. Borth, D. M. (1975). A total entropy criterion for the dual problem of model discrimination and parameter estimation. J. Roy. Statist. Soc. Ser. B 37, 77-87. Box, G. E. P. and N. R. Draper (1959). A basis for the selection of a response surface design. J. Amer. Statist. Assoc. 54, 622-654. Box, G. E. P. and N. R. Draper (1963). The choice of a second order rotatable design. Biometrika 50, 335-352. Box, G. E. P. and W. J. Hill (1967). Discrimination among mechanistic models. Technometrics 9, 57-70. Church, A., T. Mitchell and D. Fleming (1988). Computer experiments to optimize a compression mold filling process. Talk given at the Workshop on Design for Computer Experiments in Oak Ridge, TN, November. Cranley, R. and T. N. L. Patterson (1976). Randomization of number theoretic methods for multiple integration. SlAM J. Numer. Anal 23, 904-914. Cressie, N. A. C. (1986). Kriging nonstationary data. J. Amen. Statist. Assoc. 81, 625-634. Cressie, N. A. C. (1993). Statistics .for Spatial Data (Revised edition). Wiley, New York. Currin, C., M. Mitchell, M. Morris and D. Ylvisaker (1991). Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. J. Amen. Statist. Assoc. 86, 953-963. Dandekar, R. (1993). Performance improvement of restricted pairing algorithm for Latin hypercube sampling Draft Report, Energy Information Administration, U.S.D.O.E. Davis, P. J. and P. Rabinowitz (1984). Methods of Numerical Integration, 2nd. edn. Academic Press, San Diego. Diaconis, P. (1988). Bayesian numerical analysis In: S. S. Gupta and J. O. Berger, eds., Statistical Decision Theory and Related Topics IV, Vol. 1. Springer, New York, 163-176. Efron, B. and C. Stein (1981). The jackknife estimate of variance. Ann. Statist. 9, 586-596. Fang, K. T. and Y. Wang (1994). Number-theoretic Methods in Statistics. Chapman and Hall, London. Faure, H. (1982). Discrrpances des suites associres ~ un syst~me de numrration (en dimension s). Acta Arithmetica 41, 337-351. Friedman, J. H. (1991). Multivariate adaptive regression splines (with Discussion). Ann. Statist. 19, 1-67. Gill, P. E., W. Murray, M. A. Saunders and M. H. Wright (1986). User's guide for npsol (version 4.0): A Fortran package for nonlinear programming. SOL 86-2, Stanford Optimization Laboratory, Dept. of Operations Research, Stanford University, California, 94305, January. Gill, P. E., W. Murray and M. H. Wright (1981). Practical Optimization. Academic Press, London. Gordon, W. J. (1971). Blending function methods of bivariate and multivariate interpolation and approximation. SlAM J. Numer. Anal. 8, 158-177. Gu, C. and G. Wahba (1993). Smoothing spline ANOVA with component-wise Bayesian "confidence intervals". J. Comp. Graph. Statist. 2, 97-117. Hickernell, E J. (1996). Quadrature error bounds with applications to lattice rules. SIAM J. Numer. Anal. 33 (in press). Ho, S. P., S. E. Hansen and P. M. Fahey (1984). Suprem III - a program for integrated circuit process modeling and simulation. TR-SEL84 1, Stanford Electronics Laboratories.

306

J. R. Koehler and A. B. Owen

lman, R. L. and W. J. Conover (1982). A distributon-free approach to inducing rank correlation among input variables. Comm. Statist. Bll(3), 311-334. Johnson, M. E., L. M. Moore and D. Ylvisaker (1990). Minimax and maximin distance designs. J. Statist. Plann. Inference 26, 131-148. Joumel, A. G. and C. J. Huijbregts (1978). Mining Geostatistics. Academic Press, London. Koehler, J. R. (1990). Design and estimation issues in computer experiments. Dissertation, Dept. of Statistics, Stanford University. Lindley, D. V. (1956). On a measure of the information provided by an experiment. Ann. Math. Statist. 27, 986-1005. Loh, W.-L. (1993). On Latin hypercube sampling. Tech. Report No. 93-52, Dept. of Statistics, Purdue University. Loh, W.-L. (1994). A combinatorial central limit theorem for randomized orthogonal array sampling designs. Tech. Report No. 94-4, Dept. of Statistics, Purdue University. Mardia, K. V. and R. J. Marshall (1984). Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 71(1), 135-146. Matrrn, B. (1947). Method of estimating the accuracy of line and sample plot surveys. Medd. Skogsforskn Inst. 36(1). Matheron, G. (1963). Principles of geostatistics. Econom. Geol. 58, 1246--1266. McKay, M. (1995). Evaluating prediction uncertainty. Report NUREG/CR-6311, Los Alamos National Laboratory. McKay, M., R. Beckman and W. Conover (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2), 239-245. Miller, D. and M. Frenklach (1983). Sensitivity analysis and parameter estimation in dynamic modeling of chemical kinetics. Internat. J. Chem. Kinetics 15, 677-696. Mitchell, T. J. (1974). An algorithm for the construction of 'D-optimal' experimental designs. Technometrics 16, 203-210. Mitchell, T., M. Morris and D. Ylvisaker (1990). Existence of smoothed stationary processes on an interval. Stochastic Process. Appl. 35, 109-119. Mitchell, T., M. Morris and D. Ylvisaker (1995). Two-level fractional factorials and Bayesian prediction. Statist. Sinica 5, 559-573. Mitchell, T. J. and D. S. Scott (1987). A computer program for the design of group testing experiments. Comm. Statist. Theory Methods 16, 2943-2955. Morris, M. D. and T. J. Mitchell (1995). Exploratory designs for computational experiments. J. Statist. Plann. Inference 43, 381-402. Morris, M. D., T. J. Mitchell and D. Ylvisaker (1993). Bayesian design and analysis of computer experiments: Use of derivative in surface prediction. Technometrics 35(3), 243-255. Nassif, S. R., A. J. Strojwas and S. W. Director (1984). FABRICS II: A statistically based IC fabrication process simulator. IEEE Trans. Comput. Aided Design 3, 40-46. Niederreiter, H. (1992). Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia, PA. O'Hagan, A. (1989). Comment: Design and analysis of computer experiments. Statist. Sci. 4(4), 430-432. Owen, A. B. (1992a). A central limit theorem for Latin hypercube sampling. J. Roy. Statist. Soc. Ser. B 54, 541-551. Owen, A. B. (1992b). Orthogonal arrays for computer experiments, integration and visualization. Statist. Sinica 2, 439-452. Owen, A. B. (1994a). Lattice sampling revisited: Monte Carlo variance of means over randomized orthogonal arrays. Ann. Statist. 22, 930-945. Owen, A. B. (1994b). Controlling correlations in latin hypercube samples. J. Amer. Statist. Assoc. 89, 1517-1522. Owen, A. B. (1995). Randomly permuted (t, m, s)-nets and (t, s)-sequences. In: H. Niederreiter and E J.-S. Shiue, eds., Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing. Springer, New York, 299-317. Owen, .a/. B. (1996a). Monte Carlo variance of scrambled net quadrature. SIAM J. Numer. AnaL, to appear.

Computer experiments

307

Owen, A. B. (1996b). Scrambled net variance for integrals of smooth functions. Tech. Report Number 493, Department of Statistics, Stanford University. Paskov, S. H. (1993). Average case complexity of multivariate integration for smooth functions. J. Complexity 9, 291-312. Park, J.-S. (1994) Optimal Latin-hypercube designs for computer experiments. J. Statist. Plann. Inference 39, 95-111. Parzen, A. B. (1962). Stochastic Processes. Holden-Day, San Francisco, CA. Patterson, H. D. (1954). The errors of lattice sampling. J. Roy. Statist. Soc. Ser. B 16, 140-149. Phadke, M. (1988). Quality Engineering Using Robust Design. Prentice-Hall, Englewood Cliffs, NJ. Pinto, M. R., C. S. Rafferty and R. W. Dutton (1984). PISCES-II-posson and continuity equation solver. DAGG-29-83-k 0125, Stanford Electron. Lab. Ripley, B. (1981). Spatial Statistics. Wiley, New York. Ritter, K. (1995). Average case analysis of numerical problems. Dissertation, University of Edangen. Ritter, K., G. Wasilkowski and H. Wozniakowski (1993). On multivariate integration for stochastic processes. In: H. Brass and G. Hammerlin, eds., Numerical Integration, Birkhauser, Basel, 331-347. Ritter, K., G. Wasilkowski and H. Wozniakowski (1995). Multivariate integration and approximation for random fields satisfying Sacks-Ylvisaker conditions. Ann. AppL Prob. 5, 518-540. Roosen, C. B. (1995). Visualization and exploration of high-dimensional functions using the functional ANOVA decomposition. Dissertation, Dept. of Statistics, Stanford University. Sacks, J. and S. Schiller (1988). Spatial designs. In: S. S. Gupta and J. O. Berger, eds., Statistical Decision Theory and Related Topics IV, Vol. 2. Springer, New York, 385-399. Sacks, J., S. B. Schiller and W. J. Welch (1989). Designs for computer experiments. Technometrics 31(1), 41-47. Sacks, J., W. J. Welch, T. J. Mitchell and H. P. Wynn (1989). Design and analysis of computer experiments. Statist. Sci. 4(4), 409-423. Shannon, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J. 27, 379-423, 623-656. Sharifzadeh, S., J. R. Koehler, A. B. Owen and J. D. Shott (1989). Using simulators to model transmitted variability in IC manufacturing. IEEE Trans. Semicond. Manufact. 2(3), 82-93. Shewry, M. C. and H. P. Wynn (1987). Maximum entropy sampling. J. AppL Statist. 14, 165-170. Shewry, M. C. and H. P. Wynn (1988). Maximum entropy sampling and simulation codes. In: Proc. 12th World Congress on Scientific Computation, Vol. 2, IMAC88, 517-519. Sloan, I. H. and S. Joe (1994). Lattice Methods for Multiple Integration. Oxford Science Publications, Oxford. Smolyak, S. A. (1963). Quadrature and interpolation formulas for tensor products of certain classes of functions. Soviet Math. Dokl. 4, 240-243. Stein, M. L. (1987). Large sample properties of simulations using Latin hypercube sampling. Technometrics 29(2), 143-151. Stein, M. L. (1989). Comment: Design and analysis of computer experiments. Statist. Sci. 4(4), 432-433. Steinberg, D. M. (1985). Model robust response surface designs: Scaling two-level factorials. Biometrika 72, 513-26. Tang, B. (1992). Latin hypercubes and supersaturated designs. Dissertation, Dept. of Statistics and Actuarial Science, University of Waterloo. Tang, B. (1993). Orthogonal array-based Latin hypercubes. J. Amer. Statist. Assoc. 88, 1392-1397. Wahba, G. (1978). Interpolating surfaces: High order convergence rates and their associated designs, with applications to X-ray image reconstruction. Tech. report 523, Statistics Depmtment, University of Wisconsin, Madison. Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics, Vol. 59. SIAM, Philadelphia, PA. Wasilkowski, G. (1993). Integration and approximation of multivariate functions: Average case complexity with Wiener measure. Bull Amer. Math. Soc. (N. S.) 28, 308-314. Full version J. Approx. Theory 77, 212-227. Wozniakowski H. (1991). Average case complexity of multivariate integration. Bull. Amer. Math. Soc. (N. S.) 24, 185-194.

308

J. R. Koehler and A. B. Owen

Welch, W. J. (1983). A mean squared error criterion for the design of experiments. Biometrika 70(1), 201-213. Welch, W. Yu, T. Kang and J. Sacks (1990). Computer experiments for quality control by parameter design. J. Quality TechnoL 22, 15-22. Welch, W. J., J. R. Buck, J. Sacks, H. P. Wynn, T. J. Mitchell and M. D. Morris. Screening, prediction, and computer experiments. Technometrics 34(1), 15-25. Yaglom, A. M. (1987). Correlation Theory of Stationary and Related Random Functions, Vol. 1. Springer, New York. Ylvisaker, D. (1975). Designs on random fields. In: J. N. Srivastava, ed., A Survey of Statistical Design and Linear Models. North-Holland, Amsterdam, 593~507. Young, A. S. (1977). A Bayesian approach to prediction using polynomials. Biometrika 64, 309-317. Yu, Z., G. G. Y. Chang and R. W. Dutton (1982). Supplementary report on sedan II. TR-G201 12, Stanford Electronics Laboratories.

S. Ghosh and C. R. Rao, eds., Handbook of Statistics, VoL 13 © 1996 Elsevier Science B.V. All rights reserved.

]I~IL 111

A Critique of Some Aspects of Experimental Design

J. N. S r i v a s t a v a

1. Introduction

Comparative experiments form a very large part of the subject of statistical design of scientific experiments. There are v "treatments", and we need to compare their "yields", the terminology being borrowed from agricultural experiments, in the context of which the subject of experimental design initially grew. This essentially subsumes a larger class of experiments where the objective is to study the "response surface", i.e., the general nature of dependence of the yields on the treatments. Als0, the set of treatments (i.e., the "factor space") could be (discretely or continuously) of infinite cardinality, although in most cases, only a "sample" of such points can be included in the experiment. It may be stressed that in the above, the "treatments" are well defined entities, being for example, varieties of wheat, levels of a fertilizer, amounts of a chemical, a set of drugs, various settings of a particular machine, the presence or absence of a particular action, and so on. The experiment is done using "experimental material", this being divided into "units". Usually, on each unit, one of the treatments is applied, and a variable y is measured, which in some sense denotes the "effect" of the treatment applied to it "plus" (i.e., "confounded with") the effect of its own "innate characteristics" (i.e., its "individual effect"). To help "average out" over such individual effects, the principle of "replication" (i.e., try each treatment on several units) was founded. Notice that "plus" does not necessarily mean that the two effects are additive. If t is a treatment, and qSt is its effect, then the effect of a unit u may change qS~ to 9~,(~bt), where 9~ is some function, and where 9~(~bt) is not necessarily of the additive form ( ~ + q~t), where c~ denotes the effect of u. Usually, additivity is assumed, but not because it is established to be valid. The simple fact is that, except for agriculture, it has not been much studied in the various subject matter fields. The author feels that the "nature" of the experimental material should be studied in its own fight in any given science before major experiments are laid out. Also, we do not have much information on the kind of studies needed on a given experimental material in a given field so as to throw light on the intrinsic usefulness of the material in the study of the phenomena that are of interest. Again, to avoid biases arising out of a systematic choice of units to which a treatment is applied, the concept of "randomization" was introduced. This, indeed, is valuable and necessary. But, this should not be confused with "randomization analysis:' 309

310

J. N. Srivastava

of designs. This analysis is based on an averaging over the universe (of placements of treatments onto the units) generated by the act of randomization. It is shown that, approximately, it gives the same answers concerning (analysis of variance) test statistics as the ones based on linear models and normality. Some people regard this as an added justification of, or even a more desirable basis of, the usual analysis. However, the latter is controversial. Consider two different acts of randomization A~ (i --- 1,2) with two different systems of probabilities P~ (of placements of treatments into units). Since Ps and P2 are different, A, and Az will generate two different universes (say, U1, and U2). However, As and A2 could both result in the s a m e placement H of treatments on the units. Now, the experiment depends only o n / / , not on As and A2 (or t91 and/92). However, the analysis of the experiment using A~ (i = 1, 2) would be done by averaging out Ui, which would generally lead to two different conclusions. This is a bit discomforting, since there is no reason to accept one analysis over the other (Srivastava, 1975). A third principle put forth by Fisher is "local control". This refers to the division of units into groups (called "blocks"), such that each block is relatively more homogeneous than the set of units as a whole. Such blocks may correspond to "levels" of a "nuisance" factor (i.e., a factor not of interest in itself, but which enters into the units nevertheless). Note that the levels are either known (like litters of animals), or assumed known (strips of land in an agricultural experiment). In the usual analysis, the differences between the blocks are eliminated. Since the differences between the treatments are compared with the "random fluctuations" on the units (measured by the "error variance" 0"2) the elimination of blocks leads to the use of a 0-2, which is smaller than the one that would result if the blocks were ignored. Thus, the treatments are compared with greater precision by eliminating the effect of the nuisance factor. Three criteria of goodness of a design have been put forth (Srivastava, 1984). These are (i) variance-optimality, (ii) sensitivity, and (iii) revealing power. For simplicity, assume that y's are independent with variance 0-2. Variance-optimality relates to the situation where 0-2 is fixed (though unknown) and the expected values of the y's are known except for unknown parameters, which need to be determined. Variance-optimality seeks to determine a design which maximizes some functions of the information matrix of the estimates of the parameters. However, a little thought would reveal that such a situation (where the model is known except for parameters) would, relatively, arise only toward the end of an investigation. In other words, before seeking an optimal design, one would need to determine, as well as possible the following: (A1) the phenomenon to be studied, the purpose of such study, the mode of study that is relevant (leading perhaps to a set of well-defined treatments to be compared), and the experimental material to be used, and the response variable(s) to be measured, (A2) the nature of this material with respect to heterogeneity and nuisance factors etc., the units to be used, and finally (and very importantly) (A3) to explore and determine what model does arise in this situation. Assuming that all this has been satisfactorily done, the optimal design problem is to decide how to (a) sample the factor space to determine which points (i.e., treatments) are to be chosen for the experiment, including the number of units on which each

A critique o f some aspects o f experimental design

311

such point is to be tried, and (b) the actual assignment of the treatments to the units. Fisher's principle of replication is thus subsumed under variance-optimality. Sensitivity is a generalization of the concept of local control. It deals with the determination of grouping of units, so that the effects of known (and, possibly, even of unknown) nuisance factors are eliminated as much as possible. Designs with incomplete blocks, or those where heterogeneity is sought to be eliminated two-way or multi-way, have arisen out of this context. Thus, in a sensitive experiment, t72 is as small as it is possible to have. Revealing power refers to the ability of the design to find out what the true model might be. The danger of being satisfied with a given model, without knowing how misleading it is (i.e., in which directions, and to what extent), is that the very purpose of conducting the experiment may be defeated. It may indeed be worse than doing no experiment. For example, consider the study of dependence of y on z, in the z-interval [0, 1]. Suppose that it is assumed that a first degree polynomial is adequate, and that four observations will be made. The optimal designs will put two observations at each of the points z = 0 and 1. Suppose the experiment is done, and y values are found which are nearly zero for each of the four points. It will be concluded that z has no effect on 71- But, this may happen also if E(y(z)) = a z 2 - a z . A design with at least one point in the interior of (0, 1) may help to reveal this fact, and thus may be said to possess more revealing power. Above, we mentioned procedural steps ( A 1 ) - (A3) for determining a suitable experimental design. Of these, A1 is to be done primarily by the scientist (with enlightened interference by the statistician). Also, A2 relates to sensitivity, and A3 to revealing power. Seeking variance-optimality may be referred to as step A4. If, on the basis of past knowledge or intuition, the correct model is known, then A3 is already done, and one may proceed to A4. However, in other situations, the correct model may not be knowable a priori. In such cases, we will have to combine A3 and A4 in the light of whatever knowledge is available, and pursue A3 and A4 simultaneously. This last category of cases is expected to be the more common one, and here we shall seek a design with the two fold property that it helps us to determine the true model as well as allow accurate estimation of parameters under the model that is the true one. The four steps A 1 , . . . , A4 are necessary for a good experiment. Of these, onlY A2 and A4 have generally been pursued. Steps A2, A3, A4 are usually considered to be the main body of the design problem. However, A1 is even more important; we illustrate this fact by some brief remarks on clinical trials, a field which we can not go into in detail in this paper because of lack of space. Since we wish to be brief, we shall take up an issue out of A1, namely the question of "treatments" and "responses", the experimental unit being a human being. We shall illustrate by contrasting the situation with an agronomical experiment, where a plot of land is typically the experimental unit. One basic difference between the two units is that while a person can speak out his reaction at anytime regarding how he "feels" after taking a particular treatment, the plot cannot do so. In the latter case, the scientist must decide on a set of responses which are to be observed, plus a plan of observing them. This can be done with respect to a man also, but a man is able to feel and communicate responses that the scientist may not have even thought of, and

J. N. Srivastava

312

which may indeed provide crucial information. For, example, a pneumonia patient, after being treated by a particular broncho-dilator, may suddenly inform: "I feel that the inner muscles of my neck around my throat area are rapidly getting paralysed, and I will choke to death in 15 minutes". Now, the scientist-in-charge, may not have imagined such a response, because it may not be "common". However, this response is still there, and is a consequence of the total body-condition of the experimental unit (i.e, the patient), of which pneumonia is only one symptom. The fact that this symptom is considered "major,' by the scientist, does not mean that it is really so all the time. The above example shows that in certain clinical trials, we may run into a problem, namely, how to cope with the multitude of responses possible, which may initially not be even well defined! What about the concept of "comparing treatments", or "applying a treatment to an experimental unit"? Consider the latter first. In ordinary statistical design, it simply means apply the treatment to the unit; i.e., there is no question of desirability of "continued application". But, in clinical trials, "treatments" (i.e., drugs) are to be taken by the subjects (i.e., the patients) for a relatively long time. If, after taking the drug a few times, the subject feels that it is hurting in some way (i.e., if a response is found whose value is quite negative), then common sense would dictate that the "treatment" be stopped or modified in some way. In other words, the concept of a "treatment", as we know in agronomic experiments, is not quite valid here. Because of this, "comparing treatments" in the medical field, has to be, should be, a different game than what agronomy-born statistical design is used to play. In actual practice, the above troubles are further confounded by the current medical philosophy, where the goal appears to be more the suppression of symptoms rather than understanding and curing the disease. A discussion of these issues will be made elsewhere by the author. Here, the above example is included just to show the nature and importance of AI. The next section deals with "Search Linear Models", a topic useful in studies on how to reveal the model. This is needed in the latter sections, the first two of which deal with designs with nuisance factors, and the third and last one with factorial experiments.

2. Search linear model

In this section, we briefly recall certain elements of the theory of search linear models since it appears to arise in an intrinsic way when we attempt to identify the correct model. Let y_(N x 1) be a vector of observations such that E(J/) = AI~ 1 q- A2~_2,

(2.1)

where E denotes expected value, A l ( N x ul) and A 2 ( N x u2) are known matrices, ~-1(ul x 1) is a vector of unknown parameters, and ~2(u2 x 1) is a vector of parameters

A critique of some aspects of experimental design

313

whose values are unknown but about which partial information is available as follows. It is known that there is a number k, such that at most k elements of -~2 are nonnegligibte; but it is not known which elements of ~-2' are nonnegligible, and what are the values of the nonnegligible elements. The problem is to identify the nonnegligible elements of {z, and to estimate them along with estimating the elements of {_l" The following result is fundamental in this theory (Srivastava, 1975). THEOREM 2.1. (a) A necessary condition that the above problem (i.e., the identification of the nonnegligible element of ~2' and the estimation of these and of the nonzero elements of {_1) can be solved (for all values of {_1 and ~-2' subject to the above conditions) is that the following rank condition is satisfied." Rank(Al : A2o) = ul + 2k,

(2.2)

for all submatrices A20(N x 2k) of A2. (b) If cr2 = O, then the rank condition (2.2) is also sufficient. In this case, the identification can be done with probability one, and the estimation with variance zero. If ~r2 > O, this probability is, of course, less then one and the variances are positive. How are the identification and estimation actually done? Though many methods have been proposed, the following one appears to be most successful. Let S'~({-t, ~-20) denote the sum of squares due to error under the model E(_y) = AI{-a + A20_~20,

V(_y) = cr2IN,

(2.3)

where ~_20(k x 1) is a subvector of ~-2' and A20 is the submatrix of A2 whose columns * * correspond to {-2o"Let -~20 be the value of -~20 such that S~2 (~_~,-~2o) is minimum among all values of $2({-1 , ~-20) obtained by varying {-2o over the set of (~2) possible choices (as a subvector of ~_2). Then, ~-20 is taken to be the (possibly) nonzero set of parameters in -~2' and {1 and {-2o are estimated as usual, ignoring the other parameters in ~-2" If ~72 = 0, and (2.2) is satisfied, this procedure will lead to correct identification with probability one, and the parameters will be estimated with variance zero; indeed, in this case, the procedure is equivalent to projecting y on the columns of (A1 : A2o), for various A20 until we find an A2o (say A~0) for which the projection is perfect. The rank condition (2.2) ensures that this will be possible (for a unique :A~o). It is important to study what happens when (2.2) is not satisfied. In this case, there exists a submatrix A21(N x 2k) of A2, and vectors 0_q(ul x 1) and 021(2k x 1) such that A10_1 + A21_021 = O_O_N1,

(2.4)

where, throughout this paper, Opq, Jpq and Ip will respectively denote the (p x q) zero matrix, the (p x q) matrix with 1 everywhere, and the (p x p) identity matrix.

314

J. N. Srivastava

Now, let the columns of A21(N x 2k) correspond to the elements of ~_21(2k × 1), where {21 is a subvector of ~2" Furthermore, let ~_22(k x 1) be a subvector of {-21' and let ~_23(-/~x 1) be the vectorconsisting of the elements of {-21 which are not in ~-22' Also, let A22(N x k) and A23(N x k) be the submatrices of A21 corresponding to the subvectors ~-22 and ~-23' Similarly, let _022(k x 1) and 0_23(k x 1) be the subvectors of 821 corresponding to the A22 and A23. Then, for all 0~, (2.4) gives AI_0~' + A22022 = Ax (0~' - 01) -~- A23(-023).

(2.5)

This shows that _y would equal each side of (2.5), under two different situations, where each situation satisfies the condition of the model; these two situations are given by (i) ~-1 = 0_~, and-~22 is the nonzero set of parameters, with value 022, and (ii) ~--1 = (0~ -- 01), and ~--23is the nonzero set of parameters, with value (-023). Thus, in this case, there is a confounding between the parameter sets ~-22 and {23" However, the important point to note is that this confounding is of a limited kind. Indeed, it is not true that for every 0_~2(k x 1) there exists a O~3(k x 1), and two (/21 X 1) vectors _0~1 and 0~2(ul x 1) such that

A10~I q- A220~2 = A10~2 q- A230~3.

(2.6)

This fact leads to the following important class of observations, whose implication seems to have been missed thus far in much of the literature on Search Linear Models (and Search Designs). Basically, the observation is that even if (2.2) is not satisfied, very often, E(y_) will have a unique projection on the columns of (Al : A20), where Azo(N x k) is some submatrix of A2. To elaborate, consider (2.5). Suppose, for some vectors 01°1(/.'1 M 1), and O_°2(k x 1), we have

E(y) = Al01°l -~- A220°2 .

(2.7)

Then, it may indeed be true that there does not exist a submatrix A24(N x k) in A2, such that the columns of A24 correspond to a set of parameters _~24(k x 1) (this being a subvector of ~_2) such that ~24 ~£ ~22' and furthermore, such that E(_y) can be expressed as a linear combination of the columns of (A1 : A24). If this happens, the projection of E(_y) in (2.7) is unique. This means that even though (2.2) is not satisfied, in case ¢r2 = 0, the (identification and estimation) problem is still totally solvable (for the value of y_ under consideration), and for cr2 > 0, S 2(~_1,~-22) would be expected to attain the minimum value (which is the best one could hope for). The above discussion gives a valuable hint concerning the analysis of the observation vector _y. For example, in the application of the Search Linear Models to scientific investigations, the value of k would usually be unknown. Suppose k0 is the maximum value of k for which (2.2) holds. Then, given _y, one should try projecting it on the columns of [A1 : A20], where A20(N x k) are the various submatrices of A2 with k ~> k0. In this way, there is a possibility that one may discover the entire set of nonnegligible parameters. We shall elaborate this remark later on in the paper.

A critique of some aspects of experimental design

315

Before closing this section, it is important to note the effect of reparameterisation on the model (2.1). The problem of identification of the nonnegligible elements of ~-2 is unique to the model (2.1), and is not invariant if in place of ~-2 some other parameter vector is used. To, elaborate, note that as an ordinary linear model, (2.1) is equivalent to E(y) = A,~; + A3~_3,

(2.8)

A3_~3 = A1 (~-1 - ~-;) + A2~-2"

(2.9)

where

However, in (2.8), the identification is of nonzero elements of ~-3' whereas in (2.1) it relates to the elements o f ~ 2, so that as search linear models, (2.1) and (2.8) are not equivalent. Examples of this are given by the two models I and II to be discussed in Section 4.

3. Designs with one nuisance factor

In this paper we shall limit our discussion to one or two nuisance factors, and the usual block designs and their analyses. First, consider one nuisance factor. In the general case, there are v treatments, and b blocks, with sizes say k l , . . . , kb with kl ~< k2 ~< ... ~< hi Let N be the incidence matrix with elements nij, which denotes the number of units in block i to which jth treatment is applied. In choosing our design, we have a few liberties. Firstly, we don't have to use all the b blocks; if we so wish, we can use only a subset of bp ( < b) blocks, and ignore the others. Also, if a block has k units in it, we do not have to use all of them; if we so wish, we may use a subset of the available units and discard the rest. Also, if we so wish, then instead of using a binary design (i.e., one in which each treatment occurs in any given block only zero times or once), we may relax this and allow the r~j to take values larger than 1. In the analysis of designs, what restrictions do we have? The assumptions are that each block is homogeneous, that the variance 0-2 (of the observations on a unit) depends on block size (but, somehow, is reasonably constant for some range say (kin, k02)) of block sizes (which we are going to use), that 0 -2 is constant from one treatment to another and from one block to another, that the block effects and treatment effects are mutually additive, and that with respect to the (unknown location) parameters the model is linear. Now, two situations may arise. One is where the experimental material has been studied thoroughly previous to the experiment being planned, and it has been affirmed that the above assumptions hold to a reasonable degree. The other situation is where the first one is negated. Consider the first situation. The block sizes range from k01 to k02. If koa = h02, then, we select a design hopefully optimal w.r.t, the criteria set forth. Such topics have been studied extensively in the subject.

316

J. N. Srivastava

Similar studies have been done for the case when k01 < k02. However, here, a new feature arises. Suppose that, approximately, no units are needed for the experiment. The following may happen: (i) There are enough blocks available with size k02, so as to accommodate the whole experiment (with no units); (ii) condition (i) does not hold, but there exists k~l ( ) k01, and < k02) such that the experiment can be accommodated in blocks of sizes ranging in the interval (k~l , k02), but not in the interval (k~l q- 1, k02). Now, generally, it is true that if ~r2 is constant for two block sizes k~ and k~ with k~ < k~, then the information matrix is "large" if blocks of size k~ only are used, rather than using some or all blocks of size k~. This last statement is subject to many obvious constraints, such as the availability of the same 'kind' of design (for example, same association scheme (in case a PBIBD is used)) for the different block sizes, and various other combinatorial restrictions (in case different efficiencies are desired on the various treatment contrasts). In view of the above, under condition (i), we would generally use a design with fixed block size k~. On the other hand, if condition (ii) holds, then we have the following situation: we are given a fixed number bk of blocks for each size k in the range k = k ~ l , . . . , k02. We have v treatments. We may have variance conditions to satisfy, such as having different variances for various (normalized) treatment contrasts. Subject to all such conditions and constraints, the problem is to obtain an optimal or near-optimal design. Now, the number of combinations of such variance conditions, block size ranges, numbers of the block sizes, and the values of v are potentially very large. Even if a large number of such designs are catalogued, still nothing may be available when the need arises in a particular instance. It is therefore suggested that instead of developing individual designs of this kind, we produce classes of algorithms which generate them. It is likely that most designs so produced may not be optimal and some may be far from optimal. This will be due to the fact that the algorithm addresses a relatively large group of design situations. To get closer to optimality, each such group may be divided into suitable subgroups, and a series of (second step) algorithms may be developed for each subgroup. Even this may not do the job for all subgroups. Such subgroups may be divided into subsubgroups, each one of which having a new third-step algorithm. This process may be continued, if necessary. It may seem that the above may necessitate the development of too many algorithms. However, the author feels that it should still be manageable. It seems necessary for use in scientific situations where condition (ii) arises frequently. As such algorithms are developed, they should be well publicized through design-oriented journals, so that the users are kept well informed. Next, we consider situations where the experimental material has not been (and, cannot be) studied adequately prior to the intended experiment. Here, depending on the nature of the inadequacy, an appropriate procedure will have to be determined. We discuss a few issues of this kind. Take the case where, in order to accommodate the experiment, we do need to use blocks of unequal sizes (with a given number of blocks of each size), and it is not known (or, not expected) that cr2 will stay constant with change in block size. We need to decide how to select the design, and when the experiment is done, how to analyze the results.

A critique of some aspects qfexperimental design

317

Let Uk (k E K ) denote the set of units in blocks of size k, Mk the corresponding 2 the value of information matrix for ¢ (the (v x 1) vector of treatment effects), and (Tk (72, where K is the set of distinct block sizes that need to be used. To select the design, a Bayesian approach is highly justified. Depending upon the knowledge we have of the experimental material, we select a model for the dependence of C'k on k, where we assume (7~ = Uk(72, (72 being unknown. (The selected model should completely specify C'k.) This, in effect reduces the situation to one where the data from the whole experiment can be expressed as a linear model in which the observations have a variance known up to a constant (unknown) multiplier ((72). The information matrix M* of such an experiment can be written down. Now, assuming that a good optimal design algorithm is available, an appropriate design may be selected, and the experiment conducted. From the data, an estimate (7k^2of (7~ will be available for each k E K . The problem is to combine the Mk (k E K), and obtain a combined estimate of ~ for the whole experiment. Some ideas of this kind will be found in Srivastava and Beaver (1986). For further discussion, it will be instructive to note a result on the comparison of two designs D1 and D2, where D1 is a BIBD with parameters (v, b, r, k, A) with k = kora, where rn and ko are positive integers. Also, D2 is a design with v treatments, arranged in b blocks each of size k, obtained from a BIBD D20 with parameters (v, b, to, ko, Ao) as follows. If a treatment z occurs in a block of D20, then z occurs on exactly units in the corresponding block of D2; if z does not occur in a block of D20, then it also does not occur in the corresponding block of D2. (In this paper, designs like D2 will be referred to as m-BIBD's.) We assume that v, b, ko and m are such that both Da and D20 exist. Let MI* and M~ denote the information matrices respectively for D1 and D2. Then, it can be checked that

=OML

(3.1)

where 0 =

(ko - 1)/(ko

(3.2)

- 1).

Thus, the loss of information which equals (1 - 0) is given by 1 -0 = (~-

l)l(kom-

1) = ( ~ -

1)/(k-

1).

(3.3)

This means, for example, that if block size k equals 12, and we allow 0 or 3 repetitions of each treatment in a block, then we lose only (2/11) ( ~ 18.2%) of the information. We now consider the case where we are not sure whether the block and treatment effects are additive, or whether the variance of an observation depends upon the treatment or block to which it corresponds. Here, the comparison between D1 and D2 in the last paragraph is pertinent. Suppose k = 12, m = 3, ko = 4. Assume v, b are such that D1 and D2 exist. Then, two avenues are open. We may use DI or D2. If the assumptions are true, then in using D2 over D~, we lose only about 18.2% of the information. But, if the assumptions are false, and we use D1, our data are subject to

318

J. N. Srivastava

unknown biases in the location parameters, and we are not sure which observations are more reliable (i.e., have less variance) and which are not, because D1 does not help in exploring these questions, its revealing power with respect to such issues being zero. Furthermore, its sensitivity will also be negatively affected. On the other hand, each block of D2 provides ko sets of (m - 1) orthogonal contrast belonging to the error space. Thus, D2 provides a wealth of information which will throw light on the basic assumptions. The author believes that designs such as Dz are preferable over those like D1 for the above reasons. However, it will be useful to provide more methodology for the full analysis of data from D2 than is currently available.

4. Row-column designs Consider two nuisance factors, at p and q levels, respectively. We shall assume a (p x q) 'Latin Rectangle Design' defined as follows. In this design, each treatment occurs (ply) times in each column. Also, with respect to the rows considered as blocks, the design is an m-BIBD with v treatments, p blocks and block size q. Clearly, (ply), and ( q / v m ) are integers. Many well known designs such as Latin Square, Youden Square etc. are special cases of this design. In this section, we shall briefly discuss the problem of nonadditivity (of the two nuisance factors) in relation to the designs and their analysis, and its misleading effect on the inferences drawn concerning treatment effects. We shall largely confine ourselves to the Latin Square design. Detailed studies of this type will be found in Srivastava (1993), Wang (1995), and Srivastava and Wang (1996), where other designs such as Lattice Squares are also covered. Although occasionally the discussion will be in general terms, we shall illustrate mostly through the 4 x 4 Latin Square design. Consider a (p x p) Latin Square design (so that v = q = p , m = 1). Let Yij (i, j = 1 , . . . , p ) denote the observation in the (i, j) cell. The model is

Yij = "/ij 4- 7~ 4- eij,

(4.1)

where 7ij denotes the effect of the (i,j) cell, ~-~ the effect of the kth treatment (assumed to be assigned to the (i, j) cell), and where eij denotes the random fluctuation on the observations in the (i, j) cell, these being independent from cell to cell, with mean zero, and variance a 2. Let 7-' be the mean of the ~-~ and let %., % , and "~.j denote respectively the overall mean (of the ~/ij), and the mean for the/the row and jth column. Let P -~- "~.. ~- "Ft.,

c~i='Ti.-'~..,

5 i j -~- "[ij -- "~i. - - ~ . j 4- ~ . . ,

/3j=7.j-7..,

~-k=T~--~-.'.

(4.2)

Clearly, (4.1) and (4.2) give

Yij = P + cti 4-/gj + 6ij + Tk 4- eij,

(4.3a)

A critique of some aspects of experimental design

E 5ij = E 5ij = i

0.

319

(4.3b)

j

The above will be referred to as the "general model"; this model is to be said to be additive if and only if

5ij = 0,

for all ( i , j ) ;

(4.4)

otherwise, it is called "nonadditive". The additive model thus has the form

Yij = # + o~i + ~j + Tk +eij.

(4.5)

Clearly, there is no reason to believe why arbitrary cross classification nuisance factors arising in nature should be additive. Let ~-~ (u = 1 , . . . , v) denote the sum over all cells (i,j) to which treatment u is assigned. Then, it is easy to see that (4.3a, b) implies that there is an intrinsic confounding between the 5ij and the rk; this confounding arises in nature, and can not be 'undone' (for a Latin rectangle design with v > 1) even when O"2 = 0. Because of this, the usual estimate of an elementary contrast (~-h -- ~-k) is biased by an amount (3h -- 5k), where

~h = E

h

5#,

forh=l,...,v.

(4.6)

In the design class under consideration, this bias can not be prevented. Indeed, even if ~r2 equals zero so that eij = 0), and the gij are representable by the additive model (4.5), the condition (4.4) may still not be satisfied, and (4.3a, b) may hold with some 5~j being nonzero. This happens when for all h, 6ij equals a constant 5~ for all cells (i, j) to which treatment h is assigned. It should be added, however that if we take a Bayesian point of view, and assume that the 5ij have a joint a priori distribution which is absolutely continuous, then the probability of the event (in the last sentence) occurring is zero. In future discussion, we shall ignore this event. If the nuisance factors (i.e., their levels, and the nature and amount of nonadditivity) remain relatively constant, a uniformity trial (before the main experiment) would be valuable. Here, we shall have v = 1, the (single) treatment being chosen so as to minimize any interaction it may have with the levels of the nuisance factors. The author believes that, where ever possible, such trials should be considered as a must. We shall now briefly summarize some investigations on nonadditivity started in Srivastava (1993), and further developed in Wang (1995), and Srivastava and Wang (1996). We shall then discuss the situation in the light of this. It is clear that the 5's and e's are confounded, and that small values of the 5's will be indistinguishable from the e's. To get rid of the 5's completely, a design ought to be used in which at least two observations are taken (usually on two different treatments) within each cell (i, j) of nuisance factors, which is included in the experiment. Then the difference between the two observations will provide an unbiased estimate of the

Z N. Srivastava

320

difference between the corresponding two treatment effects. However, such a design may not be possible in many situations. We shall therefore, discuss' the latin rectangle design in detail, because of its current popularity. In view of the remarks above, it is important to identify cells with "large" ~'s. So, we pretend that most of the d's are negligible, except possibly for a few which we wish to identify. (This is somewhat like our attitude in testing of hypothesis in general. For example, when we test the null hypotheses H0: ~-h -- ~-k = 0, we know that it is almost certain that the null hypothesis is false. So, when we test H0, we are really looking for a difference (T h -- Tk) that is "large" in some sense.) So, we assume that the 3's are negligible, except possibly a few. Let m denote the number of nonadditive (NA) cells, i.e., m is the smallest value of m t, such that there exist (pq - m ' ) cells (i, j) for which E(yij) satisfies the model (4.3). Of course, we hope that m is not too large. Let M be a set of m cells ignoring which, the remaining (pq - m ) cells satisfy (4.3). A fundamental question is this: Can there be two distinct sets M1 and M2, each with m cells satisfying the last conditions. The answer is yes, as will be seen later. It is easy to see that, because of the above situation, the model (4.3) is a special case of the Search Linear Model (2.1). Here, we have N :pq,

~t 1 = {~,Oq,...,O~p_l,~l,...,/~q_l,Tl,...,Tv_l}

ul = p + q + v - 2 .

,

(4.7)

Also, y is the vector of observations (Yll, Y12,..., Ylq, Y2I,-.. , Ypq), and A1 ( N x ul) has 1 everywhere in the first column. The second column of A1 corresponds to c~1, and consists of "0" everywhere except for (i) a "1" against the q observations y l l , . . . , Ylq which are from the first row of the latin rectangle and (ii) a " - 1 " against the q observations Ypl,. •., Yvq. (The reason for the "minus one" is that the ypj (j = 1 , . . . , q) involve C~p which equals (-c~1 - a2 . . . . . ap-1).) The other columns of A1 can be similarly defined. There are two models (called I and II) which arise here with respect to the 6's. Model II corresponds to (4.3); here, u2 = (p - 1)(q - 1), -~2 consist of the ~ j (i < p , j < q), and the row for Yij (i < p , j < q) has "1" in the columns for ~ j and zero elsewhere, the row for yiq (i < p) has " - 1 " in the columns for 3ij (j < q) and zero elsewhere, the row for ypj has " - 1 " in the columns for 3~j (i < p) and zero elsewhere, and ypq has 1 in the columns for all ~ij (i < p, j < q). (In defining A2 and -~2' we have implicitly assumed that "all the nonzero 3's have indices (i, j) with i < p, j < q. This can be done when m is reasonably small.) In model I, we take u2 = pq, A2 = Ipq, and ~2 has all the 7ij as the elements (i.e., without the 3"s) being reduced to the fi's by (4.2). Models I and II are different as Search Linear Models (as remarked in Section 2 around (2.8), (2.9)), but are the same considered as Linear Models. Some studies on Model II have been reported in Wang (1995). The discussion here relates to Model I, but also occasionally includes

A critique of some aspects of experimental design

321

Model II (as, for example, in the remarks on the probability of correct identification.) Notice that Model I can be written as E(_y) = Al_~1 + ~-2"

(4.8)

Consider a model of the form (4.8) in general; here we have a Search Model with ~2(N x 1) having at most m nonzero elements. The idea is that we wish to explain E(_y) as much as possible using _~l(ul x 1), and we wish to use ~_2(u2 x 1) as little as possible. Now, if A1 does not have full (column) rank ul, even ~-1 can not be uniquely estimated. So, let us assume Rank(A1) = ul. We now consider confounding of parameters in -~2' i.e., non-unique solution for -~2" Let there be a set of u (~< 2ra) rows in A1, such that Rank(A~') < ul, where A~" is the matrix obtained from Al by deleting these u rows. Then there exists a non-null value ~-1 for ~-1 such that A1_~i = Ou-~,,1. Hence, if 01 and 02 are such that --02 = 01 + --1 ~*, then A*01 A 1 2 ]t , / 1- = A122.* Now, partmon'" A1 as A 1~ = [A 1.~,All, w h e r e Alj (j = 1,2) is (Nj × ul) and N1 + N 2 = u with N1,N2 0, which is the realistic case. Since both models I and II are special cases of the search linear model, the methods of Section 2 based on the computation of S~(= S~(~_1, ~-20) under (2.1)) are applicable. For p x p Latin Squares, with p = 4, 5, 6 Wang (1995) studied this method and also various other methods available in the literature for the identification of nonnegligible parameters, and found this method to be the best. For this comparison, simulation was used. For p = 4, m = 1, Wang also computed the probability theoretically. This shows that the theoretical comparison of the probability is possible. But, it is extremely messy at present. The region of integration is very complicated, being the intersection of ( ~ ) quadratic regions. Using these ideas, however, it should be possible to develop systematic theoretical procedures using the algebra of Bose and Mesner (and its generalization to the multidimensional partially balanced association scheme, i.e., the algebra of Bose and Srivastava), and the theory of the complex multinormal distribution. (The complex case arises because of the need to diagonalize the covariance matrix using a Hermitian matrix.) Though this field is very complicated, it has a certain elegance, and many researchers would enjoy developing it. It may be added that the theory so developed would be useful in studies not only of the nonadditivity of nuisance factors, but in other cases as well, where the actual model deviates from the model usually assumed. Using simulation, the probability was computed (for p = 4, m = 1, and various values of 6). Such computations were done using increasing sample sizes. Using this, the sample size needed so that the simulation results match the theoretical results to within 0.1 percent, was determined. Next, using such (or, larger) sample sizes, the probability of correct identification was determined for a few other values p, m, and 5's. The results (Wang, 1995) obtained are quite discouraging. For ra =- 1, and ~/cr = 3, the probability ranged between 0.36 for p = 4 to 0.43 for p = 6; for 5/a = 4, this range was 0.54 to 0.67. (Recall that, for the normal distribution, 2a limits correspond to 95.46% of the area, and 3a limits to 99.73% of the area. Thus, the identification probabilities encountered are too small.) For m = 2, and 5/a = 4 and 6, the probability was 0.19, 0.56 and 0.64 respectively for p = 4, 5, and 6. For m = 3, and 6/a = 4, 6, and 8, the values were 0.34, and 0.60, for p = 5 and 6. The above should be an eye opener to all concerned. The probabilities are simply too low to allow the nonnegligible cells to be identified for reasonable values of 5/a. How much 5/a should be so that the probability is about 99%? Again, the answers are disappointing. For m ---- 1, p = 4, 5/a should be about 9 and for p = 5, about 8. For m = 2, p = 5, one pair of values (of 5/a, so that the probability is about 99%) is (8, 12), for p ----6, it is (7, 10.5).

A critique of some aspects ~fexperimental design

323

What about uniformity trials (i.e., v = l)? For m = 1, p = 4, the value of 3/or (so that the probability is about 99%) is about 7, and the answer is about 6 for p = 5 and 6. The above shows that it is difficult to pinpoint nonadditive cell values unless 3/or is extremely large. Now, if a nonadditive cell is correctly identified, then it is clear that the observation from this cell must simply be ignored in the analysis, because the most we can get out of the observation is an estimate of the value of the ~ corresponding to that cell. On the other hand, if there is a nonadditive cell which is not ignored in the analysis (irrespective of whether or not it has been identified), then it contributes towards the bias in the estimates of treatment contrasts, the larger the ~ the more (potentially) being the bias. Hence, it is obvious that at least the cells which have large 3's be correctly identified, and ignored in the analysis. However, the above discussion shows that the probability that they will get correctly identified is too small to be practical. Thus, except for cells with extremely large 3's, others will probably be not ignored, and will contribute to a tremendous bias. Thus, the row-column designs (like the LR designs) have a great draw back associated with them. Notice that (from the results for v = 1) even a uniformity trial may not help much. The question arises as to whether the above discussion is merely a theoretical thunderbolt, and in practice nonadditivity might seldom arise. To answer this, Srivastava and Wang (1996) studied examples of real data (for 4 x 4 LS) from various books. Nine cases were located, and in a majority of them, there were cells with extremely large values of ~'s. (For example, in an experiment (4 LS of size 4 each) reported in Bliss (1967), we found 3 cells with b-values whose probability is of the order 3.7 x 10-7.) From the above theory, it is clear that cells with even moderately large 3's are unlikely to be identified. Thus, if in a situation, too many cells were not identified, it does not mean that too many cells with moderately large 3's do not exist. This point is also supported from another angle. As stated above, data from published books showed the existence of cells with extremely large 3's. Hence, from the branch of probability theory known as "Extreme Value Theory", it is clear that if in 50% of the situations there exist cells with extremely high 3's, then cells with moderately high ~'s must be relatively very common. It should be useful to explain here the method used in Srivastava and Wang (1996) to analyse the real data such as those in the book of Bliss. The method appears to be general though we used it only for simple designs. Let D be any row-column design and let (2 be the set of cells used in D. Let (20 be a class of subsets of cells in f2, such that each subset contains ra ceils in it. Suppose we wish to investigate whether there exists a subset of m cells in (20 which is nonadditive. Then, a procedure (using simulation) is as follows. Consider the model M under which D is being studied. Under the model M, generate a random sample of data (for all cells (2 under the design D). For this data, compute r, where r ----S~/S~((2o),

(4.12a)

J. N. Srivastava

324

= sum of squares due to error when data from D (from all cells (4.12b) O) is analyzed under M , = sum of squares due to error when the data is analyzed under model M and design D, and the set of m cells w is ignored, where w E ~20, = min S~(w). wEg20

We draw n independent samples, and the value of r is computed for each case. The number n is taken to be large, say 10,000 so as to give a fairly good estimate of the distribution of r. Of course, the distribution of r could possibly be theoretically determined, but we have not attempted it yet. The above distribution is used as follows. To analyze a real set of data which is from an experiment under a design D and model M , and in which we suspect a set of m cells (belonging to some class g20) to be nonadditive, we compute r for this data. The value of r is then matched against the distribution of r obtained above, and the probability of obtaining a value this large or larger is estimated. (In the example from Bliss, the value of r obtained was so large that it was far outside the range of values found in the simulation distribution. So an estimate of the probability was obtained by using Thebychev's inequality.) The question now arises: What should be done in view of the above facts, if it is not possible to take more than one observation per cell. It is not clear what is the best solution. However, one could possibly divide the rectangle into smaller ones and use a Nested Multidimensional Block Design (Srivastava and Beaver, 1986) with blocks of small size2 For example, instead of using one 8 x 8 Latin Square, we could use 16 squares each of size 2 x 2. This should considerably reduce the danger of bias. Clearly, much further research is warranted.

5. Factorial experiments For simplicity, we shall restrict to the 2 m case. Let t_ = (tl,. •., tin)' denote a treatment (where ti = 0 or 1, for i - : 1 . . . . , m ) , and ¢(t_) its true effect. Let ~b(2~ x 1) be the vector of the true effects of all the 2 ~ distinct treatments. Let H ~ ( 2 ~ x 2 m) denote the symmetric Hadamard matrix of order 2 m, so that H m = H1 ® H l ® "'" ®

H1

(m times),

(5.1)

where ® denotes the (left Kronecker product), and

H1 =

- 1

'

Let A { ' . . . A ~ '~ (j~ = 0 or 1 , u = 1 , . . . , r a ) denote an interaction; it is a kfactor interaction if k of the j ' s are nonzero, and equals # (the general ' m e a n ' ) if

A critique of some aspects of experimental design

325

k = 0. Let a ( k ) (k = 0, 1 , . . . , m) denote the (r~)_vector of all k-factor interactions. Let __a(2"~ x 1) be the vector of all the 2 ra distinct interactions, with the elements A j l . . . Ajm, so that the _a(k) are subvectors of a. If a and ¢ are both arranged in Yates order, then it is well known that = 2-m/2Hm2

(5.3)

where the constant in (5.3) is chosen so that (2-~/2H,,~ be an orthogonal matrix, in which case = 2-m/2Hmc~.

(5.4)

It is known from empirical observations in the various scmnces that very often, a large number of elements of 2, particularly interactions of high order, are negligible. (Of course, in many situations, both for symmetrical and asymmetrical experiments, for various reasons, the full experiment may be called for. Indeed, several replications may be attempted. Nuisance factor(s) may be present. This leads to confounded designs, which has a considerable literature. Some of the pertinent problems here are covered under block designs.) Let L(u x 1) be the vector of nonnegligible effects. Then L 1 is a subvector of c~. We shall assume, for simplicity, that # C L. Now, the set of elements of L (i.e., which members of ~ are also in L) may or may not be known. Several cases arise: (Q1) The elements of L are known without any reasonable doubt, i.e., which elements of c~ are in L is known. Note that the value of the elements of L is still unknown. (Q2) It is known beyond any reasonable doubt that L is a subvector of L*(u* x 1) (which is a subvector of _a), and the elements of L* are known. Here, u* is not large, but is a reasonably small integer. (Q3) In Q2, it is very plausible to believe that most elements of L are contained in L*, but we can not be certain that/5 is wholly contained in L*; in other words, a few elements of L are probably not in L*, but are in L**((2 m - u*) x 1), where L** is the vector of elements of ~ which are not in/5*. (Q4) It is very plausible to assume that though a few elements o f / 5 may be contained in L__* (which is known), most elements o f / 5 are not known and may not be in/5*. (Q5) Elements of L are unknown, except for #. In Q4 and Q5, two subcases arise according as a reasonable estimate u0 of u (the number of elements in L) is (a) available, and (b) not available. (By 'reasonable', we mean that we have (3u0/2) ~> u ~> (u0/2).) Consider Q1 and Q2. The author believes Q1 would arise rarely. The case Q2 is important, however, since at some stage we will have to answer the question as to which design is appropriate for a given L*. Obviously, variance-optimality is called for. It is quite nontrivial to obtain an optimal design for arbitrary L* in a given number of assemblies (treatments). This problem, in general, is far from solved. Let T be a design, i.e., a set of assemblies, and let M (= MT-(L*)) be the information matrix for L* if T is used. If M is diagonal, T is an "orthogonal design"; such designs enjoy optimality from several angles. Hence, from the beginning, orthogonal designs have attracted considerable attention. g If/5* consists only of c~(k), for k = 0, l , . . . , g , so that u = ~ k = 0 (~), and T allows estimation of L*, then T is said to be of resolution (2g + 1); in this same

326

J. N. Srivastava

situation, if we are interested in estimating all elements of L* except those in ~(g), and T allows this, then T is of resolution 2g. A design of resolution q is orthogonal if and only if it is an orthogonal array (OA) of strength (q - 1)("OA(q - 1)"). The simplest case is g = 1, and corresponds to "main effect plans" (MEP). Here, OA's of strength two are needed. Orthogonal saturated (and non-saturated) designs (i.e., those where N, the number of assemblies, equals u, the number of parameters) with 49 (where 9 is an integer) assemblies (and m ~< 49 - 1) can be readily made by taking Hadamard matrix H of size (49 × 49) with one row consisting of ones only, and then by deleting this row (of ones). It should be stressed that before using MEP's, one should be certain of being under Q2. It should be recorded that "Addelman's Orthogonal main effect plans" (see Srivastava and Ghosh, 1996) are not orthogonal with respect to the regular parameters. Several authors have been misled by these plans. They are orthogonal under a new set of parameters, which are to be obtained by a transformation of the regular parameters, the transformation being dependent on the design. However, if we allow transformation of parameters, allowing the transformation to be dependent upon the design, then e v e r y design should be considered orthogonal, because for every design such transformation can be done. Indeed, such transformation corresponds to the diagonalization of the information matrix. There are other such loose "concepts" (for example, a "screening design") appearing in less rigorous journals, which need correction (or sharpening). The case g = 2 is more reasonable, and arises far more than g -----1. But then an orthogonal design needs to be an OA(4). Unfortunately, OA(4)'s require an unreasonably large value of N (the number of assemblies in T). In the mid-fifties, this led researchers such as Connor to study "irregular designs" (i.e., those which are not 2 •-q type of fractions). One class introduced by him was that of PF (parallel flats) designs, where T consists of a few parallel flats in the finite Euclidean geometery EG(m, s), of m dimensions based on GP(s), the finite field with s elements. Addelman and Patel obtained 2-level, 3-level, and mixed designs of this type. EW.M. John's three-quater replicate of 25 design belongs to this class. The PF designs were theoretically studied by Srivastava, and then by Anderson and Mardekian. These authors (Srivastava et al., 1983), for the s '~ experiment (s, a prime number) connected the theory with cyclotomic fields, and obtained an efficient reduced representation of the information matrix over such fields which is easy to compute for a given T which is of the PF-type. The theory has been further extended by Srivastava (1987), Buhamra and Anderson (1996); a theory of orthogonal designs of this type (for general L*) has been developed by Li (1990), Srivastava and Li (1993), and extensions have been made by Liao (1994), Liao and Iyer (1994), Liao et al. (1996). Much further useful work (for general L*) can be done in this area which is quite promising. Balanced designs is another class of irregular designs which has been extensively studied. Defined initially by Chakravarti (1956), they were studied in detail principally by Srivastava (and co-authors Bose, Chopra, Anderson), and by Yamamoto (and coauthors Shirakura, Kuwada, and others). For balanced designs, the information matrix is invariant under a permutation of the name of the factors. Let the design T be written as an (N × m) matrix, so that the rows of T represent the N treatments, and

A critique of some aspects of experimental design

327

the columns of T represent the m factors. Let T* be obtained from T by permuting the columns. Then T is a balanced design for estimating _L_L*,if M r ( L * ) = MT* (L*), for all T* obtainable from T in the above way. It is well known that a balanced design of resolution q is also a balanced array (BA) of strength (q - 1)("BA(q - 1)"). For all q, an OA(q) is also a BA(q), but not visa versa. Optimal balanced designs of resolution V were obtained by Srivastava and/or Chopra in a series of papers; some of these designs were shown to be optimal in the class of all designs by Cheng (1980). However, some designs in this class have a rather low efficiency, and could be improved. Nguyen and Miller (1993) have obtained (unbalanced) designs which improve upon these. However, optimal designs are not known in general. Indeed, in general, we do not have designs which have been shown to be near-optimal (i.e., whose efficiency is, say, within 95% of that of the (unknown but) optimal design. For the case of the mixed factorials, the situation is even worse than that for the 2 '~ case. In this connection, mention should be made of the work of Kuwada (1988) on "partially balanced arrays". These arrays should not be confused with those of Chakravarti (1956), which were re-named balanced arrays by the author because they correspond to balanced designs. The information matrix they give rise to belongs to the (non commutative) algebras of Bose and Srivastava, which arises from the multidimensional partially balanced association scheme (Srivastava, 1961; Bose and Srivastava, 1964a, b). But this is not because the design is less than balanced. Rather, it arises because many sets of interactions (i.e., c~(u), for several values of ~) are being considered. For a single value of u, the information matrix for c~(u) belongs to the Bose Mesner algebra; but the "partial balance" arising here arises because of the structure of ~(u). Indeed, if 0, q~l, q52 E c~(u), then the number of factors in common between 0 and q51 may be different from that for 0 and ~b2. For example, if u ----2, 0 = A1A2, ~1 = A1A3, ~2 = A3A4, then the number of common factors between 0 and qSj is 1, but for 0 and q52, it is 0. This feature does lead to a partially balanced association scheme, but this scheme arises out of a balanced design. On the other hand, Kuwada's designs are not balanced, to begin with. They are a generalization of B-arrays, in a sense similar to the PBIB designs being a generalization of the BIB designs. The information matrix of Kuwada's PB-arrays is much more complex than the one for B-arrays, since here there is "partial balance" arising both in the design and in the set of parameters c~(u). Because of this reason, unfortunately, a lot of symbolism is needed to describe and comprehend their structure. Still, however, the situation is much better than those of unbalanced designs, since they have no structure. If their efficiency is close to that of a competing but best known unbalanced design, then they may be preferable. We will return to this point later. Very little work has been done on balanced designs for the asymmetical case, or for the symmetrical case with general L*. Some cases worthy of attention include those where __L* contains #, main effects, and two factor interactions of the form A~Aj, where i E 05, j C 02 and 05 and 02 are two (not necessarily mutual exclusive) groups of factors. Another practical case is where, in the above, both (i,j) belong to 01, or both belong to 02, but one (of (i,j)) from 01 and the other from 02 is not allowed. Many other similar important cases could be produced.

328

J. N. Srivastava

Balanced designs have a certain case in the analysis and interpretation of results, and so are attractive. It would be useful to first construct an optimal balanced design (say, for a given L_* of the type discussed in the last paragraph), and check its efficiency. If it is optimum or near-optimum, then general unbalanced irregular designs may not be attractive. It may be remarked that for general L*, orthogonal PF designs may be investigated even before balanced designs, since the authors' experience shows that for many types of cases of "intermediate" resolutions, such designs have a high chance of being existent. A great deal of interesting work has been done lately on OA(2)'s, both for symmetrical and mixed cases. This should be quite interesting from the general combinatorial angle. For statistical applications, before these arrays are used it will be important to ensure that the situation is under Q2, and there too, under the situation where the interactions are all negligible. It will be useful now to define a property of a, the vector of factorial effects, known as "tree-structure"; this requires that if any k-factor interaction (k ~> 2) Ail A i 2 " • • Z i k belongs to L, then a (k - 1)-factor effect A j I A j 2 . . . Ajk_ 1 also belongs to L, where (jl, j 2 , . . . , Jk-1) is a subset of (il, i 2 , . . . , ik). Thus, for example, if A2A3 E L, then one of the main effects A2 or A3 also belongs to L. This property was noticed by the author (Srivastava and Srivastava, 1976), in a review of papers published in some journals in social sciences for a period of ten years (ending aroung 1975). In many of these papers, the set of all significant factorial effects were reported, and in almost all cases, the author noticed that the tree structure was present. Among designs of (even) resolutions 2£, studies have been made mostly on the case £ = 2, where we wish to estimate {#,a(1)}, when elements of a(2) may not be negligible. The foldover method produces such designs, but it is only a sufficient condition. The necessary and sufficient condition is "zero-one symmetry with respect to triplets" (Bose and Srivastava, 1964a); i.e., we need a BA(3) in which in every 3-rowed submatrix, every (3 × 1) column vector occured the same number of times as its (0, 1)-complement. Further research on such BA's would enable us to obtain improved designs. However, the basic concept behind such designs requires justification; here we are basically in Q2, with L* = {#,_~(1),~(2)}, but we wish to estimate only {#,~(1)}, even though some (unknown) elements of ~(2) are nonnegligible. Now, it is clear that unless the elements of L (and their values) are known, we do not know ~ and hence we can not estimate _~, which is the "response surface" in this discrete case. Unless we know ¢, it is difficult to find t_ for which ¢(_t) may be maximal or minimal, a problem which is occasionally of great interest. Thus, the above L* needs to be estimated in any case! The usual justification of a resolution IV design is that if we do not have money to do a larger (resolution V) experiment, we may first do a resolution IV experiment, to find out which main effects are significant. We can later do a smaller experiment involving principally the factors that turn out to be important. Here, we must notice that this logic is not quite correct, because an interaction may be larger even when a main effect involved in it is small. However, such logic does have some support from the tree-structure concept, since such structure may be present, and if it is present then

A critique q["some aspects of experimental design

329

if both main effects A~ and A j are insignificant, the interacation A i A j is also likely to be so. We now consider Q3. Two cases arise according as the elements of L not in L* are (i) ignored, (ii) not ignored. In case (i), after the experiment, we would have the situation where (a) some elements of L are unknown, (b) the estimate of L* is biased, the sources and the amount of bias in the estimates of various elements of L* being unknown, and (i) the estimate of _¢ is biased similarly, which may in turn lead to a wrong conclusion about the _t for which ¢(t) is extremal (in some sense). The question arises: If case (ii) is chosen, don't we have to do the whole experiment. The answer is no, because then the situation falls under the theory of search linear models. Let L*o* be a subvector of L** such that we are quite certain that the elements of L not in L* are in L**. Then, if T ( N x ra) is the design chosen, and y(T), the corresponding vector of observations, we can express E[_y(T)] by a model of the form (2.1), where ~1 and ~2 are respectively _L_* and L**, and where the corresponding matrices A1 ~ d A2 c-an be easily written down. The number k (of model (2.1)) for this case will not be known if (as will often be the case) we do not know how many elements of L are in L**. Designs T through which one may tackle the situation in the last paragraph are called Search Designs, on which considerable literature exists. Some important cases considered include search designs of resolution 3.1, 3.2, 5.1 and 5.2, where "resolution (2g + 1). k" means that the design satisfies the rank condition (2.2), when (~-1 =)L* includes a__(k'), for k' = 0, 1 , . . . , e, and (-~2 -)--Lo* equals L**, and up to k parameters in ~-2 could be nonnegligible. Such designs were developed in the work (singly or jointly) of, among others, Srivastava, Gupta, and Ghosh. Ohnishi and Shirakura (1985) obtained AD-optimal designs in this class, the criterion of AD-optimality being developed earlier by Srivastava (1977). Another case studied which is combinatorially significant is where, in the above, g = 0, k ~> 2, (Katona and Srivastava, 1983); the case k = 3, 4 has been studied but is unpublished yet. In this connection, the concept of a q-covering of EG(rn, s) was introduced by Srivastava (1978), and many studies on this line were made in Jain (1980) (under the direction of R.C. Bose) using quadrics and tangent spaces. (A set of N points T in EG(m, s) is said to constitute a q-covering of EG(m, s) if and only if every (m - q)-dimensional hyperplane of EG(m, s) has a non-empty intersection T.) Anderson and Thomas (1980), and Mukerjee and Chaterjee (1986) have developed search designs for the situations where the number of levels of a factor is not necessarily equal to 2. Studies on minimal designs of resolution 3.1 and 3.2 (for the 2 ra case) have been made; these include particularly the work of Gupta (1988), and of Srivastava and Arora (1987, 1988, 1991). Unfortunately, most of the studies on search designs have concentrated on producing designs which satisfies (2.2), thereby guaranteeing that irrespective of which parameters are nonnegligible (and what their values are), the correct identification will be possible (with probability 1 if ~r2 -- 0). But, if we recall the remarks made around the equations (2.6) and (2.7), instead of worrying too much about (2.2), we should try to see what sets of parameters the design can help estimate. This is particularly important, since the value of k would, in most cases, be known only approximately, and

330

J. N. Srivastava

may actually be larger than the value which was guessed initially (and corresponding to which the design is to be made). There is some discussion in Srivastava (1992) in this spirit, but nothing substantial has yet been done. It should be remarked here that there is another reason why we should emphasize (2.2) less, and the remarks around (2.6) and (2.7) more. It is that in order to satisfy (2.2), even for small k, it turns out that very often, relatively too many treatments are needed. On the other hand, in most situations, the value of k will usually be unknown, and indeed may be more than what we believe it to be. Thus, the matter boils down more to what should be done in order to identify all the nonnegligible parameters, rather than to have a design which will be uniformly good irrespective of which set of k parameters (out of L*o* is nonnegligible), where k is some small integer. This brings us to the concept of "revealing power of a design", introduced in Srivastava (1984), and used in a few contexts since then. Broadly speaking, it refers to the ability of a design to reveal or identify the correct model. To discuss this further, we explain some terminology first. A design is called single-stage, if a single experiment is planned, and the data from this experiment is all that we shall have, in order to throw light on the model and the phenomenon under study. A multistage design, on the other hand, envisions the possibility of a series of experiments, the second experiment being planned in the light of the results of the first experiment, the third being planned in light of the results of the second, and so on. Such a design may end up having several (but, a relatively small number of) experiments, each one being substantial. The term sequential will be used, where a large sequence of experiments is envisaged, each experiment being an observation on a small number of treatments, possibly, even of a single treatment only. It is intuitively clear that if (in a discreate factorial experiment) the total number of observations is fixed, then generally, a multistage design will have more revealing power than a single stage design. In conjunction with Q3, we shall now consider Q4 and Q5 as well. All of these are characterised by the fact that there are nonnegligible unknown parameters (say k, in number) which need to be identified and estimated. Under Q3, u* (the number of elements in L__* (which we do wish to estimate)) is given, and k (whether known or unknown) is relatively small. In Q4, u* is given, but k may be quite large, and in Q5, u* is rather small, and k is nearly equal to u. We shall occasionally discuss response surface experiments, which are factorial experiments with continuous factors. From the view point of variance-optimality, recent work of Ghosh and A1-Sabah (1994), has shown that the classical designs in this field could be vastly improved; new designs are hundreds of times more efficient than the old ones! Obviously, this line of work has great potential and promise, and should be vigorously pursued; there is no reason to leave the response surface field underdeveloped. With respect to the nature of the models arising in statistical design, three situations need to be considered: (M1) The model is (functionally) unknown and not known to be representable as a linear model (with unknown paramters, but known coefficients), (M2) the model is representable as a linear model, but the coefficients of some of the unknown parameters are unknown, and (M3) the model is representable as a linear

A critique of some aspects of experimental design

331

model with unknown paramters, where the coefficients of the parameters are known. (In general, the coefficient will depend upon the points in the factor space.) Now, usually, the purpose of the experiment is either to study the response surface itself, or to find points in the factor space for which the expected response is extremal (i.e, maximum or minimum), or both. Clearly, in the first case, we need to find the model itself, and determine it completely, i.e., obtain the values, of all parameters involved. In case we wish to find the extremal points, a knowledge of the complete model may still be needed except in certain special situations. Even in these special cases, a knowledge of certain crucial features of the model will be necessary. On the other hand, the purpose of the subject of statistical design of scientific experiments must be to do the best (from the statistical view point) to help achieve the purpose of the experiment itself. Hence, from the above discussion, it is clear that the purpose of statistical design must be to completely determine the model (or, in some special cases, at least certain crucial features of the model). This brings us to the cases M1, M2, and M3. Of these, design theory has dealt largely with M3 only, which we consider first. This entails the situations (Q1)-(Q5), of which we have been considering the last three. In these cases, there are a number of parameters that have to be identified. Hence, from the above discussions, it follows that under M3, the purpose of statistical design must be to identify these parameters, and to estimate the values of these and other possible nonnegligible parameters. Thus, under M3, (Q3)-(Q5), one purpose of statistical design must be to help identify the nonnegligible parameters as accurately as possible, the ability of a design to help us do such identification is refered to as the "revealing power" of the design. It is clear that "revealing power" is a general concept, which will have to be sharply defined in each special situation, if we are interested in a quantitative assessment of the same. Several measures of "revealing power" for the case of a sequantial design, are introduced in Srivastava and Hveberg (1992). Consider (M3, Q3), where search design theory is pertinent. In a design of resolution {2g+ 1, k}, the estimation of L*' (= (#, a ' ( 1 ) , . . . , a'(g))) plus any set of k interactions in L**, is guaranteed. Thus, with respect to this L*, and this class of situations in L_**, the design's revealing power may be said to be 100%. As a contrast, we may say that the revealing power of a 24t-1 orthogonal main effect plan (OMEP) in 4t runs is zero, where L_* = {#,a(1)}, L** is the set of all interactions, and the class of situations is where a single element of L** is nonzero! As another instructive class of examples, consider Hadamard matrices of size ( N x N), with N not equal to a power of 2. Thus, let m = 5, N = 12, L*' = {#, cd(1)}. Suppose L*o* = {cd(2),~'(3)}, and the class of situations of interest is where at most three elements of L*o* are nonnegligible, and that these elements obey the tree structure among themselves. (For example, the nonnegligible elements could be the sets {A1A2, A1AzA4, A1A2As} or {AiA2, A1A3, A1A2A4} but not the set {AIAz, A1A3, A2A3As}, since the last one does not have tree structure.) An interesting question is: What is the "revealing power" of the (5 × 12) OA under consideration? The author believes that such OA's (which are obtained from Hadamard matrices but are not obtainable as Kronecker Products of smaller Hadamard matrices, and are obtained using quadratic residues, or other techniques) should have good "revealing

332

J. N. Srivastava

power" with respect to certain classes of situations where L* = {#, _~' (1)}; however the corresponding /5*0* and the class of situations need to be identified. Minimum aberration designs may also be investigated from this angle. The question still remains: how shall we assess the revealing power of a (single stage) design. We illustrate the answer by the example in the last paragraph. In this case, since m = 5, both c~(2) and 2(3) have 10 elements each. Now, we can have (for the nonnegligible elements), one of three situations; (i) all belong to c~(2), there are (10) = 120 cases here; (ii) two are from o~(2) and one is from ~(3), there are (3)5 (2)3 (3)3 -~- (4)5 X 3 )( 6] = 120 cases here as well, and (iii) one element from ol(2) and two from ~_(3), there are (~) (~) = 30 cases here. Let ~_20(k x 1) denote the vector of nonnegligible parameters. In the last example, we have k = 3. Then, ~-20 is a subvector of {2 in (2.2). Let ~_21(k x 1) be any other subvector of ~-2" We will say that ~-2o is "clear" if and only if [A1 " A20 : A21] is of rank (ul + ko), for all ~-21 in ~-2' where ko depends upon ~-21 (with k 0. Thus, it is important to examine some identification techinques for the case ~r2 = 0 in more detail. In Srivastava (1992), a case study was made for a 26 experiment, with v ~ 12. Two designs were compared, an OA(5) with N = 32 and a BA(6) with N = 22. The elements of L obeyed a tree structure. It was seen there that the OA(5) failed to reveal L, because some elements of L_ were mutually confounded. It is easy to see that this indeed would be a difficulty in general, with the classical designs obtained using the Bose-Kishen-Fisher theory; since the 2 m - q type of fractions necessarily give rise to 2 q interactions which are mutually confounded. On the other hand, an irregular design does not necessarily have such confounding, and thus may have much more revealing power, as is demonstrated in the above case study. Even when ~r2 = 0, and the revealing power of a design is 100% with respect to a set of parameters, it may be quite a difficult task to retrieve the parameters. In this connection, two techniques (called 'temporary elimination principle' (TEP) and 'intersection sieve' (IS)) were introduced in Srivastava (1992) and have been found

334

J. N. Srivastava

useful not only in the factorial design area but also in determining non-additive cells in row-column designs. Let 0i (i = 1,2, 3) be linear functions of parameters, such that the parameters occurring in each 0i are all distinct, and such that for i ¢ j (i, j = 1,2, 3), 0i and 0j do not involve any common parameters. Then, TEP says that if 01 = 0, then "temporarily", we can assume all parameters occurring in 01 to be individually zero. Similarly, if 01 + 0a = 01 + 03, then IS says that we should "temporarily" regard each parameter in 02 and in 03 to be zero. Here, 'temporarily' refers to the fact that TEP and IS are "tentative decision rules", i.e., the decision made by using them is made only for a while during the analysis of the data, and that at a later stage in this analysis, such decision could possibly be reversed. The purpose of this analysis is to somehow identify the elements of L, and the tentative rules often help simplify the complexity arising out of the presence of a large number of parameters. Many other useful identification techniques were introduced in Srivastava (1987b), of which we discuss one, namely a 'balanced view-field'. A view field is just a fancy name for a bunch of linear functions of parameters, which we wish to examine. A 'balanced' view field is the collection of linear dunctions {p(0) I P E P } where P is the set of all (m!) permutations (for a 2 ~ experiment) of the factor symbols, and 0 is some linear function of the parameters. For example, for m = 4, and 0 = A1 - A12, the view field is the set of 12 linear functions {A1 - A~z, A1 - A13, A1 - A14, A2 A12, A2 -A23, A 2 - A24, A3 - A I 3 , A3 -A23, As -A34, A4 -A41, A 4 - A42, A 4 - A43 }; where, for example, A2 - A24 is obtained from 01 by using the permutation (2413) or (2431). Notice that if the view field is balanced, then any linear function in it could generate the rest. If the design is a B-array of full strength, and a linear function 0 is estimable, then the whole (balanced) view field generated by 0 is estimable, and is available for inspection. Now, if u is relatively small, and 0 is appropriately selected, then for small cr2, patterns would appear in the view field, in the sense that the (estimated) value of p(O) would be roughly constant for p ranging over a certain subset P1 of P. In other words, P may break up into disjoint set P 1 , . . . , Pq, such that for p C Pj (j = 1 , . . . , q), the value of p(O) is some constant Cj. By comparing the p(O) within the class p E Pj, and using the IS and TEP often leads to identification of the large nonnegligible parameters. The use of IS and TEP is more convincing and effective when the view field is balanced. Also, because of the presence of symmetry, the view field can be grasped and inspected more easily. Indeed, for many cases, this appears to be the most powerful technique known. No doubt, the method based on the minimum error sum of squares (discussed in Section 2 for identifying parameters under the SLM) is still there, and has a more decisive role to play. However, it suffers from the draw back that such a sum of squares has to be computed for all possible competing L which would be a bit tiresome even for the computers. One needs faster methods, and the balanced view-fields do appear to be useful in at least some situations. Investigations are continuing. It would be useful to describe a few/3-arrays which may serve as good first stage designs. Let f2,~j denote the set of (~.) treatments in a 2 m experiment, in which exactly j factors are at level zero. Let Tmo be the design with ( N = 1 + (r~) + (~))

A critiqueof some aspects of experimentaldesign

335

treatments consisting of I2m0, Y2ml, and J'2m,m_ 2. As a resolution V design, it was studied in Srivastava (1961), and shown to be asymptotically orthogonal. We may use Tmo as a first stage design, or preferably Tm (which is Tmo plus the single treatment Y2mm). For 5 ~< m 1; (4, 0, 2s0, so), already discussed below (6); (5, 1, (4 + 2s0), so); (7, 1,4(s0 - 2), so), so >7 2; and (8, 2, 4s0, so), where so = 0, 1 , 2 , . . . , unless otherwise specified. (Note that some of these arrangements call for more center points than recommended in the table, an example of how applications of different criteria can produce conflicting conclusions.) Further division of the star will not lead to an orthogonally blocked design. However, it is possible to divide the cube portion into smaller blocks and still maintain orthogonal blocking if k > 2. As long as the pieces that result are fractional factorials of resolution III or more (see Box et al., 1978, p. 385), each piece will be an orthogonal design. All fractional factorial pieces must contain the same number of center points or else (4) cannot be satisfied. Thus co must be divisible by the number of blocks.

Replication of point sets

In a composite design, replication of either the cube portion or the star portion, or both can be chosen if desired. An attractive example of such possibilities is given by Box and Draper (1987, p. 362). This is a 24-run second-order design for three factors that is both rotatable and orthogonally blocked into four blocks of equal size. It consists of a cube (fractionated via x l x2x3 = :t: 1) plus replicated (doubled) star plus four center points, two in each 23-1 block. This particular design also provides an interesting example of estimating cr2 in the situation where center points in different blocks of the design are no longer directly comparable due to possible block effects.

Obtaining the block sum of squares

When a second-order design is orthogonally blocked, one can 1. Estimate the/3 coefficients of the second-order model in the usual way, ignoring blocking. 2. Calculate pure error from repeated points within the same block only, and then combine these contributions in the usual way. Runs in different blocks cannot be considered as repeats. 3. Place an extra term S S (blocks) = ~ w=l

B~

G 2,

nw

n

364

N. R. Draper and D. K. J. Lin

with (m - 1) degrees of freedom in the analysis of variance table, where B~o is the total of the nw observations in the wth block and G is the grand total of all the observations in all the m blocks. If a design is not orthogonally blocked, the sum of squares for blocks is conditional on terms taken out before it. An "extra" sum of squares calculation is needed; see Draper and Smith (1981).

10. Rotatability Rotatability is a useful property of an experimental design. Any given design produces an X matrix whose columns are generated by the x-terms in the model to be fitted (e.g., (3)) and whose rows correspond to values from the n given design points. If z ~ is a vector of the form of a row of X but generated by a selected point at which a predicted response is required after estimation of the model's coefficients, then it can be shown that the variance of that prediction is V(~(x)) = z'(X'X)-lzcr 2 where a2 is the variance of an observed response value, assumed to be constant. For any given design, contours of V(~(x)) = constant can be plotted in the k-dimensional x-space. If those contours are spherical, the design is said to be rotatable. In practice, exact rotatability is not important, but it is a plus if the design is at least "close to being rotatable" in the sense that V{~(x)} changes little for points that are a constant distance from the origin in the region covered by the design points. For more on rotatability, see Box and Draper (1987). To assess how close a design is to being rotatable, we can use a criterion of Draper and Pukelsheim (1990). We describe this in the context of second order designs, although the concept is completely general for any order. For easy generalization a special expanded notation is needed. Let x = (xl, x 2 , . . . , xk)'. We shall denote the terms in the second-order model by a vector with elements 1; Xt; X t Q x t ~

where the symbol ® denotes the Kronecker product. Thus there are (1 + k ÷ k 2) terms, 1; Xl~X2~...~Xk; X2~XlX2~...~XlXk; X2Xl,X~,...,X2Xk;...;XkXl,XkX2,...,X~.

(An obvious disadvantage of this notation is that all cross-Product terms occur twice, so the corresponding X ' X matrix is singular. A suitable generalized inverse is obvious, however, and this notation is very easily extended to higher orders. For example, third order is added via x ' ® x ' ® x ' , and so on.) Consider any second-order rotatable design with second-order moments A2 = 2 2 for i,j = 1 , 2 , . . . , k and i ¢ j. We can N - l ~ , x i 2 and )~4 = N --1 ~uXiuXju, write its moment matrix V of order (1 + k + k 2) × (1 + k + k2), in the form V = V0 + )~2(3k)1/2V2 + ),413k(k + 2)]1/2V4,

(9)

365

Response surface designs

where V0 consists of a one in the (1, 1) position and zeros elsewhere, where ~ consists of (3k) -1/2 in each of the 3k positions corresponding to pure second-order moments in V and zeros elsewhere, and V4 consists of 313k(k + 2)] -1/2 in the k positions corresponding to pure fourth-order moments, [3k(k+2)]-U2 in the 3k(k-1) positions corresponding to mixed even fourth-order moments in V, and zeros elsewhere. Note that V~, V~, and V4 are symmetric and orthogonal so that V/Vj = 0, and also the Vi have norms I[ V~ [l= [tr(ViVi)] 1/2 = 1. Suppose we now take an arbitrary design with moment matrix A, say. Draper et ai. (1991) showed that, by averaging A over all possible rotations in the x space, we obtain

(lO)

a = Vo + V2tr(AV2) + Vatr(AV4). We call A the rotatable component of A. The measure of rotatability is O* = =

[[ A - V0 []2 /]1 A - V0112 { t r ( A - g o ) 2 } / { t r ( t - Vo)2}.

(11)

The rotatability measure Q* is essentially an R 2 statistic for the regression of the design moments of second and fourth order in A onto the "ideal" design moments represented by V. Such a criterion is easy to compute and is invariant under design rotation. It enables us to say how rotatable a design is, and to improve the design's Q* value by adding new design points. For examples, see Draper and Pukelsheim (1990).

11. Variance, bias and lack of fit Suppose that E(y) = r/({) where ~ is a vector of predictor variables and let f ( ~ ) be the vector with polynomial elements used to approximate y. We choose the form of f in the hope that it will provide a good approximation to r1 over some limited region of interest, R say. Two type of errors then need to be considered: 1. Systematic, or bias, errors 6({) = r/({) - f ( { ) , the difference between the expected value of the response, E(y) = r/({) and the approximating function f ( { ) . 2. Random errors e. Although the above implies that systematic errors 8({) are always to be expected, they are often wrongly ignored. Yet it is only rarely true that bias can be totally ignored. Suppose that if({) is the fitted value obtained at a general point { in the experimental space, when the function f ( { ) is fitted to available data on V and {, then the associated mean square error, standardized for N, the number of observations and ~r2, the error variance is -

=

=

-

-

+

+

-

,(a)}2 -

r/({)}

2

366

N. R. Draper and D. K. J. Lin

after some reduction. We can write this as M ( , ) = V(,) + B ( , ) and describe it as "the standardized mean square error at a p o i n t , is equal to the variance V ( , ) of prediction plus the squared bias B(,)". We can also make an assessment of variance and bias over any given region of interest R by averaging (and normalizing) V(,) and B ( , ) over R. More generally, if w(,) is a weight function, we can write V-- fco(,)V(,)d,/fco(,)d,

and

B=/B(,)d,//co(,)

d,

and integrate it over the entire ,-space. Most often in practice we would have

co(') =

1 0

withinR, outside R,

whereupon V and B would represent integrals taken over R. If we denote the integrated mean squared error by M, we can write M=V+B.

In practice, of course, the true relationship ~(,) would be unknown. To make further progress, we can proceed as follows: 1. Given that we are going to fit a polynomial of degree dl (say) to represent the function over some interval R, we can suppose that the true function ~/(,) is a polynomial of degree d2, greater than dl. 2. We need also to say something about the relative magnitudes of systematic (bias) and random (variance) errors that we could expect to meet in practical cases. An investigator might typically employ a fitted approximating function such as a straight line, if he believed that the average departure from the truth induced by the approximating function were no worse than that induced by the process of fitting. We shall suppose this to be so, and will assume, therefore, that the experimenter will tend to choose the weight function w(,), the size of his region R, and the degree of his approximating function in such a way that the integrated random error and the integrated systematic error are about equal. Thus we shall suppose that the situation of typical interest is that where B is roughly equal to V.

All-bias designs

If the problem of choosing a suitable experiment design is considered in the context described above, a major result can be deduced. An appropriate experimental design for an "average situation" when V and B are roughly equal has size roughly 10%

Response surface designs

367

greater than the all bias design, appropriate when V = 0. This result is important because the moments of the all-bias design are easily determined. Suppose that we now work in terms of variables x, where the x's are coded forms of the ~'s, and centered around the origin, a conventional step• Suppose further that a polynomial model of degree dl

~(x) = X~bl is fitted to the data, while the true model is a polynomial of degree d2, ~(X) = X~/31 q- X~/32.

Thus, for the complete set of N data points

y(x) = X l b l , TI(X) = Xl/31 -[- X2/32. Quite often, it would reasonable to choose d2 = dl + 1. Let us now write Mll = N-1X~X1,

MI2 = N - I x ~ x 2 ,

fell = foW(X)xlxl d~,

fe12 = /oW(X)XlX'2dx.

It can now be shown that, whatever the values of/31 and/32, a necessary and sufficient condition for the squared bias B to be minimized is that

M ~ l M12 = fe{ll felv A sufficient (but not necessary) condition for B to be minimised is that M l l = fell

and

M12 =//'12,

Now the elements of fell and fe12 are of the form

/ w ( x l x ~ ' x ~ 2 ... x k'~ d x Jo and the elements of M l l and M12 are of the form N

N-1 ~

u=l

Otl ot2 .

XluX2u

o~ •. Xku.

N. R. Draper and D. K. J. Lin

368

These typical elements are, respectively, moments of the weight function and moments of the design points of order c~----al + c~2 + . . .

÷ C~k.

Thus, the sufficient condition above states that, up to and including order dl + d2, all the moments of the design are equal to all the moments of the weight function. EXAMPLE l. Suppose we wish to fit a straight line y = / 3 0 + / 3 1 x + e to data to be taken over the region R, - 1 ~< x ~< 1, where the weight function w(x) is uniform within R and z e r o outside R. Suppose quadratic bias is slightly feared. Then the all-bias design is obtained when the design moments ml, m2, m3, where N

x iu

1

mi = N - ~

u=|

are chosen to be m l = m3 = 0, because/Zl = #3 = 0, and m2 = / z 2 =

x 2 dx

dx = - .

1

1

3

It follows that, if we use a three-site, three-point design at positions x = - a , 0, a, we must choose 2a2/3 = 1/3 or a = 2 -1/2 = 0.707. For a typical case where V = B roughly, we could increase a slightly to (say) 0.75 or 0.80, about 10% or so. EXAMPLE 2. In k dimensions, fitting a plane and fearing a quadratic, with R the unit sphere, an all bias design is a 2 ~- p design with points (±a, ± a , . . . , ±a) such that

2k-Pa2/n = k/(k + 2) which implies, if n0 center points are used that

f k__(2k-_p + a=,~

n o ) ~ 1/2

(k+2)2 k-p J

"

Note that the special case k = 1,p = 0, no = 1 is Example 1. For k = 4 , p = 0, no = 2, we have a = 0.866 for the all bias case. Note that this places the factorial points at distances r = (4a2) 1/2 ----31/2 from the origin, that is, outside R.

Detecting lack of fit Consider the mechanics of making a test of goodness of fit using the analysis of variance. Suppose we are estimating p parameters, observations are made at p + f distinct points, and repeated observations are made at certain of these points to provide e pure error degrees of freedom, so that the total number of observations is N --- p + f + e.

Response surface designs

369

The expectation of the unbiased pure error mean square is a 2, the experimental error variance, and the expected value of the lack of fit mean square equals ~2+ A2/f where A2 is a noncentrality parameter. The test for goodness of fit is now made by comparing the mean square for lack of fit against the mean square for pure error, via an F(f, e) test. In general, the noncentrality parameter takes the form N

A 2 = Z{E(~'~,) - r/~} 2 = E(SL) - for 2,

(12)

u=l

where SL is the lack of fit sum of squares. Thus, good detectability of general lack of fit can be obtained by choosing a design that makes A 2 large. It turns out that this requirement of good detection of model inadequacy can, like the earlier requirement of good estimation, be achieved by certain conditions on the design moments. Thus, under certain sensible assumptions, it can be shown that a (dl =) dth order design would provide high detectability for terms of order (d2 = ) ( d + 1) if (1) all odd design moments of order (2d + 1) or less are zero, and (2) the ratio (d+l) N d2..~u=l X - .rU,2N N

(13)

2 -~d+l

Eu=l'Vu ~ is large, where

2 d.+xL+

+xL.

In particular, this would require that, for a first-order design (d = 1) the ratio N ~ 4 22 r~/{~ r~} should be large to provide high detectability of quadratic lack of - , r i~/'t2_.~ri~1 6 trv--~ 2 13 should be large fit; for a second,order design (d = 2), the ratio N 2 v2_, to provide high detectability of cubic lack of fit. Note that, for the 2 k - p design of Example 2 above, the detectability criterion is independent of the size of a, which cancels out. Increasing the number of center points slightly increases detectability, however, since this is determined by contrasting the factorial point average response minus the center point average response.

Further reading For additional commentary, see Chapter 13 of Box and Draper (1987) and Chapter 6 of Khuri and Cornell (1987). Related work includes Draper and Sanders (1988), DuMouchel and Jones (1994), and Wiens (1993).

N. R. Draper and D. K. J. Lin

370

12. Some other second order designs The central composite design is an excellent design for fitting a second-order response surface, but there are also other useful designs available. We now mention some of these briefly.

The 3 k factorial designs

The 3 k factorial design series consists of all possible combinations of three levels of k input variables. These levels are usually coded to - 1,0, and 1. For the k = 2 case, the design matrix is Xl

-1

x2

-1

0 -1 1 -1

-1 D=

0

0

0

1

0

-1

1

0

1

1

1

Such a design can actually be used to fit a model of form E(y) = flo + f l l X 1 + although the cubic and quartic terms would usually be associated with "error degrees of freedom". To reduce the total number of experimental design points when k is large, fractional replications 3 k-p would often be employed if the number of runs were enough to fit the full second-order model. An extended table of fractional 3 k-p designs is given by Connor and Zelen (1959).

The Box-Behnken designs

The Box-Behnken (1960) designs were constructed for situations in which it was desired to fit a second-order model (3), but only three levels of each predictor variable Xl,X2,... ,xk, coded to - 1 , 0 , and 1, could be used. The design points are carefully chosen subsets of the points of 3 k factorial designs, and are generated through balanced incomplete block designs (BIBD) or partially balanced incomplete block designs (PBIBD). They are available for k = 3-7, 9-12, and 16. (See Table 6 for k = 3-7.) They are either rotatable (for k = 4 and 7) or close to rotatable. Except for the designs for which k = 3 and 11, all can be blocked orthogonally. The designs

Response surface designs

371

Table 6 The Box-Behnken (1960) designs, 3 ~< k ~< 7 Number of factors k Design malxix 4-1 ~1 0 0

No. of points

4-1 0 ] 0 4-1 :El :El 0 0

Blockingand association schemes

/ 12

No orthogorml blocking BIB (one associate class)

3 N = 15

-4-1 4-1 0 0 0 0 4-1 -4-1 0 0 0 0 :El 0 0

0 0 4-1 4-1 0 0

:1:1 0 0

:El 0 0

0 4-1 4-1 0 0 0

0 :kl 0

8 1

3 blocks of 9 BIB (one associate class)

8 1

}8 i

N = 27 :1:1 q-1 0 0 0 :El :1:1 0 0 0 0 0

0 4-1 0 4-1 0 0

0 0 ::1:1 0 0 -4-1 0 0 :kl 4-1 0 0

0 4-1 4-1 zkl 0 0 0 0 4-1 :1:1 0 0 0 -4-1 0 0 0 0

0 0 :kl 0 0 4-1 0 4-1 4-1 0 0 0

2 blocks of 23 BIB (one associate class)

3

20

3 N=46

::El -4-1 0 -4-1 0 0 -4-1 4-1 0 4-1 0 0 -4-1 ::El 0 4-1 0 0 :kl 4-1 0 4-1 0 0 4-1 :kl 0 4-1 0 0 0 0 0 0 0

0 0 :El 0 :El -4-1 0

}48

2 blocks of 27 First associates: (1,4); (2, 5); (3, 6)

oo01110]}

:kl 0 0 0 0 4-1 4-1 0 -4-1 0 0 4-1 0 4-1 :::t:1 4-1 0 :1:I 0 0 0 0 0 -4-1 ::kl 0 0 :1:1 4-1 0 4-1 0 -4-1 0 0 0 :El -4-1 0 0 :El 0 0 0 0 0 0 0 0

6 N = 54

2 blocks of 31 BIB (one associate class)

56

6 N = 62

372

N. R. Draper and D. K. J. Lin

Table 7 Rechtschaffner's (1967) point sets Number I II III IV

Points

Design generator (point set)

Typical point

1 k

(+1, +1,... ,+1) or (-1, - i .... ,-1) One +1 and all other -1 Two +1 and all other -1 One + 1 and all other 0

(+1,+1 ..... +1) (+1, - 1 , . . . , -1) (+1, +1, - 1 , . . . , -1) (+ 1, 0,..., 0)

k(k - 1)/2

k

Table 8 Point sets of Box and Draper (1972, 1974) Number

Points

Design generator (point set)

Typical point ( - 1 , - 1 ..... - 1 ) (+1,-1,...,-1,...,-1) ()~,)~, -1 ..... -1) (~, 1,..., 1)

I

1

( + 1 , + 1 . . . . . +1) or ( - 1 , - 1 ..... -1)

II III IV

k

One +1 and all other -1 Two )~ and all other -1 One/~ and all other 1

k(k - 1)/2

k

have a relatively modest number of runs compared to the number of parameters in the corresponding second-order models. For additional appreciation of the usefulness of these designs, see Draper, Davis, Pozueta and Grove (1994).

S o m e minimal-point second-order designs

Lucas (1974) gave minimal-point designs not of composite type that he called "smallest symmetric composite designs," which consist of one center point, 2k star points, and (~) "edge points." An edge point is a k × 1 vector having ones in the ith and j t h location and zeros elsewhere. Note that the edge point designs do not contain any two-level factorial points. Rechtschaffner (1967) used four different so-called design generators (actually point sets) to construct minimal-point designs for estimating a second-order surface (see Table 7). The signs of design generators I, II, and III can be varied (e.g., we may have one - 1 and all other ÷1 in design generator II, say). Rechtschaffner's designs are available for k = 2, 3, 4 , . . . , but, as pointed out out by Notz (1982), they have an asymptotical D efficiency of 0 as k -+ c~ with respect to the class of saturated designs. Box and Draper (1971, 1974) provided other minimal-point designs for k = 2, 3,4, and 5, made up from the design generators (point sets) shown in Table 8. Values for A and # were tabulated in the 1974 article. Kiefer, in unpublished correspondence, established, via an existence result, that this type of design cannot be optimal for k >~ 7, however. Box and Draper's designs were given for k ~ 5, though they can be generated for any k. Mitchell and Bayne ~( 1 9 7 6 ) u s e d a computer algorithm called D E T M A X that Mitchell (1974) developed earlier to find an n-run design that maximizes I X ~ X I ,

Response surface designs

373

given n, a specified model, and a set of "candidate" design points. For each value of k = 2, 3, 4, and 5, they ran the algorithm 10 times, each time starting with a different randomly selected initial n-run design. The algorithm then improved the starting design by adding or removing points according to a so-called "excursion" scheme until no further improvement was possible. Notz (1982) studied designs for which p = n. He partitioned X so that

x:[Zll:[ '~2

Y21

Y22

'

where Z1 is (p - k) x p and Z2 is (/9 - k) x k. Note that Yll is (p - k) x (p - k), Y12 is ( p - k) x k, Y21 is k x ( p - k), and Y22 is k x k, and we can think of Z1 as representing the cube points and Z2 the star points; Y12 over Y22 consists of the columns (x 2, x~,.., , x 2k)" Thus (a) all elements in Yll are either + l or - 1 , (b) all elements in Y2a are either 1 or 0, and, more important, (c) all elements in Y12 are + 1. It follows that IXI = I X ' X I 1/z = 111111. IY22 - Jk,kl, where Jk,k is a k x k matrix with all of its elements equal to 1. Maximization of [ X t X I is now equivalent to maximization of IY11[ and IY22 - Jk,kl separately. Notz found new saturated designs for k ~ 5 and extended his result to the k = 6 case. Most of the minimal-point designs available for k ~> 7 comprise the extensions of Lucas's (1974) or Rechtschaffner's (1967) or Box and Draper's (1971, 1974) designs. Minimal-point designs can also be obtained by using the methods given in Draper (1985) and Draper and Lin (1990a), employing projections of Plackett and Burman designs for k = 3, 5, 6, 7, and 10. See Section 8. Their main virtues are that they are easy to construct and of composite form, providing orthogonal or near orthogonal designs and including other previously known small composite designs as special cases. For other designs and related considerations, see Khuri and Cornell (1987). A comparison of all the designs we have discussed in this section is made in Draper and Lin (1990a).

References Bates, D. M. and D. G. Watts (1988). Nonlinear Regression Analysis and Its Applications. Wiley, New York. Box, G. E. P. (1952). Multifactor designs of first order. Biometrika 39, 40-57. (See also p. 189, note by

Tocher.) Box, G. E. P. (1959). Answer to query: Replication of non-center points in the rotatable and near-rotatable central composite design. Biometrics 15, 133-135. Box, G. E. P. and D. W. Behnken (1960). Some new three level designs for the study of quantitative variables. Technometrics 2, 455-475. Box, G. E. P. and S. Bisgaard (1993). What can you find out from 12 experimental runs. Quality Eng. 4. Box, G. E. P. and N. R. Draper (1959). A basis for the selection of a response surface design. J. Amer. Statist. Assoc. 54, 622-654. Box, G. E. P. and N. R. Draper (1963). The choice of a second order rotatable design. Biometrika 50, 335-352.

374

N. R. Draper and D. K. J. Lin

Box, G. E. E and N. R. Draper (1987). Empirical Model-Building and Response Surfaces. Wiley, New York. Box, G. E. E and J. S. Hunter (1957). Multifactor experimental designs for exploring response surfaces. Ann. Math. Statist. 28, 195-241. Box, G. E. E and J. S. Hunter (1961). The 2 k-p fractional factorial designs, Parts I and II. Technometrics 3, 311-351 and 449-458. Box, G. E. E, W. G. Hunter and J. S. Hunter (1978). Statistics for Experimenters. Wiley, New York. Box, G. E. E and K. B. Wilson (1951). On the experimental attainment of optimum conditions. J. Roy. Statist. Soc. Ser. B 13, 1-38, discussion 38-45. Box, M. J. and N. R. Draper (1971). Factorial designs, the IX~Xt criterion and some related matters. Technometrics 13, 731-742. Corrections 14 (1972), 511 and 15 (1973), 430. Box, M. J. and N. R. Draper (1974). Some minimum point designs for second order response surfaces. Technometrics 16, 613-616. Connor, W. S. and M. Zelen (1959). Fractional factorial experimental designs for factors at three levels. U.S. Dept. of Commerce, National Bureau of Standards, Applied Math. Series No. 54. Davies, O. L., ed. (1978). Design and Analysis of Industrial Experiments. Longman Group, New York. De Baun, R. M. (1956). Block effects in the determination of optimum conditions. Biometrics 12, 20-22. De Baun, R. M. (1959). Response surface designs for three factors at three levels. Technometrics 1, 1-8. Draper, N. R. (1982). Center points in response surface designs. Technometrics 24, 127-133. Draper, N. R. (1984). Schl~flian rotatability. J. Roy. Statist. Soc. Ser. B 46, 406-411. Draper, N. R. (1985). Small composite designs. Technometrics 27, 173-180. Draper, N. R., T. E Davis, L. Pozueta and D. M. Grove (1994). Isolation of degrees of freedom for Box-Behnken designs. Technometrics 36, 283-291. Draper, N. R. and H. Smith (1981). Applied Regression Analysis. Wiley, New York. Draper, N. R. and D. K. J. Lin (1990a). Small response surface designs. Technometrics 32, 187-194. Draper, N. R. and D. K. J. Lin (1990b). Connections between two-level designs of resolution I I I * and V. Technometrics 32, 283-288. Draper, N. R., N. Gaffke and E Pukelsheim (1991). First and second-order rotatability of experimental designs, moment matrices, and information functions. Metrika 38, 129-161. Draper, N. R. and E Pukelsheim (1990). Another look at rotatability. Technometrics 32, 195-202. Draper, N. R. and E. R. Sanders (1988). Designs for minimum bias estimation. Technometrics 30, 319-325. DuMouchel, W. and B. Jones (1994). A simple Bayesian modification of D-optimal designs to reduce dependence on an assumed model. Technometrics 36, 37-47. Dykstra, O. (1959). Partial duplication of factorial experiments. Technometrics 1, 63-75. Dykstra, O. (1960). Partial duplication of response surface designs. Technometrics 2, 185-195. Hall, M. J. (1961). Hadamard matrix of Order 16. Jet Propulsion Laboratory Summary 1, 21-26. Hall, M. J. (1965). Hadamard matrix of Order 20. Jet Propnision Laboratory Technical Report 1, 32-76. Hartley, H. O. (1959). Small composite designs for quadratic response surfaces. Biometrics 15, 611-624. Herzberg, A. M. (1982). The robust design of experiments: A review. Serdica Bulgaricae Math. Publ. 8, 223-228. Khuri, A. I. and J. A. Comell (1987). Response Surfaces, Designs and Analyses. Marcel Dekker/ASQC Quality Press, New York/Milwankee. Lin, D. K. J. and N. R. Draper (1991). Projection properties of Plackett and Burman designs. Tech. Report 885, Department of Statistics, University of Wisconsin. Lin, D. K. J. and N. R. Draper (1992). Projection properties of Plackett and Burman designs. Technometrics 34, 423-428. Lin, D. K. J. and N. R. Draper (1995). Screening properties of certain two level designs. Metrika 42, 99-118. Lucas, J. M. (1974). Optimum composite designs. Techometrics 16, 561-567. Mitchell, T. J. (1974). An algorithm for the construction of D-optimal experimental designs. Technometrics 16, 203-210. Mitchell, T. J. and C. K. Bayne (1978). D-optimal fractions of three-level fractional designs. Technometrics 20, 369-380, discussion 381-383. Notz, W. (1982). Minimal point second order designs. J. Statist. Plann. Inference 6, 47-58.

Response surface designs

375

Plackett, R. L. and J. P. Burman (1946). The design of optimum multifactorial experiments. Biometrika 33, 305-325. Rechtscbaffner, R. L. (1967). Saturated fractions of 2 n and 3 n fractional designs. Technometrics 9, 569-575. Seber, G. A. E and C. J. Wild (1989). Nonlinear Regression. Wiley, New York. Wang, J. C. and C. E J. Wu (1995). A hidden projection property of Plackett-Burman and related designs. Statist. Siniea 5, 235-250. Welch, W. J. (1984). Computer-aided design of experiments for response estimation. Technometrics 26, 217-224. Wesflake, W. J. (1965). Composite designs based on irregular fractions of factorials. Biometrics 21, 324-336. Wiens, D. E (1993). Designs for approximately linear regression: Maximizing the minimum coverage probability of confidence ellipsoids. Canad. J. Statist. 21, 59-70.

S. Ghosh and C. R. Rao, eds., Handbookof Statistics, Vol. 13 © 1996 Elsevier Science B.V. All rights reserved.

1"~ _!_/.a

Multiresponse Surface Methodology

Andr~ L Khuri

1. Introduction

The formal development of response surface methodology (RSM) was initiated by the work of Box and Wilson (1951), which introduced the sequential approach in an experimental investigation. This particular approach became the cornerstone and one of the characteristic trademarks of RSM. It was effectively utilized in many applications, particularly in the chemical industry. The article by Myers et al. (1989) provides a broad review of RSM. Earlier, Hill and Hunter (1966) emphasized practical applications of RSM in the chemical industry. This was followed by another review article by Mead and Pike (1975) where the emphasis was on biological applications of RSM. In addition to these review articles, the four books by Box and Draper (1987), Khuri and Cornell (1996), Myers (1976), and Myers and Montgomery (1995) give a comprehensive coverage of the various techniques used in RSM. In the early development of RSM, only single-response variables were considered. Quite often, however, several response variables may be of interest in an experimental situation. By definition, an experiment in which a number of response variables are measured for each setting of a group of input (control) variables is called a multiresponse experiment. The analysis of such experiments did not receive much attention until the publication of Zellner (1962) and Box and Draper (1965). Both articles addressed the problem of estimation of parameters for several response models. This, however, does not mean that no multiresponse experiments were performed prior to 1962. Lind et al. (1960), for example, considered a two-response experiment in an attempt to determine conditions that 'were favorable to both responses. Data obtained in a multiresponse experiment (multiresponse data) are multivariate in character. It is therefore necessary that multivariate techniques be deployed for the analysis of such data. This way, information from several response variables can be combined. As we shall see later in this chapter, the multivariate approach has several advantages over the univariate approach. The latter is based on one-responseat-a-time treatment of a multiresponse system, This amounts to a total disregard of the multivariate nature of the experiment. On the other hand, the former enables us to gain a better understanding of the underlying mechanism as well as acquire information about any relationships that may exist among the response variables. This added information can be instrumental in providing more precise estimates of optimal process conditions, for example, and more accurate analyses of the multiresponse data. 377

378

A. L Khuri

The purpose of this chapter is to provide a thorough coverage of the basic methods used in RSM for the design and analysis of multiresponse experiments. An earlier review of this subject was given in Khuri (1990a). The present chapter is more up-todate and more extensive in scope and coverage. It should therefore be useful to those who have an interest in RSM, and would like to explore extending its applicability to multiresponse situations. It is hoped that the information given in this chapter will provide a stimulus to the reader to pursue further research in the multiresponse area. The main topics covered in this chapter include the following: • • • • •

Plotting of multiresponse data. Estimation of parameters of a multiresponse model. Inference for multiresponse models. Designs for multiresponse models. Multiresponse optimization.

2. Plotting of multiresponse data A graphical display of a data set provides a visual perception of its structure. It can convey a variety of information and may reveal some salient features of the data. Plotting is therefore an effective tool in data analysis. This is evidently true in twodimensional and, to a lesser extent, three-dimensional plots. For example, in regression analysis, a plot of residuals is a useful diagnostic tool for checking the validity of assumptions made about the data and the fitted model. While two-dimensional and three-dimensional plots are easy to visualize and interpret, the same is not true in higher dimensions. For example, it is difficult to form a mental image of a four-dimensional scatter plot. Yet, if we were to acquire a better insight and extract more information from a multiresponse data set, an efficient scheme for producing a multidimensional plot would be essential (see Tukey, 1962, Section 4). There are several techniques for graphing multiresponse data. Some techniques plot projections of the data on subspaces of dimensions three or fewer. For example, if variables are considered two at a time, pairs of values can be obtained and used to produce two-dimensional scatter plots. This generates an array of plots that can be arranged in a scatter plot matrix (see Cleveland, 1993, Chapter 5), or in a socalled generalized draftsman's display (see Chambers et al., 1983, Section 5.4). Other techniques use symbolic coding schemes. For example, Anderson (1960) proposed using small circles of fixed radius, instead of points in a two dimensional scatter plot, which he called glyphs. Several rays of various lengths emanate from each glyph representing values of different variables, which may be quantitative or qualitative (e.g., low, medium, high). Anderson extended the use of glyphs to situations other than scatter diagrams and subsequently used the more general term metroglyphs. Friedman et al. (1972) introduced a similar coding scheme called stars where the rays of a glyph are joined up to form polygons. Several other variants of the glyph technique are reported in Fienberg (1979). Chernoff (1973) used cartoon faces to represent data values by means of different facial features such as the length of a nose, shape

379

Multiresponse surface methodology

of a smile, and size of eyes. This technique is not easy to use or interpret and has several major problems that were pointed out by Fienberg (1979). Kleiner and Hartigan (1981) introduced a method in which each point is represented by a tree. Variables are assigned in a particular manner to the branches of a tree whose lengths determine the values of the variables. Additional examples of symbolic plotting can be found in Everitt (1978), Tukey and Tukey (1981), and Chambers et al. (1983). Other techniques for viewing multidimensional data are more mathematical in nature. They are based on projecting the data orthogonally onto low-dimensional subspaces, for example, of dimension two. Prominent among such techniques are projection pursuit by Friedman and Tukey (1974) and Huber (1985) (see also Jones and Sibson, 1987), and the grand tour by Asimov (1985). In the latter, a sequence of projections, chosen to be dense in the set of all projections, are selected for viewing. These techniques can lead to important insights concerning data. It is difficult, however, to interpret and draw conclusions regarding the structure of data from knowledge of projections only. A great deal of experience is therefore needed in order to extract meaningful information from the projections. Another scheme for multidimensional plotting is based on representing each point by means of a curve or a broken line in a two- dimensional space. Andrews (1972) proposed a method for representing a point x = (xl, x 2 , . . . , xk)' in a k-dimensional space by a finite series of the form

fx (t) = ~

1

~/2

Xl -}- X2 sin t + x3 cos t + X4 sin 2t + x5 cos 2~ + . . . .

(2.1)

Thus x is mapped into the function f~(t), which is then plotted over the interval -Tr < t < 7r. We note that the k terms in the sequence of functions, 1

{ffl (~), g2(t),.--, 9k(t5) } = { ~ ,

sin t, cos t, sin 2t, cos 2 t , . . .

}

(2.2)

are orthonormal on (-Tr, 7r) in the sense that

f~_ 9i(t)9j(t)dt=O, 77

-

1F 7r 92(t) dt=l,

71"

iCj, i=l,2,...,k.

We also note that f~(t) is the dot product x'g(t), where g(t) is a vector consisting of the terms of the sequence in (2.2). Thus for a specified value of t, say to,

f~(to)

= IIg(t0)ll Ilxll cosec,t0,

where [I " [I denotes the Euclidean norm of a vector and ¢~,to is the angle between the vectors :~ and g(to). Hence, the absolute value of f~(to) is proportional to the

A. L Khu~

380

length of the orthogonal projection of x on the vector g(to). It follows that if we have a data set consisting of n points, xl, x 2 , . . . , x,~, in a k-dimensional space, then the values of lf~, (t0)l are proportional to the lengths of the orthogonal projections of the data points on a one-dimensional subspace determined by g(to). Several such one-dimensional views of the data can be obtained by varying the value of to over the interval (-Tr, 7r). Andrews' plots can therefore be classified along with the grand tour and the projection pursuit techniques. Andrews (1972) outlined several useful statistical properties of the function f,~ (t). These properties are also described in Gnanadesikan (1977, Section 6.2), which gives a detailed discussion and several illustrations of this class of plots. It should be noted that Andrews' plots depend on the order of the variables in formula (2.1). They also depend on the scale of the variables since large values of xl can mask the effects of other smaller values. Embrechts and Herzberg (1991) investigated the effects of reordering and rescaling of the variables on the plots. Another related plotting technique is the method of parallel coordinates, which was suggested by Wegman (1990). He proposed to draw the coordinate axes in a k-dimensional Euclidean space as parallel. Using this scheme, a point such as ae = (Xl, x2, • •., xk)' is represented by plotting Xl on the first axis, x2 on the second axis, • .., xk on the kth axis. The resulting points on the k axes are then connected by a broken line. Thus each point in a data set is mapped into a broken line in a twodimensional space. Gennings et al. (1990) used this method to plot the dose- response surface and its contours of constant response for a combination of drugs. It can be seen that the available techniques for plotting multidimensional data are, in general, not easy to implement and interpret, even by an expert user. Advances in computer technology should make this task easier. In this respect, it can be stated that Andrews' plots are considered to be easier to use and interpret than most existing techniques. Plotting of multiresponse data serves as an exploratory tool in a response surface investigation. It is not, however, a substitute for a formal analysis of the data. The next four sections provide a broad coverage of methods available for the design and analysis of multiresponse experiments. More specifically, Section 3 addresses the fitting of a multiresponse model. Section 4 discusses inference making procedures, mainly for linear multiresponse models. Section 5 describes several criteria that can be used for the choice of a multiresponse design. Finally, Section 6 is concerned with the problem of multiresponse optimization. Some concluding remarks are given in Section 7.

3. Estimation of parameters of a multiresponse model Suppose that a number of response variables are measured for each setting of a group of k input variables denoted by Xl,X2,...,xk. Let {y~} (/ = 1 , 2 , . . . , r ; u = 1 , 2 , . . . , n ) represent n sets of observations on each of r response variables denoted by yl, y 2 , . . . , Yr. The setting of xi at the uth experimental run is denoted by x~i (i = 1 , 2 , . . . , k ; u = 1 , 2 , . . . , n ) . The relationship between the response variables and the input variables is given by the model

Yu~=f~(x~,O)+eui,

i=l,2,...,r;

u=l,2,...,n,

(3.1)

Multiresponse surfacemethodology

381

where x~ = (x~l, x ~ 2 , . . . , x~k)', 0 = (01,02,... ,Op)' is a vector of p unknown parameters, e~i is a random error, and fi is a response function of known form. Model (3.1) can be represented in matrix form as Y = F ( D , 0) + e,

(3.2)

where Y = [Yl : Yz : ' " : Yr] is an n × r matrix whose ith column is Yi, the vector of observations from the ith response (i = 1 , 2 , . . . ,r), e = [el : e2 : . . . : e~] is the corresponding matrix of random errors, D denotes the design matrix, which is of order n × k with rows equal to X tl , X 2! , . . . , x nl , and F(D,O) is a matrix of order n × r whose ith column consists of the values of fi(x~, O) for u = 1 , 2 , . . . , n (i = 1 , 2 , . . . , r). It is assumed that the rows of e are independently and identically distributed as N(0, 27), where ,U is an unknown variance-covariance matrix for the r response variables.

3.1. The Box-Draper determinant criterion Estimates of the elements of 0 can be obtained by using the method of maximum likelihood. This, however, requires knowledge of the value of ~ , which, in general, is unknown. Box and Draper (1965) proposed a Bayesian approach for estimating 0 without knowing 27. A summary of their approach follows: The likelihood function for the n x r data matrix is of the form (2~r),~r/21,~l,~/2 exp

-- t r [ , ~ - a V ( 0 ) ]

,

(3.3)

where tr(.) denotes the trace of a square matrix and V(O) is the r x r matrix

V(O) = [Y - F(:D, 0 ) ] ' [ Y - F(79, 0)].

(3.4)

Box and Draper (1965) assumed noninformative prior distributions for 0 and 27. By combining these distributions with the likelihood function in formula (3.3) they obtained the posterior density function 7r(0, ~7 [ Y ) . The marginal posterior density function for 0 can be obtained from 7r(0, 27 [ Y ) by integrating the elements of ~7 out. An estimate of 0 can then be obtained by maximizing this marginal posterior density. Box and Draper (1965) showed that this process is equivalent to finding the value of 0 that minimizes the determinant

h(O) = IV(0)I.

(3.5)

This estimation rule is called the Box-Draper determinant criterion. Bates and Watts (1984, 1987) presented a method for the minimization of h(O) along with a computer algorithm for its implementation. Several variants of the Box-Draper determinant criterion were considered. Box et al. (1970) discussed the implementation of this criterion when the multiresponse data

A. L Khun

382

are incomplete due to the missing of only few values. The missing values were treated as additional parameters to be estimated. An alternative procedure was proposed by Box (1971) when there are a large number of missing values. Stewart and Sorensen (1981) also addressed the missing data problem. Their approach is based on using the posterior density function 7r(0, 27 I Y) in which only genuine multiresponse values appear. Box (1970) considered a situation in which the variance-covariance matrix 27 is not constant for all the experimental runs.

3.2. The problem of linear dependencies among the responses Box et al. (1973) cautioned that the use of the Box-Draper determinant criterion can lead to meaningless results when exact linear relationships exist among the responses. The reason for this is the following: Suppose that there are m linearly independent relationships among the responses of the form

A Y ' = C,

(3.6)

where A is a constant m x r matrix of rank m (< r) and C is another constant matrix of order m x n. In this case, since E ( Y ) = F(D, 0), A F ' ( 7 ) , 0 ) = C.

(3.7)

From (3.6) and (3.7) it follows that

A [ Y ' - F'(:D, 0)1 = 0. This implies the existence of m linearly independent relationships among the rows of the matrix Y ' - F'(79, 0). Consequently, the matrix V(O) in formula (3.4) is singular, that is, its determinant is equal to zero for any value of 0. Since this determinant is nonnegative by the fact that V(O) is positive semidefinite, minimizing IV(0)l would be meaningless in this case. Box et al. (1973) developed a technique for identifying linear dependencies among the responses. It is based on an examination of the eigenvalues of the matrix D D ' , where

(3.8) and In, Jn denote the identity matrix and the matrix of ones of order n x n. If there are linear relationships of the type given in formula (3.6), and if the columns of C are identical, then

AD=AY'(I,~-Ijn)

= C(I~-

1Jn)=0.

(3.9)

M u l t i r e s p o n s e surface m e t h o d o l o g y

383

This indicates that there are m linearly independent relationships among the rows of D . Hence, the rank of D is r - m and the matrix D D ~ has a zero eigenvalue of multiplicity m. Vice versa, if D D ' has a zero eigenvalue of multiplicity m, then formula (3.9) is true for some constant matrix A of order m × r and rank m. Thus,

AY' = 1Ay'J,~, n

(3.10)

which looks like formula (3.6). Note that the matrix ~ A Y ' J , ~ has identical columns. We conclude that m linearly independent relationships, of the type given in (3.6) with C having identical columns, exist among the responses if and only if D D p has a zero eigenvalue of multiplicity m. It is easy to see that such linear relationships can be defined in terms of m orthonormal eigenvectors of D D p corresponding to its zero eigenvalue. McLean et al. (1979) noted that if the columns of C in (3.6) are not identical, then A D ~ 0; consequently, no linear relationships exist among the rows of D . In this case, D D ~ does not have zero eigenvalues, even though the matrix V(O) in (3.4) is singular with m zero eigenvalues. In practice, the observed values of the responses are rounded off to a certain number of decimal places. In this case, it is possible that none of the eigenvalues of D D ' is exactly zero, even if the responses are linearly related. Let us therefore suppose that A is a "small" eigenvalue of D D ' , which, if it were not for the rounding errors, would be equal to zero. If the rounding errors are distributed independently and uniformly over the interval ( - 6 , 6), where 6 is small enough, then the expected value of A is approximately equal to E(A) = ( n -

1)cr2e,

(3.11)

where fi2e = 62/3 is the rounding error variance. Here, 6 is equal to one half of the last digit reported when the multiresponse values are rounded off to the same number of decimal places. Formula (3.11) was given by Box et al. (1973). Khuri and Conlon (1981) showed that an upper bound on the variance of A is approximately given by

Var(A) T}, where T is a known positive definite matrix chosen appropriately, and q- is some positive constant. One possible choice for T is based on using region moment matrices corresponding to a certain region of interest (see Wijesinha and Khuri, 1987b, p. 182). The quantity 7tT3" provides a measure of inadequacy of the fitted model. A positive value of this quantity is an indication that 3' # 0 and, therefore, the model is inadequate. Under Strategy 1, a design is chosen so as to maximize the quantity A1 given by A1 = inf {3"H(Ir ® S)H'3"}. "[ c l l

This is a multiresponse extension of the Al-optimality criterion used by Jones and Mitchell 0978). It can be shown that

A1 = remin{T-1H(Ir @ S)H'}. A design that maximizes en~n{T-1H(Ir ® S)H'} is called a Al-optimal design. It should be noted that this criterion is not meaningful if A1 is zero for any design. This occurs, for example, when r(n - p) < ~ir=l qi, where p is the rank of X . In this case, the matrix H(I~ ® S)H', which is of order Y]i=l r qi x ~ i =~1 qi and rank not exceeding r(n - p), will be singular causing A1 to have the value zero no matter what design is used. Strategy 2. An alternative design strategy is to maximize the quantity A2, the average of 3"~H(Ir ® S)H~3" over the boundary of the region H , that is,

1 L 3"H(Ir ® S)H'3"da,

A2 = flIo dcr

where da is the differential of the surface area of the ellipsoid//0 = (3": 3"tT3" = ~-). Using an identity given in Jones and Mitchell (1978, p. 544), we can write A2 as T

A2 -- ~ir=l qi tr [T-1H(Ir ® S)H']. A design that maximizes A2 is called a A2-optimal design. Equivalently, such a design maximizes A~, where

A~2= tr[T-1H(I~ ® S)H']. This criterion can be applied even when the matrix H(I~ ® S ) H ' is singular.

A. L Khuri

396

Wijesinha and Khuri (1987b) presented a sequential procedure for the generation of A2-optimal designs. As in Section 5.2.1, an initial design 790 is chosen. Henceforth, design points are selected one at a time and added to the previous design. The number of points of 790 must at least be equal to the rank of the matrix X (see model 4.1). The stopping rule for this sequential procedure is based on making the so-called Fr6chet derivative of A~ arbitrarily close to zero (for a definition of this derivative, see Wijesinha and Khuri, 1987b, p. 184, and Theorem 2, p. 186). EXAMPLE (Wijesinha and Khuri, 1987b, pp. 188-191). Consider a multiresponse experiment with three responses, Yl, g2, Y3, and three input variables, xl, x2, x3, coded so that - 1 ~ 2) factors and the number of levels not necessarily two. I. Consider the search linear model where/31 consists of the general mean and main effects and/32 consists of 2-factor intera~lons. 1.1. Shirakura plan (k = 1) For a 2 ~ factorial experiment with m ----2 h - 1 and h (~> 3) being an integer, the ( m + 1) runs in an ( m + 1) x m matrix TI,~ are obtained by filling in the entries of the first h columns and 2 h rows with elements ' + ' and ' - ' , and then the remaining (2 h - h - 1) columns as the interaction columns for a 2 h factorial. Note that T I ~ for h = 3 and m = 7 is in fact T1 in Section 3. Consider a B I B design (v, b, r, k, A) with v = m = 2 h - 1, b = ½(2 h-I - 1)(2 h 1),r = 2 h-1 - 1,k = 3, and A = 1. Let T2m(b x m) be a matrix such that the elements in the jth row and ith column is ' + ' if the ith treatment does not occur in the jth block and is ' - ' if the ith treatment does occur in the jth block. The rows of T2,~ represent b runs. Note that T2,~ is the complement of the transpose of the incidence matrix for the B I B design. In Shirakura (1991), it is shown that the sequential assembly of T I ~ and T2m results in a Search Design with (m + 1 + b) runs and k = 1. 1.2. Ghosh-Zhangplan (k = 1) For a 3 '~ factorial experiment, with m ~> 3, let T3~((1 + 2 m ) x m) be a matrix with the first row (run) having all elements ' + ' and the remaining rows (runs) having the ith element (i = 1 , . . . , m) and ' - ' and '0' and other elements as ' + ' . The set of (1 + 2m) runs in T3,~ is a Resolution III plan. Let T4~ (m x m) be a matrix where the ith row (run) has the ith element ' - ' and the remaining elements are 0. It can be seen in Ghosh and Zhang (1987) that the sequential assembly of T3,~ and T4m is a search design with (1 + 3m) runs and k = 1.

Sequential assemblyoffractions infactorial experiments

419

1.3. Anderson-Thomas plans Anderson and Thomas (1980) presented search designs for general symmetrical factorial experiments with the number of levels is a prime or a prime power. These designs are able to search any combination of three or fewer pairs of factors that interact.

I. Chatterjee-Mukerjee and Chatterjee plans Chatterjee and Mukerjee (1986) and Chatterjee (1989) presented search designs with k = 1 in general symmetrical and asymmetrical factorial experiments. These designs are able to search one set of nonzero interactions between two factors. II. Consider the search linear model where -ill consists of the general mean and main effects and/32 consists of 2-factor and 3-factor interactions.

ILl. Ohnishi-Shirakura plans (k = 1) Ohnishi and Shirakura (1985) presented search designs for 3 ~< m - 1) observations for the uth run and t

E nu = N .

u~l

Let Yuv be the observation for the vth replication of the uth run and Yu be the mean of all observations for the uth run. Suppose that kl is an initial guess on k. There are three possibilities kl > k, kl = k, and k, < k. Consider (~) models, g ( y ) = 21/31 --}-X2jQ_2j ,

j=l,...,

P2) kl '

V(y) = o21,

Rank [X1, X2j] = Pl + ]¢1.

Let --~lj and ~2j be the least squares estimators of ~1 and ~2j for the jth model, y j = X l f l l j -1- X2jQ_2j ,

R j = y - ~j,

Fj be the F-statistic for testing H0: ~2j = 0 under the normality assumption, F L°F be the F-statistic for testing H0: No Lack of Fit, SSEj be the sum of squares due to error, SSLOFj be the sum of squares due to lack of fit, SSPE be the sum of squares due to pure error, Y-o be the fitted values when -~2j = 0, R__0 be the residuals when -~2j = 0, SSEij be the sum of squares due to error when the ith elements of /~2j denoted by/3 2j (i) is zero, tij be the t-statistic for testing H 0 : / 3 2j (i) = 0. The following results are given in Ghosh (1987). 1)2 the following statements are equivalent. I. For g E { 1 , . . . , (k,)},

(a) (b) (c) (d) (e) (f) and

SSEe is a minimum, Fe is a maximum, SSLOFe is a minimum, F L°F is a minimum, The Euclidean distance between --4 and Y-o is a maximum, The square of the simple correlation coefficient between the elements of R e R 0 is a minimum.

p2 II. For g C { 1 , . . . , (k,)}, q E { 1 , . . . , h 1}, the following statements are equivalent. (a) SSEeq is a minimum, (b) Qq is a maximum.

A set of nonnegligible parameters/32e are said to be influential if the sum of squares due to error (SSEe) is minimum for the model with those parameters. A nonnegligible

Sequential assembly of fractions in factorial experiments

423

parameter/32eq is said to be significant if the value Qq is large. Under the assumption of normality and the null hypothesis H0: ~-2j = O, Fj has the central F distribution with (kl N - P l - kl) d.f. and under the null hypothesis H0: /3(i) 2j = O, tij has the central t distribution with ( N - p 1 - kl ) d.f. The following method presented in Ghosh (1987) is due to J. N. Srivastava.

Srivastava method Case L If maxj Fj Fa;k~,N--m--k,. For w = 1 , . . . , p 2 , 6~, = the number o f j in { j , , . . . , j s ) for which

It~jl

> t=/2,N-p,-k,.

Clearly, 0 ~< 6~ ~< s. The 6~'s are now arranged in decreasing order of magnitude and write 6(1) 7> 6(2) >~ ".- /> 6(pz). If there are k ( ) kl) nonzero 6(w)'s, the influential significant parameters are 6(1),...,6(k), otherwise the influential significant /3(w)'s correspond to nonzero 6(w)'s and the number of influential parameters is then less than kl. In case of a tie, the values of It~jl may be used as guidelines. The parameter /3(1) is the most influential significant nonnegligible parameter. An estimator of the unknown k is = the number of nonzero 5~'s,

w = 1 , . . . ,P2.

The probability of correct search by the above method is very high and in fact close to one as it is found in Monte Carlo studies done by Professor Srivastava and his students. In a special situation of pure search (i.e., -ill = 0), Srivastava and Mallenby (1985) presented a method of search with the minimum amount of computation and a method for computing the probability that the correct parameter (k = 1) is identified. Professor T. Shirakura and his coworkers have recently made some important contribution in this area.

7. Parallel and intersecting fiats fractions Sequential assembly of fractions occurs naturally in parallel and intersecting flats fractions. The purpose of this section is to present developments for these types of fractions. Fractional factorial plans of parallel and intersecting flats types appeared in Connor (1960), Connor and Young (1961), Patel (1963), Daniel (1962), John (1962), Addelman (1961, 1963, 1969). In recent years, a lot of work is done in this area by J. N. Srivastava, D. A. Anderson and their coauthors. Some plans and discussions are given below to illustrate the ideas.

~ Ghosh

424

7.1. John plans C o n s i d e r the f o l l o w i n g parallel flat fraction o f Resolution I V with 12 runs for a 24 factorial experiment. Three sets of 4 runs ( x l , x2, x3, x4) with xi = + or - , i = 1,2, 3, 4, satisfying a:

+

=

--XlX 2 ~

b:

÷

=

--XlX2

C:

-~ =

3 ~- X2X3:

--XlX

= XlX3 = --X2X3:

X l X 2 = - - X l X 3 --~- - - X 2 X 3.

E a c h set represents a flat and there is no c o m m o n run b e t w e e n any two flats. F r o m the a b o v e plan, a parallel flat fraction o f Resolution IV with 12 runs is constructed for a 26 factorial e x p e r i m e n t by taking x5 = XlXZZ4 and 216 = ZlX3Z 4. T h e 4 runs in sets a, b, and c are then +

+

+ Set a:

+

+ Set b:

-

+

+

+

-

+

+

-

+

+

+

+

-

+

+

-

+

+ +

+

+

Set c:

+

-

+

+

-

+

+

-

+

-

+

-

-

+

-

+

-

+

-

+

+

-

+ +

+

Several such plans o f Resolutions III, IV and V are given in John (1962).

7.2. Anderson-Thomas plans Consider the f o l l o w i n g intersecting flat fraction o f Resolution IV with 30 runs for a 35 factorial experiment. F i v e sets of 9 runs ( x l , x2, x3, x4, xs) with xi = + , O, - , i = 1 , . . . , 5, are as follows.

Set a:

-

0

0

0

0

0

0

0

0

0

0

+

+

+

+

+

+

+

+

+

+

-

+

+

+

+

-

+

0

0

0

0

-

+

+

+

+

0 0 +

0 +

0 +

Set b:

0 +

0

0

0

0

-

0

0

0

0

+

+

0 +

+

+

0

0

+

+

,

Sequential assembly offractions in factorial experiments

Set c:

0

0

+

+

0

0

+

+

0

0

0

+

+

+

+

0

0

-

+

+

Set d:

0 0

0

0

0

+

+

+

425

0

0

+

+

+

0 +

0

0

0

-

+

+

+

0

0

0

+

+

0 +

0 +

0 +

+

0 +

0 +

0 +

0 +

0

0

0

0

-

0

0

0

0

0

0

+

+

+

+

,

+ +

Set e:

÷+++o 0

0

0

+

+:t-++

Note that the sets a - e represent 5 flats given below.

Set a:

X2 =

X3 =

Set b:

1 +X

1 =

Set c:

l+xl

= l+x2=x4=xs,

Set d:

l+xl

=l+x2=l+x3=x5,

Set e:

Xl =

X4 = X3 =

X2 =

X3 =

X5~ X4 =

XS~

X4.

The above 5 flats are not parallel. The last three runs of the sets a, b, c, d, and e are identical to the first three runs of the sets b, c, d, e, and a, respectively. There are 30 distinct runs in the sets a-c and they form a minimal Resolution IV plan. Anderson and Thomas (1979) gave a series of Resolution IV plans for the s '~ factorial where s is a prime power, in s ( s - 1)n runs and also a series of generalized foldover designs with s ~> 3 and n ~> 3, in s ( s -

1 ) n + s runs.

426

~ Ghosh

7.3. Srivastava-Li plans Consider the following parallel flat fraction with 16 runs for a 26 factorial experiment. There are 4 sets (flats) with 4 runs each as given below. Set a:

X 1 ~

X2 ~

X3 =

Set b:

Xl

=

X3 =

--,

Set c:

Xl

=

X2 =

X3 =

Set d:

Xl

=

X2 =

--,

-~-, X 4 X 5 X 6 =

Z2 =

-~-,

--,

X4X5X6

X4X5X6

X3 =

~

-I% X 4 X 5 X 6

--, =

-~-,

-~-, =

-]-.

The 16 runs allow us to orthogonally estimate the general mean, all the main effects and the two-factor interactions between the subgroup of factors (1,2, 3) and the subgroup (4, 5). Varieties of orthogonal plans of parallel flats type are presented in Srivastava and Li (1994) for an s n factorial experiment, s is a prime or prime power.

7.4. Krehbiel-Anderson plans Krehbiel and Anderson ( 1991) presented a series of determinant optimal designs within the class of parallel flats fractions, for the 3 "~ factorial, m = 6 , . . . , 14. These designs permit us to estimate all main effects and the interactions of one factor with all others.

7.5. Patel plans Patel (1963) gave fractional factorial plans for 2 "~ factorial experiments, m = 5, 6, 8, 9, and 10, using the parallel flats and then replicating just one flat. The corresponding runs are called Partially Duplicated. Each flat represented an Orthogonal Resolution III plan and the sequential assembly of plans resulted in a Resolution V plan which even permits the estimation of some three factors and higher order interactions.

7.6. AddeIman plans Addelman (1961, 1963) presented parallel flat fractions for 2 '~ factorial experiments, m = 3 , . . . , 9. These nonorthogonal plans allow us to estimate the general mean, main effects, some two and three factor interactions assuming the remaining interactions to be zero. Addelman (1969) presented sequential assemblies of fractional factorial plans for the 2 m factorial, m = 3 , . . . , 11. For example, the following parallel flat fraction of a 25 factorial experiment permits us to estimate the general mean, main effects, and the 2-factor interactions between the first factor and the remaining factors. Two sets of 8 runs ( x l , . . . , x s ) with xi = + or - , i = 1,.. ,5, satisfying a:

+

b:

-~- =

=

--XlX2X3 XlX2X3

= =

--XlX4X5

XlX4X5

=

=

X2X3X4X5,

X2X3X4X5.

Sequential assembly of fractions in factorial experiments

427

7. 7. Daniel plans Daniel (1962) gave the sequential assemblies of fractional factorial plans for 2 m factorial experiments, m = 4, 7, 8, 16. The construction of fractions for the 2 m+a factorial from the 2 ra factorial is used in this context.

7.8. Dykstra plans Dykstra (1959) presented the sequential assemblies of fractional factorial plans with partial duplication of flats for 2 m factorial experiments, m = 6 , . . . , 11. For example, the following parallel flat fraction of a 26 factorial experiment with 48 runs. Two sets of 16 runs ( X l , . . . , x 6 ) with xi = + or - , i = 1 , . . . , 6 , satisfying a:

+

=

--XlX2X3 ~ --X4X5X6 ~- XlX2X3X4X5X6~

b:

+

=

3~lX2X3

=

X4X5X6

=

XlX2X3X4X5X6.

The 16 runs of the set a is replicated twice.

7.9. Connor plans and Connor-Young plans Connor (1960), Connor and Young (196l) gave the sequential assemblies of parallel flat fractions for 2 '~ x 3 ~ factorial experiments taking m and n within practical ranges.

7.10. Ghosh plans and Ghosh-Lagergren plans Ghosh (1987), Ghosh and Lagergren (1991) gave the sequential assemblies of fractional factorial plans for 2 '~ factorial experiments in estimating dispersion effects. Detailed theoretical developments in the area of parallel flat fractions are available in Srivastava and Chopra (1973), Srivastava et al. (1984), Srivastava (1987, 1990), and the unpublished work of Anderson with his students Mardekian (1979), Bu Hamra (1987), and Hussain (1986).

8. Composite designs Sequential assembly of fractions is very common in response surface designs. In the study of dependence of a response variable y on k explanatory variables, coded as x l , . . . , xk, the unknown response surface is approximated by a first or a second order polynomial in a small region with the center being the point of maximum interest. The first order model is tried first with a small number of runs. If the first order model gives a significant lack of fit indicating the presence of a surface curvature, a second order response surface model is then fitted by augmenting the first set of runs with another set of runs (see Box and Draper, 1987; Khuri and Cornelt, 1987). One of the most useful second order designs is called a composite design (CD), consists of F

S, Ghosh

428

factorial points (FP's) which are a fraction of the 2 k points ( ~ : , . . . , :t:), 2k axial points (AP's) (:t:c~,..., 0 ) , . . . , ( 0 , . . . , :kc~) where c~ is a given constant, and no (~> 0) center points (CP's) ( 0 , . . . , 0). The total number of points is N = F + 2k + no. Note that the points are in fact the runs. Box and Wilson (1951) introduced such designs, also known as central composite designs. A lot of research has been done on the choice of F FP's in composite designs. The purpose of this section is to present developments in this area of research. For N points in the design ( x ~ l , . . . , x~k), the observations are

y(x~l,...,x~k)=y~,

u=l,...,N.

The expectation of y~ under the second order model is k

k

i=1 k

/=1

k

+ E E/3ijXuiXuj' i=t j=~

~/ O, 02 > 01,

(2)

which arises from such a model. For 03 known and equal to one, the model simplifies to the difference of two exponentials, r/(z, O) = e -0'~ - e -02~,

(3)

and this latter two-parameter form is investigated in the present example. The three sets of values for 01 and 02 given in Table 1 are introduced in order to illustrate the properties and scope of the model and its associated locally optimum designs and Figure 2 shows plots of the response z/(z, 0) against time, z, for each of these sets.

442

A. C. Atkinson and L. M. Haines

.8

c5 II

.8 o

~ 0, so that H = Ik-d. It is a special case of an ARIMA(0, d, d). The extreme = 0 corresponds to the undifferenced data being independent, and so is appropriate if there is a fixed polynomial trend of degree d - 1. The extreme ¢ = oc corresponds to the differenced data being independent, the ARIMA(0, d, 0). The CG(1) model is often known as the linear variance (LV) model. Simple special-cases that have been used for prior models in choosing designs are the MA(q) and the AR(p) processes. For the MA(q), V is banded (Toeplitz), and is 0 except for the central 2q + 1 diagonals (upper-left to lower-right). For the AR(p), V -1 is approximately banded, and is 0 except for the central 2p + 1 diagonals. Stationary two-dimensional spatial models for dependence can be specified in several ways. Simple models that have been used for design purposes can essentially be specified by a sparse form for V (afinite-lag process, with only a finite set of non-zero correlations), or a sparse form for V -1 (a conditional autoregression, CAR, with only a finite set of non-zero inverse correlations). For most CAR processes, the precise form of the elements of V-1 for boundary sites is intractable, so that a boundary-corrected (non-stationary) version of the process is usually used - see, for example, Cressie (1991, § 6.6). A model that is being used for analysis is the separable AR(1) * AR(1) with independent white noise (Cullis and Gleeson, 1991). For further details on proposed methods of analysis, see the reviews of Aza'is et al. (1990), and Cressie (1991, § 5.7); the bibliography of Gill (1991); and the recent publications of Martin (1990b), Cullis and Gleeson (1991), Zimmerman and Harville (1991), Besag and Higdon (1993), Taplin and Raftery (1994), Besag et al. (1995).

3.2. Efficiency measures and optimality A good design estimates as well as possible those contrasts {e~-} of interest, where c~ lk = 0. Thus, the efficiencies of designs are usually compared through their values for some combination of the variances of the estimated contrasts, var(c~'~)/cr 2 = c~D+cj, where D + denotes var('~)/cr2, which is given by equations (8) or (9). Often, functions of the matrix D = (D+) + can be used, where D = C for gls estimation. When all contrasts are of equal interest, there are three commonly used measures, the A-, D- and E-values, which can be defined in various ways. They can be regarded as special cases (p = 1,0, c~ respectively) of the ~p-value. Assume that rank(D) = t - 1, so that all contrasts are estimable. Let ~1,..., ~t-1 be the non-zero eigenvalues of D. Then, for 0 < p < oo, the ~p-value is defined as {(t - 1) -1 ~ - p } l / p . The values for p = 0, oc are taken as the appropriate limits, { H ~-1 }1/(t-1) and max ~-1, respectively. Unless the continuity of p is important, the ~p-value can be taken as ~-~/-P for 0 0, if it has the smallest ~p-value. Note that D-optimality is often defined by maximising 1-I ~i or {[I ~i} 1/(t-l) or ~ In ~i, and E-optimality can be defined by maximising min ~i. The D-value is related to the volume of the confidence ellipsoid for "~ under normality. The A - v a l u e E ~ i -1 i s tr(D+), and is proportional to the average variance of an estimated pairwise contrast, 2 i/0. Kiefer essentially showed that, under gls estimation, the design is u.o. if the C-matrix in (9) has maximal trace and is completely-symmetric, that is all the diagonal elements are equal, and all the off-diagonal elements are equal (and, for C, all t - 1 non-zero eigenvalues are equal). For cases when D ¢ C, as when using ordinary least-squares estimation, Kiefer and Wynn (1981) introduced weak universal optimality (w.u,o.). This includes ~/i~-optimality for all p >~ 1, but does not include 0 ~< p < 1, and hence excludes D-optimality. They showed that if var('?) is completely-symmetric with minimal trace then the design is w.u.o. Extensions to Kiefer's result on universal optimality are possible - for example, Cheng (1987) considered the case of two distinct non-zero eigenvalues. Specific techniques may also be available in some cases, in particular with E-optimality. For further details on optimality criteria, see Chapter 1 of Shah and Sinha (1989). Bounds and approximations to these q~p-values can be obtained. For a given tr(D), a simple lower bound for the ~v-value in the general formula is ~ - l , where ~ = (t - 1) -1 ~ {i = (t - 1)-ltr(D). The bound is attained if D is completely symmetric. A global bound is obtained if the maximum, over competing designs, of tr(D) is used. In either case, the ~bp-efficiency of a design can be defined by comparing the @v-value to this, usually unattainable, bound. Provided {i < 2~ Vi (so that D is not too 'far' from complete-symmetry), the series expansion about ( for the q~p-value can be used (Martin, 1990a). Thus, reasonable approximations are often given by ~--~{~-P ~ ~-~'{(t - 1 ) + p ( p + 1)S~/(2~2)}

forp > 0

and not too large, and I I ~ - 1 ,~ ~ - ( t - 1 ) e x p {S~(/(2~2)}

(p = 0, the D-value),

where ~ = E ( ~ i - - ~ ) 2 = tr(D 2) _ { t r ( D ) } 2 / ( t _ 1). The approximation to the A-value is just tr(D2)/~ 3 = (t - 1)3tr(D2)/{tr(D)} 3. A small S~¢ is associated

R. J. Martin

488

with D being close to complete-symmetry. These approximations show that when tr(D) can vary over competing designs, and universally optimal designs do not exist, efficiency is a compromise between D having a high trace and being close to completesymmetry. The weight given to the two components depends on p, with most weight going to a high trace if p is small. The conclusion that efficiency depends on a compromise between a high tr(D) and D close to complete-symmetry still holds when the expansion is not valid. Example 5.2 in Section 5.3 illustrates this compromise. Note that b

A'(Ib ® F)A = ~

A'iFAi,

i=l

where A' = ( A ' I , . . . , A~) and Ai is the k by t treatment design matrix for block i. Under gls, using (9) gives C = ~ A{A*Ai, which shows how the elements of A* affect the efficiency of designs. It shows that tr(C) is constant for all binary designs, and is maximal over all designs if (A*)i,j ~< 0 gi ¢ j. If (A*)i,j > 0 for some i ¢ j, then, for example, tr(C) is increased for a non-binary design with a treatment occurring twice in a block on units i and j. Under ols, if the design is variance-balanced, C in (7) is proportional to Et, and var('~) in (8) is proportional to A'(Ib ® EkAEk)A = ~ A ~ E k A E k A i . Thus the corresponding matrix to consider for comparing design efficiencies is EkAEk. Two other cases, with different contrasts of interest, will be considered. In the testcontrol case, for the usual Ate-criterion a design is optimal if it minimises the average variance of estimated pairwise contrasts with the control (0), ( t - 1) -1 ~~.j var(~0-~j). If r = (7"0,7-1,...)', and C is partitioned in the form

( ) Css Cns

Cns Gnu

where css is a scalar, c,~s is a ( t - 1)-vector, and Cnn is ( t - 1) x ( t - 1), the Ate-criterion can be taken as tr(C~-1). Optimality is much harder to show in this situation. In some cases the optimal design has supplemented balance, that is cns = - ( t - 1)-l essl t_l is a constant vector, and C,,~ = aoIt-1 - a l d t - 1 is completely symmetric, with a0 = (t - 1)al + (t - 1)-%**. Under ordinary least-squares and independence, this form results if an augmented design is used in which the original design is balanced. It also occurs for the more general balanced treatment incomplete block (BTIB) designs. In these cases, the test treatments are equally-replicated, but the control occurs more often. See § 7.4 of Shah and Sinha (1989) for further details of the usual case. Forfactorial experiments, contrasts of interest are usually grouped into main effects, two-factor interactions, etc. In the case of 2-level factorial experiments, all contrasts can be represented by a vector with n/2 elements of + 1, and n/2 elements of - 1 . An interaction contrast is the componentwise product of the contrasts for its constituent main effects. If the chosen t contrasts are all of equal interest, the 4~p-value can be used as above. The mean can be included, but it is more usual for the C matrix to be only for the t contrasts. Then C will be of rank t, and the formulae for the ~Sp-value are modified accordingly.

Spatial experimental design

489

4. Designs with neighbour balance 4.1. Definitions There are several situations when it is important that there is some sort of neighbour balance. In the simplest case, neighbour balance means that each treatment occurs the same number of times next to every other treatment. Like pairs of treatments can also be included. All such designs will be called neighbour designs here. They can be regarded as an extension of a block design, using overlapping blocks of size 2. Neighbour designs arise in (at least) three different cases. The first is in plant breeding trials, where it is desirable that each genotype can be pollinated by every available genotype, with each pair of genotypes having an equal chance of crossing. Airborne pollen may travel predominantly in one direction (or more), or all directions may be equally likely. The type of design is chosen for prior balance, rather than for efficiency under a model. The second case is where the balance is needed to avoid possible bias arising because of interference or competition, where the treatment applied to one unit (interference) or the response on that unit (competition) may affect the response on neighbouring units. For interference, there may be a prior model including fixed neighbour effects, but the design is usually chosen on intuitive grounds (except in the case of repeated measurements designs, where design optimality has been considered). Although there has been some work on including spatial dependence in the analysis of experiments with competition, these models have not yet been used to suggest optimal designs. Again, designs are usually chosen on intuitive grounds. The third case is when neighbour designs are used for efficiency under spatial dependence, which is usually modelled by a covariance structure. This case will be covered in more detail in Section 5, but it will be seen then that designs similar or identical to neighbour designs may arise. Some of the neighbour designs with exact balance have received considerable attention in the Combinatorial Design literature, and the the term neighbour design is sometimes used for a particular design (the circular block design of Section 4.2.5). Two earlier reviews of neighbour designs are Street and Street (1987, Chapter 14) and Afsarinejad and Seeger (1988). See also § 5 of Street (1996). Clearly exact neighbour balance restricts the possible values of t, b and k, and approximate balance may be adequate in practice. There are several distinctions to be made in discussing neighbour designs and neighbour balance: (i) Directional or non-directional neighbour balance. Directional balance is needed when left to right, say, differs from right to left. It requires equal occurrences of ordered pairs of treatments. Non-directional balance is needed when left to right, say, is equivalent to right to left. It requires equal occurrences of unordered pairs of treatments. A design with directional neighbour balance automatically has nondirectional balance. (ii) Distinct pairs only or Like pairs included. The ~(t - 1) ordered {or t(t - 1)/2 unordered} distinct pairs may only be considered. However, the t like pairs AA, etc., may also be included (self adjacencies). It may then be required that each of the t like

R. J. Martin

490

pairs occur equally often, and as often as the unlike pairs. Note that A A is counted as 2 adjacencies between A and itself, one from left to right, and one from right to left. (iii) Blocked or single-block design. For a blocked design, the end design is the design for the first and last plots of each block. (ilia) If blocked, the blocks may be contiguous or distinct. When there is no blocking structure, the design can be regarded as having one distinct block. Blocks may be contiguous, when the last unit in block i is adjacent to the first unit in the block i + 1, i = 1, . . . , b - 1. Otherwise, they are disjoint. In the designs below, contiguous blocks are denoted ( ); while disjoint blocks are denoted [ ]. (iiib) If blocked and h ~< t, the design may be binary or non-binary. If blocked and binary with kit, the design may be resolvable, when the b blocks can be grouped into non-overlapping sets of t / k blocks each of which forms a complete replicate. (iv) A one-dimensional design or block may be linear or circular. Units may be arranged in a linear sequence, or may form (an annulus of) a circle, where the units are sectors, with the last adjoining the first. (iva) If linear, there may or may not be border plots (at both ends). If there are border plots, they are denoted { } in the designs below, and are always contiguous to the next unit. Border plots are often included in neighbour designs to ensure balance, but may be necessary in practice to avoid edge effects, such as unshielded exposure of the edge plots to the weather. The responses on border plots are not regarded as part of the experimental results (except, where used, for adjusting responses on neighbouring plots), and a treatment occurrence on a border plot is not counted as an additional replicate. Neighbour counts involving border plots are from an interior plot to a border plot. A neighbour-balanced circular design can be used to obtain a neighbour-balanced linearly-arranged design with border plots by cutting the circular design between two units, and adding sufficient border plots (but not all neighbour-balanced linearlyarranged designs can be obtained this way). For adjacent neighbour balance, the first border plot contains the treatment on the last unit, and the second border plot has the treatment on the first unit. For example, Figure 1 shows the first circular block of a neighbour balanced design (see Section 4.2.5). The corresponding linear block with border plots (cut between the units with treatments 4 and 1, and read anticlockwise) is: [{4} (1 2 4) {1}]. The first border plot gives a (1,4) adjacency, and the second gives a (4, 1) adjacency. A second method for obtaining a linear neighbour-balanced design from a circular one can also be used. In this case, extra units are added at one end of the linear sequence to keep the balance. The treatments on the extra units are those at the other end of the sequence. The responses on these units are regarded as part of the experimental results, so that an equireplicate circular design may lead to an unequally

Spatial experimental design

491

1

Fig. 1. A circular block.

replicated linear design. However, the end units may be downweighted, so that statistically the design is essentially equireplicate - see Sections 4.2.7, 5.1. In the example, the corresponding linear block would be:

[1 2 4 1]

or

[4 1 241 .

In two dimensions, the different types of neighbour balance and design can hold or not hold in each of the two directions. If the two directions are regarded as equivalent, neighbour balance may only be required over both directions (either directed or undirected). Designs may be planar, regarded as on a cylinder if the ends of one dimension are joined, or regarded as on a torus if the ends of both dimensions are joined. Cylindrical designs can be used to obtain planar designs with border plots along 2 parallel sides (side-bordered designs), while toroidal designs can lead to planar designs with border plots along all sides (fully-bordered designs). Row and/or column blocking can be used. More general definitions of neighbour can also be used. For example, in onedimension, it may be convenient to also consider second-neighbours (units with one unit between them), or to pool first- and second-neighbours. In general, units are neighbours at level 9, or lag 9-neighbours, if they are 9 units apart, or equivalently, if there are 9 - 1 units between them. Consecutive triples of treatments may also be of interest. In two-dimensions, neighbours at lags 9 (horizontal) and h (vertical) can be considered. Along with adjacent neighbours, (9, h) = (1,0) or (0, 1), it is often desirable to include diagonal neighbours, (9, h) = (1, 1) or ( 1 , - 1 ) . Note that equivalence of the two diagonal directions is a separate assumption from equivalence of the two axial directions - either can hold without the other holding. If both these equivalences hold, the neighbours can be grouped into orders. For example, the order could increase with 9 2 + h 2. Then, the first-order neighbours are the adjacent units,

R. J. Martin

492

the second-order neighbours are the diagonally adjacent units, etc. The diagram below shows this order of neighbours up to 5 from the central point x.

54345 42

1 24

31x13 42

1 24

54345

4.2. Neighbour designs 4.2.1. One-dimensional linearly-arranged distinct-block distinct-directional-pair designs E. J. "Williams and, separately, B. R. Bugelski (see § 14.7 of Street and Street, 1987), considered row-complete Latin squares. A Latin square is row-complete if each ordered pair of distinct treatments occurs equally often (once) in adjacent positions in a row. The rows of these can be regarded as forming one-dimensional linearly-arranged distinct-block distinct-directional-pair designs for complete blocks of size 4, and b a multiple of ~ (an even multiple if t is odd). These designs are cross-over designs balanced for residual effects. Extensions to other b and k require t(t - 1)lb(k - 1), and are discussed in § 4.2 of Street (1996) and other chapters in this Handbook. An example for t --- 4, b = 4 is: [1234];

[2413];

[3142];

[4321].

4.2.2. One-dimensional linearly-arranged distinct-block distinct-non-directional-pair (binary) designs These designs are the non-directional versions of the designs in Section 4.2.1, and include those in Section 4.2.1. The extension to quasi-complete Latin squares (see Section 4.3) which exist for all t, allows non-directional designs corresponding to the complete block designs in Section 4.2.1. An example for t = 5, b = 5 is: [12345];

[24531];

[35214];

[51423];

[43152].

These designs can be extended to other b and k, where it is necessary that t(t - 1)]2b(k - 1). In particular, the designs are complete-block examples of equineighboured (balanced incomplete) block designs (EBIBD), which were introduced by Kiefer and Wynn (1981). These were considered in Combinatorial Design as handcuffed designs (introduced by Hell and Rosa, 1972). For a discussion of the existence and construction of these designs, see Street and Street (1987, § 14.6). Further con-

Spatial experimentaldesign

493

structions, and an extension to 2-class PBIBDs, are in Morgan (1988b). An example of an EBIBD with t = 6, b = 15, k = 4 is: [5243]; [1354]; [5632];

[6243]; [1254]; [5642];

[6143]; [1264]; [5641].

[6153]; [1263];

[6152]; [4132];

[6354]; [5132];

4.2.3. Designs of the type in Section 4.2.2 with higher-order neighbour balance Morgan and Chakravarti (1988) gave constructions for complete block designs with first- and second-order neighbour balance, and also considered BIBDs with this balance. Ipinyomi (1986) generalized EBIBDs to equineighboured designs. These are binary incomplete or complete block designs, which are neighbour balanced at each level 9, 9 = 1 , . . . , k - 1. These must have b a multiple of t ( t - 1)/2. Designs which satisfy these conditions are semi-balanced arrays of strength 2 (henceforth semi-balanced arrays), also known as orthogonal arrays of type II of strength 2 (introduced by Rao, 1961). These designs actually have a much stronger neighbour balance in that each treatment occurs equally often in each position in a block, and each unordered pair of treatments occurs equally often within the same block in each unordered pair of plot positions. They were used by Morgan and Chakravarti for the balance they needed, and implicitly by Ipinyomi. Further types of designs derived from the semi-balanced array are presented in Section 5.2. An example of an equineighboured design with t = 5, b = 20, k = 4, which is not a semi-balanced array, is: [1243]; [1523]; [2435]; [4315];

[1254]; [1532]; [2413]; [4325].

[1245]; [2135]; [31451;

[13241; [2143]; [3425];

[13251; [2154]; [3514];

[1452]; [2354]; [3514];

An example of a semi-balanced array with t=5, b=10, k=4 is: [1234]; [13521;

[2345]; [2413];

[3451]; [3524];

[4512]; [4135];

[5123]; [52 41].

4.2.4. Partially neighbour balanced designs Because the conditions for exact neighbour balance are very restrictive, Wilkinson et al. (1983) suggested partially neighbour balanced designs for use with a large t and small r. For a given level of neighbours, there should be no self adjacencies up to that level, and each unordered pair should occur at most once as neighbours. The latter can be generalized to the requirement that the number of distinct pairwise occurrences up to the given level should be ~ or m + 1 for some ~ . An example with t = 10, b = 5, k = 4 and level-1 partial balance is: [1234];

[5678];

[7109];

[0452];

[6983].

R.J. Martin

494

An example with t = 10, b = 5, k = 4 and level-3 partial balance is: [1234];

[2567];

[3789];

[5048];

[6190].

The last example still holds under any within-block permutations, since the design is a (0, 1)-design (all treatment concurrences are 0 or 1). Clearly, any (0, 1)-design has partial neighbour balance to level k - 1. Methods of construction, and a valid randomization, are discussed in Aza'is et al. (1993).

4.2.5. One-dimensional circular (and linear-bordered) distinct-block distinct-nondirectional-pair designs D. H. Rees considered the circular form of the designs in Section 4.2.2, called circular block designs (see § 14.7 of Street and Street, 1987). In Combinatorial Design, these are called balanced cycle designs. The blocks need not be binary. It is necessary that ( t - 1)12r. For a discussion of the existence and construction of these designs, and an example of a non-binary design, see Street and Street (1987, Table 14.1). An example of a binary circular block design with t -- 7, b = 7, k = 3 is: [124];

[235];

[346];

[457];

[561];

[672];

[7 13].

These designs can be adapted to become linearly-arranged neighbour balanced designs with border plots at each end of each block, by using on the left (right) border plot the treatment on the last (first) unit in the block (see Section 4.1).

4.2.6. One-dimensional circular (and linear-bordered) distinct-block distinctdirectional-pair designs The directional form of the circular block design in Section 4.2.5 (or the circular form of the designs in Section 4.2.1) is called a balanced circuit or Mendelsohn design in Combinatorial Design. It is necessary that (t - 1)In. An example of a binary balanced circuit design with t = 7, b = 14, k = 3 is obtained by using the seven blocks of the first circular block design example (t = 7, b = 7, k = 3) above twice, once as they are and once with the within-block order reversed (so the eighth block is [42 1], etc.). Linear bordered designs which result from circular BIBD designs, have a valid randomization, and are efficient under competition models, are given in the catalogue of Aza'is et al. (1993). Some of their designs have (linear) level-2 neighbour balance also. An example for t = 5, b = 5, k = 4 with level-1 and level-2 balance is: [{3} (1 2 4 3 ) {1}]; [{1} ( 4 5 2 1) {4}];

[{4} ( 2 3 5 4 ) {2}]; [{2} (5 1 3 2 ) {5}].

[{5} ( 3 4 1 5) {3}];

4.2.7. One-dimensional linearly-arranged contiguous-block non-directional designs R. M. Williams (1952) introduced Williams designs and gave a method of construction using circular designs. Note that 'Williams designs' (from E. J. Williams) has also been used for row-complete designs with an extra property (a balanced end design). The designs are essentially neighbour-balanced circular designs, from which linear

Spatial experimental design

495

designs are obtained by adding an extra unit at one end which has the treatment that is on the other end (see Section 4.1). Thus, these designs are not equally replicated: t - 1 treatments occur r times and one occurs r + 1 times (and n = r t + 1). However, in the gls analysis for which these designs were proposed (see Section 5.1), the extra unit has a reduced weight, so that the two end units in the linear sequence have a combined weight equal to that of any interior unit. If the responses are uncorrelated, the extra unit has zero weight. Thus, in the gls analysis the design is effectively equally-replicated. Williams considered both the distinct-pair (his II(a) design), and the like-pair (his II(b) design) designs for r complete blocks of size t (with one extra unit at an end). For the like-pair designs, the like pairs occur as often as the distinct ones, and straddle adjacent blocks. It is necessary that (t - 1)12r (distinct pairs) or tlZr (like pairs). An example of a distinct-pair Williams design with t = 4, b = 3 is: (1234)

(2314)

(3142)

(1).

An example of a like-pair Williams design with t = 3, b = 3 is: (123)

(312)

(231)

(1).

Generalized Williams designs, which are as above but not blocked, were considered by Kunert and Martin (1987a). Examples of generalized Williams designs are: (121323)

(1)

fort=3,

r=2(distinct-pair);

and (112213323)

(1)

fort=3,

r=3(like-pair).

Williams additionally suggested designs (type III) which extend the distinct-pair Williams designs to have level-2 balance for distinct pairs also. These designs require two extra units at one end (and n = r t + 2), although again in Williams' gls analysis these extra units combine in weight with the other two end units to give an effectively equally-replicated design. An example of a type III design for t = 4, b = 3 is: (1234)

(2134)(1324)

(12).

Butcher (1956) considered an extension of the Williams designs to level-p balance, but the designs are unappealing, with blocks of size pt containing strings of p repeated treatments. For example, the design with t = 3, p = 2, b = 2 is: (112233)

(221133)

(11).

R.J. Martin

496

4.2.8. One-dimensional linearly-arranged contiguous-block directional designs The directional forms of the Williams designs in Section 4.2.7 are the serially balanced sequences due to D. J. Finney and A. D. Outhwaite. They are again essentially circular designs adapted to be linear, as in Section 4.2.7, and have n = rt + 1. It is necessary that (t - 1)Ir (distinct pairs) or tlr (like pairs). Clearly a serially balanced sequence is a Williams design of the same type. The designs were intended for use with repeated measurements when there are residual effects. For a discussion of the existence and construction of these designs, see Street and Street (1987, § 14.5). An example of a distinct-pair serially balanced sequence with t = 5, b = 4 is: (12345)

(24135)

(43215)(31425)

(1).

An example of a like-pair serially balanced sequence with t = 3, b = 6 is: (123)

(312)(231)(132)(213)(321)(1).

The serially balanced sequences were extended by Nair (1967) to Nair designs for which all t(t - 1)(t - 2) distinct ordered triples occur equally often. They are again essentially circular designs. They require (t - 1)(t - 2)Jr, and have two extra plots at one end (so that rz = rt + 2). The designs need not be blocked. Nair gave some constructions. Examples of Nair designs for ~ = 4, r~ = 24 are: (2431)

(4321)(342l)(4231)(2341)

(3241)(24)

(complete blocks); (123412432413213421423143)

(12)

(single block).

If only the triples are relevant, one extra plot can be at each end. The complete-block example becomes: (1)

(2431)(4321)

(3241)

(3421)(4231)

(2341)

(2).

Dyke and Shelley (1976) sought extensions of the complete-block Nair designs in which the t(t - 1) triples ABA, etc., occur equally often, and as often as the distinct triples. They put one extra unit at each end, and regard these extra units as border plots. An example with t = 4, r = 9, found by computer search, is: {1}

(2134)

(3124)

(1234)

(2143)

(2413)(2314)(1432)

(4312)

{2}.

(2431)

Spatial experimentaldesign

497

4.3. Two-dimensional designs All the one-dimensional designs in Section 4.2 can be extended in an obvious way to two dimensions, but few have been explicitly examined. Other types of designs can arise if the two directions are regarded as equivalent. Some of the designs that have been proposed are discussed here. The two-dimensional extension of the row-complete Latin square is the complete Latin square. This is both row- and column-complete, and thus has distinct-pair directional neighbour balance in both directions. Directional row- and column-neighbourbalanced designs are examples of 'directional' rectangular lattice polycross designs (note that in the context of polycross designs, 'directional' has a different meaning from that used here - only some directions are relevant). See Street and Street (1987, § 14.4) for some 'directional' polycross designs, in which only some directions are important. An example of a complete Latin square with t = 4 is

1234 3142 2413 4321 The non-directional form of the complete Latin square is the quasi-complete Latin square (Freeman, 1981), in which each of the unordered pairs of distinct treatments are adjacent equally often (twice) in the rows and in the columns. Clearly, a complete Latin square is also quasi-complete. For a discussion of the existence and construction of complete and quasi-complete designs see Street and Street (1987, § 14.3). Bailey (1984) has shown that a valid randomization exists for Abelian group-based quasicomplete Latin squares. An example of a quasi-complete (but not complete) Latin square is any t = 3 Latin square, e.g.

If the two directions are equivalent, it is not necessary to have neighbour balance in each direction. A Latin square is (nearest-) neighbour balanced (Freeman, 1979) if each of the unordered pairs of distinct treatments are adjacent equally often (four times) over the rows and columns. Clearly, a quasi-complete Latin square is also neighbour balanced. These designs are examples of non-directional polycross designs, which can treat the first- and second-order neighbours jointly (row, column, and diagonal). Some constructions for polycross designs with like-pair diagonal neighbours are given by

498

R.J. Martin

Morgan (1988a, 1988b). An example of a neighbour balanced (but not quasi-complete) Latin square with t = 4 is

1234 3412 2143 4321 Neighbour-balanced nested row-column designs which are unbordered, sidebordered or fully bordered, and with or without balanced diagonal neighbours have been considered. For some constructions, see Morgan and Uddin (1991) and the references therein. A non-binary example of t = 9 in 9 blocks of 3 by 3 on a torus (giving a fully-bordered planar design), which has combined row and column (distinct pairs) neighbour balance and combined like-pairs diagonal neighbour balance is:

112

1384 1916

1749 1865

29 384

Street and Street (1987, Table 14.6) give a fully-bordered design for t = 7 in a (interior) 3 by 7 array with combined row and column (distinct pairs) neighbour balance, which does not come from a torus design. Single-block torus designs that are neighbour balanced for all neighbours up to a given spatial order, were considered by Martin (1982). Fully-bordered planar designs can be obtained from these. An example which is neighbour-balanced at all orders if the two axial directions and the two diagonal directions are both equivalent is the torus t = 5 Knight's Move Latin square: 12345 45123 23451 51234 34512

Spatial experimental design

499

5. Efficient designs for spatially dependent observations 5.1. Introduction

As discussed in Section 2, spatial information was used in early experiments in the separation principle, and subsequently in choosing a blocking structure and forming blocks. More formal methods for choosing designs that are efficient under spatial dependence began in 1952, but for many years only simple situations and simple models were considered. It is only quite recently that computing power has allowed realistic models to be fitted, and that the need for efficient designs under these models has arisen. It is often assumed that the neighbour-balanced designs of Section 4 must be efficient for spatial dependence, but the situation is usually more complicated. It will be shown in this section that it is rarely easy to obtain designs that are highly efficient under a spatial dependence model, but that the factors affecting efficiency are the positional balance of the treatments, the (non-directional) neighbour balance, and edge effects. Of these, low-order neighbour balance is often the most important, so there is indeed a link with the neighbour designs of Section 4. In the following, the model given by equations (5) and (6) is assumed to hold after sufficient differencing. The first investigation of design for a particular intended generalized least-squares analysis appears to be that of R. M. Williams (1952). It is instructive to discuss Williams' investigation, and subsequent work on it, in more detail. EXAMPLE 5.1 (Designs for a linearly-arranged sequence of units under an AR(1) or AR(2)). Williams was interested in complete-block designs for linearly-arranged units in contiguous blocks when (a variant of) an AR(1) or AR(2) is assumed, and gls is used. The grouping into complete blocks, was not used in the analysis. He began by considering two systematic designs and the AR(1). His I(a) design regularly (with the same treatment order) repeated the complete block (1 2 --- t); whilst the I(b) (r even) regularly repeated the two complete blocks (1 2 ... t) (t ... 2 1), where the second block is the reflection of the first. He essentially noted that I(a) is poor under a linear trend, but I(b) is linear trend-free. However, his major objection to both designs was that the variances of estimated treatment differences were unequal. The next designs he proposed, the II(a) and II(b) (see Section 4.2.7), were chosen to have equal variances of estimated treatment differences under an AR(1). For an AR(1), V -1 is tridiagonal, with constant elements on the diagonals, apart from the first and last diagonal elements. Williams in fact used a variant on the stationary AR(1) which gave less weight (A2, where A = Pl) to the extra unit, so that the combined weight of the two end units (1 + A2) is the same as the weight of an interior unit. Some of the possible AR(1) variants are discussed in Kunert and Martin (1987a). It is the facts that the leading off-diagonal of V-1 is constant, and all other off-diagonal terms are 0, that lead to the need for the first-order (and only first-order) neighbour balance. Note that equal variances of estimated treatment differences implies that the C matrix (9) is completely-symmetric (in this case/3 = ln).

500

R. Z Mar~n

The type III design (see Section 4.2.7) was chosen for the same reason for a similar variant of an AR(2). The form of V -1 leads to the need for the first- and second-order neighbour balance. Although he noted that the II(a) design was efficient for an AR(1) with positive A (positive decaying correlations), and the II(b) design was efficient for a negative A (alternating decaying correlations), Williams made no claims about optimality of the designs. Cox (1952) did conjecture optimality of the II(a) designs. Kiefer (1961) investigated asymptotically optimal designs, which need not he in complete blocks, for a sequence under an AR(1) or AR(2). He proved that the II(a) (and the generalized II(a)) designs are asymptotically universally optimal (u.o.) for an AR(1) with positive dependence. Exact optimality under the stationary AR(1) is difficult to show, but D- and A-optimality was proved by Kunert and Martin (1987b). Kiefer also noted that the II(b) designs are not optimal for an AR(1) with a negative A. In this case, the asymptotically u.o. designs consist of t adjoining sets of r repeated treatments (11

.--

1)

(22

...

2)...

(it

...

t)

- a design that would never be used in practice. He showed that the Williams type

II(b) (and the generalized II(b)) designs have asymptotic universal minimax optimality if the sign of A is unknown. A design is minimax optimal under the ~p-criterion, say, if its largest q~p-value as A varies is the smallest over all competing designs. A point not noted by Kiefer is that the II(b) designs are asymptotically u.o. complete block designs for the AR(1) with A < 0, and so are then optimal among the class of designs considered by Williams. Kiefer then gave a thorough (asymptotic) investigation of the case of AR(2) dependence, and showed that the Williams type III (and the generalized type III) designs are asymptotically u.o. under certain conditions (corresponding to positive, but nonmonotonically decreasing, correlations). Apart from boundaries in the (A1, A2) region, there are four different types of (asymptotic) optimal design. Which is optimal does not simply depend on the signs of the AR(2) parameters A1, A2 nor on the signs of the correlations pl, p2 (note that Kiefer's Pl, P2 are not correlations, but AR(2) model parameters). The above case is the only one for which a sequence of complete blocks can be optimal (Type III designs). The asymptotic u.o. designs for decreasing positive correlations have first-order neighbour balance for distinct pairs, but also have a maximal number of lag-2 like pairs. Kiefer also found the (asymptotic) universally minimax designs, which have first- and second-order like-pair neighbour balance. Designs that have asymptotic minimax optimality for higher order autoregressions were investigated by Kiefer and Wynn (1984). As an aside on sequences of complete blocks, Martin et al. (1996) have shown that with n = r~, ols estimation and the LV process, any sequence of complete blocks is A-optimal. Example 5.1 illustrates some of the difficulties in finding efficient designs under dependence, and some of the methods that can be used. In particular: I. Designs that are intuitively appealing can be constructed, and their efficiency evaluated.

Spatial experimental design

501

II. Designs can be sought for which the C-matrix is (close to being) completelysymmetric, and their efficiency evaluated. III. Designs which are (weakly) universally optimal can be sought. IV. Optimality may be very difficult to prove. V. Optimality will usually depend on the model that is assumed for the dependence, and which estimator is used (usually ols, or gls with the assumed A). VI. Optimality may depend not just on the model assumed, and whether the dependence is positive or not, but on actual values of the parameter A. VII. The structure of A0 of (8) for ols estimation or A* of (9) for gls estimation can be used to suggest what design features lead to efficiency. VIII. If k > t, extended complete block designs may not be optimal; and if k ~< t, binary designs may not be optimal. Two other points that did not arise in Williams' problem are: IX. If there are few competing designs, a complete enumeration and evaluation may be possible. X. Well-structured searches, or other algorithmic methods, can be used. Some of these are related to problems with design optimality under independence, where for many (~, b, k, r ) the optimal design is not known even when all contrasts are of equal interest. In the rest of this section, some results on efficient spatial design are discussed. Section 5.2 considers the case that results in the theory of optimal design can be used to show that optimal designs exist for some (~, b, k), or that there are designs with criterion values very close to the best possible. In Section 5.3, the case that theory may be useful in showing what sort of approximate properties a design should have, and in furnishing efficiency bounds, is discussed. Sections 5.4 and 5.5 briefly consider test-treatment designs and factorial designs, respectively. For some other discussions on optimal design under dependence, see Shah and Sinha (1989, § 7.2) and Cressie (1991, § 5.6.2); and references in the bibliography of Gill (1991).

5.2. Optimal designs In the following discussion, 'optimal' on its own means universally optimal for gls estimation (with the postulated A assumed correct) or weakly universally optimal for ols estimation (see Section 3.2). In most cases, optimal designs have only been sought for positive dependence. This is because negative dependence is unlikely in practice, and leads to (near-) optimal designs which would not be used in practice. These containing groups (runs in one dimension) of like treatments, or treatments only occurring in like pairs, e.g., 1 1 2 2 ..-. Similarly, minimax optimality is only useful if there is no prior information on the parameters of the dependence structure. If n is large, asymptotic optimality can be sought (as for the Williams II(a) and III designs, and the EBIBDs - see below). In determining the optimal design, the elements of A0 of (8) are relevant for ols estimation, and those of A* of (9) for gls (see Section 3.1). These matrices are symmetric, but in general are not banded (even for a stationary process). If with gls estimation

502

R.J. Martin

an off-diagonal element of A* is positive and k ~< t, tr(C) can be increased by using a non-binary design. In some cases of strong positive dependence, lag two like neighbours are desirable, and the optimal designs are non-binary. If k 2> t, the design must be non-binary, but the optimal design may not be an extended complete block design. With a one dimensional layout of units, the optimal extended block design may not use a sequence of complete blocks. When k > t, tr(C) is maximised if within-block replicates are positioned to take advantage of the largest positive or least negative sums of off-diagonal elements of A*. If enough off-diagonal elements of A* are positive, tr(C) can be increased by having additional within-block replication. Similar results can occur with ols estimation. In practice, it is likely that only binary incomplete designs or extended complete block designs would be used. Apart from those arising in repeated-measurements, cases where optimal designs are known to exist include the nine cases given below. The first eight cases use the Kiefer, and Kiefer and Wynn results on optimality - see Section 3.2. Then the number of blocks must be such that it is possible for the appropriate positional and/or neighbour balance to ensure matrix D of Section 3.2 is completely-symmetric. (1) One-dimensional linearly-arranged contiguous-block designs under an AR(1) or AR(2), and gls estimation. The Williams type II(a) design (Section 4,2.7) is asymptotically optimal under an AR(1) with A > 0, and the Williams type III design (Section 4.2.7) is asymptotically optimal under an AR(2) with A1, /~2 2> 0 -- see Section 5.1. See Kiefer (1961) for other cases, and minimax optimality. The optimality is exact for the circular designs with the circular stationary process. It is also exact for the variant used by Williams. In this case, only low-order neighbour balance is important. The extension to minimax optimal designs under an AR(p) process was given by Kiefer and Wynn (1984). (2) One-dimensional trend-free designs. There is a relatively large literature on trendfree and nearly-trend-free (trend-resistant) designs under independence in various situations - see, for example, Bradley and Yeh (1988), and Lin and Dean (1991), and the references therein. A design is trend-free if the trend is orthogonal (given blocks) to the treatments, so that, with V = I, the C-matrix from model (1) is the same as that from model (2), i.e., T~A = T~B(B~B)-lB~A. If a trend-free design is optimal for model (2) under independence, as for a BIBD, it is optimal under model (1) with the trend. Simple necessary conditions for a design to be trend-free can be found. For example, a block design in which each treatment occupies each plot position equally often must be trend-free under a straight line trend. Recall that having a (within block) polynomial trend of order d in model (1) in one dimension is equivalent to taking (within block) differences of order d in model (2) with V = I, which is equivalent to using a particular A* in (5) and (6). Thus, trend-free designs are a limiting case of optimal designs under dependence when differencing is used. In particular, model (2) with the CG(d) process and ~ = 0 is equivalent to model (1) and independence with a polynomial trend of order d. Note that in this case, positional balance is important, and neighbour balance plays no part.

Spatial experimental design

503

(3) One-dimensional linearly-arranged distinct-block binary designs (k 0, a neighbourbalanced design with a balanced end-design (see Section 4.1) is optimal. A neighbourbalanced design in which each treatment occurs equally often on an end unit (1 or k) will be optimal under gls estimation for the first-difference ARIMA(0, 1,0) model. Ipinyomi (1986) proposed his equineighboured designs (see Section 4.2.3) believing they have a completely-symmetric C-matrix for ols estimation under a stationary process (which would imply they were optimal binary designs). However, this is false in general for equineighboured designs since EkAEk is not, in general, banded. It has been shown that, provided they exist, the semi-balanced arrays (see Section 4.2.3) are optimal for all stationary processes among BIBDs for ols estimation (Kunert, 1985), and among all binary designs for gls estimation (Cheng, 1988). This follows from the pairwise positional balance of the treatments in the semi-balanced array. Martin and Eccleston (1991) noted that most one-dimensional dependence structures that are proposed for use in design are time-reversible (left to right is equivalent to right to left), so that A and A* are centro-symmetric (symmetric under a half-turn). They also noted that the arguments for optimality of the semi-balanced array are not affected by the dependence process being non-stationary (either because the variances differ, or because differencing is required). They generalized the equineighboured designs to strongly equineighboured (SEN) designs. The SEN designs include the semi-balanced arrays. They have the properties that (i) each treatment occurs equally often in each position in a block, except that position i is equivalent to position k - i + 1, and (ii) each unordered pair of treatments occurs equally often (m = 2 b / { t ( t - 1)} or 2m times) within the same block in each unordered pair of plot positions j, jr, j ¢ jr, except that a (k + 1 - j, k + 1 - j') plot pair is equivalent to a (j, j') plot pair. The number of times is m if j + j / = k + 1, and 2m otherwise. In this case, m can be odd if k is even. The optimality properties of the semi-balanced array then hold for the SEN design if A and A* are centro-symmetric. Some constructions of SEN designs are given by Street (1992). An example of a SEN design (m = 1) with t = 4, b = 6, k = 4, which is half of a semi-balanced array is: [1234];

[2143];

[1342];

[3124];

[1423];

[4132].

In this example, each treatment occurs three times on an end plot (1 or 4), and three times in an interior plot (2 or 3). Each pair of treatments occurs twice as an end pair

R. J. Martin

504

((1,2) or (3, 4)), once as an interior pair (2, 3), twice two apart ((1,3) or (2, 4)), and once three apart (1,4). Note that Kunert (1987b, § 3.1) showed that there may be non-BIBD binary unequally-replicated designs that are better than the semi-balanced arrays for some criteria with ols estimation. (4) Two-dimensional planar distinct-block binary designs (k 0. However, a simulated annealing algorithm (Martin and Eccleston, 1996) has found many designs

508

R. J. Martin

(with little obvious symmetry) that are more efficient than D4.21. The best found so far is: 1234 4123 2431 4213 It has complete blocks in the rows, and 9 diagonal self-adjacencies. It does not have positional balance (e.g., 4 is in two corners), nor does it have first-order neighbour balance (the adjacencies are 4, 5, 3, 4, 5, 3). The design appears to be more efficient than D4.21 for all A > 0. For example, it has, when A = 0.5, a relative efficiency with respect to D4:21 of 1.039, and an efficiency of 0.934 compared with an unattainable lower bound. Another design found which is almost as good has positional balance and 10 diagonal self-adjacencies, but more unequal neighbour adjacencies (5, 5, 2, 2, 5, 5). The above approaches can be unsatisfactory because, in many cases, the design obtained may be very specific to the assumptions made. Even a seemingly small change in parameter values may mean the chosen design is less efficient than another. This suggests another approach, which is to seek designs that are not necessarily optimal under any assumptions, but perform well under a range of likely conditions. This approach was followed by Martin et al. (1993) for agricultural variety trials which use block designs in which the plots are arranged linearly, or, if arranged spatially, the predominant dependence is in one direction. There is empirical evidence from Australia that the LV (or CG(1) - see Section 3.1) model is acceptable for a large proportion of field experiments, with the parameter ~b usually between 0 and 5. The approach was therefore to assume that the LV model will usually be reasonable, with ~ in the range [0, 5], and to seek designs that are efficient in this range under gls estimation, but also efficient under some reasonable alternative models. These included the CG(2) and low-order stationary ARMA models. Looking at the form of the A* matrix for the LV model and other models suggested some general principles: (a) The design should be binary. Both the end design (plots 1 and k) and the interior design (plots 2 to k - 1) should approximately contain each treatment equally often. (b) If possible, the design should have the low order (usually to at most lag 3) neighbour adjacency matrices Ng close to complete symmetry, and the end to nextto-end neighbours should also be as balanced as possible. If r is too small for this to be possible, try to get a weighted combination of the Ng, where the weights depend on the likely size of ~, close to complete symmetry. Examples of designs that are efficient and robust are, for t = 6, k = 6, b = 3: [123456];

[362514];

[246135];

Spatial experimental design and f o r t = l O ,

509

k=4, b=5:

[1234];

[25671;

[3798];

[0485];

[6 0 1 9].

5.4. Treatment-control (test-control) designs The optimal design theory for this case is considerably more complicated than when the treatments are of equal status, even with independent observations. It has received little attention so far for dependent observations. Martin and Eccleston (1993) briefly compared some designs using their SEN designs. A more thorough investigation was carried out by Cutler (1993). Cutler looked mainly at one-dimensional circular designs, and an assumed gls analysis under an AR(1). Exact Ate-optimality (see Section 3.2) results are hard to obtain, but Cutler gave conditions under which a (circular) neighbour-balanced BTIB design (see Section 3.2) is optimal for some A. Neighbour balance here means that the test treatments are neighbour balanced, and that, separately, the control treatment is neighbour balanced with the test treatments. Note that the optimal design may depend on the value of A. Cutler briefly considered linearly-arranged designs with the AR(1). He proposed augmented neighbour balanced incomplete block designs as building blocks which may lead to efficient designs. These will have the C-matrix in the desired supplemented balance form (see Section 3.2), and can be constructed from semi-balanced arrays in the test treatments by inserting the control between fixed units in each block. If the control occurs more than once in each block, the number of adjacent occurrences should be minimized. Designs constructed this way will have the much stronger neighbour balance needed for the C-matrix to have the supplemented balance form for any dependence structure. However, the optimal design of this type still needs to be found. Cutler's Example 5.1 compared two designs of this type with t = 4, k = 3, b = 6 (0 is the control): DI: [1 02];

[203];

[30 1];

[0 1 2];

[023];

[03 1];

D2:[1 02];

[203];

[301];

[1 02];

[203];

[30 1].

D1 and D2 are equivalent and Ate-optimal under independence. Cutler claims that D1 is Ate-optimal for small ~ = p~ (~< 0.23), while D2 is optimal for large A (>~ 0.72). In fact, these are the wrong way round (and Curler's paragraph above his Example 5.1 is in error). For stationary processes, D1 is better than D2 if 1 - 4pl ÷ 3p2 < 0, which for the AR(1) gives A > 1/3. Of all designs formed from D2 by within-block permutation, it appears that D2 is Ate-optimal for A ~ 0.3185, while D1 is optimal for A >/ 0.3652. Between these values, the Ate-optimal design appears to be the design obtained from D1 by replacing the fourth block by [1 0 2] (Martin and Eccleston, 1996). Martin and Eccleston (1993) essentially looked at the spatial version of Cutler's linear designs which use semi-balanced arrays (they use a SSEN design, or a reduced

510

R.J. Martin

form of Section 5.2, for the test treatments). As an example, consider a 3 by 3 square with t = 6, b = 5, and a completely-symmetric dependence structure. There are two ways of separating the controls in each block. Using a cross for the test treatments gives:

1345

1451

1512

Iolo IO2O Io3o

Taking a diagonal cross for the test treatments gives:

i040 [o5o I010 i020 1501 II02 1203 r304 If the dependence between horizontal and vertical neighbours is largest and positive, then usually the first design is better for comparing the test treatments. It appears that the second design is Ate-better if the dependence is not strong, but that the first is better for stronger dependence (this corrects the remark in Martin and Eccleston, 1993). The change-over occurs at P~,o ,-~ 0.253 for the AR(1).AR(1). In this example, the control could be replaced by four different controls, one in each position. Each comparison between a control and the new treatments would be equally accurate, and comparisons between test treatments would be as accurate as before. The comparisons between the controls would not be equally accurate, but would probably be of little interest.

5.5. Factorial designs Run orders for factorial designs have previously been considered to minimise the cost of changing levels, and, more recently, for the design to be trend-free. Only very recently have run orders been considered which are efficient under dependence. Cheng and Steinberg (1991) considered the run order of two-level main-effects singleblock factorial experiments under one-dimensional dependence. Their reverse-foldover algorithm leads to run orders with a maximal number of level changes. These designs are usually very efficient under positive dependence, but the individual numbers of sign changes can vary markedly. For example, the minimum aberration resolution IV 2 7-2 r e v e r s e foldover design has 183 level changes, distributed as 31, 30, 29, 28, 27, 23, 15. In some cases, designs with the same total number of level changes, but with them spread more equally, can be found. These will be only marginally more efficient under the D- and A-criteria, but may be much more desirable in practice. As examples, consider a 24 design. The maximal number of sign changes is 53. The reverse-foldover algorithm gives

1, abed, a, bed, ab, cd, b, acd, be, ad, abe, d, ac, bd, c, abd;

Spatial experimental design

511

with individual sign changes of 15, 14, 13, 11 whilst the following design has individual sign changes of 15, 13, 13, 12 and is slightly more efficient - see Table 1 of Cheng and Steinberg (1991): 1, abcd, a, bcd, ab, cd, abe, d, ac, bd, c, acd, bc, ad, b, acd. Saunders et al. (1995) show that there is no design with individual sign changes of 14, 13, 13, 13, but that there are two designs with individual sign changes of 14, 14, 13, 12. These are: 1, abcd, a, bcd, ab, cd, abc, d, ac, bd, acd, b, ad, bc, acd, c; 1, abcd, a, bcd, ad, bc, acd, b, cd, ab, d, abc, bd, ac, abd, c. In the multi-level case with qualitative factors, and interest in main effects (all contrasts of equal interest), the principles of Section 5.3 hold for each factor. Efficient orthogonal run orders which are robust under dependence are those in which each factor is in complete blocks, has good neighbour balance, and has as few self-adjacencies as possible. An example of such a run order for a 52 design (Martin et al., 1995) is:

1, ab, a2b 2, a3b 3, a4b4, a2b 3, b4, a4b, ab 2, a 3, ab 4, a3b2, b3, a 4, aZb,

a4b2, a 2, a3b4, b, ab3, a3b, a4b3, a, a2b4, b2. Acknowledgement I would like to thank all those who have helped and stimulated me over the years on experimental design and analysis. I would particularly like to thank Dr. B. J. N. Blight, who introduced me to the problem of design with dependent errors, and Professors R. A. Bailey, J. A. Eccleston, J. Kunert for their help and support.

References Afsarinejad, K. and E Seeger (1988). Nearest neighbour designs. In: Y. Dodge, V. V. Fedorov and H. R Wynn, eds., Optimal Design and Analysis of Experiments. North-Holland, Amsterdam, 99-113. Azais, J. M., R. A. Bailey and H. Monod (1993). A catalogue of efficient neighbour-designs with border plots. Biometrics 49, 1252-1261. Aza~s, J. M., J.-B. Denis, T. D. Home and A. Kobilinsky (1990). Neighbour analysis of plot experiments: A review of the different approaches. Biom. Prax. 30, 15-39. Bagchi, S., A. C. Mukhopadhay and B. K. Sinha (1990). A search for optimal nested row-column designs. Sankhyd Ser. B 52, 93-104. Bailey, R. A. (1984). Quasi-complete Latin squares: Construction and randomization. J. Roy. Statist. Soc. Ser. B 46, 323-334. Bailey, R. A. (1986). Randomization, constrained. In: S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 7. Wiley, New York, 524-530. Bailey, R. A. (1987). Restricted randomization: A practical example. J. Amer. Statist. Assoc. 82, 712-719.

512

R.J. Martin

Bailey, R. A., J. Kunert and R. J. Martin (1990). Some conunents on gerechte designs. I. Analysis for uncorrelated errors. J. Agron. Crop Sci. 165, 121-130. Bailey, R. A, J. Kunert and R. J. Martin (1991). Some comments on gerechte designs. II. Randomization analysis, and other methods that allow for inter-plot dependence. Z Agron. Crop Sci. 166, 101-11 I. Bailey, R. A. and C. A. Rowley (1987). Valid randomization. Proc. Roy. Soc. Lond. Ser. A 410, 105-124. Bartlett, M. S. (1978). Nearest neighbour models in the analysis of field experiments (with discussion). J. Roy. Statist. Soc. Set B 40, 147-174. Becher, H. (1988). On optimal experimental design under spatial correlation structures for square and nonsquare plot designs. Comm. Statist. Simulation Compur 17, 771-780. Besag, J., E Green, D. Higdon, C. Kooperberg and K. Mengersen (1995). Spatial statistics, image analysis and Bayesian inference (with discussion). Statist. Sci. 10, 3--66. Besag, J. and D. Higdon (1993). Bayesian inference for agricultural field experiments. Bull. Int. Statist. Inst. 55(1), 121-137. Besag, J. and R. Kempton (1986). Statistical analysis of field experiments using neighbouring plots. Biometrics 42, 231-251. Bradley, R. A. and C.-M. Yeh (1988). Trend-free block designs. In: S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 9, Wiley, New York, 324-328. Butcher, J. C. (1956). Treatment variances for experimental designs with serially correlated observations. Biometrika 43, 208-212. Cheng, C.-S. (1987). An optimization problem with applications to optimal design theory. Ann. Statist. 15, 712-723. Cheng, C.-S. (1988). A note on the optimality of semibalanced arrays. In: Y. Dodge, V. V. Fedorov and H. E Wynn, eds., Optimal Design and Analysis (?f Experiments. North-Holland, Amsterdam, 115-122. Cheng, C.-S. and D. M. Steinberg (1991). Trend robust two-level factorial designs. Biometrika 78, 325-336. Cochran, W. G. (1976). Early development of techniques in comparative experimentation. In: D. B. Owen, ed., On the Histary of Statistics and Probability. Dekker, New York, 1-26. Correll, R. L. and R. B. Anderson (1983). Removal of intervarietal competition effects in forestry varietal trials. Silvae Genetica 32, 162-165. Cox, D. R. (1951). Some systematic experimental designs. Biometrika 38, 312-323. Cox, D. R. (1952). Some recent work on systematic experimental designs. J. Roy. Statist. Soc. Set B 14, 211-219. Cox, G. M. (1950). A survey of types of experimental designs. Biometrics 6, 305-306. Discussion 317-318. Cressie, N. A. C. (1991). Statistics for Spatial Data. Wiley, New York. Cullis, B. R. and A. C. Gleeson (1991). Spatial analysis of field experiments - an extension to two dimensions. Biometrics 47, 1449-1460. Cullis, B. R., W. J. Lill, J. A. Fisher, B. J. Read and A. C. Gleeson (1989). A new procedure for the analysis of early generation variety trials. Appl. Statist. 38, 361-375. Cutler, D. R. (1993). Efficient block designs for comparing test treatments to a control when the errors are correlated. J. Statist. Plann. Inference 36, 107-125 (and 37 (1993), 393-412). Dagnelie, E (1987). La m6thode de Papadalds en exp6rimentation agronomique: Consid6rations historiques et bibliographiques. Biom. Prax. 27, 49-64. Dyke, G. V. and C. E Shelley (1976). Serial designs balanced for effects of neighbours on both sides. J. Agric. Sci. 87, 303-305. Edmondson, R. N. (1993). Systematic row-and-column designs balanced for low order polynomial interactions between rows and columns. J. Roy. Statist. Soc. Set B 55, 707-723. Federero W. T. (1955). Experimental Design - Theory and Applications. Macmillan, New York. Federer, W. T. and C. S. Schlottfeldt (1954). The use of covariance to control gradients in experiments. Biometrics 10, 282-290. Fedorov, V. V. (1996). Design of spatial experiments. In: S. Ghosh and C. R. Ran, eds., Handbook of Statistics, Vol. 13, Design and Analysis of Experiments. North-Holland, Amsterdam, 515-553. Fisher, R. A. (1932). Statistical Methods for Research Workers, 4th edn. Oliver and Boyd, Edinburgh. Fisher, R. A. (1950). Statistical Methods for Research Workers, 1lth edn. Oliver and Boyd, Edinburgh. Fisher, R. A. and E Yates (1963). Statistical Tables.for Biological, Agricultural and Medical Research, 6th edn. Oliver and Boyd, Edinburgh.

Spatial experimental design

513

Freeman, G. H. (1979). Complete Latin squares and related experimental designs. J. Roy. Statist. Soc. Ser. B 41, 253-262. Freeman, G. H. (1981). Further results on quasi-complete Latin squares, J. Roy. Statist. Soc. Ser. B 43, 314-320. Freeman, G. H. (1988). Systematic designs. In: S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 8. Wiley, New York, 143-147. Gill, P. S. (1991). A bibliography of nearest neighbour methods in design and analysis. Biom. J. 4, 455-459. Gleeson, A. C. and B. R. CuUis (1987). Residual maximum likelihood (REML) estimation of a neighbour model for field experiments. Biometrics 43, 277-287. Hell, P. and A. Rosa (1972). Graph decomposition, handcuffed prisoners and balanced P-designs. Discrete Math. 2, 229-252. Ipinyomi, R. A. (1986). Equineighboured experimental designs. Austral. J. Statist. 28, 79-88. Kempthome, O. (1986): Randomization-II. In: S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 7. Wiley, New York, 519-524. Kiefer, J. (1961). Optimum experimental designs, V, with applications to systematic and rotatable designs. In: J. Neyman, ed., Proc. 4th Berkeley Symposium, Vol. 1. Univ. of California Press, Berkeley, CA, 381-405. Kiefer, J. (1975). Construction and optimality of generalized Youden designs. In: J. N. Srivastava, ed., A Survey of Statistical Design and Linear Models. North-Holland, Amsterdam, 333-353. Kiefer, J. and H. P. Wynn (1981). Optimum balanced block and Latin square designs for correlated observations. Ann. Statist. 9, 737-757. Kiefer, J. and H. P. Wynn (1984). Optimum and minimax exact treatment designs for one-dimensional autoregressive processes. Ann. Statist. 12, 431-450. Kunert, J. (1985). Optimal repeated measurements designs for correlated observations and analysis by weighted least squares. Biometrika 72, 375-389. Kunert, J. (1987a). Neighbour balanced block designs for correlated errors. Biometrika 74, 717-724. Kunert, J. (1987b). Recent results on optimal designs for correlated observations. Arbeitsberichte, Universit,it Trier. Kunert, J. (1988). Considerations on optimal design for correlations in the plane. In: Y. Dodge, V. V. Fedorov and H. P. Wynn, eds., Optimal Design and Analysis of Experiments. North-Holland, Amsterdam, 123-131. Kunert, J. and R. J. Martin (1987a). Some results on optimal design under a first-order autoregression and on finite Williams' type II designs. Comm. Statist. Theory Methods 16, 1901-1922. Kunert, J. and R. J. Martin (1987b). On the optimality of finite Williams II(a) designs. Ann, Statist. 15, 1604-1628. Lin, M. and A. M. Dean (1991). Trend-free block designs for varietal and factorial experiments. Ann. Statist. 19, 1582-1596. Martin, R. J. (1982). Some aspects of experimental design and analysis when errors are correlated. Biometrika 69, 597-612. Martin, R. J. (1985). Papadakis method. In: S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 6. Wiley, New York, 564-568. Martin, R. J. (1986). On the design of experiments under spatial correlation. Biometrika 73, 247-277. Correction (1988) 75, 396. Martin, R. J. (1990a). Some results on the ~p-Value when errors are correlated. Comput. Statist. Data Anal. 9, 113-121. Martin, R. J. (1990b). The use of time series models and methods in the analysis of agricultural field trials. Comm. Statist. Theory Methods 19, 55-81. Martin, R. J. (1996a). Low-order spatially distinct Latin squares. Comm. Statist. Theory Methods, to appear. Martin, R. J. (1996b). Optimal small-sized block designs under dependence. Preprint. Martin, R. J. and J. A. Eccleston (1991). Optimal incomplete block designs for general dependence structures. J. Statist. Plann. Inference 28, 67-81. Martin, R. J. and J. A. Eccleston (1993). Incomplete block designs with spatial layouts when observations are dependent. J. Statist. Plann. Inference 35, 77-91. Martin, R. J. and J. A. Eccleston (1996). Construction of optimal and near-optimal designs for dependent observations using simulated annealing. Preprint.

514

R. J. Martin

Martin, R. J., J. A. Eccleston and A. C. Gleeson (1993). Robust designs when observations within a block are correlated. J. Statist. Plann. Inference 34, 433-450. Martin, R. J., J. A. Eccleston and G. Jones (1995). Some results on multi-level factorial designs with dependent observations. Preprint. Morgan, J. P. (1988a). Terrace constructions for bordered, two-dimensional neighbor designs. Ars Combinatoria 26, 123-140. Morgan, J. P. (1988b). Balanced polycross designs. J. Roy. Statist. Soc. Ser. B 50, 93-104. Morgan, J. P. and I. M. Chakravarti (1988). Block designs for first and second order neighbor correlations. Ann. Statist. 16, 1206-1224. Morgan, J. P. and N. Uddin (1991). Two-dimensional design for correlated errors. Ann. Statist. 19, 21602182. Morris, M. D. and T. L Mitchell (1995). Exploratory designs for computational experiments. J. Statist. Plann. Inference 43, 381-402. Nair, C. R. (1967). Sequences balanced for pairs of residual effects. J. Amer. Statist. Assoc. 62, 205-225. Pearce, S. C. (1983). The Agricultural Field Experiment. Wiley, Chichester. Pearson, E. S. (1938). "Student" as statistician. Biometrika 30, 210-250. Pithuncharurnlap, M., K. E. Basford and W. T. Federer (1993). Neighbour analysis with adjustment for interplot competition. Austral. J. Statist. 35, 263-270. Rao, C. R. (1961). Combinatorial arrangements analogous to orthogonal arrays. Sankhygt A 23, 283-286. Reddy, M. N. and C. K. R. Chetty (1982). Effect of plot shape on variability in Smith's variance law. Experimental Agriculture 18, 333-338. Russell, K. G. and J. Eccleston (1987a). The construction of optimal balanced incomplete block designs when adjacent observations are correlated. Austral. J. Statist. 29, 84-90. Russell, K. G. and J. Eccleston (1987b). The construction of optimal incomplete block designs when observations~within a block are correlated. Austral. J. Statist. 29, 293-302. Saunders, I. W., J. A. Eccleston and R. J. Martin (1995). An algorithm for the design of 2p factorial experiments on continuous processes. Austral. J. Statist. 37, 353-365. Shah, K. R. and B. K. Sinha (1989). Theory of Optimal Designs. Springer, New York. Steinberg, D. M. (1988). Factorial experiments with time trends. Technometrics 30, 259-269. Street, A. P. and D. J. Street (1987). Combinatorics of Experimental Design. Clarendon Press, Oxford. Street, D. J. (1992). A note on strongly equineighboured designs. J. Statist. Plan. Inference 30, 99-105. Street, D. J. (1996). Block and other designs used in agriculture. In: S. Ghosh and C. R. Ran, eds., Handbook of Statistics, Vol. 13. Design and Analysis of Experiments. North-Holland, Amsterdam, 759-808. Taplin, R. and A. E. Raftery (1994). Analysis of agricultural field trials in the presence of outliers and fertility jumps. Biometrics 50, 764-781. Uddin, N. and J. P. Morgan (199i). Optimal and near optimal sets of Latin squares for correlated errors. J. Statist. Plann. Inference 29, 279-290. Uddin, N. and J. P. Morgan (1996). Universally optimal two-dimensional block designs for correlated observations. Preprint. Wild, P. R. and E. R. Williams (1987). The construction of neighbour designs. Biometrika 74, 871-876. Wilkinson, G. N., S. R. Eckert, T. W. Hancock and O. Mayo (1983). Nearest neighbour (NN) analysis of field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 45, 151-211. Williams, R. M. (1952). Experimental designs for serially correlated observations. Biometrika 39, 151-167. Yates, E (1948). Contribution to the discussion of a paper by F. J. Anscombe. J. Roy. Statist. Soc. Ser. A 111, 204-205. Yates, E (1967). A fresh look at the basic principles of the design and analysis of experiments. In: L. M. Le Cam and J. Neyman, eds., Proc. 5th Berkeley Symposium, Vol. 4. Univ. of California Press, Berkeley, CA, 777-790. Zimmerman, D. L. and D. A. Harville (1991). A random field approach to the analysis of field-plot experiments and other spatial experiments. Biometrics 4% 223-239.

S. Ghosh and C. R. Rao, eds., Handbook of Statistics, Vol. 13 1996 Elsevier Science B.V.

11~ l lI.J

Design of Spatial Experiments: Model Fitting and Prediction*

Valerii Fedorov 1

1. Introduction Since the earliest days of the experimental design theory, a number of concepts like split plots, strips, blocks, Latin squares, etc. (see Fisher, 1947), were strongly related to experiments with spatially distributed or allocated treatments and observations. In this survey we confine ourselves to what can be considered as an intersection of ideas developed in the areas of response surface design of experiments and spatial statistics. The results which we are going to consider are also related to the results developed by Cambanis (1985), Cambanis and Su (1993), Matern (1986), Micchelli and Wahba (1981), Sacks and Ylvisaker (1966, 1968, 1970) and Ylvisaker (1975, 1987). What differs in the approach of this paper from those cited? We intend to use the techniques which are based on the concept of regression models while the cited studies are based on the ideas developed in the theory of stochastic processes and the theory of integral approximation. If this survey were to be written for a very applied audience, the title "optimal allocations of sensors" or "optimal allocation of observing stations" could be more appropriate. Environmental monitoring, meteorology, surveillance, some industrial experiments and seismology are the most typical areas in which the considered results may be applied. What are the most common features of the experiments to be discussed? 11 There are variables x E X C R k, which can be controlled. Usually k = 2, and in the observing station problem, xl and x2 are coordinates of stations and X is a region where those stations may be allocated. *Research sponsored by the Applied Mathematical Sciences Research Program Office of Energy Research, U.S. Department of Energy under contract number DE-AC05-96OR22464with Lockheed Martin Energy Systems Corporation. IThis submitted manuscript has been authored by a contractor of the U.S. Government under contract No. DE-AC05-96OR22464.Accordingly,the U.S. Governmentretains a nonexclusive, royalty-freelicense to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. 515

V. Fedorov

516

2. There exists a model describing the observed response(s) or dependent variable(s) y. More specifically y and x are linked together by a model, which may contain some stochastic components. 3. An experimenter or a practitioner can formulate the quantitative objective function. 4. Once a station or a sensor is allocated a response y can be observed either continuously or according to any given time schedule without any additional significant expense. 5. Observations made at different sites may be correlated. Assumptions 1-5 are very loosely formulated and they will be justified when needed. In the subsequent sections the term "sensor" stands for what could be an observing station, meteorological station, radiosonde or well in the particular applied problem.

2. Standard design problem In what follows we will mostly refer to experiments which are typical in environmental monitoring setting as a background for the exposition of the main results. We hope that the reader will be able to apply the ideas and techniques to other types of experiments. When Assumptions 4 and 5 are not considered we have what will be be called, the "standard design problem". The problem was extensively discussed (see for instance, Atkinson and Donev, 1992; Fedorov, 1972; Pazman, 1986; Pukelsheim, 1994; Silvey, 1980), and it is difficult to add anything new in this area of experimental design theory. Theorem I which follows, is a generalized version of the Kiefer-Wolfowitz equivalence theorem (see Kiefer, 1959) and stated here for the reader's convenience. It also serves as an opportunity to introduce the notation, which is sometimes different from that used in other articles of this volume. Let

yij=rl(xi,O)+cij,

i = 1 , . . . , n , j--- 1 , . . . , r i , E r i = N '

(2.1)

~(x, e) = er f(x), where 0 • R "~ are unknown parameters, f T ( x ) = (fl ( x ) , . . . , f,~(x)) are given functions, supporting points xi are chosen from some set X, and the sij are uncorrelated random errors with zero means and variances equal to one: We do not make distinctions in notation for random variables and their realizations when it is not confusing. For the best linear unbiased estimator of unknown parameters the accumulated "precision" is described by the information matrix:

M(~) = E p i f ( x i ) f T ( x i ) ,

p~ = r i / N ,

(2.2)

Design of spatial experiments: Model fitting and prediction

517

which is completely defined by the design ~ = { X i, P~}in • In the context of the standard design theory

(2.3) where ((dx) is a probability measure with the supporting set belonging to X: supp ~ C X, and

is the information matrix of an observation made at point x. Regression model (2.1) and the subsequent comments do satisfy Assumptions 1 and 2 from the previous section. To be consistent with Assumption 3 let us introduce a function O(M), which is called the "criterion of optimality" in experimental design literature. A design (2.4) is called (~-) optimal. Minimization must be over the set of all possible probability measures ~ with supporting sets belonging to X. Now let us assume that: (a) X is compact; (b) f ( x ) are continuous functions in X, f E Rr~; (c) ~P(M) is a convex function and

i.e., matrices M and A are nonnegative definite. (d) there exists a real number q such that

where r ( a , ¢, ~) = o(~). Here and in what follows we use ko(~) for ~[M(~)],ff'* for ~(~*) and min~,min¢,f, and so on, instead of minx~x, mincEz, f x , respectively, if it does not lead to ambiguity. THEOREM 1. If (a)-(e) hold, then (1) For any optimal design there exists a design with the same information matrix which contains no more than n = m ( m + 1)/2 supporting points.

E Fedorov

518

(2) A necessary and sufficient condition for a design ~* to be optimal is fulfillment of the inequality: min¢(x,~*) >/O. (3) The set of optimal designs is convex. (4) ¢(x, ~*) achieves zero almost everywhere in supp~*, where supp ~, stands for supporting set of the design (measure) ~. Functions ¢(x, ~) for the most popular criteria of optimality may be found, for instance, in Atkinsoii and Fedorov (1984). Theorem 1 provides a starting point for analytical exercises with various relatively simple regression problems and makes possible the development of a number of simple numerical procedures for the optimal design construction in more complicated and more realistic situations. Most of these procedures are based on the following iterative scheme: • (a) There is a design ~s E •(q). Find

(2.5)

xs = arg min {¢(x +, ~s), ¢ ( x - , ~s)}, x + = argmi

¢(x,

s),

x - = arg m ~ ¢(x, ~s), xE2t8

where Xs = supp ~8. • (b) Choose 0 0 and any ~N the rate of convergence for either QI(~N) or for Q2(~N) will not be generally better than O(N-1). This is slower than for any continuous covariance kernel. Thus, for large N the Sacks-Ylvisaker approach and results from Sections 5 and 6 may lead to the different asymptotically optimal designs. If one believes that there is no instrument or any other observation error, then the Sacks-Ylvisaker approach leads to the better limit designs. When the contribution of observation errors is significant then approximation (7.8) becomes very realistic and allows the use of methods from Sections 5 and 6, which usually produce optimal designs with very moderate numbers of supporting points. Usually these designs have about n supporting points. The existence of well developed numerical procedures and software allows the construction of optimal designs for any reasonable covariance function V(x, x r) and various design regions X, including two and three dimension cases. Let us notice that the function Cov(x, x r I 4) used in Theorem 4 may be presented in the following form: cr2N-1Cov(x, x t I ~ )

:

- yf(

, ¢)

+ y.(¢))-'

(7.10)

P = 5ijri, ri = piN. This formula is convenient for some theoretical exercises. For more applied objectives and for development of numerical algorithms based on the iterative procedures from Sections 2 and 3, the direct use of eigenfunctions f~(x) is more convenient.

Popular kernels.

There are several show-case processes and design regions for which analytic expressions for the covariance kernel exist, and the corresponding eigenvalues and eigenfunctions are known: For the Brownian motion the kernel is

V(x,x')=min(x,x'),

0~ ~*, where 3" is a positive constant. Here 3" and P* are specified in advance by the experimenter, and P* is chosen greater than 1/k because otherwise we can make a no-data decision by selecting one of the treatments randomly as the best. When #[k] --#[k-l] < 6", two or more treatments including the best are sufficiently close and the experimenter is assumed to be indifferent as to setting probability requirement in this case. The region [2~. = {(~, tr 2) I ~ = ( / z l , . . . , #k), cr2 > 0, #[k] --#[k-l] >~ ~*} is called the preference-zone and its complement w.r.t, the entire parameter space Y2 = {(#, a 2) I - o o < #i < oo, i = 1 , . . . , k, c~2 > 0} is the indifference-zone. Denoting the probability of a correct selection (PCS) using the rule R by P(CS [ R), it is required that any valid rule R satisfy the condition: P(CS I R) > / P *

whenever (#,or e) E ~ . .

(1.2)

The design aspect of this basic setup is the determination of the minimum (common) sample size n so that the probability requirement (1.2) is satisfied. In the subset selection approach developed by Gupta (1956), the goal is to select a nonempty subset of the k treatments so that the best treatment will be included in the selected subset with a guaranteed minimum probability P*. The subset size is not specified in advance; it is random and determined by the data. Formally, any valid rule should satisfy the condition: P(CS I R) > / P *

for all (#, a 2) E g2.

(1.3)

It is obvious that the requirement (1.3) can always be met by including all the treatments in the selected subset. So the performance of a rule is studied usually in terms of the expected size/3 of the selected subset. It is expected that a reasonable procedure will tend to select only one treatment when #[k] -- #[k-l] gets large. Besides being a goal in itself, selecting a subset containing the best can also be considered as the first-stage screening in a two-stage procedure designed to select one treatment as the best; see, for example, Tamhane and Bechhofer (1977, 1979). The probability requirements (1.2) and (1.3) are also known as the P*-conditions. An important step in obtaining the constant(s) associated with a proposed rule R so that the P*-condition is satisfied is to evaluate the infimum of the PCS over f2 or f2~. depending on the approach. Any configuration of (#, ~r2) for which the infimum is attained is called a least favorable configuration (LFC). Although we have discussed the selection problem in terms the normal means, the problem in general is to select from k p o p u l a t i o n s / / 1 , . . . , Hk characterized by

Design of experiments with selection and ranking goals

557

the distribution functions Fo~, i -- 1 , . . . , k, respectively, where the 0i are unknown parameters taking values in the set 69. The populations are ranked in terms of the 0~ (there may be other nuisance parameters). The ordered 0~ are denoted by 011] ~< ... ~* > 0}. There are several variations and generalizations of the basic goal in both indifference-zone and subset selection formulations. One can generalize the goal to select at least s of the t best populations with 1 ~< s ~< t ~< k - 1. In the subset selection approach, the size of the selected subset can be random subject to a specified maximum m (I ~< m ~< k). This approach of restricted subset selection studied by Gupta and Santner (1973) and Santner (1975) combines the features of the indifference-zone and subset selection formulations. There have been other attempts in this direction of integrated formulations; for example, see Sobel (1969), Chen and Sobel (1987a, b). An important modification of the goal of selecting the best population is selecting a good population or a subset containing only good populations or containing all the good populations. A good population is defined as one which is close enough to the best within a specified threshold value. There is now a vast literature on selection and ranking procedures. Several aspects of the theory and associated methodology of these and related procedures have been dealt with in the books by Bechhofer et al. (1968), Biiringer et al. (1980), Gibbons et al. (1977), Gupta and Huang (1981), Gupta and Panchapakesan (1979) and Mukhopadhyay and Solanky (1994). A very recent book is by Bechhofer et al. (1995). A categorical bibliography is provided by Dudewicz and Koo (1982). Besides these books, there have been published review articles dealing with several specific aspects of selection and ranking. The reader is specially referred to Gupta and Panchapakesan (1985, 1988, 1991, 1993), Panchapakesan (1992, 1995a, b), and Van der Laan and Verdooran (1989). In spite of the vast published literature, there have been only a few papers until the recent years devoted to design models beyond single-factor experiments. In this paper, besides the most common single-factor experiments involving mainly normal distributions, we review significant results involving blocking and factorial designs. The emphasis is not on a total coverage but enough to provide a focus on these problems to help assess the current status and potential for applications and further investigations. Sections 2 through 5 discuss selection procedures and simultaneous confidence intervals under the assumption that the observed responses for treatments are normally distributed. Selection procedures under both the indifference-zone and the subset formulations are discussed in each section. Section 2 deals with selecting the best treatment and simultaneous confidence statements for comparisons with best in singlefactor experiments. Section 3 considers these procedures in experiments with blocking while Section 4 deals with these procedures in factorial experiments. Selection with respect to a standard or control is discussed in Section 5. Selection in experiments involving other models is briefly discussed in Section 6. The models discussed are: Bernoulli, multinomial and restricted families such as IFR and WRA.

558

S. S. Gupta and S. Panchapakesan

2. Selecting the best treatment in single-factor experiments: Normal theory Consider k >~ 2 treatments H1,. • •, Hk, where Hi represents a normal population with mean #i and variance a/2. The means #i are unknown. Different assumptions can be made about the (7i2 depending on the context of the experiment. As before, the ordered #i are denoted by #[11 ~ "'" ~< #[kl and no prior information is available regarding the true pairing of the ordered and unordered #i-

2.1. Indifference-zone approach

As described in Section 1, the goal is to identify one of the k treatments as the best (the one associated with # N ) with a guaranteed minimum probability P* of a correct selection whenever #[k] - #[k-l] ~> (f*, where 5" > 0 and 1 / k < P* < 1 are specified in advance. Under the assumption that a~ . . . . . ~rk2 = cr2 (known), Bechhofer (1954) proposed the following single-stage procedure based on samples of common size n. Let Yi denote the mean of the sample responses Yij, j = 1 , . . . , n, from Hi, i = 1 , . . . , k. His rule is m

RI:

Select the treatment Hi that yields the largest Yi.

(2.1)

The LFC for this rule is given by #[11 . . . . . #[k-l] = #[k] -- 5". For given (k, 5*/cr, P*), the minimum sample size n required to meet the P*-condition is given by

n = \

\-~-j

,

(2.2)

where (x) denotes the smallest integer ~> x, H satisfies Pr{Za ~< H , . . . , Z k _ I

~< H } = P*,

(2.3)

and the Zi are standard normal variates with equal correlation p = 1/2. Values of H can be obtained for several selected values of k and P* from Bechhofer (1954), Gibbons et al. (1977), Gupta (1963), Gupta et al. (1973) and Milton (1963). Hall (1959) and Eaton (1967) have shown that the rule R1 in (2.1) is the most economical in the sense of requiring fewest observations per treatment among all single-stage location invariant procedures satisfying the P*-condition. Two stage procedures for the problem of selecting the normal treatment with the largest mean assuming a common known variance a 2 have been studied by Cohen (1959), Alam (1970) and Tamhane and Bechhofer (1977, 1979). These procedures use the subset selection procedure of Gupta (1956, 1965) to eliminate inferior treatments at the first stage and select the best from among the remaining ones at the second stage. We describe the Tamhane-Bechhofer procedure Rz below.

Design of experiments with selection and ranking goals

559

R2: Take a random sample of nl observations from each Hi, i = 1 , . . . , k. Eliminate from further consideration all treatments Hi for which Yi < Y[k] -- h a / v / ~ , where Y[1] ~< " " ~< Y[k] are the ordered sample means Y~, and h is a constant to be determined. If only one treatment remains, then it is selected as the best. If more than one treatment remain, then proceed to the second stage by taking an additional random sample of size n2 from each of these remaining treatments. Select the treatment that yields the largest sample mean based on the combined sample of nl + n 2 observations. The above procedure R2 of Tamhane and Bechhofer (1977, 1979) involves constants (nl, n2, h) to be determined in order to satisfy the P*-condition. These constants are determined by using a minimax criterion (in addition to the P*-condition) which minimizes the maximum over the entire parameter space 12 of the expected total sample size required by the procedure. The LFC for this procedure was first established only for k = 2. The constants (nl, n2, h) tabulated by Tamhane and Bechhofer (1979) for selected values of k, P*, and ~ * / a are conservative since they are based on the LFC for a lower bound of the PCS. The fact that the LFC for the PCS is #[a] . . . . . #[k-a] = #[k] - ~* was proved by Sehr (1988) and Bhandari and Chaudhuri (1990). A truncated sequential procedure for this problem has been investigated by Bechhofer and Goldsman (1987, 1989). It is designed to have improved performance over an earlier procedure of Bechhofer et al. (1968) which is an open non-eliminating sequential procedure as opposed to the Bechhofer-Goldsman procedure which is a closed but also a non-eliminating procedure. A multi-stage or sequential procedure is called open if, in advance of the experiment, no fixed upper bound is set on the number of observations to be taken from each treatment; otherwise, it is called closed. An eliminating procedure is one which excludes treatments from further sampling if they are removed from further consideration at any stage prior to taking the terminal decision. A non-eliminating procedure, on the other hand, samples from each treatment at each stage whether or not any treatment is removed from the final consideration of selection. Another well-known procedure for the problem under discussion is that of Paulson (1964) which is a closed procedure with elimination. This procedure was successively improved (by changing the choices for certain constants) by Fabian (1974) and Hartman (1988). For further details regarding various procedures and their performance, see Gupta and Panchapakesan (1991). We now consider the case of unknown common variance (72. This is the classical problem of the one-way ANOVA model. If one chooses to define the preference-zone as 126- = {(#,(7) [ #[k] - #[k-a] >~ 3"(7}, then the single-stage procedure Ra in (2.1) can still be used with the minimum required sample size n given by (2.2) with (7 = 1. If we continue with the preference-zone 126. = {(#, a) ] #[k] - # [ k - l ] /> 3"} as before, it is not possible to devise a single-stage procedure that satisfies the P*-condition. This is intuitively clear from the fact that the determination of the minimum sample size required depends on the knowledge of (7. In this case of unknown (72, Bechhofer et al. (1954) proposed an open two-stage non-eliminating procedure for selecting the best treatment. The first stage is used to estimate (r2 and determine the total sample size needed to guarantee the probability requirement. The second stage, if necessary, is used to make the terminal decision. Using in addition the idea of screening, Tamhane

s. s. Gupta and S. Panchapakesan

560

(1976) and Hochberg and Marcus (1981) have studied three-stage procedures where the first stage is utilized to determine the additional sample sizes necessary in the subsequent stages, the second stage is used to eliminate inferior populations by a subset rule, and the third stage (if necessary) to make the final decision. Tamhane (1976) also considered a two-stage eliminating procedure which was found to be inferior to the non-eliminating procedure of Bechhofer et al. (1954). Later, Gupta and Kim (1984) proposed a two-stage eliminating procedure with a new design criterion and obtained a sharp lower bound on the PCS. Gupta and Miescke (1984) studied two-stage eliminating procedures using a Bayes approach. Here we will describe the procedure of Gupta and Kim (1984). /1:~3: Take a random sample of size nl ( ) 2) from each treatment. Let Xi be the sample mean associated with treatment Hi, i = 1 , . . . , k, and S~ denote the usual pooled sample variance based on u = k(nl - 1) degrees of freedom. Determine the subset I of { H 1 , . . . , Hk} given by I =

I-Z

>>-x k] -

5")+},

where a + = max(a, 0) and d is a constant to be chosen to satisfy the P*-condition. If I consists of only one treatment, then select it as the best; otherwise, take an additional sample of size N - nl from each treatment in I, where N = max {nl, {(hS,/8*)z)}, (y) denotes the smallest integer >/y, and h is a positive constant to be suitably chosen to satisfy the P*-condition. Now, select as the best the population in I which yields the largest sample mean based on the combined sample of size N. There are several possible choices for (nl, d, h) to satisfy the P*-condition. Gupta and Kim (1984) used the requirement that Pr{the best population is included the subset I} ~> P~*,

(2.4)

where P1*(P* < PI* < 1) is pre-assigned. Evaluation of these constants is based on a lower bound for the PCS. The Monte Carlo study of Gupta and Kim (1984) shows that their procedure _R3 performs much better than that of Bechhofer et al. (1954) in terms of the expected total sample size. Recently, there have been a series of papers regarding the conjecture of the LFC for the Tamhane-Bechhofer procedure R2 and some other related procedures for selecting the best normal treatment when the common variance trz is known. As mentioned earlier, the conjecture that the LFC is #[q . . . . . ]-t[k-l] = #[k] -- (f* has been proved by Sehr (1988) and Bhandari and Chaudhuri (1990). The LFC's for two-stage procedures for more generalized goals have been established by Santner and Hayter (1993) and Hayter (1994). It will be interesting to reexamine the performance of the concerned procedures by using the exact infimum of the PCS.

Design of experiments with selection and ranking goals

561

2.2. S u b s e t s e l e c t i o n a p p r o a c h

As in Section 2.1, we are still interested in selecting the best treatment. However, we do not set in advance the number of treatments to be included in the selected subset. It is expected that a good rule will tend to select only one population as #[k] - #[k-l] gets sufficiently large. Gupta (1956) considered the case of known as well as unknown o"2. Based on samples of size n from each population, his rule, in the case of known 0"2, is R4: Select the treatment Hi if and only if Xi ./ max X j

do"

(2.5)

where Xi is the sample mean from Hi and d is the smallest positive constant for which the PCS ~> t9. for all (#, o") E g2. (Any larger d would obviously satisfy the /9*-condition but would, if anything, only increase the size of the selected subset.) This smallest d is given by ~ - H where H is the solution of (2.3). Thus d can be obtained from the tables mentioned previously. When o"2 is unknown, Gupta (1956) proposed the rule R5 which is R4 with o" replaced by S~, where S~ is the usual pooled estimator of o"2 with u = k ( n - 1) degrees of freedom. To keep the distinction between the two cases, we use d ~ in the place of d. The smallest d r needed to satisfy the/9*-condition is the one-sided upper (1 - P*) equicoordinate point of the equicorrelated (k - 1)-variate central t-distribution with the equal correlation p = 0.5 and the associated degrees of freedom u =- k ( n - 1). The values of d r have been tabulated by Gupta and Sobel (1957) for selected values of k, n, and/9*. They are also available from the tables of Gupta et al. (1985) corresponding to correlation p = 0.5. It should be noted that, unlike in the case of the indifference-zone approach, we do have a single-stage procedure for any specified n when o"2 is unknown. We may not have a common variance o"2 (the heteroscedasticity case) or a common sample size (unbalanced design). These cases have been studied by Gupta and Huang (1976), Chen et al. (1976) and Gupta and Wong (1982). In all these cases, the authors have used lower bounds for the infimum of the PCS to meet the/9*-condition. When the variances are unknown and unequal, and the sample sizes are unequal, Dudewicz and Dalai (1975) proposed a two-stage procedure using both the indifference-zone and subset selection approaches. Sequential subset selection procedures have also been studied which are applicable to the normal model. For a review of these, the reader is referred to Gupta and Panchapakesan (1991). As we have pointed out previously, several modifications of the basic goal have been investigated. In particular, we mention here the restricted subset selection approach which includes a specified upper bound for the expected subset size which is otherwise random. Procedures of this type have been proposed by Gupta and Santner (1973) and Santner (1975). Several authors have also studied the modified goal of selecting good populations. Reference can be made to Gupta and Panchapakesan (1985) and Gupta and Panchapakesan (1991).

s. S. Guptaand S. Panchapakesan

562

2.3. Simultaneous confidence intervals for comparisons with the best Related to the selection and ranking objectives is the multiple comparison approach in which one seeks simultaneous confidence sets for meaningful contrasts among a set of given treatments. A comprehensive treatment of this topic can be found in the text by Hochberg and Tamhane (1987). Our main interest here is simultaneous comparisons of all treatments with the best among them. In other words, we are interested in simultaneous confidence intervals for #i - maxjci #j, taking a larger treatment effect to imply a better treatment. If #i - maxj#i #j < 0, then treatment i is not the best and the difference represents the amount by which treatment i is inferior to the best. On the other hand, if the difference is positive, then treatment i is the best and the difference is the amount by which it is better than the second best. Assume that all (normal) treatments Hi have a common unknown variance cr2. Let Y i denote the mean of n independent responses on treatment Hi, i = 1 , . . . , k. Let S 2 denote the usual pooled (unbiased) estimator of crz based on u = k ( n - 1) degrees of freedom. Hsu (1984) showed that the intervals

j¢i

j¢~

(2.6)

form 100(1 - - a ) % simultaneous confidence intervals for #i - m a x j ¢ i #j, i = 1 , . . . , k. Here - z - = rain(x, 0), x + = max(x, 0), and C = cS~,/v/-n, where c satisfies

j~k

(2.7)

under the assumption that #l . . . . . #k = 0. The intervals in (2.6) are closely related to the selection and ranking methods discussed previously. It was shown by HSU (1984) that the upper bounds of these intervals imply the subset selection inference of Gupta (1956) and the lower bounds imply the indifference-zone selection inference of Bechhofer (1954). Any treatment i for which the upper bound of #i - maxj¢i #j is zero can be inferred to be not the best. Similarly, any treatment i for which the lower bound of #i - maxj¢i #j is zero can be inferred to be the best. The intervals in (2.6) are "constrained" in the sense that the lower bounds are nonposifive and the upper bounds are nonnegafive. Removing these constraints would require an increase in the critical value. The nonnegativity constraint on the upper bounds does not present a great disadvantage as one will not normally be interested in knowing how bad is a treatment that is rejected as not the best. On the other hand, it will be of interest to assess how much better than others is a treatment inferred to be the best. Motivated by these considerations Hsu (1985) provided a method of unconstrained multiple comparisons with the best which removes the nonposifivity constraint on the lower bounds in (2.6) by increasing the critical value slightly. We describe these simultaneous intervals below.

Design of experiments with selection and ranking goals

563

Let D = dS~/v/-n where d is the solution of Pr{Z~ 2, Y[kl is highly unsatisfactory. It is highly positively biased when the #i are equal or close. The bias becomes more severe as k increases and, in fact, it tends to infinity as k -+ oc. Sarkadi (1967) and Dahiya (1974) have studied this problem for k = 2 and known common variance a 2. Hsieh (1981) also discussed the k = 2 case but with unknown a 2. Cohen and Sackrowitz (1982) considered the case k ~> 3 with known a 2. They have given an estimator which is a convex weighted combination of the ordered sample means Y[i] where the weights depend on the adjacent differences in the ordered means. Jeyaratnam and Panchapakesan (1984) discussed estimation after selection associated with the subset selection rule R4 defined in (2.5) for selecting a subset containing the best treatment. They considered estimating the average worth of the selected subset

S. S. Gupta and S. Panchapakesan

564

defined by M = ~-~ies IZi/[S[, where S denotes the selected subset and IS I denotes the size of S. For the case of k = 2 and known a 2, Jeyaratnam and Panchapakesan (1984) considered the natural estimator which is positively biased and some modified estimators with reduced bias. Cohen and Sackrowitz (1988) have presented a decision-theoretic framework for the combined decision problem of selecting the best treatment and estimating the mean of the selected treatment and derived results for the case of k = 2 and known a 2 with common sample size. Gupta and Miescke (1990) extended this study in several directions. They have considered k > 2 treatments, different loss components, and both equal and unequal sample sizes. As pointed out by Cohen and Sackrowitz (1988), the decision-theoretic treatment of the combined selection-estimation problem leads to "selecting after estimation" rather than "estimating after selection". Estimation after selection is a meaningful problem which needs further study. The earlier papers of Dahiya (1974), Hsieh (1981), and Jeyaratnam and Panchapakesan (1984) dealt with several modified estimators in the case of k = 2 treatments. These have not been studied in detail as regards their desirable properties. The decisiontheoretic results require too many details to provide a comprehensive view. For a list of references, see Gupta and Miescke (1990).

2.5. Estimation of PCS Consider the selection rule R1 of Bechhofer (1954) defined in (2.1) for selecting the treatment with the largest mean #i. This rule is designed to guarantee that PCS ~> P* whenever #[k] - #[k-l] ~> 6*. However, the true parametric configuration is unknown. If #[k] - #[k-l] < 3", the minimum PCS cannot be guaranteed. Thus a retrospective analysis regarding the PCS is of importance. For any configuration of #, PCS =

e c~ i = l

t + v/~(#[k] - P [ i ] )

¢(t) dr,

(2.10)

~T

where ~ and ¢ are the standard normal cdf and density function, respectively. Olkin et al. (1976, 1982) considered the estimator/3 obtained by replacing the #[i] by Y[i] in (2.10). This estimator t3 is consistent, but its evaluation is not easy. Olkin, Sobe! and Tong (1976) have given upper and lower bounds for the PCS that hold for any true configuration, with no regard to any least favorable configuration. They have also obtained the asymptotic distribution of/3 which is a function of ~'i = Y[k] - Y[~], i = 1 , . . . , k; however, the expression for the variance of the asymptotic distribution is complicated. Faltin and McCulloch (1983) have studied the small-sample performance of the Olkin-Sobel-Tong estimator P of the PCS in (2.10), analytically for k = 2 populations and via Monte Carlo simulation for k/> 2. They have found that the estimator tends to overestimate PCS (getting worse when k > 2) when the means are close together and tends to underestimate when v/-nt/tr is large.

Design of experiments with selection and ranking goals

565

Anderson et al. (1977) first gave a lower confidence bound for PCS in the case of the selection rule R1 of Bechhofer (1954) defined in (2.1). Faltin (1980) provided, in the case of k = 2 treatments, a quantile unbiased estimator of PCS which can be regarded as a lower confidence bound for PCS. Later Kim (1986) obtained a lower confidence bound on PCS which is sharper than that of Anderson et al. (1977) and reduces to that of Faltin (1980) in the special case of k = 2 treatments. Recently, Gupta et al. (1994), using a new approach, derived a confidence region for the differences #[k-i+l] - #[k-~], i = 1 , . . . , k - 1, and then obtained a lower bound for PCS which is sharper than that of Kim (1986). They also derived some practical lower bounds by reducing the dimensionality of ~ = (51,..., 3k-1), where 5i = #[k-i+l] -- #[k-i], i = 1 , . . . , k - 1. The lower bound improves as this free-to-choose dimensionally q (1 ~< q ~< k - 1) increases and the result for q -- 1 coincides with that of Kim (1986). Gupta and Liang (1991) obtained a lower bound for PCS by deriving simultaneous lower confidence bounds on/z[k] - #[q, i = 1 , . . . , k - 1, where a range statistic was used. Of the two methods of Gupta and Liang (1991) and Gupta et al. (1994), one does not dominate the other in the sense of providing larger PCS values. Generally speaking, for moderate k (say, k >~ 5), the Gupta-Liang method tends to underestimate PCS. Finally, Gupta and Liang (1991) obtained a lower bound for PCS also in the case of the two-stage procedure of Bechhofer et al. (1954) for selecting the treatment with the largest mean when the common variance cra is unknown.

2.6. Notes and comments

For the rule R1 of Bechhofer (1954) defined in (2.1), Fabian (1962) has shown that a stronger assertion can be made without decreasing the infimum of the PCS. We define a treatment Hi to be good if #i ~> #[k] - 5* and modify the goal to be selection of a good treatment. Now a CS occurs if the selected treatment is a good treatment. Then the procedure R1 guarantees with a minimum probability P* that the selected treatment is good no matter what the configuration of the #~ is. We have not discussed sequential procedures for selecting the best treatment. There is a vast literature available in this regard. Reference can be made to Gupta and Panchapakesan (1991) besides the books mentioned in Section 1. For the problem of estimating the PCS, Olkin et al. (1976), Kim (1986), Gupta and Liang (1991), and Gupta et al. (1994) have obtained their general results for location parameters with special discussion of the normal means case. Gupta et al. (1990) have discussed the case of truncated location parameter models. Finally, robustness of selection procedures is an important aspect. This has been examined in the past by a few authors and there is a renewed interest in recent years. A survey of these studies is provided by Panchapakesan (1995a).

3. Selection in experiments with blocking: Normal theory There may not always be sufficient quantities of homogeneous experimental material available for an experiment using a completely randomized design. However, it may be

S. S. Gupta and S. Panchapakesan

566

possible to group experimental units into blocks of homogeneous material. Then one can employ a traditional blocking design which minimizes possible bias and reduces the error variance.

3.1. Indifference-zone approach Assume that there are sufficient experimental units so that each treatment can be used at least once in each block. Consider the randomized complete block design with fixed treatment effects, namely,

Yije = # + Ti +/~j + eije,

(3.1)

where Y/je is the gth observation (1 ~< g ~< n) on treatment i (1 ~< i ~< k) in block j (1 ~< j ~< b). Here # is the over-all mean, the ~-i are the treatment effects, the/~j are the block effects, and the errors sijk are assumed to be iid N(0, a2). It is assumed without loss of generality that k

b

E i:E i=1 j=l There is no interaction between blocks and treatments. Let ~-[1] ~< "'" ~< ~-[k] denote the ordered ~-i. The goal is to select the best treatment, namely, the one associated B with r[k]. Let us assume that ~r2 is known. Then the procedure R1 of Bechhofer defined in (2.1) for the completely randomized design can easily be adapted here. We take n independent observations Y~je (1 ~< g ~< n) on treatment Hi (1___~

(3.5)

where ~ = Yi. - Y.. (1 ~ P*. This constant d = ~ H , where H is the solution of (2.3). When o-2 is unknown, we use the procedure R8 which is R7 with o- replaced by S~,, where S 2 is given by k

b

(3.6) i=1 j = l

based on u = (k - 1)(b - 1) degrees of freedom. To keep the distinction between R7 and Rs, .we denote the constant needed by d ~ instead of d. The values of d ~ (as mentioned in the case of R5 of Section 2.2) have been tabulated for selected values of k, b, and P* by Gupta and Sobel (1957). They can also be obtained from the tables of Gupta et al. (1985) corresponding to correlation coefficient p = 0.5. Gupta and Hsu (1980) have applied the procedure R8 and its usual analogue for selecting the treatment associated with T[1] to a data set relating to motor vehicle traffic fatality rates (MFR) for the forty-eight contiguous states and the District of Columbia for the years 1960 to 1976. Their goal is to select a subset of best (worst) states in terms of MFR. As in the case of indifference-zone approach, the basic procedures R4 and R5 of Gupta (1956) discussed in Section 2.2 can be adapted for other designs such as the BIBD and Latin Square. Driessen (1992) and Dourleijn (1993) have discussed in detail subset selection in experiments involving connected designs.

3.3. Simultaneous inference with respect to the best In the model (3.4), let us assume that the error variance o-z is unknown. Hsu (1982) gave a procedure for selecting a subset C of the k treatments that includes the treatment associated with ~-[k] and at the same time providing simultaneous upper confidence

568

S. S. G u p t a a n d S. P a n c h a p a k e s a n

bounds D1,. • •, Dk for ~-[kl - 7"1,..., ~-[k] - Tk. His procedure is based on the sample treatment means Yi. and S~ given in (3.6). The procedure R9 of nsu (1982) defines

C= {Hi: Yi. ~>maxYj.-(dS~/v~))

(3.7)

and Di = max ~f m a x Y j t

j~i

"

"

~

i=1,.

""

,k,

(3.8)

where the constant d -- (k, b, P*) is to be chosen so that er{/-/(k)eC and 0[k] - 0[q ~< Di for i = I , . . . ,k} = P* and H(k ) is the treatment associated with T[k]. The constant d = d(k, b, P*) turns out to be the constant d' of the procedure R5 and it can be obtained for selected values of k, b, and P* from the tables of Gupta and Sobel (1957) and Gupta et al. (1985). For further detailed treatment of multiple comparisons with and selection of the best treatment, the reader is referred to Driessen (1991, 1992).

3.4. Notes and comments

Rasch (1978) has discussed selection problems in balanced designs. Wu and Cheung (1994) have considered subset selection for normal means in a two-way design. Given b groups, each containing the same k treatments, their goal is to select a non-empty subset from each group so that the probability of simultaneous correct selection is at least P*. Dourleijn and Driessen (1993) have discussed four different subset selection procedures for randomized designs with illustrated applications to a plant breeding variety trial. Gupta and Leu (1987) have investigated an asymptotic distribution-free subset selection procedure for the two-way model (3.4) with the assumption that the §i = (c~1,..., c~e) are iid with cdf F(e) symmetric in its arguments. Their procedure is based on the Hodges-Lehmann estimators of location parameters. For a Bayesian treatment of ranking and selection problems in two-way models, reference should be made to Fong (1990) and Fong and Berger (1993). Hsu (1982) has discussed simultaneous inference with respect to the best treatment in block designs in more generality than what was described previously. He assumes that the ~ij are iid with an absolutely continuous cdf F with some regularity conditions. Besides the procedure R9 based on sample means discussed previously, he considered two procedures based on signed ranks. Finally, Hsu and Nelson (1993) have surveyed, unified and extended multiple comparisons for the General Linear Model.

Design of experiments with selection and ranking goals

569

4. Selection in factorial experiments: Normal theory Factorial experimentation when employed in ranking and selection problems can produce considerable savings in total sample size relative to independent single-factor experimentation when the probability requirements are comparable in both cases. This was in fact pointed out by Bechhofer (1954) who proposed a single-stage procedure for ranking normal means when no interaction is present between factor-level effects and common known variance is assumed. In this section, we will be mainly concerned with the two-factor model. Consider a two-factor experiment involving factors A and B at a and b levels, respectively. The treatment means are #ij (1 ~ i ~ a, 1 ~< j ~ b) are defined by (4.1)

~ij : tz -1- O~i -t- ~ j -~ (C~fl)ij,

where # is the over-all mean, the ai (1 ~ i ~< a) are the so-called row factor main effects, the/3j (1 ~< j ~< b) are the column factor main effects, the (afl)ij are the two-way interactions subject to the conditions b

= i=1

b

2zJ =o, j=l

Z(

z),j = o

for all i,

j=l

and a

)-~(afl)ij = 0

for all j.

i=l

The factors A and B are said to be additive if (afl)ij = 0 for all i and j, and to interact otherwise. In this section, we discuss selection problems under the indifference-zone as well as subset selection approach both when the factors A and B are additive and interacting. Deciding whether or not the additive model holds is an important problem to be handled with caution. We will be content with just referring to Fabian (1991).

4.1. Indifference-zone approach We will first assume an additive model. Independent random samples Yijm, m = 1 , 2 , . . . , are taken from normal treatments Hij (1 ~~ 2 and b >~ 2. The goal is to select the treatment combination associated with a[,q and fl[b]; in other words, we seek simultaneously the best levels of both factors. The probability requirement is: P{CS I #} ~> P*

whenever _p E ~a,~,

w h e r e ~?~,~ = {# I a[~] - O~[a_l] ~ ( ~ , fl[b] -- fl[b-1] ~ ( ~ } .

(4.2)

570

S. S. Gupta and S. Panchapakesan

For the common known 0 -2 case, Bechhofer (1954) proposed a single-stage procedure based on n independent observations from each Hij. Let Yi.. and Y.j. denote the means of the observations corresponding to the levels i and j of the factors A and /3, respectively. Then the procedure of Bechhofer (1954) is R~0: Select the treatment combination of levels associated with the largest Y~.. and the largest

Y.j.

(4.3)

The LFC for this procedure is given by OZ[11 .....

/~[11

.....

~[a--l]

=

O/[a] -- 5*;

~[b-11 =/~[b]

--

5~.

(4.4)

The PCS for the rule Rio at the LFC can be written as a product of the PCS's at the LFC's when the rule R1 in (2.1) is applied marginally to each factor. This fact enables one to determine the smallest n to guarantee the minimum PCS. Bechhofer et al. (1993) have studied the performances of the single-stage procedure Rio of Bechhoffer (1954) described previously and two other sequential procedures (not discussed here). One of these is a truncated sequential procedure of Bechhofer and Goldsman (1988b) and the other is a closed sequential procedure with elimination by Hartmann (1993). The procedure R10 can be easily generalized to the case of r factors with levels km,..., kr. One is naturally interested in examining the efficiency of an r-factor experiment relative to that of r independent single-factor experiments in the absence of interaction when both guarantee the same minimum PCS P*. Let ny and nr denote the total numbers of observations required for the r-factor experiment and r single-factor experiments, respectively. Then Bawa (1972) showed that the asymptotic relative efficiency (ARE) of the r-factor experiment, defined by ARE = limp.~l(ny/nr), is given by

ARE = maxl P*

whenever 7[ab] ~> A* and ~[ab]

-

-

"/[ab-11 )

6",

(4,7)

where the event [CS] occurs if and only if the treatment combination corresponding to "/[abl is selected, and the constants ~*, A* and P* satisfy ~* > 0,

( a - 1 ) ( b - 1) A* ( a - i-)(-~-- ]-) ~ 1 ~*
max

Y~..- CAS~/v~n

l #[k] + 3~B

(5.1)

and Pr{H(k) is selected} ) PI* whenever #[kl /> #0 + (i~ and

(5.2)

/zN /> #[k-l] + ~ ,

where J~, ~t, ~ , Po*, and -PI* are constants with 0 < {3~', ~ } < c ~ , - ~ t < ~ < oo, 2-k < P0* < 1, (1 - 2-k)/k < /91" < 1, and H(k ) denotes the treatment associated with #N" We assume that the treatments have a common known variance tr2. Let #o be the specified standard. In this case, Bechhofer and Turnbull (1978) proposed a singlestage procedure based on Yi, i = 1 , . . . , k, the means of sample of size n from each treatment. Their procedure is m

RIs: Choose H0 if

Y[k]
/2 for 1 ~< i ~< k. In this case, the constant d is given by

(5.8)

fo °~ qSk(yd)q~,(y) dy = P*

where q~(y) is the density of Y = S~,/cr. The d-values satisfying (5.8) can be obtained from the tables of Dunnett (1955) for selected values of k, u, and P*. It should be noted that d is the one-sided upper-(1 - P * ) equicoordinate point of the equicorrelated (k - 1)-variate central t-distribution with the equal correlation p = 0 and u degrees of freedom. Now, let #0 be the known mean of the control treatment. We assume that all treatments have a common variance a 2 . Let Yi, i = 0, 1 , . . . , n, be the means of random samples of size no from the control treatment and of size n from each of the experimental treatments. When ~r2 is unknown, the Gupta-Sobel procedure is R17: Include

Hi in the selected subset if and only if

Yi>Y0-&rV~+ where the smallest d > 0 f o r w h i c h solution of Pr{Zl ~ d , . . . , Z k

1

no

(5.9)

the m i n i m u m P C S is guaranteed to be P* is the

~d}=P*,

(5.10)

and the Zi are equicorrelated standard normal variables with equal correlation p = n/(n + no). The d-values are tabulated by Gupta et al. (1973) for selected values of

s. s. Guptaand S. Panchapakesan

576

k, P* and p. When n = n0, then d = H, given by (2.3) with k - 1 replaced by k, and thus can be obtained from the tables mentioned in that case. When 0-2 is unknown, we use rule Rip with 0- replaced by s',, where s~, 2 is the usual pooled estimator of a2 based on u = k(n - 1) + (no - 1) degrees of freedom. In this case, d is the one-sided upper-(1 - P*) equicoordinate point of the equicorrelated (k)-variate central t-distribution with the equal correlation p = n/(n + no) and u degrees of freedom. Values of d are tabulated by Gupta et al. (1985) and Bechhofer and Dunnett (1988) for selected values of k, P*, u and p.

5.3. Simultaneous confidence intervals In some applications, the experimenter may be interested in the differences between the experimental treatments and #0, which is either a specified standard or the unknown mean of a control treatment. Let us first consider the case of comparisons with a standard #0. Assume that the treatments have a common known variance 0-2. Let Yi (i = 1 , . . . , k) denote the mean of a random sample of size ni from Hi. Define ±i = ( g i

-

- e0-/v~P*.

(5.12)

When 0-2 is unknown, let I ' be the interval obtained by replacing 0- in (5.11) with S',, where S 2 is the pooled estimator of 0-2 with u = ~ i ~ l ( n i - 1) degrees of freedom. In this case, the joint confidence statement (5.12) holds by taking d as the two-sided upper-(1 - P*) equicoordinate point of the equicorrelated k-variate central t-distribution with the equal correlation p = 0 and u degrees of freedom. When #0 is the unknown mean of a control treatment H0, let Y0 be the mean of a random sample from H0. When the common variance 0-2 is unknown, Dunnett (1955) obtained one-sided and two-sided confidence intervals for #i - #0, i = 1 , . . . , k, with joint confidence coefficient P*. The lower joint confidence limits are given by

Y i - Yo - diS,, ~

+ 1,no i = 1,..., k,

where s',2 is the pooled estimator 0-2 based on u = ~ 0 ( n i and the constants di are chosen such that Pr{tl < d l , . . . , t k

< dk} = P*

(5.13) - 1) degrees of freedom

(5.14)

where the joint distribution of the ti is the multivariate t. If nl . . . . . nk = n, then the t~ are equicorrelated with correlation p = n/(n + no). In this case, dl . . . . .

Design of experiments with selection and ranking goals

577

dk = d. For selected values of k , P * , u and p, the value of d can be obtained from the tables of Gupta et al. (1985). Dunnett (1955) has tabulated d-values in the case of no = nl . . . . . nk (i.e., p = 0.5). Similar to (5.14), we can write upper confidence limits and two-sided limits. In the equal sample sizes case, Dunnett (1964) has tabulated the constant needed. When the ni are unequal, there arises a problem of optimal allocation of observations between the control and the experimental treatments. The optimality is in the sense of maximizing the confidence coefficient for fixed N = ~ i = 0 ni. This problem has been studied by several authors. For a detailed discussion, see Gupta and Panchapakesan (1979, Chapter 20, Section 10).

5.4. Notes and comments

Chen and Hsu (1992) proposed a two-stage procedure which involves selecting in the first stage the best treatment provided it is better than a control and testing a hypothesis in the second stage between the best treatment selected (if any) at the first stage and the control. There are studies in which several treatments and a control are administered to the same individuals (experimental units) at different times. The observations collected from the same unit under these treatments are no longer independent. This type of design is called repeated measurements design. Chen (1984) has considered selecting treatments better than a control under such a design. Bechhofer et al. (1989) have studied two-stage procedures for comparing treatment with a control. In the first stage, they employ the subset selection procedure of Gupta and Sobel (1958) to eliminate "inferior" treatments. In the second stage, joint confidence statement is made for the treatment versus control differences (for those treatments retained after the first stage) using Dunnett's (1955) procedure. Bechhofer and Tamhane (1981) developed a theory of optimal incomplete block designs for comparing several treatments with a control. They proposed a general class of designs that are balanced with respect to test treatments (BTIB). Bofinger and Lewis (1992) considered simultaneous confidence intervals for normal treatments versus control differences allowing unknown and unequal treatment variances. Gupta and Kim (1980) and Gupta and Hsiao (1983) have studied subset selection with respect to a standard or control using decision-theoretic and Bayesian formulations. Hoover (1991) generalized the procedure of Dunnett (1955) to comparisons with respect to two controls. In our discussion of selecting treatments that are better than a standard or control, we assumed that there was no information about the ordering of the treatment means /zi. In some situations, we may have partial prior information in the form of a simple or partial order relationship among the unknown means #i of the experimental treatments. For example, in experiments involving different dose levels of a drug, the treatment effects will have a known ordering. In other words, we know that #l ~< #2 ~< • "" ~< #k even though the #i are unknown. For the goal of selecting all populations for which #i >/ Izo, we would expect any reasonable procedure R to have the property: If R selects Hi, then it selects all treatments H j with j > i. This is the isotonic behavior

578

S. S. Gupta and S. Panchapakesan

of R. Naturally, such a procedure will be based on the isotonic estimators of the/zi. Such procedures have been investigated by Gupta and Yang (1984) in the case of normal treatment means allowing the common variance cr2 to be known or unknown.

6. Selection in experiments involving other models Thus far we discussed selection procedures and simultaneous confidence intervals under the assumption that the treatment responses are normally distributed. In this section, we briefly mention some other models for which these problems have been investigated.

6.1. Single-factor Bernoulli models The Bernoulli distribution serves as an appropriate model in experiments involving manufacturing processes and clinical trials. In these experiments, response variables are qualitative giving rise to dichotomous data such as defective-nondefective or success-failure. Thus we are interested in comparing Bernoulli populations in terms of their success probabilities. The initial and basic contributions to this problem were made by Sobel and Huyett (1957) under the indifference-zone formulation and Gupta and Sobel (1960). There are many interesting aspects of the Bernoulli selection problems. For specification of the preference-zone one can use different measures for the separation between the best and the next best population, namely, P[k] - P[k-1],P[k]/P[k-1] and [P[k](1 - p[k-1])]/[(1 - P[k])P[k-1]]. The last measure is the odds ratio used in biomedical studies. Besides the usual fixed sample size procedures and purely sequential procedures, the literature includes inverse sampling procedures and so-called Play-the-Winner sampling rules. For a detailed review of these procedures, reference may be made to Gupta and Panchapakesan (1979, 1985).

6.2. Multinomial models The multinomial distribution, as a prototype for many practical problems, is a very useful model. When observations from a population are classified into a certain number of categories, it is natural to look for categories that occur very often or rarely. Consider a multinomial distribution on m cells with probabilities P l , . . . , P m . Selecting the most and the least probable cells are two common goals. The early investigations of Bechhofer et al. (1959), Gupta and Nagel (1967), and Cacoullos and Sobel (1966) set the pace for a considerable number of papers that followed. The investigations of multinomial selection problems reveal an interesting picture regarding the structure of the LFC which, it turns out, is not similar for the two common goals mentioned previously and also depends on whether a ratio or a difference is used to define the preference-zone. For further discussion and additional references, see Gupta and Panchapakesan (1993).

Design of experiments with selection and ranking goals

579

Although selecting the best cell from a single multinomial population has been investigated over a period of close to forty years, selecting the best of several multinomial populations has not received enough attention until recently except for the paper by Gupta and Wong (1977). For ranking multinomial populations, we need a measure of diversity within a population. Selection procedures have been studied in terms of diversity measures such as Shannon's entropy and the Gini-Simpson index. An account of these procedures is given in Gupta and Panchapakesan (1993).

6.3. Reliability models

In experiments involving life-length distributions, many specific distributions such as the exponential, Weibull and gamma have been used to characterize the life-length. Panchapakesan (1995b) provides a review of selection procedures for the one- and two-parameter exponential distributions. How the life-length distribution is described as a member of a family characterized in terms of failure rate properties. The IFR (increasing failure rate) and IFRA (increasing failure rate on the average) families are well-known examples of such families. Selection procedures for distributions belonging to such families have been investigated substantially by several authors. A review of these investigations is provided by Gupta and Panchapakesan (1988).

7. Concluding remarks As we have pointed out in Section 1, our review of design of experiments with selection and ranking goals covers mainly basic normal theory for single-factor experiments with and without blocking and 2-factorial experiments with and without interaction. We have referred to a few authors who have studied the problem using a Bayesian approach. There have also been a number of investigations under an empirical Bayes approach. Some useful additional references in this connection are: Berger and Deely (1988), Fong (1992), Gupta and Liang (1987) and Gupta et al. (1994). In Section 2.4, we referred to a computer SAS package of Aubuchon et al. (1986) for implementing simultaneous confidence intervals for the difference between each treatment mean and the best of the other treatment means. This package can also be used for selecting the best treatment using the indifference-zone and the subset selection approaches. There are a few other statistical packages such as CADEMO and MINITAB which contain modules for selection procedures. A commercially distributed package exclusively devoted to selection procedures is RANKSEL; see Edwards (1985, 1986) for details. There are also programs developed by several researchers in the course of their investigations. Rasch (1995) has given a summary of available software for selection procedures with specific description of each. Several FORTRAN programs needed for investigation and implementation of selection procedures are given in the recent book by Bechhofer et al. (1995). In the foregoing sections, we have discussed, as alternatives to tests of hypotheses among treatment means, three types of formulations: indifference-zone approach,

580

S. S. Gupta and S. Panchapakesan

subset selection approach, and multiple comparisons approach. We have mainly cons i d e r e d s i n g l e - s t a g e fixed s a m p l e size p r o c e d u r e s . In s o m e cases w e h a v e d e s c r i b e d t w o - s t a g e p r o c e d u r e s . S e q u e n t i a l p r o c e d u r e s h a v e o n l y b e e n r e f e r r e d to. I n all t h e s e cases, w e h a v e n o t d e s c r i b e d e v e r y a v a i l a b l e p r o c e d u r e for a g i v e n goal. A s s u c h w e h a v e n o t g o n e into efficiency c o m p a r i s o n s o f c o m p e t i n g p r o c e d u r e s . H o w e v e r , b r i e f c o m m e n t s h a v e b e e n m a d e in c e r t a i n cases w h e r e a p r o c e d u r e was i m p r o v e d u p o n or b e s t e d b y a n o t h e r at a later date. It s h o u l d h o w e v e r b e e m p h a s i z e d that t h e p r o c e d u r e s w e h a v e d e s c r i b e d are v i a b l e as yet.

Acknowledgement T h i s r e s e a r c h w a s s u p p o r t e d in p a r t b y U S A r m y R e s e a r c h Office G r a n t D A A H 0 4 95-t-0165.

References Alam, K. (1970). A two-sample procedure for selecting the populations with the largest mean from k normal populations. Ann. Inst. Statist. Math. 22, 127-136. Anderson, P. O., T. A. Bishop and E. J. Dudewicz (1977). Indifference-zone ranking and selection: Confidence intervals for true achieved P ( C D ) . Comm. Statist. Theory Methods 6, 1121-1132. Aubuchon, J. C., S. S. Gupta and J. C. Hsu (1986). PROC RSMCB: A procedure for ranking, selection, and multiple comparisons with the best. Proc. SAS Users Group Internat. Conf. 11, 761-765. Bawa, V. S. (1972). Asymptotic efficiency of one _R-factorexperiment relative to R one-factor experiments for selecting the best normal population. J. Amer Statist. Assoc. 67, 660q561. Bechhofer, R. E. (1954). A single-sample multiple decision procedure for ranking means of normal populations with known variances. Ann. Math. Statist. 25, 16-39. Bechhofer, R. E. (1977). Selection in factorial experiments. In: H. J. Highland, R. G. Sargent and J. W. Schmidt, eds., Proc. 1977 Winter Simulation Conf. National Bureau of Standards, Gaitherburg, MD, 65-77. Bechhofer, R. E. and C. W. Dunnett (1986). Two-stage selection of the best factor-level combination in multi-factor experiments: Common known variance. In: C. E. McCulloch, S. J. Schwager, G. Casella and S. R. Searle, eds., Statistical Design: Theory and Practice. Biometrics Unit, Comell University, Ithaca, NY, 3-16. Bechhofer, R. E. and C. W. Dunnett (1987). Subset selection for normal means in multifactor experiments. Comm. Statist. Theory Methods 16, 2277-2286. Bechhofer, R. E. and C. W. Dunnett (1988). Percentage points of multivariate Student t distributions. In: Selected Tables in Mathematical Statistics, Vol. 11. Amer. Mathematical Soc., Providence, RI. Bechhofer, R. E., C. W. Dunnett and M. Sobel (1954). A two-sample multiple-decision procedure for ranking means of normal populations with a common known variance. Biometrika 41, 170-176. Bechhofer, R. E., C. W. Dunnett and A. C. Tamhane (1989). Two-stage procedures for comparing treatments with a control: Elimination at the first stage and estimation at the second stage. Biometr. J. 5, 545-561. Bechhofer, R. E., S. Elmaghraby and N. Morse (1959). A single-sample multiple decision procedure for selecting the multinomial event which has the highest probability. Ann. Math. Statist. 30, 102-119. Bechhofer, R. E. and D. M. Goldsman (1987). Truncation of the Bechhofer-Kiefer-Sobel sequential procedure for selecting the normal population which has the largest mean. Comm. Statist. Simulation Comput. 16, 1067-1092. Bechhofer, R. E. and D. M. Goldsman (1988a). Sequential selection procedures for multi-factor experiments involving Koopman-Dormois populations with additivity. In: S. S. Gupta and J. O. Berger, eds., Statistical Decision Theory and Related Topics IV, Vol. 2. Springer, New York, 3-21.

Design of experiments with selection and ranking goals

581

Bechhofer, R. E. and D. M. Goldsman (1988b). Truncation of the Bechhofer-Kiefer-Sobel sequential procedure for selecting the normal population which has the largest mean (II): 2-factor experiments with no interaction. Comm. Statist. Simulation Comput. 17, 103-128. Bechhofer, R. E. and D. M. Goldsman (1989). A comparison of the performances of procedures for selecting the normal population having the largest mean when the variances are known and equal. In: L. J. Gteser et al., eds., Contributions to Probability and Statistics: Essays in Honor of lngram OIkin. Springer, New York, 303-317. Bechhofer, R. E., D. M. Goldsman and M. Hartmann (1993). Performances of selection procedures for 2-factor additive normal populations with common known variance. In: E M. Hoppe, ed., Multiple Comparisons, Selection, and Applications in Biometry. Dekker, New York, 209-224. Bechhofer, R. E., J. Keifer and M. Sobel (1986). Sequential Identification and Ranking Procedures (with Special Reference to Koopman-Dormois Populations). Univ. of Chicago Press, Chicago, IL. Bechhofer, R. E., T. J. Santner and D. M. Goldsman (1995). Design and Analysis of Experiments .for Statistical Selection, Screening and Multiple Comparisons. Wiley, New York. Bechhofer, R. E., T. J. Santner and B. W. Turnbull (1977). Selecting the largest interaction in a two-factor experiment. In: S. S. Gupta and D. S. Moore, eds., Statistical Decision Theory and Related Topics H. Academic Press, New York, 1-18. Bechhofer, R. E. and A. C. Tamhane (1981). Incomplete block designs for comparing treatments with a control: General theory. Technometrics 23, 45-57. Bechhofer, R. E. and B. W. Turnbull (1978). Two (k + 1)-decision selection procedure for comparing k normal means with a specified standard. J. Amer. Statist. Assoc. 73, 385-392. Berger, J. and J. J. Deely (1988). A Bayesian approach to ranking and selection of related means with alternatives to analysis-of-variance methodology. J. Amer. Statist. Assoc. 83, 364-373. Bhandari, S. K. and A. R. Chandhuri (1990). On two conjectures about two-stage selection problems. Sankhy~ Ser. A. 52, 131-141. Bofinger, E. and G. J. Lewis (1992). Simultaneous comparisons with a control and with the best: Two stage procedures (with discussion). In: E. Bofinger et al., eds., The Frontiers of Modern Statistical Inference Procedures. American Sciences Press, Columbus, OH, 25-45. Borowiak, D. S. and J. P. De Los Reyes (1992). Selection of the best in 2 x 2 factorial designs. Comm. Statist. Theory Methods 21, 2493-2500. Bttringer, H., H. Martin and K.-H. Schriever (1980). Nonparametric Sequential Selection Procedures. Birkhauser, Boston, MA. Cacoullos, T. and M. Sobel (1966). An inverse-sampling procedure for selecting the most probable event in multinomial distribution. In: P. R. Krishnaiah, ed., Multivariate Analysis. Academic Press, New York, 423-455. Chen, H. J. (1984). Selecting a group of treatments better than a standard in repeated measurements design. Statist. Decisions 2, 63-74. Chen, H. J., E. J. Dudewicz and Y. J. Lee (1976). Subset selection procedures for normal means under unequal sample sizes. Sankhy~ Ser. B 38, 249-255. Chen, P. and L. Hsu (1992). A two-stage design for comparing clinical trials. Biometr. J. 34, 29-35. Chen, P. and M. Sobel (1987a). An integrated formulation for selecting the t best of k normal populations. Comm. Statist. Theory Methods 16, 121-146. Chen, P. and M. Sobel (1987b). A new formulation for the multinomial selection problem. Comm. Statist. Theory Methods 16, 147-180. Cohen, A. and H. B. Sackrowitz (1982). Estimating the mean of the selected population. In: S. S. Gupta and J. O. Berger, eds., Statistical Decision Theory and Related Topics III, Vol. 1. Academic Press, New York, 243-2701 Cohen, A. and H. B. Sackrowitz (1988). A decision theory formulation for population selection followed by estimating the mean of the selected population. In: S. S. Gupta and J. O. Berger, eds., Statistical Decision Theory and Related Topics/V, Vol. 2. Springer, New York, 33-36. Cohen, D. D. (1959). A two-sample decision procedure for ranking means of normal populations with a common known variance. M.S. Thesis. Dept. of Operations Research, Cornell University, Ithaca, NY. Dahiya, R. C. (1974). Estimation of the mean of the selected population. J. Amer. Statist. Assoc. 69, 226-230.

582

S. S. Gupta and S. Panchapakesan

Doudeijn, C. J. (1993). On statistical selection in plant breeding. Ph.D. Dissertation. Agricultural University, Wageningen, The Netherlands. Dourleijn, C. J. and S. G. A. J. Driessen (1993). Subset selection procedures for randomized designs. Biometr. J. 35, 267-282. Driessen, S. G. A. J. (1991). Multiple comparisons with and selection of the best treatment in (incomplete) block designs. Comm. Statist. Theory Methods 20, 179-217. Driessen, S. G. A. J. (1992). Statistical selection: Multiple comparison approach. Ph.D. Dissertation. Eindhovan University of Technology, Eindhoven, The Netherlands. Dudewicz, E. J. and S. R. Dalai (1975). Allocation of observations in ranking and selection with unequal variances. Sankhy8 Ser. B 37, 28-78. Dudewicz, E. J. and J. O. Koo (1982). The Complete Categorized Guide to Statistical Selection and Ranking Procedures. Series in Mathematical and Management Sciences, Vol. 6. American Sciences Press, Columbus, OH. Dudewicz, E. J. and B. K. Taneja (1982). Ranking and selection in designed experiments: Complete factorial experiments. J. Japan Statist. Soc. 12, 51-62. Dunnett, C. W. (1955). A multiple comparison procedure for comparing several treatments with a control. J. Amer. Statist. Assoc. 50, 1096-1121. Dunnett, C. W. (1964). New tables for multiple comparisons with a control. Biometrics 20, 482-491. Eaton, M. L. (1967). Some optimum properties of ranking procedures. Ann. Math. Statist. 38, 124-137. Edwards, H. P. (1985). RANKSEL - An interactive computer package of ranking and selection procedures (with discussion). In: E. J. Dudewicz, ed., The Frontiers of Modern Statistical Inference Procedures. American Sciences Press, Columbus, OH, 169-184. Edwards, H. P. (1986). The ranking and selection computer package RANKSEL. Amer. J. Math. Management Sci. 6, 143-167. Fabian, V. (1962). On multiple decision methods for ranking population means. Ann. Math. Statist. 33, 248-254. Fabian, V. (1974). Note on Anderson's sequential procedures with triangular boundary. Ann. Statist. 2, 170-176. Fabian, V. (1991). On the problem of interactions in the analysis of variance (with discussion). J. Amen Statist. Assoc. 86, 362-375. Faltin, E W. (1980). A quantile unbiased estimator of the probability of correct selection achieved by Bechhofer's single-stage procedure for the two population normal means problem. Abstract. 1MS Bull. 9, 180-181. Faltin, F. W. and C. E. McCulloch (1983). On the small-sample properties of the Olkin-Sobel-Tong estimator of the probability of correct selection. J. Amer. Statist. Assoc. 78, 464-467. Federer, W. T. and C. E. McCulloch (1984). Multiple comparison procedures for some split plot and split block designs. In: T. J. Santner and A. C. Tamhane, eds., Design of Experiments: Ranking and Selection. Dekker, New York, 7-22. Federer, W. T. and C. E. McCulloch (1993). Multiple comparisons in split block and split-split plot designs. In: F. M. Hoppe, ed., Multiple Comparisons, Selection, and Applications in Biometry. Dekker, New York, 47-62. Fong, D. K. H. (1990). Ranking and estimation of related means in two-way models - A Bayesian approach. J. Statist. Comput. Simulation 34, 107-117. Fong, D. K. H. (1992). Ranking and estimation of related means in the presence of a covariance - A Bayesian approach. J. Amer. Statist. Assoc. 87, 1128-1135. Fong, D. K. H. and J. O. Berger (1993). Ranking, estimation and hypothesis testing in unbalanced two-way additive models - A Bayesian approach. Statist. Decisions 11, 1-24. Gibbons, J. D., I. Olkin and M. Sobel (1977). Selecting and Ordering Populations: A New Statistical Methodology. Wiley, New York. Gupta, S. S. (1956). On a decision rule for a problem in ranking means. Mimeo, Ser. 150, Institute of Statistics, University of North Carolina, Chapel Hill, NC. Gupta, S. S. (1963). Probability integrals of the multivariate normal and multivariate t. Ann. Math. Statist. 34, 792-828. Gupta, S. S. (1965). On some multiple decision (selection and ranking) rules. Technometrics 7, 225-245.

Design of experiments with selection and ranking goals

583

Gnpta, S. S. and E Hsiao (1981). On gamma-minimax, minimax, and Bayes procedures for selecting populations close to a control. Sankhy~ Ser. B 43, 291-318. Gupta, S. S. and J. C. Hsu (1980). Subset selection procedures with application to motor vehicle fatality data in a two-way layout. Technometries 22, 543-546. Gupta, S. S. and J. C. Hsu (1984). A computer package for ranking, selection, and multiple comparisons with the best. In: S. Sheppard, U. Pooch and C. D. Pegden, eds., Proc. 1984 Winter Simulation Con.[. Institute of Electrical and Electronics Engineers, Piscataway, NJ, 251-257. Gupta, S. S. and J. C. Hsn (1985). RS-MCB: Ranking, selection, and multiple comparisons with the best. Amer. Statist. 39, 313-314. Gupta, S. S. and D.-Y. Huang (1976). On subset selection procedures for the means and variances of normal populations: Unequal sample sizes case. Sankhy~ Ser. A 38, 153-173. Gupta, S. S. and D.-Y. Huang (1981). Multiple Decision Theory: Recent Developments, Lecture Notes in Statistics, Vol. 6. Springer, New York. Gupta, S. S. and W.-C. Kim (1980). Gamma-minimax and minimax decision rules for comparison of treatments with a control. In: K. Matusita, ed., Recent Developments in Statistical Inference and Data Analysis. North-Holland, Amsterdam, 55-71. Gupta, S. S. and W.-C. Kim (1984). A two-stage elimination type procedure for selecting the largest of several normal means with a common unknown variance. In: T. J. Santner and A. C. Tamhane, eds., Design of Experiments: Ranking and Selection. Dekker, New York, 77-93. Gupta, S. S. and L.-Y. Leu (1987). An asymptotic distribution-free selection procedure for a two-way layout problem. Comm. Statist. Theory Methods 16, 2313-2325. Gupta, S. S., L.-Y. Leu and T. Liang (1990). On lower confidence bounds for PCS in truncated location parameter models. Comm. Statist. Theory Methods 19, 527-546. Gupta, S. S. and T. Liang (1987). On some Bayes and empirical Bayes selection procedures. In: R. Viertl, ed., Probability and Bayesian Statistics. Plenum Press, New York, 233-246. Gupta, S. S. and "12 Liang (1991). On a lower confidence bound for the probability of a correct selection: Analytical and simulation studies. In: Aydin t3ztiirk and E. C. van der Meulen, eds., The Frontiers of Statistical Scientific Theory and Industrial Applications. American Sciences Press, Columbus, OH, 77-95. Gupta, S. S., T. Liang and R.-B. Ran (1994). Bayes and empirical Bayes rules for selecting the best normal population compared with a control. Statist. Decisions 12, 125-147. Gupta, S. S., Y. Liao, C. Qiu and J. Wang (1994). A new technique for improved confidence bounds for the probability of correct selection. Statist. Sinica 4, 715-727. Gupta, S. S. and K. J. Miescke (1984). On two-stage Bayes selection procedures. Sankhy5 Ser. B 46, 123-134. Gupta, S. S. and K. J. Miescke (1990). On finding the largest mean and estimating the selected mean. Sankhy~ Ser. B 52, 144-157. Gupta, S. S. and K. Nagel (1967). On selection and ranking procedures and order statistics from the multinomial distribution. Sankhy~ Sen B 29, 1-34. Gupta, S. S., K. Nagel and S. Panchapakesan (1973). On the order statistics from equally correlated normal random variables. Biometrika 60, 403-413. Gupta, S. S. and S. Panchapakesan (1979). Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations. Wiley, New York. Gupta, S. S. and S. Panchapakesan (1985). Subset selection procedures: Review and assessment. Amer. J. Math. Management Sci. 5, 235-311. Gupta, S. S. and S. Panchapakesan (1988). Selection and ranking procedures in reliability models. In: P. R. Krishnalah and C. R. Rao, eds., Handbook of Statistics, Vol. 7. Quality Control and Reliability. North-Holland, Amsterdam, 131-156. Gupta, S. S. and S. Panchapakesan (1991). On sequential ranking and selection procedures. In: B. K. Ghosh and P. K. Sen, eds., Handbook of Sequential Analysis. Dekker, New York, 363-380. Gupta, S. S. and S. Panchapakesan (1993). Selection and screening procedures in multivariate analysis. In: C. R. Rao, ed., Multivariate Analysis: Future Directions, North-Holland Series in Statistics and Probability, Vol. 5. Elsevier, Amsterdam, 223-262.

584

S. S. Gupta and S. Panchapakesan

Gupta, S. S., S. Panchapakesan and J. K. Sohn (1985). On the distribution of the studentized maximum of equally correlated normal random variables. Comm. Statist. Simulation Comput. 14, 103-135. Gupta, S. S. and T. J. Santner (1973). On selection and ranking procedures - A restricted subset selection rule. In: Proc. 39th Session of the Internat. Statist. Institute 45, Book I, 478-486. Gupta, S. S. and M. Sobel (1957). On a statistic which arises in selection and ranking problems. Ann. Math. Statist. 28, 957-967. Gupta, S. S. and M. Sobel (1958). On selecting a subset which contains all populations better than a standard. Ann. Math. Statist. 29, 235-244. Gupta, S. S. and M. Sobel (1960). Selecting a subset containing the best of several binomial populations. In: I. Olkin et ai., eds., Contributions to Probability and Statistics. Stanford Univ. Press, Stanford, CA, 224-248. Gupta, S. S. and W.-Y. Wong (1977). Subset selection for finite schemes in information theory. In: I. Csis~ and P. Elias, eds., Colloquia Mathematica Societatis J(mos Bolyai, 16. Topics in Information Theory, 279-291. Gupta, S. S. and W.-Y. Wong (1982). Subset selection procedures for the means of normal populations with unequal variances: Unequal sample sizes case. SeL Statist. Canad. 6, 109-149. Gupta, S. S. and H.-M. Yang (1984). Isotonic procedures for selecting populations better than a control under ordering prior. In: J. K. Ghosh and J. Roy, eds., Statistics: Applications and New Directions. Indian Statistical Soc., Calcutta, 279-312. Hall, W. J. (1959). The most economical character of Bechhofer and Sobel decision rules. Ann. Math. Statist. 30, 964-969. Hartmann, M. (1988). An improvement on Paulson's sequential ranking procedure. Sequential Anal. 7, 363-372. Hartmann, M. (1993). Multi-factor extensions of Paulson's procedures for selecting the best normal population. In: E M. Hoppe, ed., Multiple Comparisons, Selection, and Applications in Biometry. Dekker, New York, 225-245. Hayter, A. J. (1994). On the selection probabilities of two-stage decision procedures. J. Statist. Plann. Inference 38, 223-236. Hochberg, Y. and R. Marcus (1981). Three stage elimination type procedures for selecting the best normal population when variances are unknown. Comm. Statist. Theory Methods 10, 597-612. Hochberg, Y. and A. C. Tamhane (1987). Multiple Comparison Procedures. Wiley, New York. Hoover, D. R. (1991). Simultaneous comparisons of multiple treatments to two (or more) controls. Biometr. J. 33, 913-921. Hsieh, H.-K. (1981). On estimating the mean of the selected population with unknown variance. Comm. Statist. Theory Methods 10, 1869-1878. Hsu, J. C. (1982). Simultaneous inference with respect to the best treatment in block designs. J. Amer. Statist. Assoc. 77, 461-467. Hsu, J. C. (1984). Constrained simultaneous confidence intervals for multiple comparisons with the best. Ann. Statist. 12, 1136-1144. Hsu, J. C. (1985). A method of unconstrained multiple comparisons with the best. Comm. Statist. Theory Methods 14, 2009-2028. Hsu, J. C. and B. Nelson (1993). Multiple comparisons in the general linear models. Unpublished Report. Jeyarantnem, S. and S. Panchapakesan (1984). An estimation problem relating to subset selection for normal populations. In: T. J. Santner and A. C. Tamhane, eds., Design of Experiments: Ranking and Selection. Dekker, New York, 287-302. Kim, W.-C. (1986). A lower confidence bound on the probability of a correct selection. J. Amer. Statist. Assoc. 81, 1012-1017. Milton, R. C. (1963). Tables of equally correlated multivariate normal probability integral. Tech. Report No. 27, Department of Statistics, University of Minnesota, Minneapolis, MN. Mukhopadhyay, N. and T. K. S. Solanky (1994). Multistage Selection and Ranking Procedures: SecondOrder Asymptotics. Dekker, New York. Olkin, I., M. Sobel and Y. L. Tong (1976). Estimating the true probability of correct selection for location and scale parameter families. Tech. Report No. 174, Department of Operation Research and Department of Statistics, Stanford University, Stanford, CA.

Design of experiments with selection and ranking goals

585

Olkin, I., M. Sobel and Y. L. Tong (1982). Bounds for a k-fold integral for location and scale parameter models with applications to statistical ranking and selection problems. In: S. S. Gupta and J. O. Berger, eds., Statistical Decision Theory and Related Topics III, Vol. 2. Academic Press, New York, 193-212. Pan, G. and T. J. Santner (1993). Selection and screening in additive two-factor experiments using randomization restricted designs. Tech. Report 523, Department of Statistics, The Ohio State University, Columbus, OH. panchapakesan, S. (1992). Ranking and selection procedures. In: N. Balakrishnan, ed., Handbook of the Logistic Distribution. Dekker, New York, 145-167. Panchapakesan, S. (1995a). Robustness of selection procedures. J. Statist. Plann. Inference, to appear. Panchapakesan, S. (1995b). Selection and ranking procedures. In: N. Balakrishnan and A. P. Basu, eds., The Exponential Distribution: Method, Theory and Applications. Gordon and Breach, New York, to appear. Paulson, E. (1952). On the comparison of several experimental categories with a control. Ann. Math. Statist. 23, 239-246. Rasch, D. (1978). Selection problems in balanced designs. Biometr. J. 20, 275-278. Rasch, D. (1995). Software for selection procedures. J. Statist. Plann. Inference, to appear. Santner, T. J. (1975). A restricted subset selection approach to ranking and selection problem. Ann. Statist. 3, 334-349. Santner, T. J. (1981). Designing two factor experiments for selecting interactions. J. Statist. Plann. Inference 5, 45-55. Santner, T. J. and A. J. Hayter (1993). The least favorable configuration of a two-stage procedure for selecting the largest normal mean. In: E M. Hoppe, ed., Multiple Comparisons, Selection and Applications in Biometry. Dekker, New York, 247-265. Sarkadi, K. (1967). Estimation after selection. Studia Sci. Math. Hungr. 2, 341-350. Sehr, J. (1988). On a conjecture concerning the least favorable configuration of a two-stage selection procedure. Comm. Statist. Theory Methods 17, 3221-3233. Sobel, M. (1969). Selecting a subset containing at least one of the t best populations. In: P. R. Krishnaiah, ed., Multivariate Analysis 11. Academic Press, New York, 515-540. Sobel, M. and M. J. Huyett (1957). Selecting the best one of several binomial populations. Bell Syst. Tech. J. 36, 537-576. Tamhane, A. C. (1976). A three-stage elimination type procedure for selecting the largest normal mean (common unknown variance case). Sankhya Ser. B 38, 339-349. Tamhane, A. C. and R. E. Bechhofer (1977). A two-stage minimax procedure with screening for selecting the largest normal mean. Comm. Statist. Theory Methods 6, 1003-1033. Tamhane, A. C. and R. E. Bechhofer (1979). A two-stage minimax procedure with screening for selecting the largest normal mean (II): An improved PCS lower bound and associated tables. Comm. Statist. Theory Methods 8, 337-358. Taneja, B. K. (1986). Selection of the best normal mean in complete factorial experiments with interaction and with common unknown variance. J. Japanese Statist. Soc. 16, 55~55. Taneja, B. K. (1987). Nonparametric selection procedures in complete factorial experiments. In: W. Sendler, ed., Contributions to Statistics. Physica-Verlag, Heidelberg, 214-235. Taneja, B. K. and E. J. Dudewicz (1987). Selection in factorial experiments with interaction, especially the 2 × 2 case. Acta Math. Sinica, New Series 3, 191-203. Van der Laan, P. and L. B. Verdooren (1989). Selection of populations: An overview and some recent results. Biometr J. 31, 383-420. Wu, K. H. and S. H. Cheung (1994). Subset selection for normal means in a two-way design. Biometr. J. 36, 165-175.

S. Ghosh and C. R. Rao, eds., Handbookof Statistics, Vol. 13 © 1996 ElsevierScienceB.V. All rights reserved.

1• IU

Multiple Comparisons

A j i t C. T a m h a n e

1. Introduction

The pitfalls inherent in making multiple inferences based on the same data are wellknown, and have concerned statisticians for a long time. This problem was noted by Fisher (1935) in the context of making pairwise comparisons using multiple alevel two-sample t tests in a one-way ANOVA setting. To alleviate the resulting inflated type I error probability of falsely declaring at least one pairwise difference significant, he suggested the Bonferroni and the protected least significant difference (LSD) methods as simple ad-hoc solutions. Multiple comparisons as a separate field started in the late 40's and early 50's with the fundamental works of Duncan (1951, 1955), Dunnett (1955), Roy and Bose (1953), Scheff6 (1953) and Tukey (1949). Since then the field has made tremendous progress and is still extremely active with research motivated by many real life problems stemming mainly from applications in medical, psychological and educational research. A number of books and monographs have been written on the subject beginning with the mimeographed notes of Tukey (1953) (recently published by Braun, 1994) followed by Miller (1966, 1981), Hochberg and Tamhane (1987), Toothaker (1991, 1993) and Westfall and Young (1989). Recent review articles are by Bauer (1991) and Shaffer (1994); see Hochberg and Tamhane (1987) (hereafter referred to as HT) for references to earlier review articles. The aim of the present article is to give an overview of the subject, focusing more on important developments since the publication of HT. All mathematical proofs are omitted, references being given wherever needed. The outline of the paper is as follows: Section 2 discusses the basic concepts of multiple comparisons. Section 3 gives some methods for constructing multiple testing procedures. Section 4 gives p-value based procedures, which are modifications of the simple Bonferroni procedure. Section 5 covers classical normal theory procedures for inter-treatment comparisons with emphasis on two families: comparisons with a control and pairwise comparisons. Section 6 is devoted to the problem of multiple endpoints; continuous and discrete endpoints are covered in two separate subsections. Section 7 reviews several miscellaneous problems. The paper ends with some concluding remarks in Section 8. 587

588

A. C. Tamhane

2. Basic concepts 2.1. Family A family is a set of contextually related inferences (comparisons) from which some common conclusions are drawn or decisions are made. Often we refer to the collection of parameters on which hypotheses are tested or confidence statements are made as a family. Some examples of families are pairwise comparisons between a set of treatments, comparisons of test treatments with a control, comparison of two treatments based on multiple endpoints, and significance tests on pairwise correlations among a set of variables. These are all finite families. An example of an infinite family is the collection of all contrasts among a set of treatment means. Which inferences to include in a family can be a difficult question. The following guidelines are useful to resolve this question: • Contextual relatedness (not statistical dependence) is a primary consideration for grouping a set of inferences into a single family. Generally, not all inferences made in a single experiment may constitute a family and a single experiment may involve more than one family. For an example from pharmaceutical industry, see Dunnett and Tamhane (1992b). • On the other hand, an individual experiment should be considered as a unit for forming a family. In other words, a family should not extend over inferences made in several different experiments. • A family should include not just the inferences actually made, but also all other similar inferences that potentially could have been made had the data turned out differently. This is especially necessary if an MCP is used for data-snooping. For example, suppose that a particular pairwise contrast among the treatment means is tested not because it was of a priori interest, but rather because it turned out to be the largest. Since this difference is selected by data-snooping from the set of all pairwise comparisons, this latter set constitutes the appropriate family. Note that although only one test is conducted explicitly, all pairwise tests are conducted implicitly, the most significant one (or possibly more than one) being reported. Thus pre-hoc multiple inferences and post-hoc selective inferences are two sides of the same coin. In the latter case, in order to specify a family, one must be able to delineate the set of potential comparisons from which the ones actually reported are selected. • To achieve reasonable power, it is advisable that the search for interesting contrasts should be narrowed down to some finite family based on substantive questions of interest. The larger the family, the less power does the MCP have. An infinite family such as the family of all contrasts or all linear combinations of the treatment means should generally be avoided.

2.2. Error rates and their control An error rate is a probabilistic measure of erroneous inferences (restricted here only to type I errors) in a family. A multiple comparison procedure (MCP) is a statistical

Multiple comparisons

589

procedure for making all or selected inferences from a given family while controlling or adjusting for the increased incidence of type I errors due to multiplicity of inferences. For this purpose, usually the familywise error rate (FWE) (also called the experimentwise error rate) is used, where FWE = P { A t least one wrong inference}. Control of the FWE for an MCP used for multiple null hypotheses testing (called a multiple test procedure (MTP)) means that P { A t least one true null hypothesis is rejected} ~< a

(2.1)

for a stated a. For an MCP used for confidence estimation (called a simultaneous confidence procedure (SCP)), the corresponding requirement is P{All parameters are included in their confidence intervals} )l-a.

(2.2)

Both MTP's and SCP's will be referred to as MCP's unless a distinction must be made between the two. Two main reasons for controlling the FWE are: • Control of the FWE means control of the probability of an error in any subset of inferences, however selected (pre-hoc or post-hoc) from the specified family. This is useful in exploratory studies since it permits data-snooping. • Control of the FWE guarantees that all inferences in the family are correct with probability >~ 1 - a. This is useful in confirmatory studies where the correctness of an overall decision depends on the simultaneous correctness of all individual inferences, e.g., in selection type problems. The FWE can be controlled strongly or weakly. To explain this, consider the problem of testing a family of null hypotheses H~ for i E K , where K is a finite or an infinite index set. By H = ~icK Hi we denote the overall null hypothesis. If any hypothesis in this family implies any other hypothesis then the family is said to be hierarchical; otherwise it is said to be non-hierarchical. If the family includes the overall null hypothesis H then it is clearly hierarchical since H implies all the H~'s. Strong control of the FWE means that (2.1) is satisfied under all partial null hypotheses of the t y p e / / i = ~lici H~ where I C K. Weak control of the FWE means that (2.1) is satisfied only under some Hz, typically only under H = HK. Note that since the simultaneous confidence statement (2.2) is satisfied under all configurations, an MTP derived by inverting an SCP controls the FWE strongly. A majority of applications require strong control of the FWE. Henceforth, by control of the FWE we will always mean strong control unless specified otherwise. An alternative error rate used is the per-family error rate (PFE) (also called the

per-experiment error rate): PFE = E{No. of wrong inferences},

590

A. C. Tamhane

which is defined only for finite families. If Ei is the event that the ith inference is wrong, i E K , then using the Bonferroni inequality we obtain

(2.3) Therefore control of the PFE implies conservative control of the FWE. This is the basis of the Bonferroni method of multiple comparisons. If the events Ei are positively dependent (Tong, 1980) then a sharper upper bound on the FWE is provided by the following multiplicative inequality:

FWE= P ( U E~] 0) for the rejected Hi's. The directional decisions are made depending on the signs of the test statistics. The new feature of this problem is that one must consider type III errors, which are the errors of misclassifying the signs of the nonnull 0i's. To avoid the type III error probability from approaching 1/2 for any Hi as 0i approaches zero, it is necessary to allow a third decision for each hypothesis, namely that the data are "inconclusive". Subject to control of a suitable error rate, we need to maximize the number of correct directional decisions (referred to as confident directions by Tukey (1991)). Two approaches to error rate control are generally adopted. One approach is to control the type III FWE, which is the probability of making any type III error. This approach ignores type I errors altogether, which is justified if there is very little loss associated with them or because the null values Oi = 0 are a priori unlikely. As an example of this approach, see Bofinger (1985). The second approach is to control the type I and III FWE, which is the probability of making any type I or type III error. An MTP obtained by inverting simultaneous confidence intervals (SCI's) controls this latter FWE. As we shall see in the sequel, for the common types of SCI's, such MTP's are of the single-step type. It is not known whether the MTP's that do tests in a stepwise fashion (e.g., Fisher's protected LSD) have this property in general.

2.4. Adjusted p-values As in the case of a single hypothesis test, it is more informative to report a p-value for each hypothesis instead of simply stating whether the hypothesis can be rejected or not at some preassigned c~. This also has the advantage that table look-up is avoided. A complicating factor for multiple hypotheses is that each p-value must be adjusted for multiplicity of tests. The adjusted p-value of a hypothesis Hi (denoted by PaO is defined as the smallest value of the familywise level c~ at which Hi can be rejected by a given MCR Once the pa~'s are computed, tests can be conducted at any desired familywise level c~ by simply rejecting any Hi if Pai < c~ for i E K . The idea of adjusted p-values has been around for a long time, but the terminology and their use has been made more popular recently by Westfall and Young (1992) and

592

A. C. Tamhane

Wright (1992). Formulas for calculating the adjusted p-values for different MCP's are given when those MCP's are discussed in the sequel; for example, see (4.1). A resampling approach to obtain simulation estimates of the adjusted p-values is the main theme of Westfall and Young (1993).

2.5. Coherence and consonance

In a hierarchical family the following logical requirement must be satisfied: If any hypothesis is accepted then all hypotheses implied by it must also be accepted. This requirement is called coherence (Gabriel, 1969). Another requirement that is sometimes desirable, but not absolutely required is that if any hypothesis is rejected then at least one hypothesis implied by it must also be rejected. This requirement is called consonance (Gabriel, 1969). For example, if in a one-way layout the family consists of the overall null hypothesis H that all treatment means are equal and hypotheses of pairwise equality of treatment means, and if H is rejected then the consonance requires that at least one pair of treatment means must be declared significantly different. Not all MCP's satisfy this requirement. For example, Fisher's LSD may find the F test of H significant, but none of the pairwise t tests to be significant.

3. Types of MCP's and methods of their construction 3.1. Types o f MCP's

MCP's can be classified as shown in Figure 1. A single-step MCP tests the hypotheses Hi's simultaneously without reference to one another, typically using a common critical constant. An example of a single-step MCP is the Bonferroni procedure, which tests each Hi at level c~/k. A single-step MCP can be readily inverted to obtain an SCP. Such is generally not the case with stepwise MCP's. MCP's

~-S~

Stepwise

Step-Down

Fig. 1. Typesof MCP's.

Step-Up

Multiple comparisons

593

A step-down MCP tests the H i ' s in the decreasing order of their significance or in the hierarchical order of implication with the most significant or the most implying hypotheses (e.g., the one that implies all other hypotheses in the family) being tested first, using a monotone set of critical constants. Testing continues until a hypothesis is accepted; the remaining hypotheses are accepted by implication without further tests. The protected LSD provides an example of a step-down MCP. A step-up MCP tests the H i ' s in the increasing order of their significance or in the hierarchical order of implication with the least significant or the least implying hypotheses (e.g., the set of hypotheses that do not imply each other, but are implied by one or more hypotheses in the family that are outside this set) being tested first, using a monotone set of critical constants. Testing continues until a hypothesis is rejected; the remaining hypotheses are rejected by implication without further tests. Hochberg's (1988) procedure discussed in Section 4.2.2 and Welsch's (1977) procedure discussed in Section 5.2.3 are examples of step-up MCP's.

3.2. Union-intersection method

The union-intersection (UI) method was proposed by Roy (1953) as a heuristic method to test a single hypothesis H , which can be expressed as an intersection of some component hypotheses Hi, i E K. Suppose that tests are available for testing Hi versus Ai for i E K . Then the UI test of

H : N x,

versus

A = U A~

iGK

(3.1)

iEK

rejects if at least one of the component hypotheses Hi is rejected. Roy and Bose (1953) showed how a single-step MTP can be derived from a UI test. To make the ideas concrete, let ti be a test statistic for Hi with the corresponding rejection region being ti > c, where c is a critical constant. Then the UI test rejects i f / ; m a x =" max~eK t~ > e. The critical constant c is determined so that the UI test has a designated level c~, i.e., c is the (1 - a)-quantile (the upper a point) of the distribution of/;max under H. The corresponding MTP uses the same constant c to test the individual Hi by rejecting Hi if ti > c. This MTP controls the FWE strongly if the FWE is maximum under H. The well-known procedures of Dunnett (1955), Scheff6 (1953) and Tukey (1949) which control the FWE strongly are based on the UI method. Krishnaiah (1979) gave a general UI method for deriving MTP's for finite families which he called finite intersection tests.

3.3. Intersection-union method

In some problems the overall null hypothesis can be expressed as a union rather than as an intersection of hypotheses, i.e., the testing problem can be formulated as H=

UHi iCK

versus

A=

NAi" iEK

(3.2)

594

A. C. Tamhane

This formulation is applicable when all component hypotheses, H~, must be rejected in order to reject H and take an action appropriate to the alternative hypothesis A. For example, a combination drug must be shown to be superior to all of its subcombinations for it to be acceptable. It can be shown (Berger, 1982) that an c~-level test of H rejects iff all tests of component hypotheses reject at level c~. If the test statistics ti have the same marginal distribution under Hi and c is the (1 - c~)-quantile of that distribution, then H is rejected at level c~ if train = mini6K ti > e. This is the MIN test of Laska and Meisner (1989). For example, if each ti has marginally Student's t-distribution with v degrees of freedom (d.f.) then c = t (~), the (1 - c~)-quantile of that distribution. It is interesting to note that although this problem appears to be a multiple testing problem, no multiplicity adjustment is needed for doing individual tests, which are all done at PCE = c~. The IU nature of the problem can be easily overlooked and one of the standard MCP's might be recommended when all that is needed is separate c~-Ievel tests; see, e.g., the problem considered in D'Agostino and Heeren (1991) for which a correct solution was given in Dunnett and Tamhane (1992b).

3.4. Closure method Marcus et al. (1976) gave a general formulation of the closure method originally proposed by Peritz (1970). Given a finite family of k null hypotheses {Hi (1 ~< i k)), first form a closure family by including in it all intersections H I = [']i~I Hi for I C_ K = {1, 2 , . . . , k}. Reject any null hypothesis H I (including any elementary hypothesis Hi) at level c~ iff all null hypotheses H I , implying it, i.e., all / / I , for I t 2 I , are rejected by their individual c~-level tests. This method controls the FWE strongly (see Theorem 4.1 in Chapter 2 of HT) and is also coherent since if any null hypothesis H j is accepted then all Ha, for Y C J are accepted without further tests. A practical implementation of the closure method generally requires a step-down algorithm. It begins by testing the overall null hypothesis H = HK; if H is accepted by its a-level test then all null hypotheses are accepted without further tests. Otherwise one tests H I for the next lower (i.e., k - 1) dimensional subsets I _C K by their a level tests, and so on. Sometimes shortcut methods can be used to reduce the number of tests as we shall see in the sequel.

3.5. Projection method Let 0 = (01,02,. •., 0k) I be a vector of parameters and let C be a (1 - c~)-level confidence region for 0. SCI's with joint confidence level 1 - a for all linear combinations a'O where a = (al, a 2 , . . . , ak)' E T¢k with [[a[[ = 1 can be obtained by projecting C onto the line spanned by a. If C is convex then the CI for each a~O corresponds to the line segment intercepted by the two supporting hyperplanes orthogonal to the line spanned by a. The SCI's can be used to perform hypotheses tests on atO in the usual manner with FWE controlled at level a. If SCI's are desired for linear combinations gPO for £ = (gl, g 2 , . . . , gk) t belonging to some subspace/2 of 7~k of dimension

Multiple comparisons

595

. = . (cl,. c2, k ci = 0} of dimenm < k (e.g., the contrast space C k . {c , c k) ' : ~i=1 sion k - 1), then the above procedure can be applied by starting with a (1 - a)-level confidence region for " / - - LO where L is an m × k matrix whose rows form a basis for L. Then SCI's for all linear combinations b'3' (where b = (hi, b2,..., b~) t E ~ m ) are equivalent to those for all £'0, £ E £. The Scheff6 procedure discussed in Section 5.2.1 is a classic example of the projection method. A related method is the confidence region tests of Aitchinson (1964). Refer to HT, Chapter 2, Section 2.2 and Section 3.3.2 for additional details.

4. Modified Bonferroni procedures based on p-values Reporting p-values for individual hypotheses is a standard practice. These p-values may be obtained by using different tests for different hypotheses, e.g., t, X2, Wilcoxon, logrank, etc. Therefore MCP's based on p-values are especially useful. They also are very simple (involving minimal computation other than what is required by a package to produce the p-values). Throughout this section we assume a non-hierarchical finite family, {H~ (1 ~< i ~< k)}, and denote the unadjusted (raw) p-values of the hypotheses by p~ (1 ~P(k), and denote the corresponding hypotheses by H0), H(2),..., H(k). Begin by testing the most significant hypothesis H(k) by comparing P(k) with a/k. Ifp(k) < a/k then reject H(k) and test H(k-1) by comparing P(k-1) with a/(k - 1); otherwise accept all null hypotheses and stop testing. In general, reject H(i) iff PU) < a/j for j = k, k - 1 , . . . , i. It is clear that the Holm procedure rejects the same null hypotheses rejected by the Bonferroni procedure, and possibly more. Thus it is at least as powerful as the Bonferroni procedure. The adjusted p-values for the Holm procedure are given by

P~(i) = min [1, max {kp(k), (k - 1)p(k-l),..., ip(o }]

(1 ~< i ~< k).

These adjusted p-values are never larger than the adjusted p-values for the Bonferroni procedure. Holland and Copenhaver (1987) have given a slightly improved Holm procedure based on the,sharper multiplicative Bonferroni inequality (2.4).

4.2.2. Step-up procedures The use of the Simes procedure instead of the Bonferroni procedure to provide alevel tests of intersection hypotheses in the closure method results in a somewhat complicated procedure derived by Hommel (1988). Hommel's procedure finds the largest value of m (1 ~ m ~< k) such that

p(j) > ( m - j +

l)a/m,

j = l,2,...,m.

(4.3)

If there is no such m then it rejects all hypotheses; else it rejects those Hi with

Pi < o~/m. A simpler closed procedure was derived by Hochberg (1988) which is slightly conservative (Dunnett and Tamhane, 1993; Hommel, 1989). It operates in a step-up manner as follows: Order the p-values and the hypotheses as in the Holm procedure. Begin by testing the least significant hypothesis//(1) by comparing P(1) with a/1. If p(1) /> c~/1 then accept H(I) and test H(2) by comparing P(2) with a/2; otherwise reject all hypotheses and stop testing. In general, accept H(0 iff p(j) >~c~/j for j - - 1 , 2 , . . . , i .

A . C . Tamhane

598

Both Holm's and Hochberg's procedures use the same critical constants, but the former is step-down while the latter is step-up. It is easy to see that Hochberg's procedure offers more opportunities for rejecting the hypotheses than does Holm's. Thus Hochberg's procedure is at least as powerful as Holm's, which in turn dominates the Bonferroni procedure. This is also clear from the adjusted p-values for the Hochberg procedure given by Pa(i) = min[1,min{p(1),2p(2),... ,ip(i)}]

(1 ~< i ~< k).

Recently, Liu (1996) has given a general closure method of multiple testing of which the Hommel, Hochberg and Holm are special cases. In this general framework it is easy to see by comparing the critical constants used by these procedures that Hommel's procedure is more powerful than Hochberg's, which in turn is more powerful than Holm's. However, it must be remembered that both the Hommel and Hochberg procedures rest on the same assumption as does the Simes procedure which is that the P~ are independent. On the other hand, Holm's procedure needs no such assumption. Rom (1990) provided a method for computing sharper critical constants, ci, in place of the Oz/i used in Hochberg's procedure. Let the Pi be as defined in the Simes procedure. Let Pl,m >/P2,m >/ "'" >/ Pm,m be the ordered values of P~, P 2 , . . . , Pm (1 ~ m .Cl,...,Pm,m>.Cm}=l-OZ

(1 ~ 0 and ~-]~k 1 wi = k) and using weighted p-values, Pwi = p i / w i in place of the pi in the above procedures (Hochberg and Liberman, 1994).

5. Normal theory procedures for inter-treatment comparisons 5.1. Comparisons with a control and orthogonal contrasts

We consider the following distributional setting. For i = 1 , 2 , . . . , k, let the Oi be contrasts of interest in a general linear model and let the Oi be their least squares (LS) estimators. We assume that the O~ are jointly normally distributed with means 0i, variances 0"27-2 and corr(~,Oj) = Pij. The 7-~ and Pij are known design-dependent constants, while 0"2 is an unknown scalar. We assume that S 2 is an unbiased mean square error estimator of 0"2 with u d.f. and uS2/0" z ~ X 2 independent of the ~ . Simultaneous inferences (confidence intervals and/or hypothesis tests) are desired on the 0i. Two examples of this setting are: EXAMPLE 5.1 (Comparisons with a control in a one-way layout). Let the treatments be labelled 0, 1 , 2 , . . . , k where 0 is a control treatment and 1 , 2 , . . . , k are k >~ 2 test treatments. Denote by ~ the sample mean based on ni observations from the ith treatment (0 ~< i ~< k). The ~ are assumed to be independent N(Iz~, 0-2/ni) random values and uS2/0- 2 ~ X 2 independently of the ~ where u = 2~=0 n~ - (k + 1). The contrasts of interest are 0i = #i - #0, their LS estimators being Y / - Y0 (1 ~< i ~< k). In this case we have

1 1 7-2 = - - + - T~i

no

and

Pij = AiAj

whereAi=i~-~n~+no

(1 ~ 1,

and at level c~ if r r = 1. Thus Peritz's procedure can be implemented by testing H p iff it is not accepted earlier by implication and rejecting it iff at least one of the non-vacuous Hp, is significant at level C~p~ given by the above formula. Begun and Gabriel (1981) answered the second question by giving an algorithm for its implementation, thus making Peritz's procedure practicable. This algorithm uses the general step-down testing scheme of Section 5.2.2.1 for testing subset hypotheses Hp with two different choices of C~p in tandem, the Newman-Keuls (NK) choice (5.11) and the R E G W choice (5.14). The algorithm tests the hypotheses Hp using the N K and R E G W step-down MTP's. It accepts all Hp's that are accepted by the N K procedure and rejects all Hp's that are rejected by the R E G W procedure. It makes decisions on the remaining (contentious) hypotheses as follows: a hypothesis Hp is accepted if (i) a hypothesis Hp,, which implies Hp (i.e., P C_ P~) is accepted, or (ii) H p is nonsignificant at level C~p = 1 - (1 - c~)plk and for some PP in the complement of P with c a r d ( P ' ) = p' ) 2, He, is nonsignificant at level c~p, = 1 - (1 - c~)P'/k; otherwise I-IF is rejected. Once the decisions on the individual subset hypotheses are available, decisions on multiple subset hypotheses can be made using the UI method as indicated above. See HT, Chapter 4, Section 2.2 for further details. Either Q or F statistics can be used in the above procedure. A Fortran program for the Peritz procedure using the range statistics is given in Appendix B of Toothaker (1991). Competitors to Peritz's procedure have been proposed by Ramsey (1981), and Braun and Tukey (1983).

5.2.3. A step-up procedure Welsch (1977) proposed a step-up procedure based on range statistics in the case of a balanced one-way layout with n observations per treatment. The procedure uses critical constants c2 ~< c3 ~< - -. ~< ck, which are determined (Welsch used Monte Carlo simulation for this purpose) to control the FWE. First, the treatment means are ordered: Y(1) ~< ~2) ~< "'" ~< Y(k)- The procedure begins by testing "gaps" or "2-ranges", (Y(i+l) - Y(~)) for 1 ~< i ~< k - 1. If any gap exceeds the critical value c2S/v/-n then the corresponding treatment means are declared significantly different. Furthermore, all subsets P containing that pair of treatments are declared heterogeneous (i.e., the hypotheses Hp are rejected) and the corresponding p-ranges significant by implication.

612

A. C. Tamhane

In general, a p-range, (Y(p+i-1) -Y(i)) is tested iff it is not already declared significant by implication. If that p-range is tested and exceeds the critical value cpS/v/-~ then it is declared significant.

5.3. Comparisons of MCP's and recommendations for use

In this subsection, we provide guidelines for selecting from the wide array of MCP's discussed above when the assumptions of normality and homogeneous variances are satisfied approximately in practice. See the next subsection for the performance of these MCP's and modifications needed when these assumptions are not satisfied. Comparisons are confined to MCP's that strongly control the FWE. The two major considerations when selecting an MCP are: 1. What is the family of comparisons? 2. Are only multiple tests needed or are simultaneous confidence intervals also needed? If the family consists of comparisons with a control or orthogonal contrasts then an MCP from Section 5.1 should be used; if the family consists of pairwise comparisons or subset hypotheses then an MCP from Section 5.2 should be used. MCP's for other special families can be constructed using the UI method; see, e.g., Hochberg and Rodriguez (1977). The Bonferroni method is always available as a general omnibus method. If only multiple tests are needed then one of the stepwise MTP's should be used since they are more powerful than the corresponding single-step MTP's. If SCI's are needed (based on which tests can always be made) then single-step SCP's must be used. Two other advantages of the MTP's corresponding to these SCP's are that (i) they permit directional decisions in cases of rejections of null hypotheses in favor of two-sided alternatives, and (ii) they are simpler to apply than stepwise MTP's. We now focus on MCP's of Section 5.2. If SCI's for pairwise comparisons are needed then the clear choice is the T procedure for pairwise balanced designs (e.g., balanced one-way layouts) and the TK procedure for unbalanced designs. It should be noted, however, that there is no analytical proof yet to show that the latter controls the FWE in all cases. The S procedure is too conservative in most applications, and should be used only when it is desired to select any contrast that seems interesting in light of the data, the price paid being the lack of power for simple comparisons. In other cases, the Bonferroni method may be used as long as the family of comparisons is prespecified and the number of comparisons is not unduly large. For multiple testing, the criterion for the choice of an MTP is usually power. At least three different types of power have been studied in the literature for pairwise comparisons using Monte Carlo simulation; see Einot and Gabriel (1975) and Ramsey (1978, 1981). They are: all pairs power, any pair power and per pair power. The all pairs power is the probability of detecting all true pairwise differences, #i ¢ #j. The any pairpower is the probability of detecting at least one true pairwise difference. The per pair power is the average probability of detecting each true pairwise difference, averaged over all differences. The all pairs power has the drawback that it is vitiated by low power for even one pair even if all the pairs have high power (Tukey, 1995).

613

Multiple comparisons

The any pair and per pair powers are more stable measures of power and hence may be preferred. Another difficulty in comparing the powers of the MTP's is that, for a given procedure, any type of power depends on the true means configuration, and one procedure may be more powerful than another for certain configurations, but less powerful for others. In other words, except for some obvious cases (e.g., the step-down REGWQ procedure versus the single-step T procedure), rarely does one procedure uniformly dominate another. The following recommendations must be taken with these provisos. Using the criterion of the all pairs power, Ramsey (1978) found that Peritz's procedure based on F statistics is the best choice. Peritz's procedure based on Q statistics and the REGWQ procedure are also good choices. Welsch's step-up procedure also has good power, but its applicability is limited to balanced one-way layouts; also, its critical constants are available only for a few cases thus further limiting its use. Another procedure whose power has not been studied in detail, but which is likely to have good power and which is very easy to apply even by hand is Hayter's (1986) modification of Fisher's LSD. For implementing other stepwise procedures, a computer program is almost always a must.

5.4. Violations o f homoscedasticity and normality assumptions

Homoscedasticity refers to the assumption that the treatment variances, or? ~, are equal. When this assumption is violated, the performances (the FWE and power) of the MCP's for pairwise comparisons which all assume a common a 2 is not severely affected if the sample sizes, ni, are equal for all treatments. (However, the performances of the MCP's for comparisons with a control could be seriously affected if the control variance is different from the treatment variance.) The lack of robustness of the MCP's becomes more evident when the n i ' s are unequal. Generally, the MCP's are conservative (FWE < c~) when the n i ' s are directly paired with the cr~ (e.g., when the n i and cr2 are highly positively correlated), and are liberal (FWE > c~) when they are inversely paired (e.g., when the ni and a/z are highly negatively correlated). A common solution to deal with unequal a i2 is to use separate variance estimates, S~, in conjunction with the Welch-Satterthwaite formula for the d.f. separately for each pairwise comparison:

(S ln + S l j) 2 t/ij = ( S 2 / n i ) 2 / ( n i

_ 1) -t- ( S f f / n j ) 2 / ( n j

(1 P(k) be the ordered p-values and let P(i) be the r.v. corresponding to p(~). Then Pa is the probability of the event that (P(k) #0. This problem could be formulated as an estimation problem or as a multiple testing problem. In the latter case the identified MED is not really an estimate of the true MED; rather it is the lowest dose that is statistically significantly different from the zero dose. For the multiple testing formulation, Williams (1971, 1972) proposed a step-down procedure which compares isotonic estimates of the #i (1 ~< i ~< k) with the zero dose sample mean. Ruberg (1989) considered step-down and single-step procedures based on selected contrasts among sample means. Rom et al. (1994) gave closed procedures also based on selected contrasts among sample means that allow for additional comparisons among sets of successive dose levels. Tamhane et al. (1996) have compared these and other procedures using Monte Carlo simulation. For other multiple comparison problems involving ordered means, see HT, Chapter 10, Section 4. Hayter (1990) has given a one-sided analog of Tukey's Studentized range test for comparing ordered means, #j versus #i (1 ~< i < j ~< k). A third area of interest is the problem of designing experiments for multiple comparisons. The problem of sample size determination is addressed in various papers; a summary of the work can be found in HT, Chapter 6. Some recent papers are Bristol (1989), Hsu (1988), and Hayter and Liu (1992). There also has been a large body of work on block designs for comparing treatments with a control beginning with a paper by Bechhofer and Tamhane (1981). This work is summarized in the article by Hedayat et al. (1988) and a review article by Majumdar (1996) in the present volume.

8. Concluding remarks As stated at the beginning of this paper, research in multiple comparisons is still progressing at full steam even though the field has certainly matured since its inception more than forty years ago. This is a good point to take a pause and ask the question "Where should multiple comparisons go next?" This is precisely the title of a recent paper by John Tukey (Tukey, 1993), one of the founding fathers of the field. Many interesting avenues of research and problems are outlined in this paper. A theme that Tukey emphasizes is the use of graphical methods (see also Tukey, 1991). These methods are discussed in HT, Chapter 10, Section 3, but more work is needed. In this connection, see Hochberg and Benjamini (1990), who use the graphical method of Schweder and Spjotvoll (1982) to improve the power of some p-value based MCP's. Another promising line of work is the choice of alternative error rates to control with the goal of developing more powerful MCP's. False discovery rate (FDR) proposed by Benjamini and Hochberg (1995) has been mentioned in Section 2.2. Halperin et al. (1988) made another proposal that has not been followed up. Finally, it is important to remember that the field of multiple comparisons owes its origin to real problems arising in statistical practice. The various MCP's will remain unused unless they are made available in a user-friendly software. Although many statistical packages do have the options to carry out some of the MCP's discussed here,

Multiple comparisons

625

until recently there was not a single package that did a majority of the most useful ones. This gap has been filled by a SAS application package called M u l t C o m p by R o m (1994). In conclusion, we have attempted to present a c o m p r e h e n s i v e o v e r v i e w o f selected d e v e l o p m e n t s in multiple comparisons since the publication o f H T and also of m u c h b a c k g r o u n d material (discussed in more detail in HT). Our hope is that the researchers will find this paper a useful springboard for p u r s u i n g n e w and relevant practical problems, thus m a i n t a i n i n g the vitality of the field and helping it to grow into a healthy and essential subdiscipline of statistics.

Acknowledgement F o r c o m m e n t s on the first draft I a m indebted to Charlie Dunnett, Tony Hayter, L u d w i g Hothorn, a referee, and especially to John Tukey whose c o m m e n t s led m e to revise several statements and o p i n i o n s expressed in that draft. Over the last about seven years, it has been m y privilege and good fortune to have collaborated closely with Charlie D u n n e t t from w h o m I have learned a great deal. As a token o f m y gratitude, I dedicate this paper to him.

References Aitchinson, J. (1964). Confidence-region tests. J. Roy. Statist. Soc. Sen B 26, 462-476. Armitage, P. and M. Parmar (1986). Some approaches to the problem of multiplicity in clinical trials. In: Proc. 13th lnternat. Biometric Conf. Biometric Society, Seattle. Bauer, P. (1991). Multiple testing in clinical trials. Statist. Medicine 10, 871-890. Bechhofer, R. E. and C. W. Dunnett (1988). Tables of percentage points of multivariate Student t distributions. In: Selected Tables in Mathematical Statistics, Vol. 11, 1-371. Bechhofer, R. E. and A. C. Tamhane (1981). Incomplete block designs for comparing treatments with a control. Technometrics 23, 45-57. Begun, J. and K. R. Gabriel (1981). Closure of the Newman-Keuls multiple comparison procedure. J. Amer. Statist. Assoc. 76, 241-245. Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57, 289-300. Berger, R. L. (1982). Multiparameter hypothesis testing and acceptance sampling. Technometrics 24, 295300. Bofinger, E. (1985). Multiple comparisons and type III errors. J. Amer. Statist. Assoc. 80, 433-437. Bofinger, E. (1987). Step-down procedures for comparison with a control. Austral. Z Statist. 29, 348-364. Braun, H. I. (1994). The Collected Works of John W. Tukey, Vol. VIII. Multiple Comparisons: 1948-1983. Chapman and Hall, New York. Braun, H. I. and J. W. Tukey (1983). Multiple comparisons through orderly partitions: The maximum subrange procedure. In: H. Wainer and S. Messick, eds., Principles of Modern Psychological Measurement: A Festschrifi in Honor of Frederick M. Lord. Lawrence Erlbaum Associates, Hillsdale, NJ, 55~55. Bristol, D. R. (1989). Designing clinical trials for two-sided multiple comparisons with a control. Controlled Clinical Trials 10, 142-152. Brown, C. C. and T. R. Fears (1981). Exact significance levels for multiple binomial testing with application to carcinogenicity screens. Biometrics 37, 763-774. Canner, S. G. and W. M. Walker (1982). Baby bear's dilemma: A statistical tale. AgronomyJ. 74, 122-124. Chang, C.-K. and D. M. Rom (1994). On the analysis of multiple correlated binary endpoints in medical studies. Unpublished manuscript.

626

A. C. Tamhane

D'Agostino, R. B. and T. C. Heeren (1991). Multiple comparisons in over-the-counter drug clinical trials with both positive and placebo controls (with comments and rejoinder). Statist. Medicine 10, 1-31. Dubey, S. D. (1985). On the adjustment of P-values for the multiplicities of the intercorrelated symptoms. Presented at the Sixth Annual Meeting of the International Society for Clinical Biostatistics, Dusseldorf, W. Germany. Duncan, D. B. (1951). A significance test for differences between ranked treatments in an analysis of variance. Virginia J. Sci. 2, 171-189. Duncan, D. B. (1955). Multiple range and multiple F tests. Biometrics 11, 1-42. Duncan, D. B. (1957). Multiple range tests for correlated and heteroscedastic means. Biometrics 13, 164176. Duncan, D. B. (1965). A Bayesian approach to multiple comparisons. Technometrics 7, 171-222. Duncan, D. B. and D. O. Dixon (1983). k-ratio t-tests, t intervals and point estimates for multiple comparisons, In: S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 4. Wiley, New York, 403-410. Dunn, O. J. (1958). Estimation of means of dependent variables. Ann. Math. Statist. 29, 1095-1111. Dunn, O. J. (1961). Multiple comparisons among means. J. Amer. Statist. Assoc. 56, 52-64. Dunnett, C. W. (1955). A multiple comparison procedure for comparing several treatments with a control. J. Amer. Statist. Assoc. 50, 1096-1121. Dunnett, C. W. (1980a). Pairwise multiple comparisons in the homogeneous variances, unequal sample size case. £ Amer. Statist. Assoc. 75, 789-795. Dunnett, C. W. (1980b). Pairwise multiple comparisons in the unequal variance case. J. Amer. Statist. Assoc. 75, 796-800. Dunnett, C. W. (1989). Multivariate normal probability integrals with product correlation structure, Algorithm AS251. Appl. Statist. 38, 564-579; Correction note 42, 709. Dunnett, C. W. and A. C. Tamhane (1991). Step-down multiple tests for comparing treatments with a control in unbalanced one-way layouts. Statist. Medicine 11, 1057-1063. Dunnett, C. W. and A. C. Tamhane (1992a). A step-up multiple test procedure. J. Amer. Statist. Assoc. 87, 162-170. Dunnett, C. W. and A. C. Tamhane (1992b). Comparisons between a new drug and placebo controls in an efficacy trial. Statist. Medicine 11, 1057-1063. Dunnett, C. W. and A. C. Tamhane (1993). Power comparisons of some step-up multiple test procedures. Statist. Probab. Lett. 16, 55-58. Dunnett, C. W. and A. C. Tamhane (1995). Step-up multiple testing of parameters with unequally correlated estimates. Biometrics 51, 217-227. Edwards, D. G. and J. C. Hsu (1983). Multiple comparisons with the best treatment. Z Amer. Statist. Assoc. 78, 965-971. Einot, I. and K. R. Gabriel (1975). A study of the powers of several methods in multiple comparisons. J. Amer. Statist. Assoc. 70, 574-583. Farrar, D. B. and K. S. Crump (1988). Exact statistical tests for any carcinogenic effect in animal bioassays. Fund. Appl. Toxicol. 11, 652-663. Finner, H. (1990). On the modified S-method and directional errors. Comm. Statist. Ser. A 19, 41-53. Fisher, R. A. (1935). The Design of Experiments. Oliver and Boyd, Edinburgh, UK. Gabriel, K. R. (1969). Simultaneous test procedures - Some theory of multiple comparisons. Ann. Math. Statist. 40, 224-250. Gabriel, K. R. (1970). On the relationship between union-intersection tests. In: R. C. Bose et al., eds., Essays in Probability and Statistics. Univ. of North Carolina Press, Chapel Hill, NC, 251-266. Games, P. A. and J. E Howell (1976). Palrwise multiple comparison procedures with unequal N ' s and/or variances: A Monte Carlo study. J. Educ. Statist. 1, 113-125. Gupta, S. S. and S. Panchapakesan (1979). Multiple Decision Procedures. Wiley, New York. Halperin, M., K. K. G. Lan and M. I. Hamdy (1988). Some implications of an alternative definition of the multiple comparisons problem. Biometrika 75, 773-778. Hayter, A. J. (1984). A proof of the conjecture that the Tukey-Kramer multiple comparisons procedure is conservative. Ann. Statist. 12, 61-75.

Multiple comparisons

627

Hayter, A. J. (1986). The maximum familywise error rate of Fisher's least significant difference test. J. Amer. Statist. Assoc. 81, 1000-1004. Hayter, A. J. (1989). Pairwise comparisons of generally correlated means. J. Amer. Statist. Assoc. 84, 208-213. Hayter, A. J. (1990). A one-sided Studentized range test for testing against a simple ordered alternative. J. Amer. Statist. Assoc. 85, 778-785. Hayter, A. J. and J. C. Hsu (1994). On the relationship between stepwise decision procedures and confidence sets. J. Amer. Statist. Assoc. 89, 128-137. Hayter, A. J. and W. Liu (1992). A method of power assessment for comparing several treatments with a control. Comm. Statist. Theory Methods 21, 1871-1889. Hayter, A. J. and A. C. Tamhane (1990). Sample size determination for step-down multiple comparison procedures: Orthogonal contrasts and comparisons with a control. J. Statist. Plann. Inference 27, 271-290. Hedayat, A. S., M. Jacroux and D. Majumdar (1988). Optimal designs for comparing test treatments with a control (with discussion). Statist. Sci. 4, 462-491. Heyse, J. and D. Rom (1988). Adjusting for multiplicity of statistical tests in the analysis of carcinogenicity studies. Biometr. J. 8, 883-896. Hochberg, Y. (1974a). The distribution of the range in general unbalanced models. Amer. Statist. 28, 137138. Hochberg, Y. (1974b). Some generalizations of the T-method in simultaneous inference. J. Multivariate AnaL 4, 224-234. Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800-802. Hochberg, Y. and Y. Benjan~ni (1990). More powerful procedures for multiple significance testing. Statist. Medicine 9, 811-818. Hoehberg, 5(. and U. Liberman (1994). An extended Simes test. Statist. Probab. Lett. 21, 101-105. Hochberg, Y. and G. Rodrignez (1977). Intermediate simultaneous inference procedures. J. Amer. Statist. Assoc. 72, 220-225. Hochberg, Y. and D. Rom (1996). Extensions of multiple testing procedures based on Simes' test. J. Statist. Plann. Inference 48, 141-152. Hochberg, Y. and A. C. Tamhane (1987). Multiple Comparison Procedures. Wiley, New York. Holland, B. S. and M. D. Copenhaver (1987). An improved sequentially rejective Bonferroni test procedure. Biometrics 43, 417-424. Holm, S. (1979a). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6, 65-70. Holm, S. (1979b). A stagewise directional test based on t statistic. Unpublished manuscript. Hommel, G. (1986). Multiple test procedures for arbitrary dependence structures. Metrika 33, 321-336. Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75, 383-386. Hommel, G. (1989). A comparison of two modified Bonferroni procedures. Biometrika 76, 624-625. Hsu, J. C. (1981). Simultaneous confidence intervals for all distances from the 'best'. Ann. Statist. 9, 1026-1034. Hsu, J. C. (1982). Simultaneous inference with respect to the best treatment in block designs. J. Amer. Statist. Assoc. 77, 461-467. Hsu, J. C. (1984). Ranking and selection and multiple comparisons with the best. In: T. J. Santner and A. C. Tarnhane, eds., Design of Experiments: Ranking and Selection (Essays in Honor of Robert E. Bechhofer). Marcel Dekker, New York, 23-33. Hsu, J. C. (1988). Sample size computation for designing multiple comparison experiments, Comput. Statist. Data Anal. 7, 79-91. James, S. (1991). Approximate multinormal probabilities applied to correlated multiple endpoints. Statist. Medicine 10, 1123-1135. Krishnalah, E R. (1979). Some developments on simultaneous test procedures. In: E R. Krishnaiah, ed., Developments in Statistics, Vol. 2. North-Holland, Amsterdam, 157-201. Keuls, M. (1952). The use of the 'Studentized range' in connection with an analysis of variance. Euphytica 1, 112-122.

628

A. C. Tamhane

Kimball, A. W. (1951). On dependent tests of significance in the analysis of variance. Ann. Math. Statist. 22, 600-602. Kramer, C. Y. (1956). Extension of multiple range tests to group means with unequal number of replications. Biometrics 12, 307-310. Laska, E. M. and M. J. Meisner (1989). Testing whether an identified treatment is best. Biometrics 45, 1139-1151. Lehmacher, W., G. Wassmer and P. Reitmeir (1991). Procedures for two-sample comparisons with multiple endpoints controlling the experimentwise error rate. Biometrics 47, 511-521. Lehmann, E. L. and J. P. Shaffer (1977). On a fundamental theorem in multiple comparisons. J. Amer. Statist. Assoc. 72, 576--578. Lehmann, E. L. and J. P. Shaffer (1979). Optimum significance levels for multistage comparison procedures. Ann. Statist. 7, 27-45. Liu, W. (1996). Multiple tests of a nonhierarchical finite family of hypotheses. J. Roy. Statist. Soc. Set B 58, 455-461. Majumdar, D. (1996). Treatment-control designs. Chapter 27 in this Handbook. Mantel, N. (1980). Assessing laboratory evidence for neoplastic activity. Biometrics 36, 381-399; Corrig., Biometrics 37, 875. Marcus, R., K. R. Gabriel and E. Peritz (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63, 655-660. Miller, R. G., Jr. (1966). Simultaneous Statistical Inference, McGraw-Hill, New York. Miller, R. G., Jr. (1981). Simultaneous Statistical Inference. 2ud edn. Springer, New York. Mosier, M., Y. Hochberg and S. Ruberg (1995). Simple multiplicity adjustments to correlated tests, in preparation. Naik, U. D. (1975). Some selection rules for comparing p processes with a standard. Comm. Statist. Ser. A 4, 519-535. Newman, D. (1939). The distribution of the range in samples from a normal population, expressed in terms of an independent estimate of the standard deviation. Biometrika 31, 20-30. O'Brien, P. C. (1983). The appropriateness of analysis of variance and multiple comparison procedures. Biometrics 39, 787-788. O'Brien, P. C. (1984). Procedures for comparing samples with multiple endpoints. Biometrics 40, 10791087. O'Neill, R. T. and G. B. Wetherill (1971). The present state of multiple comparisons methods (with discussion). J. Roy. Statist. Soc. Ser. B 33, 218-241. Peritz, E. (1970). A note on multiple comparisons. Unpublished manuscript, Hebrew University. Perry, J. N. (1986). Multiple comparison procedures: A dissenting view. J. Econ. Entom. 79, 1149-1155. Pocock, S. J., N. L. Geller and A. A. Tsiatis (1987). The analysis of multiple endpoints in clinical trials. Biometrics 43, 487-498. Ramsey, P. H. (1978). Power differences between pairwise multiple comparison procedures. J. Amer. Statist. Assoc. 73, 479-485. Ramsey, P. H. (1981). Power of univariate pairwise multiple comparison procedures. PsychoL Bull 90, 352-366. Rom, D. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality. Biometrika 77, 663-665. Rom, D. (1992). Strengthening some common multiple test procedures for discrete data. Statist. Medicine 11, 511-514. Rom, D. (1994). MultComp2.0for PC, User's manual. ProSoft, Philadelphia, PA. Rom, D. and B. W. Holland (1994). A new closed multiple testing procedure for hierarchical family of hypotheses. J. Statist. Plann. Inference, to appear. Rom, D., R. J. Costello and L. T. Connell (1994). On closed test procedures for dose-response analysis. Statist. Medicine 13, 1583-1596. Roth, A. J. (1996). Multiple comparison procedures for discrete test statistics. Talk presented at the International Conference on Multiple Comparisons. Tel Aviv, Israel. Rothman, K. J. (1990). No adjustments are needed for multiple comparisons. Epidemiology 1, 43-46.

Multiple comparisons

629

Roy, S. N. (1953). On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Statist. 24, 220-238. Roy, S. N. and R. C. Bose (1953). Simultaneous confidence interval estimation. Ann. Math. Statist. 24, 513-536. Ruberg, S. J. (1989). Contrasts for identifying the minimum effective dose. J. Amer. Statist. Assoc. 84, 816-822. Ryan, T. A. (1960). Significance tests for multiple comparison of proportions, variances and other statistics. Psychol. Bull, 57, 318-328. Saville, D. J. (1990). Multiple comparison procedures: The practical solution. Amer. Statist. 44, 174-180. Scheff6, H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika 40, 87-104. Schervish, M. (1984). Multivariate normal probabilities with error bound, Algorithm AS 195. Appl. Statist. 33, 89-94; Corrig., Appl. Statist. 34, 103-104. Schweder, T. and E. Spjotvoll (1982). Plots of p-values to evaluate many tests simultaneously. Biometrika 69, 493-502. Shaffer, J. E (1979). Comparison of means: An F test followed by a modified multiple range procedure. J. Educ. Statist. 4, 14-23. Shaffer, J. E (1980). Control of directional errors with stagewise multiple test procedures. Ann. Statist. 8, 1342-1348. Shaffer, J. E (1986). Modified sequentially rejective multiple test procedures. J. Amer. Statist. Assoc. 81, 826-831. Shaffer, J. E (1994). Multiple hypothesis testing: A review. Tech. Report 23, National Inst. Statist. Sci., Research Triangle Park, NC. Sid~ik, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Statist. Assoc. 62, 626-633. Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73, 751-754. Sori6, B. (1989). Statistical 'discoveries' and effect size estimation. J. Amer. Statist. Assoc. 84, 608~510. Stef~nsson, G., W. C. Kim and J. C. Hsu (1988). On confidence sets in multiple comparisons. In: S. S. Gupta and J. Berger, eds., Statistical Decision Theory and Related Topics IV, Vol. 2. Springer, New York, 89-104. Stepanavage, M., H. Quan, J. Ng and J. Zhang (1994). A review of statistical methods for multiple endpoints in clinical trials. Unpublished manuscript. Tamhane, A. C. (1977). Multiple comparisons in model I one-way ANOVA with unequal variances. Comm. Statist. Sen A 6, 15-32. Tamhane, A. C. (1979). A comparison of procedures for multiple comparisons of means with unequal variances. J. Amer. Statist. Assoc. 74, 471--480. Tamhane, A. C., Y. Hochberg and C. W. Dunnett (1996). Multiple test procedures for dose finding. Biometrics 52, 21-37. Tang, D. I. and S. Lin (1994). On improving some methods for multiple endpoints. Unpublished manuscript. Tang, D. I., N. L. Geller and S. J. Pocock (1993). On the design and analysis of randomized clinical trials with multiple endpoints. Biometrics 49, 23-30. Tang, D. I., C. Gneceo and N. L. Geller (1989a). Design of group sequential clinical trials with multiple endpoints. J. Amen Statist. Assoc. 84, 776-779. Tang, D. I., C. Gnecco and N. L. Geller (1989b). An approximate likelihood ratio test for a normal mean vector with nonnegative components with application to clinical trials. Biometrika 76, 577-583. Tarone, R. E. (1990). A modified Bonferroni method for discrete data. Biometrics 46, 515-522. Tong, Y. L. (1980). Probability Inequalities in Multivariate Distributions. Academic Press, New York. Toothaker, S. E. (1991). Multiple Comparisons for Researchers. Sage, Newberry Park, CA. Toothaker, S. E. (1993). Multiple Comparison Procedures. Sage, Newberry Park, CA. Troendle, J. E (1995a). A stepwise resampling method of multiple hypothesis testing. J. Amer. Statist. Assoc. 90, 370-378. Troendle, J. E (1995b). A permutational step-up method of testing multiple outcomes. Preprint. Tukey, J. W. (1949). Comparing individual means in the analysis of variance. Biometrics 5, 99-114.

630

A. C. Tamhane

Tukey, J. W. (1953). The Problem of Multiple Comparisons. Mimeographed Notes, Princeton University, Princeton, NJ. Tukey, J. W. (1991). The philosophy of multiple comparisons. Statist. Sci. 6, 100-116. Tukey, J. W. (1993). Where should multiple comparisons go next? In: F. M. Hoppe, ed., Multiple Comparisons, Selection and Biometry. Marcel-Dekker, New York, 187-207. Tukey, J. W. (1995). Personal communication. Tukey, J. W., J. L. Ciminera and J. E Heyse (1985). Testing the statistical certainty of a response to increasing doses of a drug. Biometrics 41, 295-301. Waller, R. A. and D. B. Duncan (1969). A Bayes rule for symmetric multiple comparisons. J. Amer. Statist. Assoc. 64, 1484-1503; Corrig., J. Amer. Statist. Assoc. 67, 253-255. Welsch, R. E. (1972). A modification of the Newman-Keuls procedure for multiple comparisons. Working Paper 612-72, Sloan School of Management, M.I.T., Boston, MA. Welsch, R. E. (1977). Stepwise multiple comparison procedures. J. Amer. Statist. Assoc. 72, 566-575. Westfall, P. H. and S. S. Young (1989). p value adjustments for multiple tests in multivariate binomial models. J. Amer. Statist. Assoc. 84, 780-786. Westfall, P. H. and S. S. Young (1993). Resampling Based Multiple Testing. Wiley, New York. Williams, D. A. (1971). A test for differences between treatment means when several dose levels are compared with a zero dose control. Biometrics 27, 103-117. Williams, D. A. (1972). The comparison of several dose levels with a zero dose control. Biometrics 28, 519-531. Wright, S. P. (1992). Adjusted P-values for simultaneous inference. Biometrics 48, 1005-1014.

S. Ghosh and C. R. Rao, eds., Handbook of Statistics, Vol. 13 © 1996 Elsevier Science B.V. All rights reserved.

1N JL27

Nonparametric Methods in Design and Analysis of Experiments

Edgar Brunner and Madan L. Puri

1. Introduction and notations

1.1. Historical background The two main assumptions underlying the classical analysis of variance (ANOVA) models are the linear model and the normal distribution of the error term. One of the first attempts to relax these assumptions was made by Friedman (1937) where in a two-way layout with one observation per cell the observations are 'replaced' by their 'place numbers', called 'ranks' within the block factor. The next step was taken by Kruskal and Wallis (1952) in the one-way layout where the observed random variables are replaced by their ranks. Designs in practical data analysis however, are more complex than these two simple designs and the demand for the analysis of such designs without the restrictive assumptions of the ANOVA was one of the most burning problems in applied staffstics. The similarity of both the Friedman statistic and the Kruskal-Wallis statistic to their parametric counterparts gave raise to the hope that also nonparametric statistics for more complex designs would look similar to their parametric counterparts. One difficulty however is that the variance of a rank statistic depends on the alternative and the covariance matrix of the vector of rank means in the one-way layout has a rather difficult form (see, e.g., Puri, 1964) and it seems prohibitive to estimate the matrix for two- or higher-way layouts. Thus, the way for the development of nonparametric procedures in the two-way layout was determined by the need to circumvent this problem. Lemmer and Stoker (1967) assumed that all distribution functions in the design were identical and proposed statistics for main effects and interactions which had a similar form like the statistics for the two-way ANOVA with fixed effects. However it seems to be rather unrealistic to develop a statistic for the interaction, e.g., under the assumption that all distribution functions are identical. So it was natural to remove first the nuisance parameters from the model by subtracting consistent estimates of them from the data and then replacing the residuals by their ranks. This concept of 'ranking after alignment' (RAA) was introduced by Hodges and Lehmann (1962) and further developed by Sen (1968), Puri and Sen (1969, 1973), Sen and Puri (1977) and Adichie (1978), among others. 631

632

E. Brunner and M. L. Puri

For a comprehensive treatment of aligned rank tests in the context of time series analysis, the reader is referred to Hallin and Purl (1994) and the references cited therein. The main effect of one factor may also be removed by ranking within the levels of this factor, like in the Friedman test. The remaining nuisance parameters, the interactions, either must be excluded from the model or a 'joint hypothesis' should be tested, i.e., the interaction and the main effect be tested together. Both concepts were used by Koch (1969, 1970) in a split-plot and in a complex split-plot design and Mehra and Sen (1969) used the RAA-technique to develop a test for the interaction in a two-factor block design. Parallel to this development, first steps were made by Sen (1967) and by Koch and Sen (1968) to formulate hypotheses for treatment effects and to derive statistics in a nonparametric mixed model and the ideas of these papers were later applied by Koch (1969, 1970) in partially nested designs. All procedures using the RAA-techniques are clearly restricted to linear models and are not applicable to pure ordinal data where it is not reasonable to compute sums or differences of the observations. Moreover, only one of the main assumptions of ANOVA, the normality of the error distribution is relaxed with this approach. Although some hypotheses in the papers of Koch and Sen (1968) and Koch (1969, 1970) are formulated in a nonparametric setup, no unified theory for nonparametric procedures in two- or higher-way fixed models or in the mixed model was developed. Patel and Hoel (1973) seem to be the first to leave the floor of the linear model in a two-way-layout and defined a nonparametric interaction in this design by the difference of the probabilities underlying the two Wilcoxon-Mann-Whitney statistics in a 2 x 2 design. It seems to be the first time that in a two-way layout a nonparametric effect was defined and a consistent estimator of this effect was used as a basis for the test statistic rather than replacing observations by ranks. The ranks of the observations came out as a convenient tool to estimate the nonparametric effect. This concept was further developed for main effects and interactions in some fixed and mixed 2 x 2 designs by Hilgers (1979), Brunner and Neumann (1984, 1986a) for fixed and mixed 2 x 2 designs and by Boos and Brownie (1992) and by Brunner, Puri and Sun (1995) for fixed and mixed 2 x b designs. The next step in the development of the nonparametric mixed model was taken by Hollander, Pledger and Lin (1974) who considered the robustness of the WilcoxonMann-Whitney statistic in the paired two-sample design. Govindarajulu (1975b) derived the asymptotic distribution of the Wilcoxon rank sum in this design and indicated how to estimate the asymptotic variance of the statistic under the hypothesis. A more general result in the mixed model with one fixed factor, one random factor and an equal number of replications per cell was given by Brunner and Neumann (1982) where also random interactions were included into the model. The general form of the covariance matrix of the rank means and explicit estimators for the unknown variance terms under the hypothesis were given in this paper and the results were applied to different mixed models. In all the three aforementioned papers, only the ranks over all observations were used. During the same time, McKean and Hettmansperger (1976) and Hettmansperger and McKean (1977) developed asymptotically distribution free statistics in general fixed

Nonparametric methods in design and analysis of experiments

633

models based on minimizing Jaeckel's dispersion function (Jaeckel, 1972). However the proposed statistics are not pure rank statistics. For a description of this method, we refer to Hettmansperger and McKean (1983). Parallel to this development, the idea of the 'rank transform' (RT) was born (Conover and Iman, 1976, 1981) and some empirical results with this method were given (Lemmer, 1980). The simple technique to replace the observations in the respective parametric statistic by their ranks and then assume that the asymptotic distributions of both statistics are same, has been criticized by Fligner (1981), Brunner and Neumann (1986a), Blair, Sawilowski and Higgens (1987), Akritas (1990), Thompson and Ammann (1989, 1990), Akritas (1990, 1991, 1993) and Thompson (1991a). Analytic counter examples showing that RT-statistics may become degenerate under the hypothesis, have been given by Brunner and Neumann (1986a). Blair, Sawilowski and Higgens (1987) showed by a simulation study that RT-statistics may have undesirable properties. Hora and Conover (1984), Iman, Hora and Conover (1984) and Hora and Iman (1988) derived RT-statistics in the two-way layout without interactions and with equal number of replications where the assumption of no interactions was made for the formulation of the hypotheses as well as for the derivation of the asymptotic distributions of the statistics. Akritas (1990) showed that the RT was not 'valid' to test a main effect or the interaction in a linear model since it is a nonlinear transformation of the data. He also showed that the homoscedasticity of the error terms was not transferred to the ranks in general unless all distribution functions in the design are identical. Thus, the RT is valid in the two-way linear model with crossed fixed factors to test the joint hypothesis of no main effect and no interaction together. It is also valid in the two-way hierarchical design to test the hypothesis of no nested effect. Akritas (1991, 1993) derived similar results for repeated measurements models. Kepner and Robinson (1988) derived a rank test for the mixed model with one observation per cell. Thompson (1990) generalized the results of Brunner and Neumann (1982) and derived the asymptotic distribution of a linear rank statistic for independent vectors of equal fixed length and she applied the result to various balanced mixed models. The assumption of equal fixed length of the vectors was used in the proofs of the asymptotic normality of the statistics in both papers and no general theoretical result for the asymptotic normality of a linear rank statistic in an unbalanced mixed model was available. Akritas (1993) derived a rank test in a special unbalanced mixed model. A general result was derived by Brunner and Denker (1994) for vectors of unequal length and was applied to various unbalanced mixed models. It is rather astonishing that most of the theoretical results as well as the derivations of rank statistics were based on the assumption of the continuity of the underlying distribution functions and ties were excluded although some theoretical results (e.g., Conover, 1973; Lehmann, 1975; Behnen, 197_6) regarding ties were available in literature for linear rank statistics of independent observations. Koch and Sen (1968) and Koch (1970) recommended to brake ties by using the 'mid-rank-method'. This method was used by Boos and Brownie (1992) to formulate hypotheses and to derive estimators of the asymptotic variances of rank statistics using the U-statistic representation. Munzel (1994) generalized the results of Brunner and Denker (1994) to the

634

E. Brunner and M. L. Puri

case of ties and the same method was used by Brunner, Puri and Sun (1995) to derive the asymptotic normality of linear rank statistics in mixed models and consistent estimators of the asymptotic variances for fixed and mixed 2 × b designs. The problem to derive (pure) rank tests for interactions and main effects separately in two- or higher-way layouts remained open until Akritas and Arnold (1994) had the simple idea to formulate the hypotheses in a two-way repeated measurements model in terms of the distribution functions simply by replacing the vector of the expectations in the linear model by the vector of the distribution functions. They showed that the covariance matrix has a simple form under these hypotheses. It is easy to see that the hypotheses in the linear model are implied by theses hypotheses. The acceptance and interpretation of such hypotheses by applied researchers has to be seen in future. The idea to formulate the hypotheses in two- and higher-way layouts in terms of the distribution functions is investigated in the general model with fixed factors (Akritas et al., 1994) and in the general mixed model (Akritas and Brunner, 1995). The purpose of this paper is to provide a unified theory of rank tests in some aspects of the design and analysis of experiments, exploiting the ideas of Akritas and Arnold (1994) and Akritas and Brunner (1995). For lack of space, we mainly restrict ourselves to discussing the test procedures and the rationale underlying the hypotheses.

1.2. A i m o f the p a p e r

In this paper, we consider only pure rank statistics in factorial designs. On the one hand, such pure rank statistics are invariant under any strict monotone transformation of the data, on the other hand they are robust against outliers. Moreover, they are applicable to ordinal data such as scores in psychological tests, grading scales in order to describe the degree of the damage of plants or trees in ecological or environmental studies. With such data, ties must also be taken into account. Thus we shall not consider procedures which need sums or differences of the original data and which are therefore restricted to linear models. Our aim is to generalize the classical models of ANOVA in such a way that not only the assumption of normality of the error terms is relaxed but also the structure of the designs is introduced in a broader framework. In addition, the concept of treatment effects is redefined within this framework. We formulate hypotheses in a nonparametric setup for generalized treatment effects in various designs and derive nonparametric tests for these hypotheses in such a way that the common rank tests existing in literature will come out as special cases. Moreover, new procedures are presented within this unified approach. In order to identify the testing problem underlying the different rank procedures, the relations between the hypotheses in the general model and in the standard linear model are investigated. We do not assume the continuity of the underlying distribution functions so that data with ties can be evaluated with the procedures given in this paper. In the next section, we give some general notations which are used throughout the paper in order to avoid unnecessary repetitions. In Section 2, models with fixed factors are considered and the concept of nonparametric hypotheses introduced by Akritas and Arnold (1994) is explained. Also in this section, following Fligner and Policello (1981), Brunner and Neumann (1982, 1986a, b) and Brunner et al. (1995),

Nonparametricmethodsin designand analysisof experiments

635

rank procedures for heteroscedastic models (nonparametric Behrens-Fisher problem) for two-sample designs and for the stratified two-sample design (fixed-factor model) are discussed. In Section 3, the random-factor model is briefly discussed, and in Section 4, the mixed model is considered. All procedures are derived from the general Theorems given in Section 4.2. Rank procedures for heteroscedastic mixed models are considered separately for the two-sample design and the stratified two-sample designs, following the ideas in Brunner et al. (1995). Procedures for ordered alternatives, many-one and multiple comparisons follow from the general approach presented in the paper. However, they are not considered in this paper. For reasons of brevity and the editor's suggestions, the proofs of the results are either given very briefly or are omitted. They will appear somewhere else.

1.3. Notations used throughout the paper Distribution functions. by

The distribution function of a random variable Xi is denoted

= 1 [F?(x) +

(1.1)

where F ~ ( x ) = P ( X i H~(A). (2) In the linear model without interactions (2.10), the following holds:

HoQA I B)

H (A)

H (A I B)

2, b > 2) derived by Akritas (1990) and Thompson (1991a). For the 2 x 2-design and the 2 x b-design, we derive statistics based on ranks within the levels of factor B for the hypotheses H~ (A I B), H~ (AB) and H~° (A). In all cases, unequal sample sizes are allowed. The implications for the linear model (2.11) can easily be seen from Proposition 2.9. We also consider models without interaction where the number of replications per factor level combination (i, j) is n = 1 (complete designs) and either the number of levels of factor A or of factor B is large, i.e., a --+ oc and b fixed or vice versa. These are the so-called 'block designs' where the block effect is assumed to be fixed. In addition, incomplete balanced designs are briefly considered.

2.2.2. Models with interactions 2.2.2.1. A test for HoF(A ] B) in the ax b design. The two-factor design with crossed fixed factors A and B is symmetric in A and B. For the analysis of the simple factor effects, it suffices therefore to consider only the hypothesis HoF(A I B). The random variables X i j k ~ Fij(x), i = 1,..., a, j = 1,...,b and k = 1 , . . . , n i j , in this design are assumed to be independent. We will use the notation of Section 2.1.1 and estimate p = f H d F by ~ = f Lr d~" where H = N -1 ~i~=1 ~ = 1 nijFij,

~r = N-1 ~ia=l ~=1 ?zijfij ' ~ : (/~11,... ,Fab)'

and Fig is the empirical distribe the rank of X i j k

Xijl, • •., Xijnlj. Let Rijk among all the N = ~ia=1 Y'~=I nij observations and let -Rij. = R.J. = N j 1 ~ = I ~ k ~ l Rijk where Nj = ~ia=l nij. To derive a rank test for HoF(A I B): (Pa ® / b ) F = 0, we totic distribution of the estimator (Pa ® Ib)~ = (P~ ®/'b) f H bution function of the observations

ni-j1 ~k~l Rijk, and consider the asympd~'. It follows from

Proposition 2.2 that the estimator

(Pa @ Ib) / H d-~=

(Pll--~.l,''',Pab--~.b)'

= L

N

,- ab

is unbiased and consistent for (P~ ® Ib)p. Under Hff(A t B), the statistic is the same as b times the statistic for Hff(A) in the one-factor design based on the ranks over all observations Rijk since HoY(a I B) in a two-factor design is equivalent to HoF(A) in a one-factor design which is replicated b times. The variances of the statistics within each level of factor B are estimated separately by ~2 = [N2(Nj _ a)]-lS~ where

653

Nonparametric methods in design and analysis of experiments

~2 = ~ia_l 2klJl(lr~ijk __ ~ij.)2 (see Theorem 2.5). Then under HoF(A I B), the quadratic form a

(2.13)

j=l

Sj

i=1

has asymptotically a central x}-distribution with f = (a - 1)b. (For a derivation in the context of a linear model with equal sample sizes, see Akritas (1990).)

2.2.2.2. Tests for H ~ in 2 x b designs. H~' (A) and H ~ ( A B ) in the 2 x 2 design. The 2 x 2-design is - in a certain sense the natural extension of the one-factor two sample design to the case of two-factor designs. It shall be considered separately since in the 2 x 2-design the hypotheses for the main effects H ~ ( A ) and H~'(B) as well as for the interaction H ~ ( A B ) in the general model (2.12) are identical to the parametric hypotheses and in such cases, heteroscedastic distributions may be assumed. At the present state, this seems to be the only way to provide rank tests for heteroscedastic models. We consider the general model X~jk ~ Fij(x), i = 1,2, j = 1,2, k = 1 , . . . , nij, where the Xijk's are independent random variables. The linear model simplifies to //'11 : ~ @ 7- -~- 0 , P'12 : ~ - - 7- - - 0 , P21 : --~ -~- 7- - - 0 and #22 = - ~ - r + 0. Let wy = f Fly dFzj, j = 1,2. Then the hypothesis for the main effect A is formulated as H ~ (A): w A = (wl +w2)/2 = 1/2 and the hypothesis for the interaction is formulated as H~'(AB): w AB = wl - w2 = 0. The hypothesis H~'(/3) follows by the symmetry of the design. The generalized conditional mean w A = (Wl + w2)/2 is estimated by @A = (wl + ~ 2 ) / 2 where w J = (-R2~. - ( n 2 j + 1 ) / 2 ) / n l j --B

_

(2.14)

B

and R2j. = n2j 1 ~k:J1 J~ijk as in the one-factor design. Here, RiBk is the rank of Xijk among all the observations within level j of factor B. In the same way, w AB is estimated by ~AB = wl --w2- Let T A = ~ A _ 1/2, T AB = ~ A B and Nj = nlj+n2j, j = 1,2, and let ,~(iJ)~ijkbe the rank of X~jk within level i of factor A and within level j of factor /3. The variance estimators 8~ for the variances of the statistics T A under H ~ ( A ) and for T AB under H~(A/3) are easily derived from Theorem 2.8 for ~2 = ~ = 1 ^2aj where ^

J --

2 ~ nljn2j

ni~ ,=

nij

--

/~B k _ p(ij) _ ~ B

1

~ijk

=

ij. -~ ( n i j -~

1)/2

2

(2.15)

Then under H~'(A): w A = 1/2, the statistic

~

j=l nlj

"

2

(2.16)

654

E. Brunnerand M. L. Puri

has asymptotically (min~,j(n~j) --+ oo) a standard normal distribution, Under H y ( A B ) : w AB = 0, the statistic (2.17) has asymptotically (min 1 as b --+ 00. Combining these results, it follows under H~(A) that the statistic

/{-A b2(~Z1) ~ (~i" N-kl) 2 --

$2

~=1

2

(2.27)

has asymptotically a central x}-distribution with f = a - 1 d.f. We note that the same statistic has been given by Brunner and Neumann (1982) and by Kepner and Robinson (1988) for the case where the factor B is random. Conover and Iman (1976) proposed a RT-version of the ANOVA F-statistic for this design, namely FR =

b(b -

1) ia=l (Ri-

(N + 1)/2) 2

(f iJ - Ri. - R-j

(2.28)

( N -~ 1)/2) 2.

Iman et al. (1984) showed for the case of no ties that (a - 1)FR has a central Xa-12 distribution under H~(A ] B). They showed also by a simulation study that the power of the FR-statistic is higher than the power of the Friedman statistic given in (2.23) which is based on ranks within the levels of factor/3. There is only a slight loss in power when a normal distribution is assumed. If the underlying distribution function is log-normal or Cauchy, this simulation study showed a considerable gain in power compared with the results of the ANOVA. Hora and Iman (1988) derived asymptotic relative efficiencies of the RT-procedure for this design. For details, we refer to these articles. _

2.3. Special procedures for linear models In this paper, we are considering only pure rank statistics which have the property of being invariant under monotone transformations of the data and which may also be applied to pure ordinal data. We do not like to restrict our considerations to linear models since in many experiments, shift effects are not realistic. The nonparametric analysis of multi-factor experiments in literature is mainly related to linear models. Based on the idea of Hodges and Lehmann (1962), first to estimate the nuisance parameters and then rank the residuals, Sen (1968) developed a class of aligned rank order tests in two-way layouts without interactions and Puri and Sen (1969, 1973), Sen and Puri (1977) and Adichie (1978) developed aligned rank tests for general linear hypotheses. For a unified description of this method, see Puff and Sen (1985). Aligned rank tests for linear models with autocorrelated errors are considered in Hallin and Puri (1994). Another approach is based on minimizing Jaeckel's dispersion function (see Jaeckel, 1972) and has been developed by McKean and Hettmansperger (1976), and Hettmansperger and McKean (1977). Note that the statistics given there are not pure rank statistics. For a further approach and an excellent description and comparison of these methods, see Hettmansperger and McKean (1983).

660

E. Brunner and M. L. Puri

3. Random models

3.1. One-factor random models 3.1.1. Models and hypotheses L i n e a r model. The one-way layout linear model for random effects is Xij=#+Ai+eij,

i=l,...,a;

j=l,...,ni,

(3.1)

where # is the overall mean, Ai are i.i.d. N(0, a 2) random variables and ei/ are i.i.d. N ( 0 , cr2) random variables independent of Ai, i = 1 , . . . , a and j = 1 , . . . ,ni. By assumption, Var(Xll) = a 2 + a~, Cov(Xll, X21) = 0 and Cov(Xll, X12) = cry. General model. Let X = ( X [ , . . . , X ~ ) ' where X i = (Nil,.. . , Xin~)' are independent random vectors with common distribution functions G~(x) and identical marginal distribution functions F(x), i = 1 , . . . , a , j = 1 , . . . , n i , and N = a ~ i = 1 n~. It is reasonable to assume that the random variables within each vector X i are interchangeable. Thus, (Xil, Xi2) have identical bivariate marginal distribution functions F[*(x,g) and it is assumed that F**(x,g) does not depend on i, i.e., F**(x,y) : F**(x,y), / = 1 , . . . , a . The bivariate marginal distribution function of (Xil,Xel), i ¢ i p = 1 , . . . , a , is denoted by F*(x,y) and by independence, F*(x, g) = F(x)F(g). Thus, the covariance matrices of X i and X are Si = C o v ( X i ) = (a 2 - c)I~ + eJ~ and a

s ~ = Coy(X) = O s~ i=1

respectively, where cr2 = V a r ( X u ) and c = C o v ( X l l , X12 ). The common distribution function Gi(.) is called 'compound symmetric' and the covariance matrix of a random vector with a 'compound symmetric' common distribution function has the structure of the matrix Si. Hypothesis. In the linear model, the hypothesis of no effect of the random factor A is usually stated as H0:~7 2 = O. In the general model, by assumption, the random variables Xij are interchangeable within each level i of factor A. Intuitively, the hypothesis of no effect of the random factor A is equivalent to the condition that all random variables Xij, i -- 1 , . . . , a, j = 1 , . . . , hi, are interchangeable, and thus in turn, they have the same bivariate marginal distribution functions and the hypothesis of no random effect in the general model is formulated as H~: F**(x, y) = F*(z, y) = F(z)F(y). It is easy to see that H~ ~ H0: cr2A = 0 if the linear model (3.1) is assumed. Note that, by assumption, PaF = 0 since F = Fla, and that PaP = 0 where p = f F dF = l la.

661

Nonparametric methods in design and analysis of experiments

3.2. Testfor H~) We proceed as in the linear model and consider the asymptotic distribution of v ~ = f H d~' under the assumption that factor A is random. Let Y = ( g l 1 , • • - , Y , a n a )l where Y~j = H(Xij) and let

\ i=1

Tl'i

~/

~r2 = Var(Y11) = ] H 2 d H

- 1/4

(= 1/12 if F is continuous).

Then by Theorem 2.4, the statistic v/-NY. = v / - N f H d~' has asymptotically a multivariate normal distribution with mean 0 and covariance matrix Va = N~r~diag{nT1,...,n21}. Let Wa = V ~ - 1 [ I ~ - J~v~-acr 2] be the contrast matrix defined in (1.5) and let Wa

=

V~I[I~-

j~13~]

where FVa =

NS~/diag{n~-I,. • ", n a-1 } and ^cT~ = ( N - a) -1 }--2~i=1 ~ ~ j =~,l (Rij - R i . ) 2. In the same way as for the one-factor fixed design in Section 2.1.3, it follows that the quadratic form

o(

N-a E n i -~. QH = Ei~=l Ej"-=l (Rij - ~.)a ~=1

N+I

2

)2

(3.2)

has asymptotically a central x~-distribution with f = a - 1 under H(~. The above derivation of the asymptotic distribution of QH under H~ in the general model is simple. Under the alternative H;': F*(x,y) 7~ F(x)F(y) however, additional assumptions or restrictions of the model are necessary to derive the asymptotic distribution of QH if a is fixed and ni --+ ee. In literature, nonparametric procedures for the one-way random effects model have been given for the linear model X~j = # +O~ + e~j, i = 1,..., a, j = 1,..., ni, where # is the overall mean, Y/ and eij are mutually independent random variables and the constant 0 ~> 0 represents the degree of the random treatment effect. The hypothesis of 'no random effect' is expressed in this model as H~: 0 = 0. Greenberg (1964) considered the case where Y~ is normally distributed and eij has an arbitrary continuous distribution. Govindarajulu and Deshpande (1972) and Govindarajulu (1975a) relaxed the assumption that Y{ is normally distributed and derived locally most powerful tests for H~. Shetty and Govindarajulu (1988) studied the asymptotic distribution of these statistics under local alternatives and the power properties were later studied by Clemmens and Govindarajulu (1990). Shirahata (1985) derived the asymptotic distribution of the Kruskal-Wallis statistic in the random effects model when a is fixed and ni ~ oc. The case of ni fixed and a --+ oo was earlier considered by Shirahata (1982) where the statistic is based on some measures of intraclass correlation.

662

E. Brunner and M. L. Puri

A two-way layout with random effects without interaction Xij = # + Y~ + OZj + eij, i = 1,... , n --+ oe, j = 1,... ,b, is considered by Shirahata (1985) where the asymptotic distribution of the Friedman statistic is considered. The nonparametric treatment of the random model needs further research to develop a unified theory. On the one hand, procedures for two- and higher-way layouts including interaction should be investigated for linear models. On the other hand, it is necessary to examine to what extent the assumption of the additivity of the random effects can be relaxed.

4. Mixed models

4.1. Background and examples In a mixed model, randomly chosen subjects are observed repeatedly under the same or under different treatments. Such designs occur in many biological experiments and medical or psychological studies. The subjects are the levels of the random factor(s) and the subject effects are regarded as unobservable random variables. Here, we shall consider four different designs. (1) Two-factor mixed models: (a) random factor and fixed factor crossed, (b) random factor nested under the fixed factor. (2) Three-factor mixed models with two factors fixed: (a) repeated measurements on one fixed factor, the random factor is crossed with the fixed factor B and is nested under the fixed factor A, (b) repeated measurements on both crossed fixed factors. The hypotheses ill these designs are formulated in the same way as for the fixed models. The statistics are based on consistent estimators for the generalized means. These estimators are vectors the components of which are linear rank statistics. We distinguish two models: (I) The repeated measurements model where in the case of 'no treatment effect' the common distribution function of the observed random variables on subject i is invariant under the numbering of the treatment levels. This means that the random variables within one subject are interchangeable and the common distribution function is compound symmetric. Note that in the linear mixed model with independent random effects, the compound symmetry of the common distribution functions of the subjects is preserved under the alternative. In general models however (to be stated in the following sections), it seems to be unrealistic to assume that compound symmetry is preserved if treatment effects are present. (II) The multivariate model allows arbitrary dependencies between the observed random variables within one subject. This is typically the case with longitudinal data or inhomogeneous materials. A multivariate model is also assumed if a treatment effect is present in a repeated measurements model. We do not consider special patterns of dependencies such as an autocorrelation structure, for example.

Nonparametric methods in design and analysis of experiments

663

Example for (1, a)/compound symmetry. A cell culture, some tissue or a blood sample is split into j -- 1 , . . . , b homogeneous parts and each part receives one of the b levels of the treatment. This experiment is repeated for i = 1 , . . . , n independent randomly chosen subjects (cultures, tissues, blood samples etc.). For this design, we will also consider the case where mij ~ 1 repeated observations are taken for each subject i and for each treatment j. Example for (1, a)/multivariate model. Growth curves of subjects or any observations taken at different (closely distant) timepoints such that observations taken at close timepoints are 'more dependent' than observations from timepoints which are far spaced. Example for (1, b). The level i of the fixed treatment is applied to ni subjects which are observed repeatedly under the same treatment. Example for (2, a)/compound symmetry. In the experiment described in example (1, a), there exist i : 1 , . . . , a different groups of subjects where all the homogeneous parts of the subjects in group i are treated with level i of the fixed factor A and the homogeneous part j of a subject within group i is treated with treatment j. The subjects are nested under factor A while they are crossed with factor B. Therefore, this design is sometimes called 'partially nested'. Example for (2, a)/multivariate model. Two groups of subjects are given different treatments (i ---- 1,2) and the outcome X~jk is observed at b fixed timepoints j -1 , . . . , b for the subjects k = 1 , . . . , n~ where the observations for one subject may be arbitrary dependent. Example for (2, b)/multivariate model. Each subject receives all i : 1. . . . , a levels of the fixed treatment A and the outcome is observed at j : 1 , . . . , b fixed fimepoints for each subject. Some historical remarks. Nonparametric hypotheses and tests for the mixed model have already been considered by Sen (1967), Koch and Sen (1968) and by Koch (1969, 1970). In the latter article, a complex split-plot design is considered and different types of ranks are given to aligned and original observations and the asymptotic distributions of univariate and multivariate rank statistics are given. Mainly joint hypotheses in the linear model are considered, i.e., main effects and certain interactions are tested together. However, no unified theory for the derivation of rank tests in mixed models is presented. Moreover, some of the statistics are not pure rank statistics rather than aligned rank statistics and therefore they are restricted to linear models. For the simple mixed model with two treatments for paired observations, rank tests using overall ranks on the original observations have been considered by Hollander et al. (1974) and Govindarajulu (1975b). In the former paper, the robustness of the Wilcoxon-Mann-Whitney statistic with respect to deviation from independence is studied. In the latter paper, a rank statistic is derived and it is indicated how the unknown variance may be estimated. Lam and Longnecker (1983) derived an estimator

664

E. Brunner and M. L. Puri

for the unknown variance based on Spearman's rank correlation. Brunner and Neumann (1982, 1984, 1986a, b) derived the asymptotic distribution of rank statistics in two-factor mixed models with an equal number of replications and applied the results to different mixed models. The asymptotic variances and covariance matrices of the statistics were estimated using ranks over all the observations and ranks within the treatments. Rank tests for the mixed model with m = 1 replication were considered by Kepner and Robinson (1988). Thompson (1990) considered the asymptotic distribution of linear rank statistics in mixed models with an equal number of replications and applied the results to different balanced repeated measurements designs (Thompson and Ammann, 1989, 1990; Thompson, 1991a) for joint hypotheses where main effects and interactions are tested together. The asymptotic distribution of linear rank statistics for vectors of different lengths was derived by Brunner and Denker (1994) and the results were applied to rank tests for nonparametric hypotheses in unbalanced mixed models. Akritas (1991, 1993) considered rank tests for joint hypotheses in the linear model for balanced repeated measurements designs and in an unbalanced design. Nonparametric hypotheses in mixed models based on the generalized mean vectors have been considered by Brunner and Neumann (1986a, b) for paired observations and in 2 × 2 designs and are further developed for 2 x b designs by Boos and Brownie (1992) and Brunner et al. (1995). We note that in the last two papers no continuity of the distribution functions has been assumed. The results of Brunner and Denker (1994) which were derived under the assumption of continuous distribution functions are generalized to the case of ties by Munzel (1994) where the score function is assumed to have a bounded second derivative. For simplicity, we consider here only the Wilcoxon scores. In mixed models with two factors fixed, nonparametric hypotheses based on the distribution functions have been introduced by Akritas and Arnold (1994) and were used to provide a unified approach to rank test for mixed models by Akritas and Brunner (1995). In the next sections we discuss rank procedures for these hypotheses.

4.2. General asymptotic results Here we give the general asymptotic results for mixed models and we will show how to apply these results to the different designs. The general mixed model can be formulated by independent random vectors Ilk

= ( X i /l k , . . . ,Xick)/ /,

i = 1,...,r andk = 1,...,hi,

(4.1)

where X i j k = ( X i j k l , . . . , Xijlemijk )1, j = 1 , . . . , c, and Xijks ~ F i j , k = 1 , . . . , ni and s = 1 , . . . , m i j k . The row-factor with r levels is applied to all c parts of one subject and the subjects are nested under this factor. For each level i, there are ni independent subjects (replications). If more than one factor is applied to the subjects, then the r levels may be regarded as a lexicographic ordering of all factor level combinations of the factors. The column-factor with e levels is applied to all subjects. However, the level j of this factor is applied only to the jth part of the subject (which is split into e homogeneous parts). This factor is crossed with the subjects. If more than one factor is applied to

Nonparametric methods in design and analysis of experiments

665

each subject, then the c levels may be regarded as a lexicographic ordering of all factor level combinations of the factors.

Examples In the matched pairs design, we have n independent random vectors X i = ( X i l , Xi2) t where X~j ~ F j , i = 1 , . . . , n and j = 1,2. This design is derived from the general mixed model (4.1) by letting r = 1, n l = n, c = 2 and m i j k = 1. The one-factor hierarchical design is a special case of (4.1) if r = a, c = 1, m i j k = m i k and X i j k s = X i k s ~ F~, i = 1 , . . . , a , k = 1 , . . . , n i and s = 1 , . . . , m ~ k . For the onefactor block design with b treatment levels and with n blocks, we choose r = 1, c = b, rnijk = 1, Xijk~ = X j k ~ Fj, j = 1 , . . . , b and k = 1 , . . . , n . The a x b split-plot design is derived from (4.1) by letting r = a, c = b, m i j k = 1, X i j k s = X i j k ~ Fij, i = 1 , . . . , a, j = 1 , . . . , b and k = 1 , . . . , ni. For the two-factor block design with n blocks and where factor A has a levels and factor B has b levels, we choose r = 1, nl = n, c = ab and rnijk = 1. The index j is split into u = 1 , . . . , a and v = 1 , . . . , b. Then X ~ k ~ Fur, k = 1 , . . . , n . To state the asymptotic results, we introduce some notations. The vector of the distribution functions is denoted by F = ( F l a , . . . , Fl~, • • •, F~l, • • •, F ~ ) ~ and we define

= (?11,""" ,?rc) ! where ? i j = ni-'Ek=l'~' ff'ijk and F i j k ( x ) = rn~j~ E~=l'~'Jk c ( x Xi/k~) is the empirical distribution function within the cell (i, j, k). Let c

ni

i=1 j = l k = l

and

i=1 j = l k = l

where N = • i

2 j ~ k m i j k . The vector of the generalized means p = f H d F is

estimated by ~ = f H d F . ! / PROPOSITION 4.1. Let X i k = ( S i l Ik , . . . , S i c k) be independent random vectors as defined in (4.1) and assume that the number of replications m i j k f o r a subject k

is uniformly bounded, i.e., m i j k ko > o and I ~ 1 / > ko > o, ~ = 1 , . . . ,~. If (1) m~jk ~< M < o~, (2) 0 < ) ~ o ~ < n i / N < ~ 1 - )~o < l a n d (3) minn~ -+ c~, i = 1 , . . . ,r, then under the hypothesis HoF(C): C F = O, (i) the statistic x / N C ~ = v/--NC f ~r d F has asymptotically a multivariate normal distribution with mean 0 and covariance matrix C V C ~, (ii) the quadratic form Q(C) = N ~ t C ' [ C V C ' ] - C ~ has asymptotically a central x}-distribution with f = rank(C) and where [ C V C ' ] - denotes a generalized inverse of [CVC~]. (iii) If C is offulI row rank, then Q(C) = N ~ ' C ' [ C V C ' ] - 1 C ~ has asymptotically a central X2f-distribution with f = rank(C) where V is given in Theorem 4.3.

668

E. Brunner and M. L. Puri

(iv) Let W = V -1

[I-JV-1/l'V -11] and let W

= ~r-I

[i__j~r-1/lt~.r-ll]

where ~r is given in Theorem 4.3. Then ~ ) ( W ) -= N ~ ' ~ r ~ has asymptotically a central x}-distribution with f = rank(W). The results for the heteroscedastic models for c = 2 treatments are given separately for the different models and hypotheses. In the next section, the general results given here will be applied to the two-factor mixed model with one factor fixed.

4.3. Two-factor mixed models 4.3.1. Cross-classified designs 4.3.1.1. Models and hypotheses Models. In a cross-classified mixed model, the random variables X~jk are observed on the ith randomly chosen subject (or block), i = 1 , . . . , n which is repeatedly observed (or measured) under treatment j = 1 , . . . , b and k = 1 , . . . , mij repeated observations are made on the same subject i under treatment j. In the classical linear model theory, this is described as Xijk =

pj -[- Ai + Wij + e~jk

(4.5)

where #j = E(X~j~), the Ai's are i.i.d, random variables with E(A1) = 0 and Var(Al) = a~. Wij are i.i.d, random variables independent of Ai with E(W11) = 0 and Var(Wll) = (r2B; e~jk are i.i.d. N(0, a 2) random variables independent of Ai and Wij, i = 1 , . . . , n, j = 1 , . . . , b and k = 1 , . . . , mij. (For a discussion of different assumptions on Ai and Wij, see Hocking (1973).) Let X i = ( X m , . . . , X~bv~b) ~ be the vector of observations for block i. Then b

Cov(X

) :

:

G j=l

-

(4.6)

where crx2 = V a r ( X m ) , e~ = Cov(Xlll,Xl21), e** = C o v ( X l l l , X l l 2 ) and M~ =

2 =1 Tt~ij

This is the usual linear block (or repeated measurements) model which is appropriate if a subject is split into b homogeneous parts and each part is observed repeatedly mij times. In a multivariate model where, e.g., the observations are taken at different time points (not necessarily equidistant), ~7i is an arbitrary positive definite covariance matrix. We note that the terminology is not unique in literature for these two models. We will use the terms repeated measurements design if 27~ has the compound symmetry form given in (4.6) and multivariate design if this is not the case. In this setup, the random variables Xijk and Xi,jk, are identically distributed according to a distribution function Fj (x), j :- 1 , . . . , b, and they are assumed to be independent for i ¢ i ~ but they may be dependent for / = i' since they are observed on the same (random) subject and the random variables Xij~ ~ Fj(x) and

Nonparametric methods in design and analysis of experiments

669

Xij,k, ~ Fj, (x) may also be dependent. In a general model, we need also the bivariate common distribution functions of two random variables within treatment j (denoted by F~* (x, y)) and of two random variables between two treatments j and j ' (denoted by Ffj, (x, y)). Thus, the general two-factor mixed model can be described by independent random vectors X i = ( X ~l , . . . , X ~b) ' , i = 1 , . . . , n, with common distribution functions G i ( x ) ,

x~j=(x~j,,...,x~j,~,~)'

x~-k ~Fj(x),

(Xijk,Xijk,)' ~ F~*(x,y),

i= l,...,n;

(Xijl,Xij,

i = 1,...,T/,; j ~ £ j ' = 1 , . . . , b .

l) ! e"a f ; y ( x , y ) ,

(4.7)

i = 1,...,m k = 1 , . . . , , ~ , k ¢ k' = l , . . . , m i j ,

It is reasonable to assume in this general model the compound symmetry if a subject is split into homogeneous parts and if there is no treatment effect. This means that the parts of each subject are 'interchangeable' under the hypothesis. However under a treatment effect, the compound symmetry structure may not be preserved in the general model. The property of interchangeable parts of the subjects is reflected by the interchangeability of the random variables Xij~ and Xij, k, for j 7~ j ' = 1 , . . . , b and Vk, k' under the hypothesis of no treatment effect. The random variables Xijk and X~jk,, k, k' = 1 , . . . , mij, are always interchangeable since they describe replications of the same experiment under the same treatment. Thus it follows for this model that Var(Xljl) = crx, 2 j = ~rx, 2 Fj** = F**, and F~j, * = F*, j , j ' = 1 , . . . , b . For the multivariate model, no special assumptions on the bivariate distribution functions are made. Treatment effects in the general model (4.7) are described by the generalized means pj = f H d F j , j = 1 , . . . , b, where H = N -1 2 ~ = 1 N j F j and N = 2~=1JVj,

n

Nj = E i = l mi~. Hypotheses. We are mainly interested in analysing the fixed treatment effect which is defined for the linear model as in the one-factor fixed model. The hypothesis of no treatment effect is written as H~: Pbl~ = 0 where /z = ( # l , - . . , # b ) ' . In the general model (4.7), we consider two hypotheses. For the case of equal cell frequencies (mij =----m), the common distribution function Gi (xi) = G ( x i ) is assumed to be independent of i and the hypothesis is formulated as H~: G(x~) = G(Tr(xi)) where 7r(x~) = ( x ~ l , . . . , x ~ b ) ' and 7rl,...,Trb is any permutation of the first b positive integers. 'No treatment effect' means that the outcome of the experiment is independent of the numbering of the treatments. This hypothesis is used for small samples where the permutation distribution of the test statistic is computed. The other hypothesis//oF: P b F = 0 is the same as in the one-factor fixed model and is only related to the one-dimensional marginal distributions. Therefore, this hypothesis is appropriate for the compound symmetry model as well as for the multivariate model. The relations between the hypotheses are stated in the next Proposition. PROPOSITION 4.6. (1) In the general two-factor mixed model (4.7), H~ ~ H0F. (2) In the linear two-factor mixed model (4.5), H~ ¢:> H0F ¢=> H~.

E. Brunner and M. L. Puri

670

PROOF. (1) is evident and (2) follows by the additivity, the independence and the identical distribution functions of the random variables in (4.5). [] For b ~> 2 treatments, we derive tests for H ~ in the compound symmetry model as well as in the multivariate model. For the case of b = 2 treatments, which is the most important one for applications, tests for the nonparametric hypotheses H~, H0F and H~: f F l dF2 = 1/2 are given where an unequal number mij of replications is allowed for H0F and H~. We need only the assumption of the independence of the vectors X i = (X~I , X~2)'. The so-called 'matched-pairs-design' (ra = 1) which is a special case of this model, is considered separately.

4.3.1.2. Tests for b >~ 2 samples Notations. The statistics are based on a consistent estimator of the vector of the generalized means p = f H d F where F = (F1,... ,Fb)', H = N -1 ~ = 1 NjFj n

and N = ~ = 1 N j

= 2~=1 ~ i = l mij. The vector p is estimated by an unweighted

mean of the cell means. Let ~' = ( F 1 , . . . , FD)' and

Fj(x) = 1

1

n i:l

e(x

Xijk)

mij k=l

1

1 i=l j = l k : l

j=l h

Then ~ = ( P l , . . . ,P-b)' = f H d F is consistent for p (see Proposition 4.1). The estimators for the components ~j are computed from the ranks Rijk of Xijk among all the N random variables

~-j =

~r d_~j = ~ i=l

*J k=l ~

1 (~ N "J'

~) "

Rijk -(4.8)

. . , R ~b.),i = 1,.. ,n, the vectors of the rank means for Denote by R{ . . (Rir, ' subject i where Rij. = mij 1 ~rffi] Rijk. Let further -~. = n -I ~iL1 ~i denote the unweighted mean of the vectors/74. It follows from Theorem 4.2 that the statistics x/n f H d ( F - F ) and ~ f H d ( ~ ' F ) = v/n(Y - p) are asymptotically equivalent. Moreover, under HoF: C F = O, it follows that v ~ C ~ and v/-~CY, are asymptotically equivalent. Here Y. = /'b--1~i%1 gi is the mean of Y~ = (Yfl.,... ,Yib.)' where Yij. = mi~ 1 ~km~] Y/jk and l'~jk = H ( X i j k ) . Note that the result of Theorem 4.2 remains true if the statistic

671

Nonparametric methods in design and analysis of experiments

is multiplied by ~ instead of v/-N. Let Si = Cov(Y~). Then V --- C o v ( v ' ~ Y.) = n - 1 ~ i ~ 1 ,-qi since the Y / a r e independent random vectors. A consistent estimate

1 Vn- N2(n_ 1) ~i=l (g.4 -

-~.) (R~ - R.)'

(4.9)

for V follows directly from Theorem 4.3.

Statistic for//oF:

P b F = 0; multivariate model. Tests for this model have been considered by Thompson (1991a) and Akritas and Arnold (1994). In both these papers, ties are excluded. Thompson derived a statistic using a generalized inverse while Akritas and Arnold used a contrast matrix of full row rank and the quadratic form for the statistic is written in terms of an inverse containing the contrast matrix. In what follows, these results are generalized to an unequal number of replications m i j per treatment j and block i. Moreover, ties are allowed. We choose the contrast matrix W = V - I [ I D - J b V - 1 / l ~ V - 1 1 b ] and we note that W F = 0 iff P b F = 0 and that W V W = W . The statistic v ' ~ W ( ~ - p) is asymptotically equivalent to v ' ~ W Y. under //oF: P b F = O. Let W = V~q[Ib -- Jb~r£-l/l~bV£-llb] where Vn is given in (4.9). Then Q ( W ) = n(W~)'(WVW)-(W~) = n ~' W p. Denote the (i,j)-element of ~--1 by gij, i, j = 1 , . . . , b, and let s.J Ei=Ib 8 i j and ~'.. -- ~b=1 ~'.j. Then it follows from Theorem 4.5 that the statistic =

A

[k b

(B) =

l(j=~l

)2]

(4.10)

i=1 j=l has asymptotically (n -4 oo) a central x}-distribution with f = r a n k ( W ) = b - 1 under HoE: W F = 0 which is equivalent to P b F = 0. (Note that QnM(t3) has the RT-property with respect to a parametric statistic for a repeated measurements model with an unspecified structure of the covariance matrix V.)

Statistic for//oF: P b F = 0; compound symmetry model. We have only to apply Theorem 4.4 to the special design considered in this subsection by letting r = 1, ni = n, rnijk = rnij and c = b. Under H0F, it follows from (4.4) and Theorem 4.4 that n V = Cov(v/-n Y.) = D + C*Jb where D = diag{7-1,..., %}, "rj = n -1 ~ i = l 7-iJ and "rij = m ~ 1 (a 2 + (mij - 1)c**) - c*. The estimators for rj given in Theorem 4.4 n A simplify to ?j = n -~ ~ i = 1 rij where F

^ | 1 "rij = mij

[

M ~ ( b - l) M~ -- E t =b l ?Tilt 2

+

b

b rnij

= E

E

j = l k=l

-

,

-- E

j=l

b -

=

j=l

E. Brunner and M. L. Puri

672

Let n

/~.j.

b

1ERi

j.

~---

and /~

1 --

b

j~l/~.j. ~

;"

'

Then it follows from Theorem 4.5 that the quadratic form b

1 (~.j. _ _~)z

(4.11)

j~l

has asymptotically a central x}-distribution with f =- b - 1 under H0~. In case of equal cell frequencies mij -= m,

l ) ~=1 t = l

-cj =_ ~- = n ( b -

]~.j. =-R.j.,

f~ = nmb + 1 2

(4.12)

and Q c. s (B) given in (4.11) simplifies to ,~-Z,~ n 2 ( b - -1)

CS Q.

(B) = E~=, E j = , (

~b ( ~-, -~.j. ~J. - ~ . . ) 2 =

nmb + 1 2

)2

(4.13)

which has been given by Brunner and Neumann (1982). A special case of this, a model without random interaction and m = 1 has been considered by Kepner and Robinson (1988) for the hypothesis H~ which implies H0F. Two consistent estimators for the unknown variances ~-j (_= ~- under H0F) are given there, namely

and

1

~2

7-n'2 :

(n--

l)(b--

1)i=1 j=l

The estimator z,~,1^2 is identical to the estimator used in (4.12) when specialized to m = 1. The second estimator ~-n,2^2has the RT-property with regard to the variance estimator of the ANOVA F-test.

Nonparametric methods in design and analysis of experiments

673

Table 4.1 Ranks and rank means of the observations 'ratio' Pair

Type of cell N

MIT

MIC

S

30

1

17

2 3 4 5 6 7 8

25.5 6 12 11 31 20 22

29 8.5 18 32 7 21 3

27.5 10 13 27 5 24 1

14.5 19 8.5 16 23 14.5 25.5 2

Means

18.06

15.31

17.25

15.38

4

EXAMPLE 2. In this example, we re-analyse the data given by Koch (1969), Example 1. For the description o f the experiment and the data set, we refer to this article. The two factors 'Pair' (random, with n = 8 levels) and 'Cell' (fixed, with b = 4 levels: N, MIT, MIC, and S) are crossed and mij = 1 replication is observed for the variable 'ratio'. The ranks /~ij of the observations and the means are given in Table 4.1. For the multivariate model, the estimated covariance matrix is 67.89 - 1 1 . 8 8 - 1 5 . 1 3 ~_

1

1024

•

128.64 •

1.26'~

77.91 66.40 / 129.00 63.96 / 57.41 /

and the statistic Q~M (B) given in (4.10) is QnM (B) = 1.33 (19 = 0.722), when compared with the X32-distribution. The inspection of the estimated covariance matrix ~'n recommends the multivariate model rather than the compound symmetry model for this experiment. Because the sample size is small, ( n - b + 1)QM(13)/[(b - 1 ) ( n - - 1)] is compared with the/W-distribution with f l = b - 1 = 3 and f2 = n - b + 1 = 5 resulting a p-value of p = 0.8138• The small sample distribution is motivated by the distribution o f Hotelling's T 2 under the assumption o f multivariate normality where the hypothesis H ~ : #1 . . . . . #b is tested. Since the statistic Q M ( B ) has the RTproperty, the results given here can be computed by an appropriate statistical software package.

Tests for b = 2 samples S t a t i s t i c s f o r HoF a n d H i . Here we consider the special case of b = 2 treatments.

In this case, explicit statistics can be given for the hypotheses H0F and H~. For b = 2 samples, the hypotheses HoF and H ~ are formulated as //oF: Fl = F2 and H i : p = f F l dF2 = 1/2. Therefore, a consistent estimator ~ for p can be

674

E. Brunner and M. L. Puri

written as a linear rank statistic which does not require the inverse of a covariance matrix. Let Xi = (X~I, X~2)', i = 1 , . . . , n , be independent random vectors, Xij = (Xijl,..., Xij,~j)', j = 1,2, and assume that Xijk ~ Fj (x), i = 1 , . . . , n, k = 1 , . . . , m i j , j ---- 1,2. Let Fj(x) = Nj- 1 ~ i =n1 ~ km~j = l c(x Xijk), j = 1,2, n where Nj = ~ i = l mij and let N = N1 + N2. Then an asymptotically unbiased and consistent estimator of p is ~.~f~ldff2

1

(R.z.

~ N2+I,

where R.2. = ~ i = '~l ~ kmi2 = l R~;k, and Rijk is the rank of Xijk among all N observations. Let s 2N,0 denote the variance of N1N2~/N under HE: F1 = F2 and let

~2 n ~ [NI(Ri2. - mi2-R.2.) - N 2 ( R i l . - m i l R . 1 . ) ] 2 N,0- N 4 ( ~ _ 1) i=1 where R.j. = N~-1R.v, j = 1,2. Then under H f , ~'2N,0 is a consistent estimator of s 2N,0 in the sense that E[SN,O/SN, ^2 2 0 1]2 -+ 0 and the statistic

TnF- N'gN~N1N2(~ _

1/2) = R.2. -

N'~N,oN2(N + 1)/2

(4.14)

has asymptotically (n -+ oo) a standard normal distribution. For small samples, the distribution of Tff may be approximated by the central t f-distribution with f = n - 1. For testing the hypothesis Hg: p = 1/2 in the heteroscedastic case, the estimator ~-2N,O is replaced by

~ ' 2 _ X2( n - 1) ~ (1~i2' - f~22!) - ( R i b - jr~)) i=1 --frti2 ( ~'2"

N 21+2 ~] -~ Tgbil (R. 1-

N1 2+ 21 ) ]

r~(J)k and ~ijk r~(J) is the rank of X~jk among all the where R}j! = ~k~=~ll~ij under treatment j. Under H g, the statistic

Tpn _ N1N2N.~N(~ -

1/2) --- R.2. -

N2(NN~N+

1)/2

Nj

observations

(4.15)

has asymptotically (n --+ oo) a standard normal distribution. For small samples, the distribution of Tg may be approximated by the central t f-distribution with f --- n - 1. For details, see Brunner et al. (1995).

675

Nonparametric methods in design and analysis of experiments

Table 4.2 Ranks and means for the sunburn data Ranks of the sunburn degree for subject Lotion

1

2

3

4

5

6

7

8

9

10

Mean

Old

16

17

20

12.5

12.5

18

4

6

7

2

11.5

New

11

14.5

9

14.5

l0

19

3

8

5

1

9.5

To derive the exact (conditional) distribution for small samples, we have to restrict the considerations to an equal number of replications m~j =-- m in order to apply a permutation argument. For testing the null hypothesis of no treatment effect, either the linear model (4.5) is assumed and the hypothesis Hff: Pb/z = 0 is considered, or in the general model (4.7) the hypothesis H g is tested. In both cases under the hypothesis, the common distribution function of X = ( X [ , . . . , X ~ ) ~ is invariant under all 2 '~ equally likely permutations of the vectors X i l = ( X / l l , . . . , X g l m ) t and Xi2 = ( X i 2 1 , . . . , Xi2m) ~, i = 1 , . . . ,n. Therefore, the null distribution of the difference of the rank sums R.2. - R.1. can easily be computed by a shift algorithm based on a recurrence relation identical to that one for the Wilcoxon signed rank statistic if the integers 1 , . . . , n in the recursion formula for the latter one are replaced by A i = IRi2. - R i l . [, i = 1 , . . . , n. In case of ties, the statistic 2(R.2. - R . 1 . ) is used in order to have integers 2Ai for the shift algorithm. For details, see Brunner and Compagnone (1988) and Zimmermann (1985a). EXAMPLE 3. In this example, we analyse the data given by Gibbons and Chakraborti (1992), Problem 6.18 where the degree of sunburn after the application of two suntan lotions is measured for 10 randomly chosen subjects. For the description of the data set, we refer to Gibbons and Chakraborti (loc. cit.). The N = 20 ranks of the 10 paired observations are listed in Table 4.2. The results for testing H0F and H i are T ~ = - 1.642 (p = 0.135), and T~p = - 1.43, (p = 0.186). Both p-values are obtained from the approximation by the t f-distribution withf=n-l=9.

The matched pairs design with missing observations. The so-called 'matchedpairs-design' (m = 1) is a special case of the model considered in the previous paragraph. In this case we have independent vectors X i = ( X i l , Xi2) t, i = 1,. . . , n, and the statistics for testing H ff or H~ are easily derived from (4.14) and (4.15). However, we shall consider separately the case of missing observations which is of some importance in practice. We denote by X l i = ( X l i i , X l i 2 ) ~ the nc complete observed vectors X m , k = 1; i = 1 , . . . , n c . Let X2ij, i = 1 , . . . , u j ; j = 1,2, denote the u j incomplete observations where the matched pair has only been observed under treatment j, j = 1,2, and the paired observation is missing. In order to test the hypothesis H0F or Hop, also these incomplete observations can be used. In total, there are N = N1 + N2 observations where N j = nc + u j , j = 1,2, is the total number of observations under treatment j. Let R k i j be the rank of X k i j among all

676

E. Brunnerand M. L. Puri

~0)j be the rank of X k i j among all the Nj observations under the N observations, let l~k~ treatment j and let R..2 ---~ ~-~i=l nc RliZ ÷ ~i=1 u2 R2i2 and let -

g2N , 0 -

1 [~2~ ([NIRli2 - N2Rlil] - [N1RI.2 - N2RI.1]) -2

t.~=l 2

ud

+ Z(N-Nj)2~-~ j=l

] (R2ij - R2.y) 2

i=l

where

RI.j

.

1 n~-~ .

R~ij and. R2-j .

nc i=1

1 ud R 2 i j , Uj i=1

j = 1,2.

Then, under H f , the statistic

~

= R..2 - N z ( N + 1)/2

N'gN,O

(4.16)

has asymptotically (rain(no + ul, nc + u2) -+ co) a standard normal distribution. In the heteroscedastic case, let

S~¢ . ~ .

. ([f~li2-~I:~(2) . ~'Ii2Jl

[~I.2--(2) RI.2 ]

i=l --(1) 2 -- [ R I i l - ~liljl~(l)]÷ [RI-I -- RI.I])

__(j) +

[R2

,

-

-

2

-

j=l i=l where Rlij,

nc i=1

?~c i=1

ud --

1 Z

R2.j = u~- ~=1

R2ij,

~(j)

uj

_1 V" #5)

-~z.j ~-- Uj ~ 2 i j ~

j = 1,2.

i=1

Then, under H~, the statistic

~

= R..2 - N z ( N + 1)/2

N'gu

(4.17)

has asymptotically (as min(nc + Ul, nc + u2) -+ oc) a standard normal distribution.

677

Nonparametric methods in design and analysis of experiments

For small samples, the null distribution of R1.2 (within the complete observations) is computed for the ranks of the complete observations R1~1,..,, R~n~2 as described in case (1). The null distribution of R..2 (within the incomplete observations) is computed as for the Wilcoxon-Mann-Whitney statistic under H~" where the integers 1 , . . . , N are replaced by the ranks of the incomplete observations R2ij, i = 1 , . . . , nj, j = 1,2. Since the incomplete observations are independent from the complete observations, the desired null distribution of R..2 is the convolution of the two null distributions.

4.3.2. Nested designs 4.3.2.1. Models and hypotheses L i n e a r model. The random variables Xijk, i = 1,...,a; 1,..., m~d, are written as

X~jk = #i + By(i) + eijk

j

=

1,...,hi;

k =

(4.18)

where ]~i = E(Xijk) are unknown parameters, Bj(~) ~ N(O, cr2B) are i.i.d, random variables and e~j~ ~ N(O, a 2) are i.i.d, random variables independent from By(i). The notation j(i) means that the random subjects BI(~),...,/3,,(i) are nested under treatment i. Such designs are typically used when N = ~ i a I ni randomly chosen subjects are repeatedly (k = 1 , . . . , mij) observed or measured under the same treatment. In the linear model (4.18), the random variables Xijk and Xej,k, are independent if i ¢ i ~ or if j ¢ j~ where Xijk and Xij,k, are identically distributed according to Fi(x). Note that the random variables Xijk and Xijk, may be dependent. Model (4.18) is generalized in the following way: Let

General model.

Xij=(Xijl,...,Xijm)',

i=

1,...,a;

j=l,...,ni,

(4.19)

be independent random vectors of the observations for block j under treatment i where Xij~ ,-~ Fi(x), j = 1 , . . . ,hi; k = 1 , . . . , m i j . In the linear model (4.18), Sij = C o v ( X i j ) = (~r~ - c~)Im~j + CxJm~, where (rx2 = Var(X111) and cx = C o v ( X m , Xl12). In the general model (4.19), the assumption of a special structure of the covariance matrix of Xid in case of no treatment effect is not necessary. This will be explained in Section 4.3.2.2 when the test statistic is derived. The shift model Fi(x) = F(x - #i) is easily derived from the general model (4.19). Hypotheses. The definition of the treatment effects and the hypotheses are the same as for the model with independent observations considered in Section 2.1.1. The hypothesis for the linear model is H~: Palz = 0 where /z = (#1,..., #a) ~. In the general model, we consider the hypothesis//oF: P a F ~ 0 where F = (F1,..., F~) ~. The statistics are based on a consistent estimator of p = (Pl,--. , P ~ f = f H d F a a where H(x) = N -1 ~i=1 ~jn~-I mijFi(x) and N = ~i=1 ~j'~l m i j . We restrict the consideration of the hypothesis H~' to the case of a = 2 samples where the hypothesis is formulated as H~': p = fF~ dF2 = 1/2.

E. Brunner and M. L Purl

678

It is obvious that the same relations between the hypotheses in the semiparametric models and the general model as given in Proposition 2.1 are also valid in the hierarchical model,

4.3.2.2. Tests for a >, 2 samples.

Here, as in the case of the cross-classification, we admit unequal numbers rn~j of replications for block j under treatment i. Therefore, the components Pi of the generalized mean vector p = (Pl,-.-, Pa)' are estimated by an unw2ighted sum of cell means. We will use the notation of Subsection 4.3.1.2. Let F = (F~,..., Fa)', then Pi = f HdF~ is estimated by

N( ,..- 1/2) where ~gi.. = n:(' ~.~=l-Rij. and Rid. = mi-j ~ E~'n~{ Rijk. Here, Hijk is the rank of

Xijk among all the N = 2 i ~ l ~ ' ~

m~j observations. Let Y. = (Y1..,...,Y=..)'

where ~.. = n~-1 2 j ~ j Y~j., then

=

= d) u-. 2 "i=l

where ~riz = n~-1 E;}~I a2j and (r}j = Var(Yij.). It is not necessary to make special assumptions on the bivariate distribution functions of Xi3 under the hypothesis since the random variables Y~j. are independent and only the variances cri2 have to be estimated. The equality of certain two-dimensional marginal distribution functions (which generates a convenient form of the covafiance matrix) is only needed if different treatments are applied to the same subject as in the cross-classification. Let £'2 ~__. Ej-,~I(~j. - - /~i..)2 and ~2 [N2(n~ _ 1)]-~S} where Ri.. ___~ Cr i ~---

%-1 ~ j ~ l

R~4.. Then a~2 is consistent for a~2 in the sense that ~ ~i/a~2 3_+ 1. Let W = V~-j (I~ - JaV~ -1/trace(V~-~ )) be the contrast matrix defined in (1.5) and let

Va=+ NA2 i=l nZ~T i

and W = ~r~-l(I~ - Ja~r£-l/trace(~r£-l)) then W V a W = W and W F P a F = 0. It follows from Theorem 4.5 that the quadratic form

= 0 iff

=

=

a

~2

92

(4.2o)

has asymptotically (ni -4 ~ ) a x}-distribution with f = a - 1 under Ho~: P a F = O.

Nonparametric methods in design and analysis of experiments

679

In case of an equal number of replications m i j ~ m, -Rij. = m - l ~ = 1 Ri.. = (mni)-lR~.. = -R~.., k = -Ri.. = ( X + 1)/2 and

Rok,

nl i=l 3=1

where n. = ~ i ~ a ni. The quadratic form QN H given in (4.20) reduces to

QN = (n. - a ) .

iL1

(Ri.. - (N + 1)/2) 2 -

This statistic has been given by Brunner and Neumann (1982) for the case of no ties. For small sample sizes (and an equal number of replications), the exact permutation distribution of ~ i ~ I R~2. can be computed. Under HoE: P ~ F = 0, the marginal distribution functions of all vectors X O. are identical and, since the vectors are independent, ! ! the common distribution function of X = ( X ~ I , . . . ,X~,~,) remains invariant under any permutation of the vectors Xia, i = 1 , . . . , a, j = 1 , . . . , ni. Therefore, the exact permutation distribution o f ~--]iLa Ri2. can be quickly computed by a multivariate shift algorithm (Streitberg and Roehmel, 1986) as in the case of the statistic Q given in Section 2.1.3. However, the integers 1 , . . . , N are replaced by the rank sums R H , . . . ,R~n~. In case of ties, the same remark applies as in Section 2.1.3.

4.3.2.3. Tests for a = 2 samples. In the case of two samples, we give test statistics for both hypotheses Hff: F1 = F2 and H~: p = 1/2. Here we have independent random vectors X O = ( X ~ j l , . . . , Xij~o)~, i = 1,2, j = 1 , . . . , hi, with uniformly bounded length m 0 ~< M < oo. For the case that the number of replications m O also tends to oo (however in a certain rate depending on n. = nl + n2), we refer to Brunner and Denker (1994). The other assumptions on Xij and the marginal distribution functions F1 (x) and F2(x) are the same as in the previous paragraph. As in Section 2.1.3.3, the treatment effect for a = 2 samples is p = f F 1 dF2 and an asymptotically unbiased and consistent estimator for p is given by ~ --( N I N z ) - l ( R 2 . . - N2(N2 + 1)/2) where Ni = ~-]j~l mij and N = N1 + N2. Under H ~ : F1 = F2, the statistic H;

=

J~2.. - N2 N : I

has asymptotically a standard normal distribution if min(nl,n2) ROk is the rank of X o k among all the N observations, Ri.. = ~j=ln~ ~=lm~ Rij~, RO" = mo'lRij • and Ri.. = N~-lRi.., i = l, 2.

(4.2])

-+ cx~. Here ~jn~l Rij. :

680

E. Brunner and M. L. Puri

Under Hff: f F1 dF2 = 1/2, the statistic R2.. - N2

H~r= i

~i=l 2j=l mij

N2+'

(4.22)

N.

2

*j.

has asymptotically a standard normal distribution if min(nl ' n2) -+ oo. Here ~(i) ~ ~ij k is the rank of Xijk among all the Ni observations under treatment i and ~!9"z3- = mij I ~ - ~

~r~(i)~ijk'For details, we refer to Brunner and Denker (1994).

4.4. Three-factor mixed models 4.4.1. Partially nested designs~repeated measurements on one fixedfactor 4.4.1.1. Models and hypotheses. In this section, we consider two-factor repeated measurements designs where the repeated measurements are only taken on one factor, factor/3 say, and the subjects are nested within the levels of factor A. Therefore, this design is called 'partially nested' design. In medical and psychological studies it appears either when different groups of subjects are observed under the same treatments for each subject or when subjects are divided randomly into several treatment groups and the outcomes are observed consecutively at several time points. Nonparametric procedures for this design have been considered by Brunner and Neumann (1984, 1986a) for the 2 x 2 design with heteroscedastic distributions. Designs with a, b ~> 2 levels have been considered by Thompson and Ammann (1990), Thompson (1991a) and Akritas (1993). Rank tests for the hypotheses considered in these papers as well as rank tests for other hypotheses are derived below. First we state the models. Models.

The linear model is commonly written as

X i j k = # + ai -F ~ j -F ( a ~ ) i j + Sk(i ) + ( B S ) k j ( i ) + ¢ijk

(4.23)

where # is the overall mean, ai is the treatment effect of level i of the fixed factor A, /3j is the treatment effect of level j of the fixed factor B on which the repeated measurements are taken, (a/3)ij is the effect of combination (i, j) of the fixed factors (interaction); Sk(0 are i.i.d. N(0, a~) random variables representing the random subject effect which is nested under the (group)-factor A; (BS)jk(O are i.i.d. N(0, a ~ s ) random variables representing the random interaction between subjects and factor/3, and the error terms Eijk are i.i.d. N(0, a~). All random effects are assumed to be independent from all other random variables. The compound symmetry of the covariance matrix of Xik = (X~lk,..., Xibk)' has already been discussed in Section 4.1. The general model is derived from the general mixed model (4.1) by letting r = a and e = b. For simplicity, we consider only designs with m~j = 1 replication for subject k and treatment j. However, the formulas given in this section can be generalized easily to the case of mij >~ 1 replications using the results from Section 4.2.

Nonparametric methods in design and analysis of experiments

681

Thus, we consider independent vectors

Xik=(Xilk,...,Xibk)',

i= 1,...,a, k=l,...,ni,

(4.24)

where Xijk N Fij(x), i = 1 , . . . , a , j = 1 , . . . , b . In a general model it is not appropriate to assume that the covariance matrix Sik = Cov(Xik) is not changed under the treatment. Therefore, it is only reasonable to assume a compound symmetry of Sik if there is no treatment effect within group i, i.e., under HoF(/3 I A): Fil =

....

Fib, i = 1,...,a.

Hypotheses. The treatment effect is formulated in terms of the marginal distribution functions and no special structure of Sik is assumed. The hypotheses are therefore essentially the same as in the two-way layout with independent observations (see Section 2.2.1). Let F = (Fll,..., blab)~. Then the hypotheses are formulated as (no main group effect), (no main treatment effect),

HoF(A/3): (Pa ® Pb)F = 0 (no interaction), H[(A r/3): (P, ® Ib)F = 0 (no simple group effect within the treatments),

Ho (/3 I A): (Ia ® Pb)F

= 0

(no simple treatment effect within the groups).

Note that the model is not symmetric in the factors A a n d / 3 since the subjects are nested under the groups. Therefore, two hypotheses for the simple factor effects are stated. We like to point out that for the general model, the covariances between the observations X~jk and X,ij,k within one subject k may be arbitrary. It is only assumed that they do not depend on k (independent replications) and that the covariance matrix is non-singular. Thus, Sik = Si, k = 1,...,ni, and [S~[ ~ 0, i = 1 , . . . , a . The implications of the hypotheses stated above for the general model (4.24) will follow from Proposition 2.9. Rank tests for the simple factor/3 effect H0F (/3 I A) for the compound symmetry model have been considered by Thompson and Ammann (1990) under the assumption of absolutely continuous distribution functions. For the general multivariate model, Thompson (1991a) derived a rank test for the same hypothesis, and Akritas (1993) derived rank tests for both simple factor effects, i.e., for HoF(A I B) and HoF(/3 I A). In both papers, ties were excluded. Here we shall derive nonparametric tests for the partially nested design from the unified approach given in Section 4.2 for all the five hypotheses stated above and we do not assume that the distribution functions are continuous. For the a x 2 design, we also consider the hypotheses H~(/3), H~'(/3 ] A) and H~ (A/3) including heteroscedastic models. In the 2 x 2 design, we especially concentrate on the two-period cross-over design, a design which is frequently used in psychological and medical studies.

682

E. Brunner and M. L. Puri

4.4.1.2. Derivations of the statistics, a, b >~ 2 Estimators and notations. The statistics for the repeated measurements design are based on a consistent estimator of the vector of the generalized means a p 7-- ( P l l , . . . , P a b ) t = fHdF where H = N - 1 E i = l E ~ = l n i F i j and F =

(F11,..., Fab)' is the vector of all the distribution functions. Let F = ( f i l l , . . . , ff~b)' where F~j (x) = n~-1 )-~'~;1 c(x - X i j k ) is the empirical distribution function of Fij. Let Rijk be the rank of X i j k among all the N = b ~ i = 1 ni observations, let Rik . . (R~lk, . . , R i b k ) ' be the vector of the ranks for subject k within group i. Let-Rij. = n~ 1 ~ knl - = ( R i l . , . . . , -~ ib.) , be the vector of the rank = l Rijk and let -Ri. means within group i 1, . , a and 6 . = (~, 1 . , . . . , ~ ' ~.)-, . Furthermore, let R.. ~ (-R.I.,..., R.b.)' where R.j. = n •- 1 Y ~ ~= I ~ kn=, l Rijk and n. = Y~i=l ni = N / b . Let C be any suitable contrast matrix. Then, under HoF: C F = 0, the statistics ~ C f ~r d~' = N -1/2C-R. and v / N C f H d F = v / N C Y . are asymptotically equivalent. (This follows from Theorem 4.2.) Here, V = &'l.,... ,Y'~.)' where Yi. = (Y~I. , . . . , ~ l b . ) ,, i = 1, . . . , a , and -Yij. = ni- l E k =n~l Y i j k where Yijk = H ( X i j k ) . Note that the vectors Y/k = (Y~lk,..., Y~bk)t are independent by assumption. We need to estimate the covariance matrix 11/ = C o v ( x / N Yi.). A consistent estimator =

--

1

..

- -

ni

=

'

(4.25) k=l

of V~ follows from Theorem 4,3 by letting mij ---- 1. We denote by W i : Vi -1 (Ib -- JbVi -1//l~Vi -11~)

(4.26)

the contrast matrix defined in 1.5. Note that W i V i W i = W i and W i F i -= 0 iff PbFi = O, i = 1 , . . . , a, where Fi = ( F i l , . . . , Fib)'. An estimator of Wi obtained by replacing V~-1 by ~ - 1 is given by

~ri~_ ~ri--1 (I b -- Jbgi-1/~!! ))

(4.27)

where s'!!) ~- ~j'=lb ~!~! = ~'~j=lb 2j'=lb v(i)~ JJ' and ~'~), is the (j, j')-element of ~ - 1 . Further notation will be explained when it appears to be necessary for the derivation of the statistics. Statistics Test for the treatment effect B. A test statistic for HoE(B)" ~a (1-1'a® P b ) F = 0 N 1 , ® Ib)~ and is derived here. It follows from Theorem 4.2 that the statistics v/-N~la

Nonparametric methods in design and analysis of experiments

683

v/N( ~ 1~ ® Ib ) f H d~" = , _ ( ±a1'~® Ib ) Y . are asymptotically equivalent under H0F. The covariance matrix of ~(£1',_.,~~ ® Ib)Y. is a

V=Cov(V~v..) = ~1

a

~Cov(v/Ngi.)=

1 ~Y~.

i=l

i=1

Let W = V-X(Ib - JbV-~/l~bV-~l~). Then W V W = W and ~(Lll a a ® Pb)F = 0 iff,~(~ Y~® W ) F = 0. Note that r a n k ( W ) = b - 1. A consistent estimator of V follows from (4.25), namely

1±

1

ni

= -~ ~=~ N n ~ ( n i - 1)

(1~ --~ ) (~

-

~)

k=l

since the vectors Y/k are independent. Let W = V - ' ( I b - J b ~ ' - a / s ' . . ) be an estimator of W where V -1 is replaced by ~ - 1 and ~.. = ~ = 1 ~ ' = t s'JJ' where ~'jj, is the (j,S)-element of ~ - 1 . The statistic for testing HoF(B) is based on the vector

N-~n(, a~11~ ® ~)-~.

=

N -1/2 W R... Note that under HoF(B), N -1/2 W -R.. is asymptotically equivalent to

N ~/2 ~W F..

Thus, under

N

Ho~(B), the

quadratic form

""

=± N kj=lj'=l

-~.j. ~jj, R j,.- ~ ~ ~ j . - ~ + ) "" - - j = l

| /

(4.28)

]

has asymptotically a x}-distribution with f = b - 1 d.f. and QN(B) has the RTproperty with respect to a multivariate parametric statistic with covariance matrix V. T e s t f o r t h e g r o u p e f f e c t A. The hypothesis of no main effect A is formu1 t lated as H ~ ( A ) : (Pa ® glb)F = 0 and the asymptotic distribution of V ~ ( I a ® 1 t A 111b ~, -b i df ~r d F is considered under HoF(A). Note that the statistics x/-N(I~ ® glb)p 1 ! fu09 : v / N ( I a ® gl lb)Y. , - - are asymptotically equivalent unand v / N ( I a ® ~lb) der H0F. Let ~i2 = Var(Yi..) where Y~.. = ni- 1 ~ k '~ = l Yi-k and note that the random variables Yi.k = b - ' ~ = 1 Y~jk are independent. Then, V = C o v ( v ~ ( I ~ N g1 l bi )-r-. ) = diag{~-~,...,T~} where T~ = N c ~2 and ~ri2 is estimated consistently by ^ 2 [N2ni(ni 1 ) ] - i S ] where S 2 = ~ = 1 ( i . k - Ri..) 2. Thus, r i [Nn~(n~ - 1)]-~S~ is consistent for r]. Let W = V -1 (I~ - J ~ V - I / ~ = , [1/~-~])

and let W be the matrix corresponding to W where ~-2 is replaced by ^2~-. Let

684

E. Brunner and M. L. Puri

~ . . = (bn~)_l ~ k~= l ~ j =b l Rijk. By the same arguments as in the previous paragraphs, it follows under H0F that the quadratic form

QN(A) = l ( ~ l . . , . . . , R a . . )

W (R1..,...,Ra..) ]

=~ ~(~[ ~) @.._

1 2 (4.29)

r=l

Sr

)

has asymptotically a x}-distribution with f = a - 1 d.f. and QN(A) has the RTproperty with respect to a parametric statistic with heteroscedastic errors. Test for the interaction AB. For the hypothesis of no interaction, H0F (AB): Fi. + F.j - F.., we use the contrast matrix (7 = CA N CB, where

CA= ( l a - l i - - I a - l )

and

Fij =

CB= (lb--li--Ib-- O.

Both contrast matrices are of full row rank. Thus, the hypothesis is written as HoF(AB): (CA ® CB)F = 0 and we consider the asymptotic distribution of v/-N(CA @CB)~ which is asymptotically equivalent (see Theorem 4.2) to the statistic v/N(CA ® CB) f H d~' = v'-N(CA ® CB)Y. under H f (AB). The covariance matrix a

v = Cov(,/-~ ~ . ) = @ v ~ i=1 is estimated consistently by a

i=l where ~ is given in (4.25). Let R. = (Ral.,... ,--~b.)R'. Then the quadratic form

0N(A/~) = Z N~ ' ' (C:~®C~)

[

°

× (CA®CB)G 2 subgroups of experimental units are covered by this model. The well known two-period cross-over design (which is frequently used in pharmaceutical studies with volunteers or in psychological learning experiments) is studied separately in the last subsection. To define treatment effects and to formulate hypotheses, we will use the conditional generalized means w~ = f F~tdF~2, which are generalized treatment effects within group i = 1 , . . . , a. For the case of a = b = 2, we consider also the group effect (main effect of factor A) and we define the generalized group effect as 9 = 91 + 92 where 9j = f Flj dF2j, j = 1,2. In what follows, we consider the hypotheses

1 a

1/2,

no treatment effect (main effect B), 1. H ~ ' ( B ) : N = g ~-~i=1 '//)i : 2, H~(AB): p = ~ i = l ( w ~ - ~)2 = 0, no A B interaction, 3, H ~ ( B I A): z / = ~ i ~ 1 (w~ - 1/2) 2 = 0, no simple factor B effect, 4. H ~ ( A ) : 9 = 1, no group effect (main effect A), if a = 2. The relations to the hypotheses in the linear model follow in the same way as in the 2 x 2 design (Proposition 2.9) in Section 2.2.1.

Estimators and notations The quantities wi are estimated consistently by

~=

,d

1
• " ~> ~-p-1 /> ~-p ~ t ~ ) (m, b, n , . . . , n),

(59)

where t~ ) is the upper c~th percentile of the null distribution of T2. The percentiles are tabulated by De Kroon and Van der Laan (1981) for various values of m, b and n in the range 2 ~< m ~< 4, 2 ~< b ~< 10, and 2 ~< n ~< 4. These authors state that the asymptotic distribution [as n --+ ec] of T2 is chi-square with (b - 1)(m - 1) degrees of freedom and, consequently, the large sample approximation to procedure (59) is reject Ho *(z) if and only if T2 >~ X2((m - 1)(b - 1)),

(60)

where x~(q) is the upper c~th percentile of the chi-square distribution with q degrees of freedom. When a null hypothesis of additivity or of no rank interaction is rejected, the individual cells can be compared using the techniques of the one-way layout. If the null hypothesis is not rejected, then analysis of the main effects is of interest (Sections 3.2.3 and 3.2.4).

3.2.2. Example - testing for non-additivity, multiple observations per cell. We use the welding data in Table 3 of Experiment 1.4 in order to illustrate the analysis of a two-way layout with an equal number of observations per cell. The hypothesis of additivity H~ dd (54) can be tested using the RGLM method. Since this method can be used for higher-way layouts also, we have postponed the illustration until Section 4.3.2. As an alternative, we may wish to test the hypothesis Ho *(~) (56) of no rank interaction. Ranking the observations in Table 3 within the i = 3 levels of the gage bar settings (and using average ranks to break the ties), we obtain the rank vectors /~1 R2 R3

= = =

(1, (6.5, (4.5,

2, 8, 2,

3, 4.5, 7,

5.5, 2.5, 3,

8.5, 9, 4.5,

10, 10, 1,

7, 6.5, 8,

4, 1, 9,

5.5, 4.5, 10,

8.5), 2.5), 6),

giving 7"2= (4)(5)(11) 12{

(32+ 1 4 . 5 2 + ' - . + 162) - 3 1 ( 2 4 2 + 2 5 " 5 2 + ' " + 3 7 2 ) }

= 15.38. No tables are given by De Kroon and Van der Laan (1981) for m = 5, b = 3, and n = 2. Here, n is not large, and therefore we should not expect the chi-square distribution with (b - 1)(m - 1) = 8 degrees of freedom to be an accurate approximation to the true distribution of T2. However, if we do compare the value T2 = 15.38 with the

A. M. Dean and D. A. Wolfe

734

percentiles of the X2(8) distribution, we see that 15.38 is around the 95th percentile. Table 5.3.6 of De Kroon and Van der Laan (1981) suggests that the percentile of the exact distribution is likely to be slightly lower, so there is some evidence at a significance level of just under a = 0.05 to reject H o *(~) (56) and to conclude that there is rank interaction of U with/3. 3.2.3. Main effect tests designed f o r general alternatives - at least one per cell In this subsection, we consider two-way layout settings with at least one per cell (that is, nij ~> 1 for every i = 1 , . . . , b and j = 1 , . . . , m). We the additive model (50) holds and we wish to test the null hypothesis of factor U,

observation observation assume that no effect of

H~-: ['rj are all equal, j = 1 , . . . , m]

(61)

against the general class of alternatives H{: [not all 7-j's equal].

(62) m

Let Rijk be the rank of Y~jk within the hi. = ~-~j=l nij observations on the ith level of B, for j = 1 , . . . , m and k = 1 , . . . , n i j . For each level of U (j = 1 , . . . , m ) , compute the sum of 'cell-wise weighted' average ranks, given by b

(63) ¢=1Lni" J nlj where Rij. = ~ k = l Rijk. Define the vector R to be R = (Sl - E0[Sll,...

=

S l --

, S i n - 1 -- E 0 [ S m - I ] )

[nil(n/. +

l)/2ni.],...,

i=1

S m - I --~-~[ni,m-l(ni. ~- 1)/2hi.]

(64)

.

i=1

Note that we have chosen to define R without a term for the ruth level of U. The S j ' s are linearly dependent, since a weighted linear combination of all rn of them is a constant. We could omit any one of them in the definition of R and the test would be exactly the same no matter which one was chosen for omission. The covariance matrix for R under H~ (61) has the form 270 = ((crst)), where b

cr~t = ~ i=1

[ni~(ni. - nis)(ni. + 1)/12n2.],

for s = t =

1,...,m-

1

Nonparametric analysis of experiments

735

b

=- ~

[nisnit(ni. + 1)/12n2.],

for s ¢ t = 1 , . . . , m -

1.

(65)

i=1

Letting 2701 denote the inverse of 270, a general test statistic proposed by Mack and Skillings (1980) is MS = R'2Jol-R,

(66)

and the associated level a test of H~" (61) versus H~ (62) is reject H~ if and only if MS ~> w~(b, m,

rill,...

,

nbm),

(67)

where w~ (b, m, nal,..., nbm) is the upper ath percentile for the null sampling distribution of MS (66). Values of w~(b, m, n11,..., nbm) are available in the literature for equal nij (see the subsection below) but not for arbitrary replications. However, the asymptotic distribution [as N -+ oo] of MS under H~" is chi-square with m - 1 degrees 73% of freedom where N = ~ i b I }-]~j=l nij. Thus, when N is large, the approximation to procedure (67) is reject H~- if and only if MS ~> X~(m - 1),

(68)

where X2 ( r a - 1) is the upper ath percentile of the chi-square distribution with r a - 1 degrees of freedom. Mack and Skillings (1980) note that this chi-square approximation tends to be conservative, especially for small levels of a.

Equal cell replications In the special case where we have an equal number of replications for each of the bm combinations of levels of factors U and B, the statistic MS (66) permits a closed form expression. Let nli = na2 . . . . . nbr~ = n. Then N = bran and ni. = m n for i = 1 , . . . , b, and the Mack-Skillings statistic MS (66) can be written in closed form as

MS = [ 1 2 / m ( N + b)] ~

(mSj - (N + b)/2) 2.

(69)

j=l

The associated level a test of H~- versus H~- is, then, reject H~- if and only if MS ) wa (b, m, n , . . . , n).

(70)

The exact critical values w~ (b, m, n , . . . , n), some obtained via simulation of the exact null distribution, can be found in Mack and Skillings (1980) for selected significance levels a and b = 2(1)5, rn = 2(1)5, and common number of replications n = 2(1)5.

736

A. M. Dean and D. A. Wolfe

3.2.4. Two-sided all treatments multiple comparisons - equal numbers of observations per cell In this subsection, we discuss a multiple comparison procedure that is designed to make two-sided decisions about all m ( m - 1)/2 pairs of treatment effects when we have an equal number n of replications from each combination of factors B and U in a two-way layout. It is appropriate as a follow-up procedure to the equalreplication Mack-Skillings test in the previous subsection and was proposed by Mack and Skillings (1980). Let Sj be as given in (63) with n~. = rnn for j = 1 , . . . , ra. At an experimentwise error rate no greater than c~, the Mack-Skillings two-sided all-treatments multiple comparison procedure reaches its ra(ra - 1)/2 decisions through the criterion decide ~-~ 7£ 7-~ if and only if IS~ - S~] ~> [w~(N + b)/6] 1/2,

(71)

where N = bran is the total number of observations and w~ = w~(b, m, n, n , . . . , n) is the upper c~th percentile for the null sampling distribution of the Mack-Skillings statistic MS (69). The multiple comparison procedure in (71) guarantees that the probability is at least 1 - c~ that we will make all of the correct decisions under the strict null hypothesis that all of the treatment effects are equal and controls the experimentwise error rate over the entire class of continuous distributions for the error terms. The procedure (71) often requires an experimentwise error rate higher than the p-value associated with any previous test in order to find the most important differences between the various treatments. When the total number of observations is large, the critical value [w~(N + b)/6] 1/2 can be approximated by [(N+b)/12]l/2q~, where qa is the upper c~th percentile for the distribution of the range of m independent N(0, 1) variables. Thus, the approximation to procedure (71) for N large is decide T~ ¢ ~-~ if and only if IS,,, - Svl >1 q~[(N + b)/12] 1/~.

(72)

Values of q~ (c~ = .0001, .0005, .001, .005, .01, .025, .05, .10, .20; m = 3(1)20(2) 40(10)100) can be found, for example, in Hollander and Wolfe (1973, Table A10). Mack and Skillings (1980) note that the procedure (71) is rather conservative; that is, the true experimentwise error rate might be a good deal smaller than the bound c~ provided by (71). As a result, they recommend using the approximation (72) whenever the number of observations is reasonably large.

3.2.5. Example - main effects tests and multiple comparisons, two-way layout, multiple observations per ceil We use the rocket thrust duration data in Table 4 of Experiment 1.5 to illustrate the Mack-Skillings test of the hypothesis H0 (61) against the general alternative hypothesis H1 (62). For these data, the RGLM test (Section 4.3) would indicate no interaction between the factors. Similarly, the test (59) would indicate no rank interaction. Consequently, the additive model (50) is an adequate representation of the response.

Nonparametric analysis of experiments

737

Table 8 Ranks within levels of altitude cycling of the rocket thrust duration data Temperature 1 1

Altitude Cycling

2

2

3

4

15.5 15.5

13 11

3

1

9

14

7

2

5

4

10

12

8

6

12 14.5

16 14.5

4 1

3 2

11 13

5 8

6 7

9.5 9.5

We let f a c t o r / 3 denote the i = 2 levels of altitude cycling and let the levels of factor U denote the m -- 4 levels of temperature. The ranks of the data in Table 4 within levels o f / 3 (with average ranks used to break the ties) are shown in Table 8. To test the hypothesis H~- (61) against the alternative hypothesis H~" (62), we first form the sum of cell-wise weighted average ranks (63), with hi. = m n = 16, and obtain $1--7,

$ 2 = 1.4375,

$3 = 5 . 5 ,

$4=3.0625.

Since we have an equal number n = 4 observations per cell, we use the closed form Mack-Skillings statistic (69) with N = bran = 32, giving M S = [ 1 2 / 4 ( 3 2 + 2)] {(4 x 7 -

17) z + . . . + (4 x 3 . 0 6 2 5 - 17) 2}

-- 26.04. Comparing this value with the exact null distribution tables for MS in Mack and Skillings (1980) with rn = 4, b = 2 and n -- 4 we find the p-value for these data is considerably smaller than 0.01. Thus there is strong evidence to suggest that temperature has an effect on the rocket thrust duration. In order to determine which temperatures differ in their effects on the thrust duration, we use the Mack-Skillings multiple comparison procedure (71) and, selecting a significance level of a = 0.0994, we decide ~'u ¢ ~-v if and only if ISu - Sv I ) [6.243(32 + 2)/6] 1/2 = 5.948. Thus, using the above values of Sj (j = 1, 2, 3, 4), our 6 decisions, at an experimentwise error rate of 0.0994, are that no two treatments differ in their effect on the rocket thrust duration (although at a slightly larger experimentwise error rate, we would conclude a difference between the effects of the first two levels of temperature). This indicates the conservative nature of the decision procedure as pointed out by Mack and Skillings (1980).

A. M. Dean and D. A. Wolfe

738

3.3. Two-way layout - one observation per cell 3.3.1. Testing for non-additivity - one observation per cell The test statistic MCRA was proposed by Hartlaub et al. (1993) for testing H~ dd (54) under the two-way model (49) with one observation per cell. The procedure is as follows. Arrange the data in a two-way table, with the levels of factor B defining the rows and those of factor U defining the columns. Subtract the jth column average from the data values in the jth column (j = 1 , . . . , m) to give Y/j - ~ j in the (ij)th cell (i = 1 , . . . , b; j = 1 , . . . , m). This is called aligning in the columns and removes the effect of factor U from the data in the table. (The column median can be subtracted instead, but this does not completely remove the effect of factor U from the test statistic.) Then, rank the aligned data values Y/1 - Y.I, Y/2 - Y.2,..., Y~,~ - ~ m in the ith row to give the rank vector Ri = (R/l, R i 2 , . . . , Rim), for each i = 1 , . . . , b. This removes the effect of factor B from the aligned data values in the cells. Any remaining variation is due to non-additivity. The m ( m - 1)/2 statistics Wjj, are computed as Wjj, =

~

(Rij - R e j - Rij, +

Ri,j,) 2

(73)

l~ l,~(m, b),

(84)

where l,~(m, b) is the upper c~th percentile for the null distribution of the statistic L (83). The critical values lob(m, b) are given for c~ = .05, .01 and .001 by Page (1963), and also by Hollander and Wolfe (1973, Table A16), for m = 3, b = 2(1)20 and m = 4(1)8, b = 2(1)12. Additional critical values are given by Odeh (1977b) for c~ = .2, .1, .025, .005 and m = 3(1)8, b = 2(1)10. When the number of levels of B is large, the appropriate asymptotic distribution of a standardized form of L can be used to provide approximate critical values for the test procedure. When the null hypothesis H~- (61) is true, the standardized statistic

L* = [L - Eo(L)]/[varo(L)] '/2

(85)

Nonparametric analysis of experiments

743

has an asymptotic distribution [as b -+ c~] that is standard normal, where Eo(L) = b m ( m + 1)2/4

and

var0(L) = bm2(m + 1)2(ra - 1)/144

(86)

are the expected value and variance, respectively, of L under the null hypothesis HJ. Thus the approximation to procedure (85) for b large is reject H~- if and only if L* ~> z~,

(87)

where za is the upper ath percentile of the standard normal distribution. 3.3.6. One-sided treatment versus control multiple comparisons - one observation per cell In this subsection, we present a multiple comparison procedure that is designed for the two-way layout to make one-sided decisions about individual differences in the effects of levels 2 , . . . , m of factor U relative to the effect of a single control population corresponding to level 1 of U. In the context of this paper, it is to be viewed as an appropriate follow-up procedure to rejection of H~- (61) with either the Friedman procedure of Section 3.3.2 or the Page ordered alternatives procedure of Section 3.3.5 when one of the populations corresponds to a control population. Let R i a , . . . , Rim be the ranks of the data within the ith level of factor B, i = 1 , . . . , b and let R.j be the sum of these ranks assigned to the jth level of U, for j = 1 , . . . , rn. At an experimentwise error rate of c~, the procedure described by Nemenyi (1963), Wilcoxon-Wilcox (1964), and Miller (1966), reaches its rn - 1 decisions through the criterion decide ~-~ > ~-1 if and only if (R.~ - R.1) ~> r* (m, b),

(88)

where the critical value r~ (m, b) satisfies the probability restriction Po([R.u - R.,] ) r * ( m , b ) , u = 2 , . . . , m )

= 1 - a,

(89)

with the probability Po (.) computed under H~ (61). Values of r~ (m, b) can be found in Hollander and Wolfe (1973, Table A.18) for m = 3, b = 2(1)18 and m = 4, b = 2(1)5, and additional tables for m = 2(1)5, b = 2(1)8 and m = 6, b = 2(1)6 in Odeh (1977c). When the number of levels of factor B is large, the critical value r~ (ra, b) can be approximated by the constant q*,l/2[bm(m + 1)/6] a/2, where q'3/2 is the upper cah percentile for the distribution of the maximum of rn - 1 N(0, 1) variables with common correlation p = 1/2. Thus the approximation to procedure (88) for b large is decide ~-~ > ~-1 if and only if (R.= - R.,) >>.q~,l/2[brn(rn + 1)/6] 1/z.

(90)

* Selected values of q~,1/2 for m = 3(1)13 can be found in Hollander and Wolfe (1973, Table A.13).

744

A. M. Dean and D. A. Wolfe

3.3.7. Example- two-way layout, no replications, ordered alternatives To illustrate the test of H~- (61) against the ordered alternative H f (82) we use the air velocity data from Experiment 1.6. We are interested in testing for possible differences in effects among the m = 6 Reynolds numbers which we regard as levels of factor U. In this setting, it is reasonable to expect that, if there is a significant treatment effect associated with the Reynolds number, this effect will be a monotonically increasing function of the number. Therefore, we are interested in testing H~-: IT1 --- ~-2 . . . . . T6] versus H~': ['rl ~< "r2 ~< .-. ~< "r6, with at least one strict inequality]. The b -- 3 rib heights are the levels of factor B. We rank the m = 6 observations from least to greatest within each of the rib height levels (using average ranks to break the one tie). This gives the following rank sums for each of the six Reynolds numbers: R.1 = 4,

R.2 = 5.5,

R.3 = 8.5,

R.4 = 12,

R.5 = 16,

and

R.6 = 17. Page's test statistic (83) is then L = 270.5. If we compare this value with the exact tables for L (Table A.16 in Hollander and Wolfe (1973), for example) with m = 6 and b = 3, we see that 270.5 > 1.001(6, 3) = 260, so that the p-value for these data is less than .001. Thus, there is strong evidence that the distance (from the center of the rod) of maximum air velocity is a monotonically increasing function of the Reynolds number. In order to detect which of the Reynolds numbers yield maximum air velocities further from the center of the rod than for a baseline Reynolds number (designated as level 1 of U), we apply the Nemenyi-Wilcoxon-Wilcox-Miller one-sided treatment versus control procedure (88). Taking our experimentwise error rate to be a = .03475, we obtain r.*03475(6, 3 ) = 11 from Table I in Odeh (1977c), with m = 6 and b = 3. Using the above values of the within-B rank sums, we see that (R.2 - R . l ) = 1.5,

(R.5 - R.1) = 12,

(R.3 - R . l ) = 4.5,

(R.4 - R.1) = 8,

(R.6 - R.1) = 13.

If we compare these observed differences in the rank sums with the critical value 11, we see that our one-sided treatment versus control decisions are ~'2 = T1, ~-3 ----T1, ~-4 = ~-1, ~'5 > ~-1, and T6 > T1. Thus there is a statistically significant increase (over the effect of the baseline control Reynolds number corresponding to level 1 of U) in the position of maximum air velocity for the Reynolds numbers corresponding to levels 5 and 6 of U.

3.4. Two-way incomplete layout 3.4.1. Test for general alternatives in a balanced incomplete block design In this subsection, we discuss a distribution-free procedure that can be used to analyze data that arise from a balanced incomplete block design. A balanced incomplete block

Nonparametric analysis of experiments

745

design has b blocks with s (< m) treatments observed per block and no treatment observed more than once per block. Every pair of treatments occurs together in )~ blocks, and each of the rn treatments is observed p times. The parameters of a balanced incomplete block design satisfy p(s - 1) = , ~ ( m - 1). We are interested in testing H~(61) against the general alternatives H~- (62), under the assumption of no treatmentblock interaction. An appropriate nonparametric test statistic for a balanced incomplete block design was first proposed by Durbin (1951), with Skillings and Mack (1981) providing additional critical values. For this setting, we rank the observations within each block from 1 to s. Let R . 1 , . . . , R.m denote the sums of the within-blocks ranks for treatments 1,... ,m, respectively. The Durbin statistic for testing H~- (61) versus the general alternative H~- (62) is defined to be Tt~

T = [12/)~m(s + 1)] E ( R . j

- p(s + 1)/2) 2,

(91)

j=l

and the associated level c~ test of H~- versus H{ is reject H~" if and only if T >~ t~(m, s, A,p),

(92)

where tc~(m, s, .~,p) is the upper ath percentile for the null sampling distribution of the statistic T (91). Values (some of them obtained via simulation) of t~(zn, s, )bP) for a variety of balanced incomplete block designs and significance levels closest to .10, .05, and .01 have been tabulated by Skillings and Mack (1981). When the number of blocks is large, the statistic T has an asymptotic distribution [as b -+ ec] under H~- (61), that is chi-square with rn - 1 degrees of freedom. Thus the approximation to procedure (92) for large number of blocks is reject H~- if and only if T >~ ) / ~ ( m - 1),

(93)

where )/2 (m - 1) is the upper ath percentile of the chi-square distribution with m - 1 degrees of freedom. Skillings and Mack (1981) have noted that the chi-square approximation (93) can be quite conservative when a = .01 and either b or A is small. In particular, they suggest that the approximation is not adequate when A is either 1 or 2. In such cases, they strongly recommend the use of the exact tabulated values of t~ (rn, s, A, p) or the generation of 'exact' values via simulation in lieu of the chi-square approximation. 3.4.2. Main effect tests f o r general alternatives in arbitrary two-way incomplete layouts - no replications Not all incomplete block designs satisfy the necessary constraints to be balanced incomplete block designs. In this subsection, we present a procedure for dealing with data from a general two-way layout with at most one observation per cell. Let qi denote the number of treatments observed in block i, for i = 1 , . . . , b. (If q~ = 1 for any block i, remove that block from the analysis and let b correspond to the

746

A. M. Dean and D. A. Wolfe

number of blocks for which qi > 1.) We are interested in testing H~" (61) against the general alternatives H~" (62), under the assumption of no treatment-block interaction. Skillings and Mack (1981) proposed a test procedure that is appropriate for any incomplete two-way layout. We rank the data within block i from 1 to qi. For i = 1 , . . . , b and j = 1 , . . . , m , let

Rij = rank of Y~j among the observations present in block i, if nij

=

1,

= (qi + 1)/2, if nij = O. Next, we compute the adjusted sum of ranks for each of the m treatments as follows

b Aj = ~ [12/(qi + 1)]x/2[Rij - (qi + 1)/2], i=1

j = 1,...,m.

(94)

Set A = [A1,...,Am-1]'.

(95)

Without loss of generality, we have chosen to omit Am from the vector A. The Aj's are linearly dependent, since a weighted linear combination of all m of them is a constant. We could omit any one of the Aj's and the approach we now discuss would lead to the same test statistic• Now, the covariance matrix for A under H~" (61) is given by

~t~=2/~lt -A12

--A12 m

~ t ¢ 2 A2t

--A13

--~l,m-1

-A23

--)k2,ra--I

220 =

,

--Al,rn-1

--A2,m-1

(96)

--A3,m-1 "•" ~tm¢m_l )~m-l,t

where, for t ¢ j = 1 , . . . , m , Ajt = [number of blocks in which both treatments j and t are observed].

(97)

Let 220 be any generalized inverse for 22o. The Skillings-Mack statistic SM for testing H~- (61) versus the general alternatives H~- (62) is given by SM = X 2 2 o A .

(98)

If Ajt > 0 for all j ~ t, then the covariance matrix XT0 (96) has full rank and we can use the ordinary inverse 2201 in the definition of SM (98). The level c~ test of H~(61) versus H~" (62) is reject H~- if and only if SM ~> s*~(b,m, n11,... ,nb,~),

(99)

Nonparametric analysis of experiments

747

where s*~(b,ra, n , 1 , . . . , nbr,) is the upper c~th percentile for the null sampling distribution of the statistic SM (98). Values of s*~(b,m, nal,... ,nb,~) are not available in the literature for arbitrary incomplete block configurations. However, when every pair of treatments occurs in at least one block and the number of blocks is large, the statistic SM has an asymptotic distribution [as b -+ c~] under H~- that is chi-square with m - I degrees of freedom. Thus, when Ajt > 0 for all j ~ t = 1 , . . . , m, the approximation to procedure (99) for large b is reject H~- if and only if SM/> x ~ ( m - 1),

(100)

where X~ (m - 1) is the upper c~th percentile of the chi-square distribution with m - 1 degrees of freedom. Skillings and Mack (1981) have pointed out that the chi-square approximation (100) can be quite conservative when c~ is smaller than .01 and, in such cases, they recommend generation of 'exact' critical values s~(b, m, n , , , . . . , nbm) via simulation in lieu of the chi-square approximation. Note that if Ajt -- 0 for a particular pair of treatments j and t, so that j and t never appear together in a block, then H~- (61) could fail to be rejected even when ~-j and ~-t are quite different. Consequently, we recommend removing any such pairs of treatments from the analysis. If the sample size configuration satisfies the constraints for a balanced incomplete block design, then the Skillings-Mack statistic SM (98) is identical to the Durbin statistic T (91). If the sample sizes are all 1, so that we have a randomized complete block design, the Skillings-Mack statistic SM (98) is identical to the Friedman statistic S (76).

3.4.3. Two-sided all treatments multiple comparisons for data from a balanced incomplete block design In this subsection, we discuss a multiple comparison procedure that is designed to make two-sided decisions about all ra(m - 1)/2 pairs of treatment effects when we have data from a balanced incomplete block design. It is appropriate as a followup procedure to the general alternatives Durbin-Skillings-Mack test for equality of treatment effects in a balanced incomplete block design, as discussed in Section 3.4.1. Let R.,, . . . , R.m be the sums of the within-blocks ranks for treatments 1 , . . . , m, respectively. Let s denote the number of observations present in each of the blocks, let ,k denote the number of blocks in which each pair of treatments occurs together, and let p denote the total number of observations on each of the m treatments. At an experimentwise error rate no greater than a, the Skillings-Mack two-sided alltreatments multiple comparison procedure reaches its m ( m - 1)/2 decisions through the criterion decide ~-~ ¢ T. if and only if IR.~ - R.v[ ) [t~Am(s + 1)/6] '/2,

(101)

where t~ = ta (m, s, A, p) is the upper ath percentile for the null sampling distribution of the Durbin test statistic T (91), as discussed in Section 3.4.1. The associated multiple

748

A. M. Dean and D. A. Wolfe

comparison procedure (101) controls the experimentwise error rate over the entire class of continuous distributions for the error terms. Values of t~ = t ~ ( m , s, A,p) for a variety of balanced incomplete block designs and experimentwise error rates closest to .10, .05, and .01 are available in Skillings and Mack (1981). When the number of blocks is large, the critical value [t~Am(s + 1)/6] 1/2 can be approximated by [(s + 1)(ps - p + A)/12]l/2q~, where q~ is the upper ath percentile for the distribution of the range of m independent N(0, 1) variables. Thus, the approximation to procedure (101) for b large is decide ~-u ¢ % if and only if

IR.u - R.vl >1 q~[(s + 1)(ps - p + A)/1211/2.

(lO2)

Values of q~ for a = .0001, .0005, .001, .005, .01, .025, .05, .10, and .20 and m --- 3(1)20(2)40(10)100 can be found, for example, in Hollander and Wolfe (1973, Table A. 10). Skillings and Mack (1981) note that the procedure (101) is rather conservative; that is, the true experimentwise error rate might be a good deal smaller than the bound c~ provided by (101). As a result, they recommend using the approximation (102) whenever the number of blocks is reasonably large. 3.4.4. Example - incomplete two-way layout

We apply the Durbin procedure (92) for testing H~- (61) against a general alternative H~- (62) to the monovinyl isomers data in the balanced incomplete block design of Experiment 1.7. The blocks (levels of B) are shown as columns in Table 6, and the levels of treatment factor U ("pressure" measured in pounds per square inch) indicate the rows. If we rank the data within each of the ten blocks (using average ranks to break the one tie), we obtain the following treatment rank sums: R.1 = 7.5,

R.2 = 7.5,

R.3 = 13,

R .4 =

15,

R.5 = 17.

Using these treatment rank sums in expression (91), we obtain T = 15.1. If we compare this value with the null distribution for T (in Table 2 of Skillings and Mack, 1981) with ra = 5, b = 10,p = 6, s = 3 and A = 3, we find that the p-value for these data is less than .0105. This indicates strong evidence that there is a significant difference in the effects of the various pressure levels on the percent conversion of methyl glucoside to monovinyl isomers. In order to detect which of the five pressure levels differ, we apply the SkillingsMack multiple comparison procedure, as given in (10!). Taking our experimentwise error rate to be a = .0499, we see from Table 2 in Skillings and Mack (1981), with m = 5, b --- 10, A = 3,p = 6, and s = 3, that t.0499 = 9.200. Thus, our decision criterion is decide ~-~ ¢ ~-v if and only if IR.u - R.vl ) [(9:20)(3)(5)(4)/6] 1/2 = 9.59.

Nonparametric analysis of experiments

749

Comparing the differences in the above treatment rank sums with the critical value 9.59, we see that, with an experimentwise error rate of c~ = .0499, we cannot find any pairwise differences between the treatment effects, despite the fact that the p-value for the Durbin hypothesis test was less than .0105. This clearly illustrates the conservative nature of procedure (101), as noted by Skillings and Mack (1981). If we choose instead to use approximation (102), taking b = 10 to be 'large', our critical value for approximate experimentwise error rate c~ = .05 would be lower; that is, q.0514{6(3) - 6 + 3}/12] 1/2 = 3.858(2.236) = 8.627, where q.05 = 3.858 is obtained from Hollander and Wolfe (1973, Table A.10) with m = 5. The decisions associated with this approximate procedure (102) at c~ = .05 are that T1 = 7-2,7-1 = 7"3,7-1 = 7"4,7-1 • 7-5,7-2 = 7"3,7-2 ~-- 7-4,7-2 ¢ 7"5,7-3 = 7"4,7-3 = 7-5, and 7-4 = TS. Thus, with the approximate procedure (102), we are able to distinguish that pressures of 475 and 550 psi each lead to different percent conversions of methyl glucoside to monovinyl isomers than does the pressure 250 psi.

3.5. Other tests in the two-way layout

Mack (1981) describes a test of H d" (61) against the general class of alternatives H I (62) for the incomplete two-way layout with some cells empty and other cells with more than one observation per cell. A test against the class of ordered alternatives H~" (83) for this same setting is given by Skillings and Wolfe (1977, 1978). For equal sample sizes, Mack's statistic is identical to a statistic proposed by De Kroon and Van der Laan (1981) for testing the global hypothesis H0T (53) of no main effect of factor U and no interaction between factors/3 and U. We refer the reader to the original papers for details of each of these tests.

4. Higher way layouts 4.1. Introduction

Specialized procedures for the two-way layout were discussed in Section 3. In this section, we consider general procedures for experiments with two or more factors. Most of the techniques for the one-way layout, as described in Section 2, can be used for comparing the effects on the response of the various treatment combinations. However, there are available some specialized nonparame.tric techniques for investigating the effects of the factors separately. We discuss two such procedures in this section. In Section 4.2, we discuss a technique for analyzing a factorial experiment with any number of factors when there is at least one observation per cell and all interactions among the factors are expected to be negligible. The method investigates the main effects, one factor at a time, grouping the levels of the other factors together and using the techniques of the two-way layout.

750

A. M. Dean and D. A. Wolfe

A robust version of the general linear model least squares analysis (RGLM) was developed by McKean and Hettmansperger (1976). This procedure, which is described in Section 4.3, is very general, but requires a computer algorithm for its implementation. Hettmansperger and McKean (1983), Aubuchon and Hettmansperger (1984), and Draper (1988) describe and discuss other methods available in the literature for testing parameters in the general linear model. The results of a small power study run by Hettmansperger and McKean (1983) suggest that the RGLM method, with the traditional type of test statistic based on the difference of error sums of squares for the reduced and full models, gives tests at least as powerful as the competitors. In addition, the RGLM procedure has most of the features of least squares analysis and allows estimation of parameters and multiple comparison procedures to be implemented. Consequently, we only discuss the RGLM method of handling the general linear model in this article, and refer the reader to the above mentioned papers for aligned rank and other related procedures.

4.2. Using two-way layout techniques 4.2.1. Main effect tests - multi-factor experiment

Groggel and Skillings (1986) suggested a method for testing individually the hypotheses of no main effects in a multi-factor experiment which utilizes the two-way analysis of Mack and Skillings (see Section 3.2.3). The method requires that there are no interactions between the factors and that all treatment combinations are observed at least once. First of all, focus on just one of the factors and label it factor U with m levels, and regard the combinations of the levels of the other factors as the b levels of a combined factor/3. If there are nij observations in a particular cell in the original experiment, then there are still nij observations in an analogous cell in the experiment involving factors B and U. We write the model as an additive two-way layout model (50) and we wish to test the hypothesis H~-: [~-j are all equal, j = 1 , . . . , m], where 7j is the effect of the jth level of factor U, against a general alternative hypothesis that the effects of factor U are not all equal. The general formula for the Mack-Skillings statistic is given in (66). For factorial experiments with a large number of factors, it is usual that a small equal number n of observations are taken per treatment factor combination, in which case the test statistic reduces to (69). We take this equal number n of cell observations to be the case throughout the rest of this section. Let Rijk be the rank of Yijk within the m n observations on the ith level of the combined factor B. Let b

n

i=1 k=l

The decision rule for testing H~" against a general alternative hypothesis, as given by (70), is

Nonparametric analysis of experiments

751

reject Hff if and only if 12

( N + b)] 2

~[

r a ( N + b)

j=l

2

mSj

>~ w a ( b , m , n , . . .

,n),

(103)

where N = bran and w=(b, re, n , . . . , n) can be obtained from the table in Mack and Skillings (1980) for small values of b, m, and n, and can be approximated by X ~ ( m - 1) if either n or b is large (see Groggel and Skillings, 1986). The test is applied to each factor in turn, the remaining factors forming the combined factor/3.

4.2.2. Example - multi-factor experiment, main effect tests We use factors C, D and E from Experiment 1.2 as an illustration of Groggel and Skilling's two-way procedure, and we ignore the other five factors as though they had not been part of the experiment. (Since this method is not appropriate for fractional factorial experiments, it cannot be used to test the significance of the eight factors in the entire experiment.) We use the mean thickness response (averaged over seventy levels of noise variables) for this illustration. The levels of the three factors C, D and E and the observed mean thicknesses are shown in Table 10. (Note that the treatment combinations have been reordered from Table 1.) Table 10 Ranks of mean thickness of epitaxial layer - factors C, D and E Mean thickness

C

D

E -1 -1

14.821 14.757

14.878 14.843

-1 1

-1 -1

14.888 14.921

14.932 14.415

-1 1

-1 -1

14.165 14.037

13.972 13.907

-1 1

1 1

13.860 13.880

14.032 13.914

-1 1

1 1

Rij k

Rij.

2 1

4 3

6 4

1

2

4

6

1

3

1

4

-1 -1

4 3

2 1

6 4

1 2

4 3

5 5

1 1

First, let U be factor C with m = 2 levels, and take the b = 4 levels of factor /3 to be the four levels ( - 1 , - 1 ) , ( - 1 , 1), ( 1 , - 1 ) and (1, 1) of the combined factors D and E. Then the ranks assigned to the m = 2 levels of factor U (C) at each of the b = 4 levels of the combined f a c t o r / 3 (D and E ) are as shown in Table 10. The corresponding cell-wise weighted average ranks (63) for the two levels of U are S ~ = 2 3 / 4 = 5.75 and S ~ = 17/4 = 4.25. The decision rule (103) for testing H~against a general alternative hypothesis is to reject H~- if and only if 12 2 ( 1 6 + 4 ) [ ( 1 1 . 5 - 1 0 ) 2 + ( 8 . 5 - 10) 2] = 1.35 > w a ( 4 , 2 , 2 , . . . , 2 ) .

752

A, M. Dean and D. A. Wolfe

From the table of Mack and Skillings (1980) for m = 2, b = 4, and n = 4, we obtain w.0o77(4, 2, 2 , . . . , 2) = 7.35. Thus at level c~ = .0077 there is not sufficient evidence to reject the null hypothesis of no effect of the levels of C on the average thickness of the epitaxial layer. On the other hand, if we let U be factor D and let B represent the combined factors C and E, similar calculations lead to the rank sums for the two levels of U being S D = 28/4 = 7 and S D = 12/4 = 3. Using the decision rule (103) for testing H~against a general alternative hypothesis for factor D, we reject Hff since 12 [(14 - 10) 2 + (6 - 10) 2] = 9.6 > 7.35 = w.0077(4, 2, 2, 2). 2(16+4) "'" The hypothesis that the levels of E have no effect on the average thickness of the epitaxial layer would not be rejected since the value of the associated test statistic is 0.

4.3. Robust general linear model 4.3.1. Tests and confidence intervals - multi-factor experiments Hettmansperger and McKean (1977, 1978) describe a robust alternative (RGLM) to the method of least squares which gives a general theory for estimation and testing of parameters in the linear model without requiring assumptions about the common distribution of the independent error variables. Recent discussion articles on the method are given by Draper (1988), Aubuchon and Hettmansperger (1984) and McKean and Vidmar (1994). Let 3, be the vector of parameters corresponding to a complete set of linearly independent contrasts in all treatment and block effects (but excluding the general mean) and let X c be the corresponding design matrix, whose columns contain the contrast coefficients (so that the column sums of the X~ matrix are zero). We can then write the general linear model in matrix terms as Y = 10 + X c ~ ' + E where Y ' = [ Y ] , . . . , 1~)] and E ' = [ E l , . . . , EN] are the vectors of response and error variables, respectively, 1 is a vector of l's, and 3, is the vector of location parameters. Temporarily, we assume that the common distribution of the errors is symmetric around zero. We write the qth row of the matrix model as Yq = O+xq'7+Eq with Xq representing the qth row of the design (contrast) matrix Xc. We let y and e represent the "observed" values of Y and E , respectively. The traditional method of least squares selects an estimator "~ of "7 to minimize the sum of squares of the errors; that is, to minimize N N e r e ~- E eq: = q=l q=l

Nonparametric analysis of experiments

753

In a similar fashion, the RGLM method selects an estimator ~ of 7 to minimize the sum of the weighted errors, N

D(~') = E

N

[a (R(eq))] eq = E

q=l

a (Rq)eq,

(104)

q=l

where, for q = 1,... ,N, Rq = R(eq) is the rank of eq among all Net's and a (Rq) is some function of Rq, called the score. The most commonly used scores are the Wilcoxon scores, defined as (105) These scores ensure that negative and positive residuals of the same magnitude have equal weight in the minimization and that larger residuals have more weight than smaller ones (in the same way that a large squared residual has more weight than a small one in the method of least squares). It is possible to modify the scores so that residuals corresponding to outlying observations play little or no part in the minimization, as discussed by Hettmansperger and McKean (1977) and by Draper (1988), who also lists several other types of scores. Using the Wilcoxon scores, the RGLM estimate of 3' is then the value ~, that minimizes

D(~') = E

q=l

x/~

NT1

(106)

An experimental version of an algorithm which minimizes this function is available in the r r e g routine of all versions of MINITAB since version 8. McKean and Vidmar (1994) state that a SAS algorithm is currently under development and that a PC algorithm can be obtained from those authors. For a single replicate or fractional factorial experiment, the robust contrast estimates coincide with the estimates obtained from the method of least squares, although the test statistic, discussed below, differs. The same is true of any fully parameterized model with two observations per cell. The test was originally developed as an asymptotic test and, as such, is best with large numbers of observations per cell. In factorial experiments, large sample sizes rarely occur, and an approximation for small samples will be discussed below. In such cases, stated significance levels should be interpreted with caution. Suppose that the null hypothesis H~r: [Hq, = 0] is to be tested against the general alternative hypothesis H~: [ H 7 # 0], where H ' 7 represents a vector of h linearly independent estimable functions of the parameters. Then, as in the general linear model approach, a test statistic can be obtained by comparing the value D(Tn) of

754

A. M. Dean and D. A. Wol'fe

D(3") (106) for the reduced model under H i with D ( ~ F ) for the full model. The test statistic is

where 6 is an estimator of 6. For symmetric error distributions, 6 is taken by Hettmansperger and McKean (1977) to be

where W I , . . . , WN(N+I)/2 are the pairwise Walsh a v e r a g e s (~q + ~ p ) / 2 , with ~i = (yi - x ~ ) . Also, W@) is the ith largest among W1, • • •, W N ( N + O / 2 , and t is the lower critical point of a two-sided level c~ Wilcoxon signed-rank test, which is approximated by

The estimator 6 is called the H o d g e s - L e h m a n n e s t i m a t o r and is calculated in the HINITAB r r e g routine using c~ = 0.10 in (108). When the error distribution is not symmetric, the Wilcoxon scores can be replaced by the sign scores and the estimator of 6 by a kernel density estimator or a two-sample Hodges-Lehmann estimator (see Hettmansperger, 1984, 244-250 and Draper, 1988, 253). For a large number of observations per cell, the test statistic FR (107) has an approximate chi-square distribution divided by its degrees of freedom, h. However, for small numbers of observations per cell, the true distribution of FR is better approximated by an F ( h , N - h - 1) distribution. The test of the null hypothesis H~ against the general alternative hypothesis H~ at significance level c~ is then

where F ~ ( h , N - h - 1) is the upper c~th percentile of an F-distribution with h and N - h - 1 degrees of freedom. The sample estimates ~ of a set of linearly independent contrasts 3' and their corresponding estimated standard errors (given by the diagonal elements of 6 x / ( X ~ X c ) -1 ) can be obtained from the MINITAB r r e g routine. A standard set of linearly independent contrasts that can be used in a model with two factors are the treatment-control contrasts f l l - f l i , 7"1--Tj and the interaction contrasts (/3T)11 --(/3~-)il --(j3~-) U +(/3T)ij (i = 2 , . . . , b; j = 2 , . . . , ra). If there are more than two factors, then the standard set of contrasts would be extended by including similar sets of contrasts involving the extra factors, together with higher order interaction contrasts. Any other contrast can be written as a linear combination, I'3", of the standard contrasts. The contrast estimator

Nonparametric analysis of experiments

755

Table 11 Design matrix X~ for the welding experiment 1 1 1 1 1

--1 --1 --1 -I -l 0 0 0 0 0

1 1 1 1 1

1

1 0

-1 0 0 0

1 0 0

-1 0 0

1 0 0 0

-i 0

-1

2 1 0 -1 -2

-2 1 2 1 -2

0 1 1 1 1 0 0 0 --1 0 0 0 0 0 0 0 --1 0 0 0 0 0 0 0 --1 0 0 0 0 0 0 0 --1 0 0 --1 1 1 1 1 --2 2 --1 --1 0 0 0 --1 --1 --1 0 -1 0 0 0 --2 --1 0 0 --1 0 1 --1 --1 0 0 0 --1 2 2

-2 -1

2 0 1 2

4 2 0 --2 --4 --2 --1 0 1 2

-1 -2 -I 2

--4 2 4 2 --4 2 --1 --2 --1 2

would then be l'~,, with the estimated standard error ~ ~ / l ' ( X ~ c X c ) - l l . Confidence intervals for a set of contrasts can be calculated using the usual Scheff6 method of multiple comparisons; that is, a set of 100(1 - ~ ) % simultaneous confidence intervals for any number of contrasts of the form l' 7 is given by

l'~, =t=v/hF,~(h, N - h - 1)

~v/lt(XtaXa)-lg

(111)

(see Hettmansperger and McKean, 1978).

4.3.2. Example - R G L M method, multi-factor experiment We use the welding data of Experiment 1.4 to illustrate the R G L M approach. There are two treatment factors with 3 and 5 levels, respectively, and an interaction is expected. A fully parameterized model with two observations per cell will lead to contrast estimates identical to the least squares estimates. For illustration purposes we include in the matrix model just the treatment-control contrasts/31 - / 3 i and "q - ~-j, and the linearly independent trend contrasts representing the linear x linear, linear x quadratic, quadratic x linear and quadratic x quadratic components of the interaction (obtained from a table of orthogonal polynomials). The X c matrix is then as shown in Table 11, but with each row repeated twice, corresponding to the two observations per cell. If the columns of X e are entered into the Minitab r r e g command, the minimum value o f D ( ' y ) (106) for the full model (with main effects and the four interaction trend contrasts) is obtained as D ( ~ ' r ) = 118.06. Dropping the last 4 columns of the X ~ matrix (which correspond to the four interaction contrasts) gives the the minimum value of D ( ' y ) for the reduced model under the hypothesis o f negligible interaction contrasts as D ( ~ n ) = 136.44. These values are obtained, together with the value of = 5.359, from NTNITAB r r e g by specifying that the hypothesis o f interest is that the last 4 parameters (contrasts) are all zero. The statistic F n for testing the hypothesis

756

A. M. Dean and D. A. Wolfe

H~ ad (54) against an alternative hypothesis that the linear x linear, linear x quadratic, quadratic x linear and quadratic x quadratic c o m p o n e n t s of the interaction are not all zero is then given by (107) as F R = ( 1 3 6 . 4 4 - 1 1 8 . 0 6 ) / 4 _ 1.71. (5.359/2) Since/;0.o5(4, 25) = 3.06, the hypothesis that the low order interaction trend contrasts are negligible is not rejected at significance level c~ = 0.05. In Section 3.2.2, the hypothesis of no rank interaction was rejected at significance level just u n d e r 0.05. The reason for the apparent contradiction is that the strongest interaction trend contrasts are the quartic contrasts, and these were assumed negligible in the above m o d e l (as is often done in practice w h e n low order trends are fitted). T h e m a i n effect treatment-control contrasts are estimated by the M i n i t a b rreg routine as /3l~"~'-/32 = - 2 . 1 1 rl - T2 = 2.73 A

/31~"~/33 = 3.82 rl -- 3t. (Recall that a construction for M O L S of prime and prime-power order is given in Subsection 3.4.) CONSTRUCTION 4.2 (Sen and Mukerjee, 1987). Let L and N be two M O L S of order t. Let the ith column of L be Li and the ith column of N be N i and let Gh = [Lh Nh hj], 1 ~ h ~< t. Let G = [Gt, G 2 , . . . , Gt] and H i = G + i ( m o d t ) where this addition is performed on each entry of the matrix. Then the strongly balanced, uniform COD is

The Latin squares of order 3 in Table 2.1.6(a) give the arrays in Table 4.2.3. Balanced uniform designs can be smaller than strongly balanced uniform designs and these designs are universally optimal for the estimation of direct treatment effects when no treatment may be applied in successive periods and the designs are uniform on the units and the last period. The classic example is the balanced Latin square; it is usually called a column-complete Latin square in the combinatorial literature. If each treatment need only be adjacent to every other equally often (without

Block and other designs used in agriculture

783

regard to order), as might be the case in a field trial, for instance, then the squares are said to column-quasi-complete (or partially balanced). A similar nomenclature is used for row-column designs. (Extensions to non-square arrays are described in Subsection 5.2.) Constructions of column-complete and column-quasi-complete Latin squares have been given by E. J. Williams (1949); extensions to row-column designs have been given by Freeman (1979) and Street (1986). E. J. Williams (1949) construction is particularly easy to describe. CONSTRUCTION 4.3 (E. J. Williams, 1949). (a) There is a column-complete Latin square of order 2ra for every integer m ) 1. (b) There is a column-quasi-complete Latin square of order 2m + 1 for every integer m ~> 1 and a pair of Latin squares of order 2 r a + 1 which are together column-complete for every integer m / > 1. PROOF. (a) Let the Latin square be L = (lij), 1 1, let lij = l i l -k l U - 1 (mod2m). Clearly L is a Latin square. The sequence of differences from adjacent positions is {2-1,

3-2m,

2m-2,

4-(2m-1),...,(m+l)-(m+2),

2m-l-3,...,m+2-m}

= {1,3,5,...,2m-

1,2m- 2,2m-

4 , . . . , 2}

which contains each of the non-zero elements modulo 2m precisely once. Thus every ordered pair of distinct treatments appears in adjacent positions exactly once in the columns of L (and, as it happens, exactly once in the rows of L). (b) If the order of the square is odd, then it is possible to construct a column-quasicomplete Latin square using the sequence 1 2 2 r a + 1 3 2m 4 . .. m m + 3 m + 1 m + 2 . To construct a pair of squares which are together column-complete use the sequence above and the sequence 12m+122m3

... m + 3 m m + 2 m + l .

These sequences can be checked as in part (a).

[]

Table 4.2.4 contains a pair of squares of order 5 constructed in this way. Variants of Williams designs are optimal designs for the second model above, the one with correlated errors. In particular let wij be the number of times that treatments i and j are adjacent in a uniform COD with t = p. If the wij (i ¢ j ) are all equal then the COD is said to be a Williams design. Suppose that d = (dij) is a Williams design. Let B be a block design with blocks (dlj, dpj), j = 1 , 2 , . . . , n. B is called the

D. J. Street

784

Table 4.2.4 A pair of column-quasi-completeLatin squares of order 5 1

2

5

3

4

1

5

2

4

3

2 5 3 4

3 1 4 5

1 4 2 3

4 2 5 1

5 3 1 2

5 2 4 3

4 1 3 2

1 3 5 4

3 5 2 1

2 4 1 5

end-pair design. If t = n and if the end-pair design is connected then d is a Williams design with circular structure. Either of the squares in Table 4.2.4 is an example of a Williams design with circular structure. If B is a BIBD then d is a Williams design with balanced end-pairs. Kunert (1985) has established various results about the optimality of Williams designs. For example, a_Williams design with balanced end-pairs is universally optimal for the estimation of treatment effects over the class of uniform CODs with t = p and is universally optimal for over all CODs if t / > 4 and p > / ( t - 2 - ~ ) / ( 2 ( t - 3 ) ) and for all p if t = 3. Other results are summarised in Street (1989). Gill (1992) has found the optimal designs when the observations are correlated and t divides both n and p if the model is Yij = ai + ~j + Td[i,j] q- OYi,j-1 q- Eij, and 0 and p are both known. For some cautionary comments about change-over designs see Senn (1988, 1993). For change-over designs when the treatments have a factorial structure see Shing and Hinkelmann (1992) (for n factors all with a prime-power number of levels), Fletcher and John (1985) (who argue that a modification of the usual definition of balance is appropriate to allow for the fact that interaction effects are not as important as main effects) and Fletcher (1987) (who gives generalised cyclic designs for two and three factor experiments with each factor having up to four levels).

5. N e i g h b o u r d e s i g n s for field trials

Designs may need to be balanced for neighbouring treatments in several different applications. We have seen that this is an important consideration in the design of change-over designs (see Subsection 4.2). It is also important in other areas such as in field trials, in designs for studying intergenotypic competition, in bioassay work when using Ouchterlony gel diffusion tests and when designing the collection of seed from seed orchards. The nearest-neighbour method for the analysis of field trials was originally proposed by Papadakis (1937) as a way of making allowance for local variation in factors such as soil fertility. Because of this variation, responses on adjacent plots tend to be more alike than do responses on plots which are well-separated. Papadakis proposed that, for each response, the response of neighbouring plots be used as a covariate. Variations of this general idea have been proposed and investigated; a thorough survey of the area has been given in Martin (1985).

Block and other designs used in agriculture

785

One-dimensional trials are ones in which the plots tend to be long and narrow and so each plot may be regarded as having only two neighbours, those adjacent on the long sides. Blocks in such designs are often called linear blocks. Gleeson and Eccleston (1992) have considered the design of field experiments for one-dimensional field trials and have found that partial neighbour balance (where any two treatments are neighbours zero or one times) is more important than having an incomplete block structure. Two-dimensional trials, in which plots are more square and so have four neighbouring plots, have not been investigated in this way but it is assumed that similar results will apply. Gleeson and Cullis (1991) describe over 1000 trials which have been conducted and analysed using the method of Papadakis or a variant and for which neighbour balanced designs are appropriate. Issues relating to the randomisation of neighbour designs are discussed by Monod and Bailey (1993). We consider both block designs, and row-column designs, with neighbour balance below.

5.1. Circular block designs In gel diffusion tests the treatments appear in wells around a circle on an agar plate and so every treatment has two neighbours and the blocks may be written circularly. However circular blocks may be used as a way of representing linear block designs with edge effects where the edge effect can either be allowed for directly in the model or can be eliminated by having border plots; Martin (1985). A neighbour design (or circular block design) for v treatments is a layout arranged in b blocks of size k such that each treatment appears r times and such that any two distinct treatments appear as neighbours (that is, in adjacent positions) A~ times. We do not require that the entries in a block be distinct, merely that a treatment never appears as its own neighbour. If a block in a neighbour design is written as (al, a 2 , . . . , ak) then we say that the neighbours of ai are ai-1 and a~+l, where the subscripts are calculated modulo k. We will denote a neighbour design by N[k, At; v]. If the design is to be used in a field trial then, to preserve neighbour balance, the linear block used must be a k ~ ~ l ~ a 2 ~ . . . ~ a k ~ a 1 . An example of a neighbour design with v = 11, k = 5 and At = 1 is given in Table 5.1.1. If the blocks are of size two then we say that each block has two pairs of neighbours. Thus by counting plots and pairs of treatments we get that vr = bk and A~(v- 1) = 2r for any neighbour design.

Table 5.1.1 An N[5, 1; 11] (1,2,0,3,7) (7,8,6,9,2)

(2,3,1,4,8) (8,9,7,10,3)

(3,4,2,5,9) (9,10,8,0,4)

(4,5,3,6,10) (10,0,9, 1,5)

(5,6,4,7,0) (0, 1, I0,2,6)

(6,7,5,8,1)

786

D.J. Street

In a series of papers (see Hwang and Lin (1978) and references therein) the necessary conditions were shown to be sufficient for the existence of a neighbour design for all values of v and k by the actual construction of the designs concerned. These have been summarised in Street and Street (1987) and further constructions appear in Preece (1994). Azais et al. (1993) give a catalogue of neighbour designs with border plots. For designs with relatively few replicates, Wilkinson et al. (1983) suggested that a good layout would have each pair of treatments on adjacent plots at most once and each pair of treatments on plots of distance two apart at most once also. They also have some desirable properties in terms of the intersections of the neighbour lists of the treatments. Design which satisfy these properties have been given by Street and Street (1985) and Azais et al. (1993).

5.2. Row-column designs The properties of complete and quasi-complete Latin squares, defined in Subsection 4.2, have been extended to larger, not necessarily square, arrays by Freeman (1979). He considers p x q arrays of v treatments in which: (i) each treatment appears equally often in the array, perhaps further restricted to appear equally often in rows, or columns or both; (ii) no treatment appears adjacent to itself in either rows or columns; (iii) the array has either directional balance (each ordered pair of distinct treatments appears adjacent equally often in rows and in columns) or non-directional balance (each pair of distinct treatments appears adjacent equally often in rows and columns considered together). Thus a complete Latin square has p --- q = v and has directional balance while a quasi-complete Latin square has non-directional balance. Table 5.2.1 gives another example of an array with non-directional balance. For instance, the neighbours of treatment 1, in row 1, are 2, 2 and 4. Note that each treatment appears 9 times in the array but that the treatments can not appear equally often in each row (or column) of the array. On occasion balance is obtained by including some border plots, where varieties are grown but the responses are only used as 'neighbour plots'. Constructions for (non-) directionally balanced designs, with and without borders, have been given by Freeman Table 5.2.1 A non-directionally balanced design w i t h p = q = 6 and v = 4

1 3 1 4 2 3

2 1 3 2 4 1

1 2 1 4 3 4

4 3 4 1 2 3

3 4 2 3 1 2

2 I 4 2 3 4

Block and other designs used in agriculture

787

(1979) and Street (1986). Freeman (1988a) has studied nearest neighbour row-column designs for three and four treatments in detail. Morgan (1988, 1990) gives designs with directional balance and he also balances for diagonal neighbours as well. This is important in polycross experimentation for adequate mixing of the genotypes. Other designs with neighbour balance exist, for instance the serially balanced sequences of R. M. Williams (1956), described in Street and Street (1987), and extended by Monod (1991). Wild and E. R. Williams (1987) give an algorithm for making neighbour designs from c~-arrays.

6. Factorial designs and fractional factorial designs Factorial designs have been much used in agriculture since their introduction by Fisher in 1923. One recent example may be found in Iremiren, Osara and Okiy (1991). We consider designs with k treatment factors where factor i has si levels, 1 ~< i ~ k. We call this design a sl x s2 x ... x sk factorial design. If sl = s2, say, then we write s T x ..- x sk. We call any combination of treatment factor levels a treatment combination and write it as a k-tuple with the ith entry being from a set of si symbols. If all the factors have the same number of levels we say that the factorial design is symmetrical (or sometimes pure); otherwise we say it is asymmetrical (or sometimes mixed). A complete factorial design has each treatment combination appearing an equal number of times; an incomplete factorial design (sometimes called a fractional factorial design) has only some of the treatment combinations appearing in it. An effect is said to be confounded if it can not be distinguished from the effect of blocks and two treatment effects that can not be distinguished are said to be aliased. The defining contrasts subgroup is the set of all contrasts which are confounded with blocks in a factorial design. The choice of the fraction to use is non-trivial, as it is necessary to be sure that no effects of importance are confounded. It is possible to block both complete and fractional factorial designs. Again the allocation of treatment combinations to blocks needs to be done with a view to the confounding, and aliasing, that will result. The construction of fractional factorial designs, and incomplete blocks for complete or fractional factorial designs, can be done either by using the theory that has been developed by various authors or by making use of published tables and algorithms (see Subsection 6.4).

6.1. Symmetrical prime-power factorials There is an extensive theoretical development of these designs available. We begin by considering a small example. Consider a 32 factorial design with factors A and B. Then the treatment levels for each factor are represented by the elements of {0, 1,2}. Hence we get the 9 treatment combinations {(00), (01), (02), (10), (11), (12), (20), (21), (22)}. We would

788

D.J. Street

like to subdivide the treatment sum of squares into three parts, corresponding to the main effect of each of the factors and the interaction effect of the factors. To calculate the sum of squares of the main effect of A we need to compare the responses to the three levels of A, independently of the level of B applied. Hence we want to compare the responses of the treatment combinations in the three sets {(00), (01), (02)}, {(10), (11), (12)} and {(20), (21), (22)}. In each of these sets the level of A is constant. We write P ( 1 , 0 ) or A to represent these three sets. To calculate the A sum of squares we choose any two orthogonal contrasts on these sets and calculate the corresponding sum of squares. The sum of these two contrast sum of squares is the A sum of squares and it is independent of the particular two orthogonal contrasts chosen. For example we might use the orthogonal contrasts ( - 1 , 0 , 1) and ( - 1,2, - 1). Then, if y~j is the response observed for treatment combination (i, j), the observed A sum of squares is (Y20 q- Y21 q- Y22 -- (Y00 Jr- Y01 -}- Y02))2/2 -}- (2(y10 + Yll + Y12) -- (Y20 q- Y21 -~" Y22 q- YO0 q- YO1 q- Y02))2/6 •

We define the main effect of B similarly using the sets {(00), (10), (20)}, {(01), (I 1), (21)} and {(02), (12), (22)}, the entries of P(0, 1) or B. To calculate the interaction sum of squares we must first define the sets of treatment combinations which are the same. We would like to compare the responses in sets in which the pair of levels of A and B are in some sense constant. For instance, one set of three sets could be {(00}, (12), (21)}, {(01), (10), (22)}, {(02), (11), (20)}. In these sets the sum of the levels of A and B are always equal. We represent these sets as P ( 1 , 1 ) or AB. Then the other set of sets has {(00), (11), (22)}, {(01), (12), (20)} and {(02), (10), (21)}. In these sets the sum of the level of A and twice the level of B is a constant so we represent these by P ( 1 , 2 ) or AB 2. Again we calculate the sum of squares of each partition using a pair of orthogonal contrasts; the interaction sum of squares is the sum of these two partition sum of squares. Observe also that any two sets in each of the four partitions, P ( 1 , 0 ) , P(0, 1), P(1, 1) and P ( 1 , 2 ) , of the treatment combinations have only one treatment combination in common. We now formalise the ideas of the preceding example. To do this we let GF[s] denote the Galois field of order s, where s must be a prime or a prime power. If s is a prime then GF[s] is Zs, the integers modulo s. If s is a prime power then a more complicated set must be used; see Street and Street (1987), for example, for an elementary account of finite fields. Consider an s k factorial design where s is a prime or a prime power. We define the partition P ( 1 , 0 . . . . ,0) = { { ( X l , . . . , xk) I Xl = 0}, 0 c GF[s]}. Sometimes we write A for P(1,O,... ,0). Each set in the partition has s ~-I elements in it and the sum of squares of the main effect of the first factor is calculated by using a set of s - 1 orthogonal contrasts on the sets in the partition. For the interaction effect of the first two factors, say, we need to calculate (s - 1) partitions, P(1, c~,0,... ,0) = AB% a E GF[s], c~ # 0, where P ( 1 , c ~ , 0 , . . . , 0 ) = { { ( x l , . . . , x k ) I x, + c~x2 = 0}, 0 E GF[s]}. The sum of squares of the interaction effect is calculated by using a set of s - 1 orthogonal contrasts on the sets in each of the partitions. Higher order interactions

Block and other designs used in agriculture

789

are defined similarly. For example, to estimate the interaction effect of the first three factors we need to calculate the (s - 1) 2 partitions, P(1, a,/3, 0 , . . . , 0) = AB'~G ~, ix, j3 E GF[s], a # 0,/3 ¢ 0, where P(1, a,/3, 0 , . . . , 0) = { { ( z l , . . . , zk) I zl + a x 2 + /3z3 = 0}, 0 c GF[s]}. (Note that P ( a , / 3 , 7 , 0 , . . . ,0) -- P(1,a-1/3, a - 1 7 , 0 , . . . ,0), as axl +/3x2 + 7x3 = r if and only if xl + O~-1/3X2 + O~-I"/X3 -----O/--1T. Hence we may assume without loss of generality that any partition has a 1 as the first non-zero entry.) When it is necessary to block the treatment combinations in an s k factorial design, the sensible approach is to use as blocks the sets of one of the partitions corresponding to part of a high order interaction. This would give blocks of size s k-1 and the component of the effect which provided the blocks could not be estimated. Hence we say that the effect and the blocks are confounded. If the partition corresponding to a main effect is chosen to provide the blocks then it can not be estimated. If one of the partitions corresponding to an interaction effect is chosen to provide the blocks then an estimate of the interaction effect can be obtained, but only from the other partitions. In the 32 example, for instance, use the partition P(1, 1) = A B to give the three blocks {(00)}, (12), (21)}, {(01), (10), (22)}, {(02), (11), (20)}, each of size 3. Then the interaction effect can only be estimated from P(1,2) --- A B 2 and so it is estimated with 50% efficiency. More generally if an interaction effect of h factors is chosen to provide the blocks then that interaction effect can be estimated with efficiency 1 - (s - 1)/(s - 1) h (Yates, 1937). If smaller blocks are required then the usual approach is to use the intersections of sets from two, or more, partitions, as blocks. For example, in a 33 design suppose we want to use blocks of size 3. There are 13 partitions associated with this design; one for each of the three main effects, two for each of the three two-factor interactions and four for the three-factor interaction. Suppose we decide to use the intersections of the sets in P(1, 1, 1) = A B C and P(1, 1,2) = A B C 2 to be the blocks. Now P(1, 1, 1) = {{(Xl,X2, X3) [ xl + x2 + x3 = 0}, 0 = 0, 1,2} and P(1, 1,2) = {{(xl,x2, x3) [ xl + x 2 + 2 x 3 = 0}, 0 = 0,1,2}. Thus the sets in P(1,1, 1) are {000,012,021,102,201,120,210, 111,222}, {001,010,022, 100,202, 121,211,112, 220} and {002,011,020, 101,200, 122,212, 110,221}. The three sets in P(1, 1,2) are {000, 120,210, 101,202,112,221,011,022}, {(001,121,211,102,200, 110,222, 012,020} and {002, 122,212, 100,201, 111,220,010,021}. We get the blocks in the design by intersecting, in turn, each set from P(1, 1, 1) with each set from P(1, 1,2). Thus the blocks obtained by intersecting the first set of P(1, 1, 1) with each of the sets in P(1, 1,2) in turn are {000, 120,210}, {102,222,012} and {201, 111,021}. The remaining 6 blocks are obtained in the same way from the other two sets of P(1, 1, 1). The blocks are {202, 112,022}, {001,121,211}, {100,220,010}, {101,221,011}, {200,110,020} and {002, 122,212}. Observe that the intersection always gives a set of size 3. Also note that we have 9 blocks and so the block sum of squares has 8 degrees of freedom. There are 2 degrees of freedom associated with each of P(1, 1, 1) and P(1, 1,2). To identify the other degrees of freedom suppose that (Xl, X2, X3) satisfies xl + x2 + x3 = a (corresponding to a set in P(1, 1, 1)) and xl +x2+2x3 = / 3 (corresponding to a set in P(1, 1,2)). Then we know that x3 = / 3 - a and Xl + x2 = a - / 3 + a = 2a -/3. Hence all the treatment

790

D.J. Street

combinations in a block have a given level of factor C and the sum of the levels of factors A and B is a constant in each block. Hence the main effect of the third factor and the two factor interaction of the first two factors have also been confounded with blocks. We call these effects, that have been confounded as a consequence of the two effects that we have chosen to confound, generalised interactions. Note that had we chosen to use the partitions P ( 0 , 0 , 1) and P ( 1 , 1, 1), say, to obtain the blocks then the generalised interactions would be xl + X2 -[- 2Z3 = Ot -[-/~ and xa + x2 = a - / 3 , since we know that xl + x2 + x3 = a and x3 = / 3 , say. Three effects are independent of each other if the third effect is not the generalised interaction of the first two. (We have seen that the ordering of the effects is not important.) When constructing an s k factorial design in blocks of size s m we need to choose ra independent effects to confound. We then need to calculate the s k - m - m generalised interactions of these effects to be sure that we have not inadvertently confounded an effect of interest. Unfortunately there is no fast way of doing this in general, although for some small designs tables exist; see Subsection 6.4. The set of the m independent effects and the s k-'~ - m generalised interactions together form the defining contrasts subgroup mentioned earlier. It is possible to get the blocks directly from the equations of the confounded effects. Consider the 33 example again. The block Bo = {000, 120,210} has the treatment combinations which satisfy xl + x2 + x3 = 0 and xl + z2 + 2x3 = 0. Suppose that x and y are both in B0. Then x + y = (Xl + y l , x z + y2, x3 + Y3) is also in /30 since x I -}- Yl + x2 + Y2 -t- x3 q- Y3 = 0 and xl + yl q- x2 -~ y2 q- 2(x3 q- Y3) = 0. We say that Bo is closed under component-wise addition and forms a subgroup of the set of all treatment combinations. Now consider/30 + (001) = {001,121, 211}, say. This set consists of all the treatment combinations in which xl + x2 + x3 = 1 and xl + x2 + 2x3 = 2 and so is the intersection of the second set of P ( 1 , 1, l) with the third set of P ( 1 , 1,2). All the other blocks can be obtained from/30 by adding another treatment combination that has not already appeared in the design. The other blocks are called the cosets of/30. In general the block in which the treatment combination ( 0 0 . - . 0) occurs is called the principal block. All other blocks are obtained from it by the addition of a treatment combination which has not yet appeared in the design. On occasion it is not possible to find enough experimental units to allow every treatment combination to appear in the experiment. Hence a fractional factorial experiment, in which only a subset of the treatment combinations appear, is conducted. Most commonly the fraction that is used is obtained by using one block chosen at random from a confounded factorial with blocks of the right size. Now there will be some interaction effects which can not be distinguished from each other. Such effects are said to be aliased. Suppose that for a 33 factorial design we could only find three experimental units. Using the principal block of the design we discussed above, the treatment combinations that appear are 000, 120 and 210. We know that the interactions confounded with blocks in the original design have partitions P(1, 1, 1), P(1, 1,2), P ( 1 , 1,0) and P ( 0 , 0, 1). Hence for these effects we have no information at all from this design since the three treatment combinations appear in the same set in each of these partitions. These effects are said to form the defining contrast. For every other partition

Block and other designs used in agriculture

791

we have one treatment combination from each set in the partition. For instance, for P ( I , 0, 0) we get the sets {000}, {120} and {210}. This is the same as the partition for P(0, 1,0), for P(1, 1,0) and so on for each of the effects not in the defining contrast. Hence all the effects not in the defining contrast are aliased with each other. Suppose that for a 33 factorial design we could find nine experimental units. If we use the partition P(1, 1, 1) as the defining contrast and again choose the principal block, then we have the treatment combinations 000, 012, 021,102, 201,120, 210, 111 and 222 in the experiment. To determine the aliasing here we note that every treatment combination in the experiment satisfies xl + z2 + x3 0. Suppose that we want to determine the sets in P ( 1 , 0 , 0). Then we also require that xl = 0, 0 = 0, 1, 2. Hence we see that we also know that x2+x3 = - 0 = 20 and that 2xl + Z z + X 3 = 0. But this second equation is just saying that xl + 2xz + 2x3 = 20. Thus the sets in P ( 1 , 0 , 0), /9(0, 1, 1) a n d / 9 ( 1 , 2 , 2) are the same in this fraction and so these effects are aliased. Similarly the sets in P(0, 1,0), P ( 1 , 2 , 1) and P ( 1 , 0 , 1) are the same, as are the sets i n / 9 ( 0 , 0 , 1),/9(1, 1,0) and/9(1, 1,2) and the sets i n / 9 ( 1 , 2 , 0 ) , P ( 1 , 0 , 2 ) and P(0, 1,2). There are 9 treatment combinations in the experiment. There are therefore 8 degrees of freedom available and every partition corresponds to two degrees of freedom. Hence we expect that there will be 4 independent partitions. As there are (33 - 1)/2 = 13 partitions and one is used as the defining contrast, there will be four sets, each with three aliased partitions. Thus we have determined the aliasing scheme for this design. In general one determines the aliasing scheme by first determining the confounded effects, including the generalised interactions, for the corresponding blocked complete factorial and then systematically calculating all the partitions which are the same in the fractional design. Note that the aliasing does not depend on the particular block chosen from the original, blocked design. We close this subsection with an example of constructing a fractional factorial design in which there are four factors, each with 7 levels. Suppose that 49 units are available for the experiment. We decide that the partitions/9(a, b, c, d) a n d / 9 ( w , x, y, z) will correspond to the defining contrast. The generalised interactions that are also confounded are given by c~(a, b, c, d) +/3(w, x, y, z), c~,/3 E GF[7], c~,/3 not both 0. Ideally we would like all of the defining contrasts and generalised interactions to involve four factors. However this is not possible since if w, say, is non-zero then as/3 takes each of the values in GF[7] so does/3w. Hence for any choice of a, there will be a value of a such that aa+/3w = 0. So assume that d = 0 and that a = b = c = 1. Then the generalised interactions are of the form (~ +/3w, ~ +/3x, e~ +/3y,/3z). We want z to be non-zero. Then we do not want all of w, x and y to be zero, otherwise we will be confounding the main effect of the fourth factor. Try w = 1. Then a + / 3 w = c~ + / 3 . If c~ + / 3 = 0 then we want c~ + / 3 x and c~ + / 3 y to be nonzero (so that the generalised interaction involves three factors). Hence we want that / 3 ( x - 1) ~ 0 a n d / 3 ( y - 1) ~ 0. T h u s x ~ 1 a n d y ~ 1. T r y z = 1, x = y = 2. Then the defining contrasts are (1, 1, 1,0) and (1,2, 2, 1). The generalised interactions are (0, 1, 1, 1) = 6(1, 1, 1 , 0 ) + (1,2,2, 1) and ( 1 , 0 , 0 , 6 ) = 2(1, 1, 1 , 0 ) + 6 ( 1 , 2 , 2 , 1). This confounds part of a two-factor interaction. Can we do better? Try x = 2 and y = 3. Then the generalised interactions are of the form (a +/3, a + 2/3, a + 3/3,/3). If =

792

D.J. Street

c~+/3 = 0 then this is (0,/3, 2/3,/3) and if c~+/3 = 1 then this is (1, 1 +/3, 1 + 2/3,/3). These always involve at least three factors. Hence the treatment combinations to be used in the experiment are those in one block of the blocked design with principal block {(xl, x2, x3, z4) I Xl + x2 + x3 ~-- 0, Zl + 2x2 + 3x3 + Z4 = 0}.

6.2. Single replicate generalised cyclic designs There are various families of asymmetrical factorial designs available in the literature. One of the largest and easiest to construct is the generalised cyclic designs. Let Zs~ be the integers modulo si. In a S l x sz x ... x sk factorial design use the integers modulo s~ to represent the si levels of factor i, 1 ~< i ~< k. Let Gt be the abelian group of treatment combinations with addition defined component-wise. A single replicate factorial design in incomplete blocks is said to be a generalised cyclic design if one block,/30, say, is a subgroup of Gt and the other blocks are the cosets of/30 in Gt./30 is called the principal block of the design. If/30 can be generated by g treatment combinations then the generalised cyclic design is called a g-generator design. We see that the blocked factorial designs of the previous subsection are all examples of generalised cyclic designs. For instance consider a 33 factorial design in which/30 has the treatment combinations 000, 012, 021, 102, 201, 120, 210, 111 and 222. So/30 is one set in P(1, 1, 1). Then/30 is also the principal block of a 2-generator design, as each treatment combination in/30 is a linear combination of 012 and 102. We write B0 = (012, 102). In a 23 × 33 design, where by convention the first three factors have 2 levels and the next three factors have 3 levels, with B0 = (011120, 110120) = {000000, 000120, 000210, 011000, 011120, 011210, 110000, 110120, 110210, 101000, 101120, 101210}, another block of the design is B0 + 000001 = {000001,000121,000211,011001, 011121,011211, 110001, 110121, 110211, 101001, 101121, 101211}. We need to be able to calculate the defining contrasts subgroup of any generalised cyclic design. To do this we let a be the least common multiple of sl, s 2 , . . . , sk, written a = lcm(sl,s2,...,sk). Let Gc = {(Cl,...,ck) I ci E Zs,} be an abelian group with addition defined component-wise. For any treatment combination x and any c E Go, define [c, x] = ~i(cixia/si) (modulo a). Then the annihilator of Bo, (/30) °, is (Bo) ° = C = {e I c E Go, [e, x] = 0 Vx E/30}. The confounding scheme of the design constructed from/3o is completely specified by the elements in C as each element c C C represents a single degree of freedom from the interaction of the factors corresponding to the non-zero elements of c. C is called the defining contrasts

subgroup. Consider the 33 design with/30 = (012,102). Then C = (/30) ° = {000, 111,222}. This is what we expect since Bo, the principal block, is one of the sets in the partition P(1, 1, 1). Various results about the calculation of the defining contrasts subgroup for a generalised cyclic design have been obtained by Voss and Dean (1988). We summarise these below and refer the reader to Voss and Dean (1988) for the proofs.

Block and other designs used in agriculture

793

The first result is for symmetrical one-generator generalised cyclic designs. THEOREM 1 (Voss and Dean, 1988). Let Bo be the principal block of a one-generator sk factorial design with generator x = (xl, • • •, xk ). Define the following three subsets of G~ by (i)

=

c I

= 1,

= o, i ¢ j } ,

(ii) C2 = Ui: ~ # o { c I ci = s/gcd(xi, s), cj = O, i # j } , (iii) C3 = Ui:~¢o, ~j~o{e I c~ = s - (xy/g), cj = x J g , Cm = 0 otherwise, where g = gcd(xi, xj)}. Then C = (/30) ° is generated by C1 t3 C2 U C3. Consider the principal block B0 = (120) of a 33 factorial design. Then C1 = {001}, C2 = 0 and C3 = {110}. Thus C = {000,001,002, 110,220, 111,112,221,222}. Hence we see that we have confounded two degrees of freedom of the main effect of the third factor, two degrees of freedom of the two factor interaction between factors 1 and 2, and four degrees of freedom of the three factor interaction. This is the same as we found in terms of the partitions of the previous subsection. The next result extends Theorem 1 to symmetrical g-generator generalised cyclic designs. THEOREM 2 (Voss and Dean, 1988). Let t3o be the principal block of a g-generator s k factorial design with generators x l , . . . , Xg. Let (xi) be the block generated by xi, i = 1 , . . . , g . Let Di = (xi) °. Then C = (Bo) ° = DI n . . . C / D g . Returning to the 33 design with Bo = (012, 102), let xl = 012 and x~ = 102. Then DI = (100,011) = {000, 100,200,011,022, 111,122,211,222} and D2 = (010, 101) = {000,010,020, 101,202,111,212, 121,222}. Thus C = {000, 111, 222} and this is consistent with the previous discussion of this example. The final result in this subsection extends the preceding theorem to g-generator generalised cyclic designs. Let B0 be the principal block of a g-generator S~ l × S~2 X . . . × sknk factorial design with generators x l, . . . , x a. Let wil be the first ni positions of xi, let x~2 be the next n2 positions of xi and so on. Then Bo is also generated by gk generators (Xil, 0 , . . . , 0), (0, x2i, 0 , . . . , 0 ) , . . . , ( 0 , . . . , 0, xik), 1 ~< i ~< g, provided that gcd(si, sj) -- 1 for i # j. THEOREM 3 (Voss and Dean, 1988). Let Bo be the principal block of a g-generator s~ 1 × s~ 2 × . . . × s~ k factorial design with gcd(si, sj) = 1 for i # j andwith generators X l , . . . , x 9. Let Boi = (Xl~,W2i,...,Xgi) and let Di = (Boi) °, i = 1 , . . . , k . Then c = (Bo) ° = { c I c =

e Dd.

Recall the 23 x 33 design with Bo = (011120, 110120) discussed above. Since Bo has 12 treatment combinations in it, there are 18 blocks in the design altogether. Hence 17 degrees of freedom are confounded in blocks. Now/3ol = C011, 110) and B02 = (120). Hence D1 = {@1, c2, c3) ] c2+c3 = 0, cl +c2 = 0} (where the additions are done modulo 2, of course). Thus D1 = {000, 111} = (111). We have seen above that D2 = {000,001,002, 110,220, 111,112,221,222} = (001,110). Hence (/30) 0 = (111001, 111110).

794

D.J. Street

It is possible to use the results above to construct designs with gcd(si, sj) # 1 by replacing factors with pseudo-factors with co-prime levels. Pseudo-factors are introduced for convenience in constructing the design and do not correspond to actual factors in the experiment. For instance, consider a 3 × 62 design. Replace each of the two factors with 6 levels by two factors, one with 2 levels and one with 3 levels. We then have a 2 2 × 3 3 design and we look for a suitable design within this framework. We will order the pseudo-factors so that the first and third pseudo-factors correspond to the first factor, X, with 6 levels, the second and fourth pseudo-factors correspond to the second factor, Y, with 6 levels and the fifth pseudo-factor corresponds to the third factor, Z, with 3 levels. Suppose we want blocks of size 18. Then we need 6 blocks so C has 6 elements in it. We do not want to confound main effects so we do not want C to contain any elements of the form (a, 0, b, 0, 0) (part of the main effect of X), (0, a, 0, b, 0) (part of the main effect of Y) or (0, 0, 0, 0, a) (part of the main effect of Z). Consider C = (11000, 00111) = {00000, 11000, 00111,00222, 11111, 11222). Thus we have confounded, in order, 1 degree of freedom from the X Y interaction and 4 degrees of freedom from the X Y Z interaction. B0 = {(xx, x2, x3, x4, xs) I xl + x2 = 0, x3 + x4 + x5 = 0) = (11000, 00120, 00111). Generators for some generalised cyclic designs may be found in John and Dean (1975) and Dean and John (1975).

6.3. Comparing factorial designs We now define the concepts of resolution (introduced by Box and J. S. Hunter (1961)) and minimum aberration (introduced by Fries and W. G. Hunter (1980)), both useful when comparing two, or more, factorial designs of the same size. A fractional factorial design is said to be of resolution R if no p factor interaction is aliased with another effect containing less than R - p factors. We let Rs(k, m) denote the maximum possible resolution of an s k-m factorial design. Thus we see that the higher the resolution the better the design, since it will be possible to estimate more of the interaction effects. For example, consider a 23-1 design with treatment combinations {000, 011,101, 110}. The defining contrasts subgroup for this design is C = {000, 111} and so we have that P(100) and P(011) give equal partitions, as do P(010) and P(I01), and P(001) and P(110). Hence we see that main effects and two-factor interactions are aliased and so the design is of resolution III. Now consider a 23-1 design with treatment combinations {000,001, 110, 111}. Here the defining contrasts subgroup is C = {000, 110} and so the partitions P(100) and P(010) are equal. Hence the main effects of the first two factors are aliased and so the design is not of resolution III but of resolution II. Consider the defining contrasts subgroup C. The word length of any effect in this subgroup is the number of non-zero entries that it has. If ai is the number of vectors with word length i, 1 ~< i ~< k, in the defining contrasts subgroup then the word length pattern of the design is given by w = ( a l , . . . , ak). Hence we see that the resolution of the design is the smallest i such that ai is positive. If two designs, D1 and D2, are both resolution R then we say that D1 has less aberration than D2 if the value of aR

Block and other designs used in agriculture

795

for D1 is less than the value of a n for D2. An s k-m design has minimum aberration if no other s k - m design has less aberration. For the 23-1 designs discussed above we have that the word length patterns are (0, 0, 1) and (0, 1,0) respectively, confirming the resolutions that we calculated above. The concept of aberration was introduced by Fries and Hunter (1980) in an attempt to distinguish between designs of the same resolution. The general idea is that resolution tells you only that at least one interaction of the appropriate order has been confounded; aberration gives an indication of how many effects of the appropriate order are confounded. Fries and Hunter (1980) give some constructions for minimum aberration two-level designs and Franklin (1984) improves on them, and includes some small tables.

6.4. Tables and algorithms

Published tables are limited in extent. McLean and Anderson (1984) give tables of factorial designs in incomplete blocks, and the associated confounding schemes, for 2 k, 3 k and 2k3 p designs for up to 10 factors. Similar tables appear in Colbourn and Dinitz (1996), although the minimum aberration designs are given where possible. Montgomery (1991) gives alias relationships, including some blocking options, for two-level designs with up to 11 factors and at most 64 units. Bisgaard (1994) gives blocked fractions for 2-level designs with up to 15 factors. Chen et al. (1993) give some 2 and 3 level fractional factorials with small numbers of runs. Some of the designs advocated by Taguchi are related to conventional two- and three-level factorial designs and hence some of the tables in Taguchi (1987) may be useful. See Box et al. (1988) for a discussion of the relationship between the 'conventional' factorial designs and the arrays, like Ls, advocated by Taguchi. Generators for some generalised cyclic designs may be found in John and Dean (1975) and Dean and John (1975). A number of algorithms are available. Turiel (1988) gives one to determine defining contrasts and treatment combinations for two-level fractional factorial designs of small resolution. Mount-Campbell and Neuhardt (1981) give one for enumerating the fractional two-level factorials of resolution III. Cook and Nachtsheim (1989) give one to assist in blocking factorial designs with pre-specified block sizes. Franklin (1984) gives an algorithm to construct minimum aberration designs. The DSIGN method, due to Patterson (1965), is a systematic approach to factorial design construction and is described in elementary terms in Street and Street (1987). For more details see Patterson and Bailey (1978).

7. Response surface designs A response surface model is a model fitted to a response y as a function of predictors

Xl, x 2 , . . . , xk in an attempt to get a 'mathematical French curve' (Box and Draper, 1987) that will summarise the data. Thus a response surface design is a selection of points in x-space that permit the fitting of a response surface to the corresponding

D. J. Street

796

observations y. The x-points need to be chosen so that all the parameters in the model can be estimated and the significance, or otherwise, of each one established. Response surface designs have been used by various workers in the agricultural area including Huett and Dettmann (1992), Thayer and Boyd (1991) and Pesti (1991). Much of the current interest in response surface methodology (RSM) in the statistical literature has focussed on the industrial applications to process improvement, and comparing the benefits of the RSM approach to the approach advocated by Taguchi; Lucas (1994) is one such example. In this section we will describe some of the designs commonly used in RSM and consider the attributes of 'good' response surface designs. For suggestions on analysis, graphical representation of the surface and so on see, e.g., Box and Draper (1987). We begin by describing the standard response surface model.

7.1. Polynomial approximations to the response function In this setting we assume that we can approximate the response, y, as a polynomial function of k predictors x l , . • •, xk, where the xi may be coded values of the original predictors. Typically we code the predictors so that the region of interest is centered at the origin and has a range + 1 on each axis, but this is not mandatory. In any case the range of possible values for the xi's, considered as a k-tuple, forms the feasible region for the experiment. If there are k variables then a first order polynomial model is one where E(y)

= no

+

nlXl

-]-''" "q- nkXk = nO Jr ~ - ~ n i X i • i

The ni are called the linear coefficients in the model. A second order model is one where E(y)

= to +/~lXl

q-""

-{- fl123~lX2 -}- . . .

i

-~

flkXle -[-/~llX ff -[-

q-

flk_l,kXk_lXk

i

i/ 2) sets of bl, b2,..., ba (~> 1) blocks such that the set consisting of bh blocks contains every treatment #h(~> 1) times for h = 1 , 2 , . . . , a, i.e., the set of bh blocks forms a #h-replication set of each /Z treatment(called a resolution set). Here r = ~ h = l #h. Furthermore, when #1 = #2 = . . . . #a (= #, say), it is simply called #-resolvable for # ~> I Note that Definition 4.7 corresponds to that of #-resolvability introduced by Shrikhande and Raghavarao (1964). A 1-resolvable block design is simply called resolvable in the sense of Bose (1942). One of the earliest examples of a resolvable BIB design is related to the Kirkman (1950a, b) school girl problem, which attracted many mathematicians in the late 19th and early 20th centuries. A good bibliography can be found in Eckenstein (1912). However, no complete solution was known until Ray-Chaudhuri and Wilson (1971) completely solved the problem. For practical applications, refer to John (1961) and Kageyama (1976b). These papers indicate the importance of (/zl, # 2 , . . . , #a)-resolvable block designs, with possibly varying block sizes and having #1, #2,. •., #~ not necessarily all equal. It can easily be seen that any (#1, # 2 , . . . , #~)-resolvable block design is a particular NB design. This can be formalized as follows. DEFINITION 4.8. An NB design is said to be (#1, #2,...,/z~)-resolvable if its rh = # h l v for h = 1 , 2 , . . . , a . Since the class of NB designs includes all block designs satisfying Definition 4.7, a resolution set being a superblock, a (#I, # 2 , . . . , #a)-resolvable NB design can be called simply (#1, # z , . . . , #a)-resolvable block design. Next definition concerns only those (#1,/z2,...,/z~)-resolvable block designs which have a Constant block size within each resolution set. The constant block size within the hth set (superblock) is denoted by k~ for h = 1 , 2 , . . . , a (cf. Mukerjee and Kageyama, 1985). DEFINITION 4.9. A (#1, # 2 , . . . , #a )-resolvable block design with a constant block size in each resolution set (superblock) is said to be affine (#1, # 2 , . . . , #a)-resolvable if: (i) for h = 1 , 2 , . . . , a, every two distinct blocks from the hth set intersect at the same number, say qhh, of treatments; (ii) for h ¢ M = 1 , 2 , . . . , a, every block from the hth set intersect every block of the htth set at the same number, say qhh', of treatments. It is evident that for an affine (#1, #2,. • •, #a)-resolvable block design the equalities qhh(bh -- 1) = k~(#h - 1), qhh, bh, = k~#h,, (h ¢ h' = 1 , 2 , . . . ,a) hold.

844

T. Calihski and S. Kageyama

Some properties of NB designs discussed earlier will now be specified for the resolvable block designs. THEOREM 4.3. A (#a,/~2,..., Iza)-resolvable block design is superblock orthogonal. PROOF. Note that R = (~11 :/z21 : ... :/~al), led = v(/Zl,/Z2,... , / Z a ) ' , r = r l with O~ r = }--~h=l/zh. Then (70 = r ( I - v-111 ~) which, on account of Theorem 2.1 and Remark 2.2, completes the proof. [] Theorem 4.3 can yield, on account of Definition 4.4, the following. COROLLARY 4.1. If a (~1, ]d2,..., ]~a)-resolvable block design is sub-block VB, then it is VB. Since a ( ~ 1 , ~ 2 , . • . , #~)-resolvable block design is equireplicate, the two notions of balance (VB and EB) are identical (cf. Williams, 1975). This leads to the following. COROLLARY 4.2. If a (~1, ]~2,..., #a)-resolvable block design is sub-block VB, then it is EB. An extension of Fisher's inequality (Corollary 1.2) for a (~1, ~ 2 , . . . , #~)-resolvable VB design can be given as follows. THEOREM 4.4. For a (#1, #2, . . , #~)-resolvable VB block design, the inequality b ) v + a - 1 holds. For a proof see Kageyama (1993, Theorem 3.1). As special cases, Theorem 4.4 yields some results available in Hughes and Piper (1976), Kageyama (1973, 1976b, 1978, 1984), Raghavarao (1962, 1971) as follows. COROLLARY 4.3. The inequality b >>.v + a - 1 holds f o r each class of the following block designs: (i) #-resolvable BIB design (#1 = #2 . . . . . #a = / z ) ; (ii) #-resolvable VB block design (#2 = #2 . . . . . #~ = #); (iii) (#1, # 2 , . . . , #a)-resolvable BIB designs. In particular, a (#1, # 2 , . . . , IZa)-resolvable BIB design with b = v + a - 1 is affine #-resolvable with #2 = #2 . . . . . #~ = #. When b = v + a - 1, one can have the following. THEOREM 4.5. In a (#1, # 2 , . . . , I~)-resolvable VB block design with b = v + a - 1, except when #1 = / z 2 . . . . . #a = 1, block sizes of blocks belonging to the same resolution set (superblock) are equal. THEOREM 4.6. A #-resolvable VB block design satisfying b = v + a - 1 f o r # >7 2 is an affine #-resolvable BIB design. For proofs of these theorems see Kageyama (1993, Theorems 3.2 and 3.3).

Block designs: Their combinatorial and statistical properties

845

Theorem 4.6 is an interesting result similar to the theorem due to Rao (1966) that an equireplicate binary VB block design with b = v is a symmetric BIB design. It is remarkable that Theorem 4.6 shows that there does not exist a p-resolvable VB block design with unequal block sizes satisfying b = v + a - l for a positive integer p ~> 2; the result gives a complete solution to the open problem proposed by Kageyama (1974, p. 610). Note that there exists a 1-resolvable VB block design with unequal block sizes satisfying b = v + a - 1. By a direct calculation due to Shrikhande and Raghavarao (1964), the following can be shown. THEOREM 4.7. For a p-resolvable incomplete block design involving b blocks in a resolution sets (superblocks) and v treatments with a constant block size, any two o f the following imply the third:

(a) affine p-resolvability, (b) VB, (c) b = v + a 1. THEOREM 4.8. A (Pl, P 2 , - . . , pa)-resolvable VB block design with b = v + a - 1 and a constant block size within the hth resolution set (superblock), k~ (h = 1 , 2 , . . . , a), must be affine (Pl, P 2 , . . . , p~)-resolvable with qhh = (k~2/v)[1 - (b - r ) { # h ( V -- 1)}] provided that bh >~ 2, and qhh' = k~k~,/v

(h ~ h' = 1 , 2 , . . . , a ) .

THEOREM 4.9. An incomplete block affine (#1, P 2 , . . . , p~)-resolvable VB block design must have b = v + a - 1.

Theorems 4.8 and 4.9 extend respectively the two implications '(b), (c) imply (a)' and '(a), (b) imply (c)' contained in Theorem 4.7. Thus, Theorem 4.7 can be partially extended to (Pl, P 2 , . . - , pa)-resolvable block designs. The result '(a), (c) imply (b)' of Theorem 4.7 cannot, however, be extended in general. That is, an incomplete block affine (pl, # 2 , . . . , p~)-resolvable block design with b = v + a - 1 is not necessarily VB. This point is illustrated by the following example. EXAMPLE 4.1. Consider an affine (2, 2, 1, 1)-resolvable incomplete block design with parameters v = 9, b = 12, r = 6, k~ = k~ = 6, k~ = k~ = 3, a = 6, given by the following blocks [(4, 5, 6, 7, 8, 9)(1,2, 3, 7, 8, 9)(l, 2, 3, 4, 5, 6)][(2, 3, 5, 6, 8, 9)(1, 3, 4, 6, 7, 9)(1,2, 4, 5, 7, 8)][(1,6, 8)(2, 4, 9)(3, 5, 7)1[(1, 5, 9)(2, 6, 7)(3, 4, 8)]. Clearly, here b = v + a - 1, but the design is not VB, as can be checked easily. In view of the above example, it would be interesting to determine necessary and Sufficient conditions under which an incomplete block (pl, P2, • • •, pa)-resolvable

846

T. Calihski and S. Kageyama

block design with b = v + a - 1 becomes VB. One such condition is given by the following result. THEOREM 4.10. A n incomplete block affine ( / Z l , / Z 2 , . . . , / z ~ ) - r e s o l v a b l e design satisf y i n g b = v + a - 1 is VB if a n d only if (/zh - 1 ) / ( b h - 1) = (r - a ) / ( v - 1), h = 1,2,...,a. EXAMPLE 4.2. There exists an affine 1-resolvable VB design with unequal block sizes and b = v + a - 1, whose blocks are as given by [(1,5)(2, 6)(3, 7)(4, 8)][(1,2, 3, 4) (5, 6, 7, 8)][(1, 3, 6, 8)(2, 4, 5, 7)][(1,2, 7, 8)(3, 4, 5, 6)][(1,4, 6, 7)(2, 3, 5, 8)], in which C = 4 ( I s - ~Jg).

The literature of block designs contains many articles exclusively related to VB designs with resolvability. As a characterization of the saturated case of Theorem 4.4, it can be shown that in a (#1, # 2 , . . . , #a)-resolvable VB design with b = v + a - 1, except for the case/Zl = / Z 2 . . . . . /za = 1, sizes of blocks belonging to the same resolution set (superblock) are always equal (Theorem 4.5). Whether the above holds for the case/Zl =/Z2 . . . . . /za = 1 as well, is an open problem. In the light of Corollaries 4.1 and 4.2, the/z-resolvable block designs play an important role. These designs may be treated as the starting designs for some constructions of designs with desirable property. In general, given a set of parameters, the construction of resolvable block designs with unequal block sizes, having some balancing property, is not so simple. There is not much in the literature devoted to designs with unequal block sizes. A paper on such designs by Ceranka et al. (1986) presented four different techniques for constructing /z-resolvable C-designs. The four construction techniques are based on dualization; merging of treatments and dualization; complementation; and juxtaposition. For a class of block designs with a constant block size, Shrikhande (1976) gave an excellent survey of known combinatorial results on affine resolvable BIB designs. For resolvable t-designs, the reader can be referred to, for example, Sprott (1955), Hedayat and Kageyama (1980), Kageyama (1976a), Kageyama and Hedayat (1983), Kimberley (1971), Lindner and Rosa (1980, Chapter 4), and Mavron (1972). For the combinatorial discussions on (/Zl,/Z2,...,/z~)-resolvability, the reader can be also referred to Bose (1942), Ceranka et al. (1986), Hughes and Piper (1976), Kageyama (1973, 1977, 1984), Kageyama and Sastry (1993), Mukerjee and Kageyama (1985), Raghavarao (1971), Ray-Chaudhuri and Wilson (1971), Shrikhande (1953, 1976), and Shrikhande and Raghavarao (1964).

4.3. c~-designs

Resolvable incomplete block designs are extensively used in statutory trials of agricultural crop varieties in the United Kingdom (cf. Patterson and Silvey, 1980) and elsewhere. Numbers of varieties in these trials are fixed, i.e., not at the choice of the statistician, and large enough to require the use of incomplete block designs. Numbers of replications are usually fixed and must b e the same for all treatments (varieties).

Block designs: Their combinatorial and statistical properties

847

Thus, preference is given to resolvable block designs. For example, some important disease measurements are expensive and have, therefore, to be restricted to one or two replications. Again, large trials cannot always be completely drilled or harvested in a single session. Use of resolvable designs allows these operations to be done in stages, with one or more complete replications dealt with at each stage. Yates (1939, 1940) has pointed to other advantages of resolvable designs. On page 325 of his 1940 paper he noted that "cases will arise in which the use of ordinary randomized blocks will be more efficient than the use of incomplete blocks, whereas lattice designs can never be less efficient than ordinary randomized blocks." This advantage of lattice designs is shared by all other resolvable incomplete block designs. Yates (1940) further stated that "incomplete block designs which cannot be arranged in complete replications are likely to be of less value in agriculture than ordinary lattice designs. Their greatest use is likely to be found in dealing with experimental material in which the block size is definitely determined by the nature of the material." In variety trials, of course, a wide choice of block size is open to the experimenter. A general algorithm for constructing designs for this purpose has been described by Patterson and Williams (1976a). The algorithm is able to produce large numbers of designs but only the most efficient are adopted for practical application. In these designs the number of varieties is a multiple of the block size. These designs are called c~-designs. They include as special cases some lattice and resolvable cyclic designs. Their method has been developed to provide a simple computer algorithm for automatic production of plans for variety trials. (See also Patterson et al., 1978). When seed is in short supply or some other economy is enforced, the trials sometimes have to be conducted in only two replications. It is important in these circumstances to use the most efficient possible designs. Bose and Nair (1962) have described and tabulated a series of two-replicate resolvable designs. Another series of designs is given by the duals of the resolvable paired-comparison designs of Williams (1976). Both series involve the use of symmetric block designs. Patterson and Williams (1976b) have shown that every two-replicate resolvable design is uniquely associated with a symmetric incomplete block design. Williams et al. (1976) used this relationship to examine the efficiency of the designs given by Bose and Nair (1962) and Williams (1976). Furthermore, John and Mitchell (1977) gave a listing of optimal binary incomplete block designs obtained from an exhaustive computer search of possible designs. Using and extending their results on symmetric designs, Williams et al. (1977) obtained a series of optimal two-replicate resolvable designs and the identification of these designs was considered. Also, on the basis of a comparison of these designs with those of Bose and Nair (1962) and Williams (1976) recommendations were made on the choice of efficient designs. Bose and Nair's (1962) designs very usefully augment the simple square and rectangular lattices but there appears to have been no parallel development for higher order lattices. David's (1967) construction for cyclic designs is capable of producing a large number of resolvable designs but again there are restrictions; this time block size k must equal either r, the number of replications, or a multiple of r. REMARK 4.1. (i) Some other block designs are also resolvable. Clatworthy (1973) provided information on the resolvability, or otherwise, of most of the PBIB designs

848

T. Calitiski and S. Kageyama

with two associate classes in his extensive tables. The resolvable designs are, however, usually either square lattices or less efficient alternatives. (ii) The a-designs, introduced by Patterson and Williams (1976a), are useful to construct NB designs. Moreover, only some of them are C-designs.

5. Analysis of experiments in block designs under the randomization model One of the problems of interest related to experiments in block designs is the utilization of between block information for estimating treatment parametric contrasts. This problem, after Yates (1939, 1940) known in the literature as the recovery of inter-block information, is connected with the principle of randomizing the experimental material before subjecting it to experimental treatments. Therefore, methods dealing with the recovery of inter-block information have to be based on a properly derived randomization model. The purpose of the present section is to reconsider one of such methods and to re-examine the principles underlying the recovery of inter-block information in case of a general block design.

5.1. The randomization model and its statistical implications

According to one of the basic principles of experimental design (cf. Fisher, 1925, Section 48), the units are to be randomized before they enter the experiment. Suppose that the randomization is performed by randomly permuting labels of the available blocks, say N~ in number, and by randomly permuting labels of the available units in each given block, all the 1 + N B permutations being carried out independently (cf. Nelder, 1954; White, 1975). Then, assuming the usual unit-treatment additivity (cf. Nelder, 1965b, p. 168; White, 1975, p. 560; Bailey, 1981, p. 215; Kala, 1991, p. 7), and also assuming, as usual, that the technical errors are uncorrelated, with zero expectation and a constant variance, and independent of the unit responses to treatments (cf. Neyman, 1935, pp. 110-114 and 145; Kempthorne, 1952, p. 132 and Section 8.4; Ogawa, 1963, p. 1559), the model of the variables observed on the n units actually used in the experiment can be written in matrix notation, with the matrices A and D defined in Section 1.2, as y = A'I" +

D'~ +

r / + e,

(5.1)

where y is an n x 1 vector of observed variables {Ye(j)(i)}, r is a v x 1 vector of treatment parameters {'ri},/3 is a b x 1 vector of block random effects {/35}, r/is an n x 1 vector of unit errors {Be(j)} and e is an n x 1 vector of technical errors {ee(j)}, g(j) denoting the unit gin b l o c k j (i = 1 , 2 , . . . , v ; g = 1 , 2 , . . . , k j ; j = 1 , 2 , . . . , b ) . Properties of the model (5.1) have been studied for a general block design by Califiski and Kageyama (1988, 1991). These properties are different from those of the usually assumed model (see, e.g., John, 1987, Section 1.2). It has been found that the expectation vector and the dispersion matrix for y are

E(y) = za'r

(5.2)

Block designs: Their combinatorial and statistical properties

849

and Var(y) = ( D ' D - 1--~Blr~l~) o'~

+ (I,~ - 1-~DD' ~Cr~KH ]

+ I~cr2'

(5.3)

respectively, where KH is a weighted harmonic average of the available numbers of units within the NB available blocks, from which units in numbers {kj} have been chosen for the experiment after the randomization, and where the variance components 2 and ~r~ 2 are related to the random vectors fl, r / a n d e, respectively. ~ s2, ~u, To see this more precisely, suppose that the available NB blocks are originally labelled ~ = 1 , 2 , . . . , NB, and that block ~ contains K~ units (plots), which are originally labelled ~r = 1 , 2 , . . . , K~. The label may also be written as ~r(~) to denote that unit 7r is in block 4. The randomization of blocks can then be understood as choosing at random a permutation of { 1 , 2 , . . . , NB }, and then renumbering the blocks with j = 1 , 2 , . . . , N s accordingly. Similarly, the randomization of experimental units within block ~ can be seen as selecting at random a permutation of { 1 , 2 , . . . , K~}, and then renumbering the units of the block with g = 1 , 2 , . . . , K ( accordingly. It will be assumed here that any permutation of block labels can be selected with equal probability, as well as that any permutation of unit labels within a block can be selected with equal probability. Furthermore, it will be assumed that the randomizations of units within the blocks are among the blocks independent, and that they are also independent of the randomization of blocks. The above randomization procedure reflects the practical instruction given by Nelder (1954): "choose a block at random and reorder its members at random ... ;" then "repeat the procedure with one of the remaining blocks chosen at random . . . . and so on". This can be accomplished whether the available blocks are of equal or unequal number of members, i.e., their units. The purpose of this randomization is not only to "homogenize'" the within block variability, but also to "average out" the possibly heterogeneous variance among the experimental units from different blocks to a common value (cf. White, 1975, Sections 2 and 7). Now, following Nelder (1965a), assume for a while that all the available units in the NB available blocks receive the same treatment, no matter which. For this "null" experiment let the response of the unit ~r(~) be denoted by #~(~), and let it be denoted by me(~) if by the randomizations the block originally labelled ~ receives in the design label j and the unit originally labelled 7r in this block receives in the design label g. With this, introducing the identity

~,~(~) = #.(.1 + ( ~ . ( ~ / - ~.(.)) + (~,~(~) - ~.(~)), where (according to the usual dot notation) K~

#'(~) =

1 Z #~r(f) ~ ~(e)=l

1

and

NB

#.(.) = ~-B y ~ #.(f), f=l

850

T. Calidski and S. Kageyama

one can write the linear model ml(j) = # +/3j + r/e(j),

(5.4)

for any g and j, where # = #4.) is a constant parameter, the mean, while/3j and ~Te(j) are random variables, the first representing a block random effect, the second a unit error. The following moments of the random variables in (5.4) are easily obtainable: E(/?j) = 0,

E(r/e(j)) = 0,

Cov(/3j, rh(j,)) = 0,

whether j = j ' or j 7~ j',

{N~

1(NB - 1)O.~, if j - j ' , -N~lO.~, if J -~7-J',

Cov(/3j, 13y)

and K H I ( K H - 1)O.~:, if j = j ' and g = g', Cov(,i.le(j),T]e,(j,) ) :

2 --KH-1 O.U' O,

if j = j , and g ¢ gl, if j ¢ j ' ,

where NB

O.2B = (NB - 1) - 1 E ( # . ( ~ ) 5=1

- #.(.))2

and K~

NB

o'2 = NB1 E O.2u,~, ~=1

with

2 = (K~ - 1)-1 O.u,~

and where the weighted harmonic average K s is defined as NB 2 2 K H 1 = N B 1 E K~--1 o.u,5/Ou. ~=1

Further, denoting the technical error affecting the observation of the response on the (randomized) unit g(j) by ee(j), note that the model of the variable observed on that unit in the null experiment can be extended from (5.4) to Ye(j) = me(j) + ee(j) = / ~ +/3j + r/e(j) + ee(j),

(5.5)

Block designs: Their combinatorial and statistical properties

851

for any j and g. It may usually be assumed that the technical errors {ee(j)} are uncorrelated, with zero expectation and a constant variance, a 2 in (5.3), and that they are independent of the block effects {/3j} and of the unit errors {rib(j)}. It follows from the model (5.5) and its properties that in the null experiment the moments of the random variables {Ye(j) } do not depend on the labels received by the blocks and their units in result of the randomizations. This means that the randomized units within any of the randomized blocks can be considered as homogeneous in the sense that the variables observed on them have a common mean, variance and covariance. Moreover, this means that also the randomized blocks themselves can be considered as homogeneous, in the sense that the variables observed on equal subsets of units from different blocks have a common mean vector and a common dispersion matrix of relevant order. These homogeneities hold regardless of any heterogeneity of the variance cr~r,~ for different ( (~ = 1 , 2 , . . . , NB) (cf. White, 1975, Section 2). Now, to adopt the results obtained for the null experiment to a real experiment designed according to a chosen incidence matrix N , two questions are to be answered. How the randomized units within the randomized blocks are to be assigned to different treatments, and how the model (5.5) is to be adjusted accordingly. As to the first question, the following rule can be used. Assign the block which due to the randomization is labelled j to the jth column of N . Then assign the units of this block to treatments in numbers indicated by the elements of the jth column of N and in order determined by the labels the units have received due to the randomization. Repeat this procedure for j = 1 , 2 , . . . , b. Note, however, that this rule requires not only that b ~< NB, but also that none of the elements of the vector N I l v = k = [hi, k 2 , . . . , kb]/ exceeds the smallest K~. Otherwise, an adjustment of N is to be made, as suggested by White (1975, pp. 558 and 561). For more discussion on this see Califiski and Kageyama (1996, Section 2.2). With regard to the second question, note that from the assumption of the complete additivity mentioned earlier, which is equivalent to the assumption that the variances and covariances of the random variables {13j}, {~Te(j)} and {ee(j)} do not depend on the treatment applied, the adjustment of the model (5.5) to a real situation, of comparing v treatments on different units of the same experiment, can be made by changing the constant term # only. Thus, the model gets the form Ye(j) (i) = #(i) + ~j + ~e(j) + ee(j), which in matrix notation can be written as in (5.1), with ~-i ---- #(i), and the corresponding moments of {Ye(j)(i)} as in (5.2) and (5.3). The model (5.1), with properties (5.2) and (5.3), is exactly the same as that obtained by Kala (1991, Section 5) under more general considerations. It also coincides with the model (2) of Patterson and Thompson (1971), as will be shown in Lemma 5.1. Furthermore, if kl = k2 . . . . . kb = k (say), then (5.3) can be written as Var(y)

(In -- ]

,

"~ 2

\

+

s,

(5.6)

T Calit~skiand S. Kageyama

852 where

k ) 2

°'2 ~---o'b q- O'2e~

2

7

and k

2

2

which shows that the design has then the orthogonal block structure of Nelder (1965a, b) with three "strata", the "intra-block", the "inter-block" and the "total area" stratum (to be discussed in Section 6). The variances azl, o-22and o-~ are called the "stratum variances" (cf. Houtman and Speed, 1983, Section 2.2). Evidently, they become smaller when b = NB and k = KH. Then the present model becomes equivalent to that considered by Rao (1959) and recently by Shah (1992). It should be emphasized here that the described randomizations of blocks and units are aimed not at the selection of experimental material, which is to be accomplished before, but at assigning the appropriately prepared experimental units, all or some of them, to the experimental treatments according to the chosen design. By these randomizations the responses of units to treatments become random variables of certain uniform dispersion properties, even if the desirable unification of units within blocks is not fully achieved (cf. White, 1975, p. 558). It has been shown by Califiski and Kageyama (1991) that under the model (5.1), with its properties (5.2) and (5.3), a function w ' y is uniformly the BLUE of c% if and only if w = A's, where s = r - 6 c satisfies the condition

(k ~ - N ' r - 6 N ) N '

s = O,

(5.7)

i.e., is related to the incidence matrix N of the design by either the condition (a) N ' s = 0, the estimated function being then a contrast (i.e., e'lv = 0), or (b) N ' s ~ O, with the elements of the vector N ' s all equal if the design is connected, and equal within any connected subdesign otherwise. Condition (5.7) implies also that under the considered model any function w ' y = s ' A y , i.e., with any s, is uniformly the BLUE of E(w'y) = s'r6"c, if and only if (i) the design is orthogonal and (ii) the block sizes of the design are constant within any of its connected subdesigns (recall Section 1.2 and Corollary 1.1).

5.2. Combined analysis utilizing informations from different strata The results presented in Section 5.1 show that unless the function e ' r satisfies the condition (5.7), there does not exist the BLUE of it under the randomization model (5.1). However (as shown by Califiski and Kageyama, 1991, Section 3), for a contrast e"r that does not satisfy (5.7) there exists the BLUE under the intra-block submodel, considered in Section 1.4, if e = A~blA's for some s and, simultaneously, also

Block designs: Their combinatorial and statistical properties

853

under the inter-block submodel, obtained by the projection ( D ' k - a D - n-1 l n l n,)t y if c = A ( D ' k - a D - l , ~ l k ) A ' s for the same (or proportional) s, provided that the latter satisfies some additional condition involving block sizes. This means that in many, but not all, cases the estimation of a contrast can be based on information available in two strata of the experiment, the intra-block and the inter-block stratum. Unfortunately, each of them gives a separate estimate of the contrast, usually different in their actual values, sometimes even quite diverse. Therefore, a natural question arising in this context is whether and how it is possible to combine the information from both strata to obtain a single estimate in a somehow optimal way. Methods dealing with this problem are known in the literature as procedures of the recovery of inter-block information. An update of some of these methods for proper (i.e., equiblock-sized) block designs has been given by Shah (1992). Here a general theory underlying the recovery of inter-block information will be presented, in a way that is applicable to any block design, whether proper or not. In an attempt to solve the problem of combining information from different strata on the basis of the randomization model (5.1), it is instructive to make provisionally an unrealistic assumption that the variance components appearing in the dispersion matrix (5.3) are known. Then the following results are essential. LEMMA 5.1. Let the model be as in (5.1), with the expectation vector (5.2) and the dispersion matrix (5.3), the latter written equivalently as Var(y) = a 2 ( D ' F D + I,~),

(5.8)

where 1

F = 718

2 O'BI 1I

N B cr1 Xblb,

(5.9)

2 21. Further, suppose that the true value with or21= cr2 + ~re2 and",/ = (o.2 K H-1 Ou)/O of 7 is known. Then, (a) any function w ~ y which is the BLUE of its expectation, (b) a vector which is the BLUE of the expectation vector Al"r and, hence, (c) a vector which gives the residuals, all remain unchanged when altering the present model by deleting N B 1( a 2 / a 2 ) l b l ~ in (5.9), i.e., by reducing (5.8) to Vat(y) = o-~(TD'D + I,~) = trOT,

(5.1o)

where T = 7 D ' D + In. The matrix T in (5.10) is positive definite (p.d.) if 7 > -1/kmax, where kmax = maxj kj. To prove this lemma note that from l~n(In - Pza') = 0', where PA, = A ' r - ~ A , it follows that ( D ' P D + I n ) ( I ~ - Pza,) = ( ~ D ' D + I,~)(In - Pza,).

(5.11)

T Calihski and S. Kageyama

854

For details of the proof see Califiski and Kageyama (1996, Lemma 3.1). (See also Patterson and Thompson, 1971, p. 546, and Kala, 1981, Theorem 6.2.) Now, on account of Lemma 5.1, a general theory due to Rao (1974) gives the following. THEOREM 5.1. Under the model and assumptions as in Lemma 5.1, including the assumption that ~/ is known and exceeds -1/kmax, (a) the BLUE of 7" is of the form ,.~ = ( A T - 1 A ' ) - I A T - l y

(5.12)

with T -1 = ( T D ' D +

-+- D ' k - 6 ( k -~ + 7 I b ) - l k - a D ;

In) -1 = ~

(5.13)

(b) the dispersion matrix of g- is Var('~)

= o'2(AZ-lAt)

- 1 --

1 2 t. -~BcrBavlv,

(5.14)

(C) the BLUE of c ' r for any c is c"~, with the variance cWar('~)c, which reduces to

Var(e'~) = cr~e'(AT-1A')-1e,

(5.15)

if c ' r is a contrast; (d) the minimum norm quadratic unbiased estimator (MINQUE) of cr~ is

"a~ -

1 n -

vltY_Za,.~ll~._,_

1

n-v

( y - - A ' ~ ) ' T - I ( y - - A'-~).

(5.16)

To prove this theorem note that from Theorem 3.2(c) of Rao (1974), and Lemma 5.1, the BLUE of A'~- is, A ' r = PA,;T-ly, where P A ' ; T - ' -~ A t ( A T - 1 A ' )

-1AT-l,

(5.17)

that (5.8) can be written as Var(y) = o - } T - ~1~ a21,~l', lVB

(5.18)

Block designs: Their combinatorial and statistical properties

855

that the residual sum of squares is of the form II(/n -

2

P ,,T-1)YlIT-,

= y ' ( I n - Pza,;T-,)'T-I(I= - PA';T-~)Y,

(5.19)

and that the p.d. matrix T used in (5.18) satisfies the conditions required for the matrix T used in Rao's (1974) Theorem 3.2. For details of the proof see Califiski and Kageyama (1996, Theorem 3.1). Since, for-y > -1/kmax (as assumed),

(k_ ~ + 7ib)_ 1 = ~ k ~ - k~(k ~ + 7 - 1 I b ) - l k ~, if 3' 7~ 0, if 7 = 0, [ k ~, it is permissible, on account of (5.13), to write

T -1 = In - D t k ~ 6 D ,

(5.20)

where k~-~ is a diagonal matrix with its jth diagonal element equal to 7

(5.21)

k*jl-- -- kj')' + 1

With this notation, formulae (5.12), (5.13) and (5.14) can be written, respectively, I I t as "~ = CZIQ~, Var('~) o'12Cc 1 _ N-I~2 S ~'u~wv and, for c ' l v = 0, Var(c!T) = ~2c~6clc, where, on account of (5.20), --

C~ = r ~ - N k . ~ N

' and Q~ = (A - N k . ~ D ) y .

(5.22)

Note that C~ and Q~ in (5.22) are similar, respectively, to C and Q used in the intra-block analysis (Section 1.4), where cr2 is to be replaced by 02 appearing in (5.8), but they are more general, in the sense of combining the intra-block and inter-block information (hence the subscript c). From (5.21) it is evident that a maximum recovery of the inter-block information is achieved in the case of 3' ~< 0, i.e., when the blocking is completely unsuccessful, while there is no recovery of that information in the case of "7 --+ oo, i.e., when the formation of blocks is fully successful, in the sense of eliminating the intra-block variation of experimental units, measured by cr2. In the latter case Cc --+ C and Qc --+ Q. Also note that if the design is proper, i.e., if kj = k for all j, then (as in John, 1987, p. 193)

Cc=r~-~-~NN

' and Q c = ( A - ~ - - ~ N D ) y ,

(5.23)

where k , 1 = k -1 (1 -~r~/~22), with o 2 defined as in (5.6). Moreover, in this case, any contrast e%- = s ~ r ~ " satisfying the condition A~bA~s = cr~s, with 0 < ¢ < 1, has the "simple combinability" property of Martin and Zyskind (1966). Evidently this is secured if and only if the design is proper.

T. Calihskiand S. Kageyama

856

5.3. Estimation of stratum variances All the results established in Section 5.2 are based on the assumption that the ratio 7 is known (see Lemma 5.1). In practice, however, this is usually not the case. Therefore, to make the theory applicable, estimators of both, cr~ and 7, are needed. There may be various approaches adopted for finding these estimators. To choose a practically suitable one, let first the residual sum of squares (5.19) be written as

II(In - Pza';T-')YlIT-, 2 = y'RTRy

= 7y'RD'DRy

+ y'RRy,

(5.24)

where

R = T -1 - T - 1 A ~ ( A T - 1 A t ) - I A T -1.

(5.25)

Now, generalizing Nelder's (1968) approach, the simultaneous estimators of a~ and 7 can be obtained by equating the partial sums of squares in (5.24) to their expectations. This leads to the equations

[

tr(RD'DR)

tr(RR)

J [. a 1 j

[

yI R R y

J

A solution of (5.26) gives estimators of a27 and a 2, and hence of 7. However, the equations (5.26) clearly have no direct analytic solution, as the matrix R also contains the unknown parameter 7. Before considering their solution in detail, note that exactly the same equations as in (5.26) result from the MINQUE approach of Rao (1970, 1971, 1972). Also, note that the equations obtained from equating the sums of squares y t R D ' D R y and y ' R R y to their expectations can be written equivalently as

y'RD'DRy

= cr2tr(RD'D)

and

y ' R R y = a~tr(R),

(5.27)

since t r ( R D ' D R T ) = t r ( R D ' D ) and t r ( R R T ) = tr(R). Now, since ((r2)-lR can be shown to be the Moore-Penrose inverse of the matrix a~cb.Tq)., where q~. = I n - A ' r -~A, the equations (5.27) coincide with the equations (5.1) of Patterson and Thompson (1975) on which the so-called modified (or marginal) maximum likelihood (MML) estimation method is based. The method is also known as the restricted maximum likelihood (REML) approach (see, e.g., Harville, 1977). Thus not only the original approach of Nelder (1968), but also its generalization presented here must, in principle, give the same results as those obtainable from the MML (REML) equations derived under the multivariate normality assumption (as indicated by Patterson and Thompson, 1971, p. 552, 1975, p. 206). Note that for solving the equations (5.26), or their equivalences, an iterative procedure is to be used. According to the original MINQUE principle of Rao (1970, 1971), a solution of (5.26) would be obtained under some a priori value of 7, and thus should coincide with results obtained from the other two approaches when restricted to a single iteration, provided, however, that in all the three methods the same

857

Block designs: Their combinatorial and statistical properties

a priori or preliminary estimate of 7 in R is used (cf. Patterson and Thompson, 1975, p. 204). But, as indicated by Rao (1972, p. 113), the MINQUE method can also be used iteratively, and then the three approaches will lead to the same results (cf. Rao, 1979, p. 151). A suitable computational procedure for obtaining a practical solution of equations (5.26) is that given by Patterson and Thompson (1971, Section 6), who also illustrated its use by an example. The procedure starts with a preliminary estimate 70 of 7, usually with "70 7> 0. Incorporating it into the coefficients of (5.26), the equations can be written as

tr(RoD'DRoD'D) tr(RoD'DRo) ] [cr!:]= [y'RoD'DRoy] tr(RoD'DRo)

tr(RoRo)

L aI j

[

y'RoRoy

J'

(5.28)

where Ro is defined as in (5.25), but with T -I replaced by T o 1 obtained after replacing "7 in (5.13) by 70, i.e., with T O 1 = dl)l -I-

D'k-6(k -~ + "/olb)-lk-6D

(5.29)

(with T o 1 = 4,1 if 3'0 --+ oo). However, instead of the equations (5.28), it is more convenient to consider their transformed form

tr(RoO'D)

n- v

o-~

J = [

y'RoU

].

(5.30)

The solution of (5.30) gives a revised estimate of 3' of the form

(n - v)y' RoDt D Roy - tr( D RoD')y' l ~ y = 70 + tr[(DP~D,)2~y,Roy _ tr(DRoD')y'RoD'DRoy"

(5.31)

Thus, a single iteration of Fisher's iterative method of scoring, suggested by Patterson and Thompson (1971, p. 550), consists here of the following two steps: (0) One starts with a preliminary estimate 3'0 (> -1/kmax) of 7 to obtain the equations (5.30). (1) By solving (5.30), a revised estimate ~ of 7 is obtained, of the form (5.31), and this is then used as a new preliminary estimate in step (0) of the next iteration. One should, however, watch that 7o remains always above the lower bound -1/kma~, as required. If ~ ~< -1/kmax, it cannot be used as a new 70 in step (0). In such a case formula (5.31) is to be replaced by

(n - v)y' RoD' D l=loy - tr( D RoDt)y' Roy = 70 + C~tr[(DRoD,)2]y,Roy _ tr(DRoD')y'RoD'DRoy' with a C (0, 1) chosen so that ~ > -1/km~x (as suggested by Rao and Kleffe, 1988, p. 237).

T Calidski and S. Kageyama

858

The above iteration is to be repeated until convergence, i.e., until the equality

y' RoD' D Roy _ y' Roy tr(DRoD')

n - v

(5.32)

is reached. The solution of (5.26) can then be written as [ ~ , final estimate of 7 obtained after the convergence, and

-~2 = y'~ly

n - v'

~2],, where ~ is the

(5.33)

where R is defined according to (5.25) in the same way as Ro has been, but now with To--1 in (5.29) becoming

~,-1 = 4)1 + D ' k - 6 ( k -~ + ~ I b ) - l k - 6 D (provided that ~ ¢ 0, otherwise reducing to ~,-1 = In). Hence, the so obtained estimator of r can be written as ? = (A~'-I A')-1A~'-ly,

(5.34)

which may be called (after Rao and Kleffe, 1988, p. 274) an "empirical" estimator of "r. Of course, it is not the same as the BLUE obtainable with the exact value of 7. It can be shown (see, e.g., Kackar and Harville, 1981; Klaczynski et al., 1994) that if the unknown ratio 3' appearing in the matrix T defined in (5.10) is replaced by its estimator ~ obtained under the assumption that y has a multivariate normal distribution, then the unbiasedness of the estimators of ~- and e~r established in Theorem 5.1 is not violated. However, as to the variance of the estimator of c~r, the replacement of 7 by causes an increase of the variance (5.15). Unfortunately, the exact formula of that increased variance is in general intractable. But it can be approximated as suggested by Kackar and Harville (1984). For details and discussion see Califiski and Kageyama (1996).

5.4. Remarks concerning nested block designs The randomization model for an NB design has been considered recently by Califiski (1994b). It differs from the model (5.1) by the addition of the component G t ~ , where G ' is the n × a "design matrix for superblocks" and ~ is an a × 1 vector of superblock random effects {ai}. The expectation vector (5.2) remains unchanged, while the dispersion matrix is of the form

,

+(In -

1-~-DD ' )"~KH ]

+ Suet2'

(5.35)

Block designs: Their combinatorial and statistical properties

859

where NA is the number of available superblocks, and BH is a weighted harmonic average of the available numbers of blocks within the NA available superblocks, similarly as KH is such an average of the available numbers of units within the B1 + B2 + " " " + BNA available blocks, and where O'A, 2 ~'B, ~2 vU ~2 and O'z~ are the variance components related to the random vectors c~,/3, r/and e, respectively, all these quantities being defined similarly as in Section 5.1. Since (5.35) can also be written as

Var(y) = O-2(G'F1G + D'F2D + I,~), where 1

, 2

2

F1 = Ia71 -- X-;--~ lal~O-A/O-1, IrA 1

2

2

°'2 ~ O'b -}- o-e2

and F2 = Ib72,

72 =

1

2

the analysis of an experiment in an NB design can be seen as a straightforward extension of that described in Sections 5.2 and 5.3. For more details see Patterson and Thompson (1971, Section 10). If the NB design is resolvable, the analysis is more simple. It is described by Speed et al. (1985).

6. The concept of general balance

6.1. Submodels of the randomization model As shown in Section 5.1, under the model (5.1) the BLUEs of linear treatment parametric functions exist in very restrictive circumstances only. Therefore, the usual procedure is to resolve the model (5.1) into three submodels (two for contrasts), in accordance with the stratification of the experimental units. This can be represented by the decomposition Y=Yl +Ya+Y3,

(6.1)

resulting from orthogonal projections of y on subspaces related to the three "strata": 1st - of units within blocks, the "intra-block" stratum, 2nd - of blocks within the total area, the "inter-block" stratum, 3rd - of the total area

860

T. Calihski and S. Kageyama

(using the terminology of Pearce, 1983, p. 109). Explicitly (see Califiski and Kageyama, 1991, Section 3), Yl = 491Y,

Y2 = 492Y,

(6.2)

Y3 = 493Y,

where the projectors 49m a = 1,2, 3, are defined as

492=D'k-aD-L1nlk n

491= In - D' k-a D ,

and

493=11~1~. n

They satisfy the conditions 49a49~,=0

ifa¢c~',

and (6.3)

491 + 492 + 493 = In, also

491Dt =

0

and

49~1n = 0

for a = 1,2,

(6.4)

and are called the "stratum projectors" (cf. Houtman and Speed, 1983, p. 1070). The submodels (6.2), called "intra-block", "inter-block" and "total-area", respectively, have the following properties: E ( y l ) ~-~ 491 At'/',

Var(Yl ) = 491 (fib -~- O'e2),

(6.5)

(

(6.6)

E(y2) = 492A'r,

Var(y2) =

12)

492D'D492 Cr2B- -~cr U + 492(a~ + a2),

E(y3) = 493A'r,

Var(Y3)=493[ ( l k'k- ~ ) a ~ + (1- n-~-'-Hk'k)cr2 +Cr2e]•

(6.7)

Evidently, if the block sizes are all equal, i.e., the design is proper, then the dispersion matrices of the inter-block and total-area submodels are simplified. However, the properties of the intra-block submodel remain the same, whether the design is proper or not. This means that the intra-block analysis (as that based on the intra-block submodel) can be considered generally, for any block design (see Section 1.4).

Block designs: Their combinatorial and statistical properties

861

6.2. Proper block designs and the orthogonal block structure

As noticed in Sections 5.1 and 5.2, for proper block designs some advantageous simplifications occur. This is essential in particular with regard to the inter-block submodel, for which the structure of the covariance matrix, shown in (6.6), is not very satisfactory from the application point of view. Therefore, further development of the theory related to the model (5.1), with the properties (5.2) and (5.3), will be confined in the present section to proper designs, i.e., designs with kl=k2 .....

kb=k

(say).

(6.8)

This will allow the theory to be presented in a unified form. It can easily be shown that the dispersion matrices in (6.6) and (6.7) are simplified to

Var(y2) = q~2 [k(72 q- (1 - ~----H)a2 q- (7e2]

(6.9)

and Var(Y3) = ~.b3[ ( 1 -

~---B)k(72 + (1

_

k

2

~--ffH)(TU + (7e2],

(6.10)

respectively, for any proper block design. Thus, in case of a proper block design, the expectation vector and the dispersion matrix for each of the three submodels (6.2) can be written, respectively, as E(y~) = ¢~A'-r

(6.11)

Var(y~) = ¢~(72,

(6.12)

and

for a = 1,2, 3, where, from (6.5), (6.9) and (6.10), the "stratum variances", (72, 0-2 and (732,are as in (5.6). Furthermore, if NB = b and k = KH (the latter implying the equality of all potential block sizes), which can be considered as the most common case, the variances (72 and (72 reduce to = k(7

+

2 2 and (72 = fie"

(6.13)

On the other hand, for any proper block design, the decomposition (6.1) implies not only that g(y) = g(yl) + E(y2) q- E(Y3) = q~IAtT + q~2AtT + q~3At"F,

(6.14)

862

T. Calihski and S. Kageyama

but also that

War(y) = Var(yl) -[- War(y2) q- Var(y3) = 491°-2 -I- 4920-2 -+- 4930-2,

(6.15)

where the matrices 491,492 and 493 satisfy the conditions (6.3). The representation (6.15) is a very desirable property, as originally indicated for a more general class of designs by Nelder (1965a). After him, the following definition will be adopted (see also Houtman and Speed, 1983, Section 2.2). DEFINITION 6.1. An experiment is said to have the orthogonal block structure (OBS) if the dispersion matrix of the random variables observed on the experimental units (plots) has a representation of the form (6.15), where the matrices (¢~ } are symmetric, idempotent and pairwise orthogonal, summing up to the identity matrix, as in (6.3). It can now be said that any experiment in a proper block design has the orthogonal block structure, or that it has the OBS property. LEMMA 6.1. An experiment in a block design has under (5.1) the orthogonal block structure if and only if the design is proper. For a proof see Caliriski (1993b, Lemma 4.1). The condition (6.8) is, however, not sufficient to obtain for any s the BLUE of a t r 6 r under the overall model (5.1) (see Remark 2.1 of Califiski and Kageyama, 1991). It remains to seek the estimators within each stratum separately, i.e., under the submodels y~ = 49,~y,

o~ = 1,2,3,

(6.16)

which for any proper block design have the properties (6.11) and (6.12), with the matrices 491 and 492 reduced to (6.17)

49a = I,~ - 1 D ' D

and 492 = ~ D ' D -

respectively, and with

llnl"n 493 =

(6.18)

n - l l n l " unchanged.

THEOREM 6.1. If the block design is proper, then under (6.16) a function w ' y ~ = wl49~7 is uniformly the BLUE of c l r if and only if 49aw = 49aAls, where the vectors c and s are in the relation c = A49aAI s. PROOF. The proof is exactly as that of Theorem 3.1 of Califiski and Kageyama (1991), on account of (6.11) and (6.12). []

Block designs." Their combinatorial and statistical properties

REMARK6.1 • Since lvA¢,~ ~ which the BLUEs may exist hand, no contrast will obtain Corollary 3.1 (b) and Remark

863

= 0 t for a = 1,2, the only parametric functions for under (6.16) with a = 1,2 are contrasts. On the other a BLUE under (6.16) for a = 3. (See also Remark 3.1, 3.6(a) of Caliriski and Kageyama (1991).)

It follows from Theorem 6.1 that if for a given e ( 5 0) there exists a vector s such that c = Afb~Ats, then the BLUE of c~'r in stratum c~ is obtainable as A

c'7" = s' Ay~,

(6.19)

with the variance of the form var(c,---7)

=

=

c ' (A¢~za), - ca~, 2

(6.20)

2 is the appropriate stratum variance defined in (6.13). where ff~ Explicitly, the matrices A¢=A' in (6.20) are:

Aq51A' = r ~ - k N N ~= C1

Aq52A'= 1 N N ' - l r r ' = C 2

(= C, the C-matrix),

( C 0 i n P e a r c e , 1983, p. 111)

(6.21)

(6.22)

and C3 = A ¢ 3 A '

=

!rv'. n

The decomposition (6.1) implies that any function s~Ay can be resolved into three components, in the form

s~A y = s' Q1 + stQ2 + s~Q3,

(6.23)

where Q,~ = Ay~ = A~b,~y (c~ = 1,2, 3), i.e., each of the components is a contribution to the estimate from a different stratum. As stated in Remark 6.1, the only parametric functions for which the BLUEs may exist under the submodels Yl = ~bly and Y2 = ¢2Y are contrasts. As will be shown, certain contrasts may have the BLUEs exclusively under one of these submodels, i.e., either in the intra-block analysis (within the 1st stratum) or in the inter-block analysis (within the 2nd stratum). For other contrasts the BLUEs may be obtained under both of these submodels, i.e., in both of the analyses. It has been indicated in Lemma 4.1 of Califiski and Kageyama (1991), that a necessary and sufficient condition for E(s~Q1) = n E(s~Q2), when sir = 0, is

864

T. Cali~ski and S. Kageyama

or its equivalent Aq~2A's = (1 - e)r6s,

with 0 < e < 1.

(6.25)

From Califiski (1993b, Section 4) the following results can now be given. LEMMA 6.2. If the design is proper, then f o r any c = r6 s such that s satisfies the equivalent eigenvector conditions (6.24) and (6.25), with 0 < e < 1, the BLUE of the contrast e~" is obtainable in both of the analyses, in the intra-block analysis and in the inter-block analysis. LEMMA 6.3. If the design is proper, then for any c = r~ s such that s satisfies one of the eigenvector conditions AqS~AIs : r~s,

c~ = 1,2,3,

(6.26)

the BLUE of the function e~'r is obtainable under the overall model (5.1). THEOREM 6.2. In case of a proper block design, f o r any vector e = ra s such that s satisfies the eigenvector condition A ¢ ~ A ~ s = e~r~s,

with 0 < e~ ~< 1 (c~ = 1,2, 3),

(6.27)

where 61 = 6, 62 = 1 - 6 , 63 = 1, the BLUE of the function ct'r is obtainable in the analysis within stratum c~ [for which (6.27) is satisfied], i.e., under the submodel ya = qb~y, where it gets the form (c'-~)~ = 6 ~ l s ' Q ~ = e ~ l c ' r - ~ Q ~ ,

(6.28)

and its variance is Var[(ct-~)c~] = 6~ lst~'~sO "2 = 6 ; 1C'?'--6CO"2.

(6.29)

If (6.27) is satisfied with 0 < 6~ < 1, then two BLUEs of c~'r are obtainable, one under the submodel Yl = e l Y and another under Yz = qb2y. If (6.27) is satisfied with 6~ = 1, then the unique BLUE is obtainable within stratum ~ only, being simultaneously the BLUE under the overall model (5.1). REMARK 6.2. Formula (6.29) shows that the variance of the BLUE of c~v obtainable within stratum c~ is the smaller the larger is the coefficient 6~, the minimum variance being attained when 6~ = 1, i.e., when (6.28) is the BLUE under the overall model (5.1). Thus, for any proper design, 6~ can be interpreted as the efficiency factor of the analysed design for the function c ~ - when it is estimated in the analysis within stratum c~. On the other hand, 1 - e~ can be regarded as the relative loss of information incurred when estimating c'~- in the within stratum c~ analysis. REMARK 6.3. Since E(6~ls~Q,~) = s ~ r ~ - if and only if (6.27) holds, for a function c~r = s~r~r to obtain the BLUE within stratum c~ in the form (6.28), the condition (6.27) is not only sufficient but also necessary.

Block designs: Their combinatorial and statistical properties

865

6.3. Generally balanced block designs Consider again the v x v matrices Ca = Aq~,~A',c~ = 1,2,3, and their spectral decompositions m-1

C1 = T'5 ~

~13H/3"rS,

/3=0 m

C2 = r ~ ~ ( 1

- e/3)H/3r ~,

Ca = r 6 H m + l r ~ = l r r ' ,

/3=1

n

where the v x v matrices H a are as defined in Section 1.5. These representations, together with the general results established in Section 6.1 for proper block designs, give rise to the following concept of balance. DEFINITION 6.2. A proper block design inducing the OBS property defined by {q6,~} is said to be generally balanced (GB) with respect to a decomposition

C(A') = e C(A'S )

(6.30)

(the symbol C(.) denoting the column space of a matrix argument and ®~ denoting the direct sum of the subspaces taken over/3), if there exist scalars (e~/3} such that for all c~ (= 1,2,3)

(6.31)

A¢~A' = ~-~ e~zr~H~r ~

(the sum being taken over all/3 that appear in (6.30),/3 = 0, 1 , . . . , m, m + 1), where H a = S/3S~, Hm+l = svs~, and where the matrices {S/3} are such that

S~T6S~ =/'p~

for any/3

and

S~r6S/3, = O for/3 ~/3'.

It can easily be shown that Definition 6.2 is equivalent to the definition of GB given by Houtman and Speed (1983, Section 4.1) when applied to a proper block design, and so coincides with the notion of general balance introduced by Nelder (1965b). The following result from Califiski (1993b, p. 32) explains the sense of Definition 6.2. LEMMA 6.4. A proper block design is GB with respect to the decomposition (6.30) if and only if the matrices (SO} of Definition 6.2 satisfy the conditions

A~b~A'Sp = e ~ r ~ S ~ for all ~ and/3. (See also Pearce, 1983, p. 110.)

(6.32)

T. Calihskiand S. Kageyama

866

REMARK 6.4. It follows from Lemma 6.4 that any proper block design is GB with respect to the decomposition

C(A') = C(A'So) ® C ( A ' & ) ® . . . ¢ O(A'Sm) ® C(A'sv),

(6.33)

where the matrices So, S I , . . . , Sm represent basic contrasts of the design, those represented by the columns of S~ receiving in the intra-block analysis a common efficiency factor ~1/3 : E/3 and in the inter-block analysis a common efficiency factor e2;~ -= 1 - c a, and where sv = n-I/21v. Certainly, the equality (6.31) above can equivalently be written as

which is exactly the condition of Houtman and Speed (1983, Section 4.1) in their definition of GB. Also, it should be mentioned that the notion of GB stems back to the early work by Jones (1959), who called an experiment balanced for a contrast if the latter satisfied the condition (6.24), and called it balanced for a set of contrasts if they satisfied this condition with the same eigenvalue. Thus, in his terminology, a block design is balanced for each basic contrast separately, but it is also balanced for any subspace of basic contrast corresponding to a distinct eigenvalue. It is, therefore, natural to call a block design GB for all basic contrasts, provided that the eigenvalues can be interpreted in terms of efficiency factors and relative losses of information on contrasts of interest. This is just what is offered by any proper block design if adequately used. For some illustration of this see Califiski (1993b, Section 5).

6.4. Concluding remarks The unified theory presented in Sections 6.2 and 6.3 reveals the special role played by basic contrasts in defining the general balance (GB) of a block design. Since any proper block design is GB, as stated in Remark 6.4 (cf. Houtman and Speed, 1983, Section 5.4), the notion of GB is interesting only from the point of view of the decomposition (6.33) with respect to which the balance holds. Therefore, any block design offered for use in an experiment should be evaluated with regard to that decomposition. The experimenter should be informed on the subspaces of basic contrast appearing in (6.30) and the efficiency factors receivable by them in the intra-block and in the inter-block analysis. This has already been pointed out by Houtman and Speed (1983, Section 4.2), who write that these subspaces have to be discovered for each new design or class of designs. Referring directly to block designs, and to partially balanced incomplete block designs in particular, they write (p. 1082) that although it is generally not difficult to obtain these subspaces' (more precisely orthogonal projections on them) "most writers in statistics have not taken this view point". The suggested (in Section

Block designs: Their combinatorial and statistical properties

867

3.2) classification of block designs based on the concept of PEB, which for proper block designs coincides with that of GB, is exactly an attempt to meet their complains. The knowledge of basic contrasts or their subspaces for which a design is GB, and of the efficiency factors assigned to them, allows the experimenter to use the design for an experiment in such a way which best corresponds to the experimental problem. In particular, it allows to implement the design so that the contrasts considered as the most important can be estimated with the highest efficiency in the stratum of the smallest variance, which is the intra-block stratum, if the grouping of units into blocks is performed successfully. Further discussions about the pros and cons of GB can be found in Mejza (1992) and Bailey (1994). For the combined analysis under GB, corresponding to that in Section 5.2, see Califiski (1994a).

Acknowledgements The paper was prepared when the first author visited the Hiroshima University under a JSPS Fellowship for research in Japan. The opportunity and facilities offered him there are gratefully acknowledged. The authors wish also to thank the referee and the Editors for their encouraging comments.

References Agrawal, H. L. and J. Prasad (1983). On constructionof balanced incompleteblock designs with nested rows and columns.Sankhya Ser. B 45, 345-350. Anscombe,E J. (1948). Contributionto the discussionon D. G. Champernowne's"Samplingtheory applied to autoregressive sequences". J. Roy. Statist. Soc. Ser. B 10, 239. Atiqullah, M. (1961). On a property of balanced designs. Biometrika 48, 215-218. Bailey, R. A. (198l). A unifiedapproach to design of experiments.J. Roy. Statist. Soc. Ser. A 144, 214--223. Bailey, R. A. (1994). Generalbalance:Artificialtheory or practical relevance?In: T. Califiskiand R. Kala, eds, Proc. lnternat. Con.[. on Linear Statistical Inference LINSTAT93. Kluwer Academic Publishers, Dordrecht, 171-184. Bailey, R. A. and C. A. Rowley (1987). Validrandomization.Proc. Roy. Soc. London Ser. A 410, 105-124. Baksalary, J. K., A. Dobek and R. Kala (1980). A necessary condition for balance of a block design. Biom. J. 22, 47-50. Baksalary, J. K. and P. D. Puri (1988). Criteria for the validity of Fisher's condition for balanced block designs. J. Statist. Plann. Inference 18, 119-123. Baksalary, J. K. and Z. Tabis (1985). Existence and constructionof connected block designs with given vectors of treatment replications and block sizes. J. Statist. Plann. Inference 12, 285-293. Banerjee, S. and S. Kageyama(1990). Existence of a-resolvable nested incompleteblock designs. Utilitas Math. 38, 237-243. Banerjee, S. and S. Kageyama(1993). Methods of constructingnested partiallybalancedincompleteblock designs. Utilitas Math. 43, 3-6. Bechhofer, R. E. and A. C. Tamhane (198!). Incompleteblock designs for comparing treatments with a control: General theory. Technometrics 23, 45-57. Beth, T., D. Jungnickeland H. Lenz (1985). Design Theory, BibliographischesInstitut,Mannheim,Germany. (D. Jungnickel,Design theory: An update. Ars Combin. 28 (1989), 129-199.) Bose, R. C. (1942). A note on the resolvabilityof balancedincompleteblock designs. Sankhya 6, 105-110.

868

T. Cali~ski and S. Kageyama

Bose, R. C. (1950). Least square aspects of analysis of variance. Mimeo Series 9, Institute of Statistics, University of North Carolina, Chapel Hill. Bose, R. C. and W. H. Clatworthy (1955). Some classes of partially balanced designs. Ann. Math. Statist. 26, 212-232. Bose, R. C. and W. S. Connor (1952). Combinatorial properties of group divisible incomplete block designs. Ann. Math. Statist. 23, 367-383. Bose, R. C. and K. R. Nair (1939). Partially balanced incomplete block designs. Sankhyd 4, 337-372. Bose, R. C. and K. R. Nair (1962). Resolvable incomplete block designs with two replications. Sankhya Ser. A 24, 9-24. Bose, R. C. and T. Shimamoto (1952). Classification and analysis of partially balanced incomplete block designs with two associate classes. J. Amer. Statist. Assoc. 47, 151-184. Bose, R. C. and S. S. Shrikhande (1960). On the composition of balanced incomplete block designs. Canad. J. Math. 12, 177-188. Califiski, T. (1971). On some desirable patterns in block designs (with discussion). Biometrics 27, 275-292. Calitiski, T. (1977). On the notion of balance in block designs. In: J. R. Barra, F. Brodeau, G. Romier and B. van Cutsem, eds., Recent Developments in Statistics. North-Holland, Amsterdam, 365-374. Caliriski, T. (1993a). Balance, efficiency and orthogonality concepts in block designs. J. Statist. Plann. Inference 36, 283-300. Calitiski, T. (1993b). The basic contrasts of a block experimental design with special reference to the notion of general balance. Listy Biometryczne - Biometr. Lett. 30, 13-38. Calirlski, T. (1994a). The basic contrasts of a block design with special reference to the recovery of inter-block information. Invited presentation at the International Conference on Mathematical Statistics ProbaStat'94, Smolenice, Slovakia, p. 8. Califiski, T. (1994b). On the randomization theory of experiments in nested block designs. Listy Biometryczne - Biometr. Lett. 31, 45-77. Caliriski, T. and B. Ceranka (1974). Supplemented block designs. Biom. J. 16, 299-305. Caliriski, T., B. Ceranka and S. Mejza (1980). On the notion of efficiency of a block design. In: W. Klonecki et al., eds., Mathematical Statistics and Probability Theory. Springer, New York, 47-62. Caliriski, T. and S. Kageyama (1988). A randomization theory of intrablock and interblock estimation. Tech. Report No. 230, Statistical Research Group, Hiroshima University, Hiroshima, Japan. Calitiski, T. and S. Kageyama (1991). On the randomization theory of intra-block and inter-block analysis. Listy Biometryczne - Biometr. Lett. 28, 97-122. Calitiski, T. and S. Kageyama (1996). The randomization model for experiments in block designs and the recovery of inter-block information. J. Statist. Plann. Inference, 52, 359-374. Carmichael, R. D. (1956). Introduction to the Theory of Groups of Finite Order. Dover, New York. Ceranka, B. (1983). Planning of experiments in C-designs. Scientific Dissertation 136, Annals of Poznan Agricultural Univ., Poland. Ceranka, B. (1984). Construction of partially efficiency balanced block designs. Calcutta Statist. Assoc. Bull. 33, 165-172. Ceranka, B., S. Kageyama and S. Mejza (1986). A new class of C-designs. Sankhya Ser. B 48, 199-206. Ceranka, B. and M. Kozlowska (1984). Some methods of constructing C-designs. J. Statist. Plann. Inference 9, 253-258. Ceranka, B. and S. Mejza (1979). On the efficiency factor for a contrast of treatment parameters. Biom. J. 21, 99-102. Ceranka, B. and S. Mejza (1980). A new proposal for classification of block designs. Stadia Sci. Math. Hungar. 15, 79-82. Chakrabarti, M. C. (1962). Mathematics of Design and Analysis of Experiments. Asia Publishing House, Bombay. Clatworthy, W. H. (1973). Tables of Two-Associate-Class Partially Balanced Designs. NBS Applied Math. Series 63, Washington, D.C., USA. Cochran, W. G. and G. M. Cox (1957). Experimental Designs, 2nd edn. Wiley, New York. Colboum, C. J. and M. J. Colbourn (1983). Nested triple systems. Ars Combin. 16, 27-34. Colbourn, C. J. and R. A. Mathon, eds. (1987). Combinatorial Design Theory. Annals of Discrete Mathematics, Vol. 34. North-Holland, Amsterdam.

Block designs: Their combinatorial and statistical properties

869

Colbourn, C. J. and E C. van Oorschot (1988). Applications of combinatorial designs in computer science. IMA preprint series #400, Univ. of Minnesota, USA. Corsten, L. C. A. (1962). Balanced block designs with two different numbers of replicates. Biometrics 18, 499-519. Cox, D. R. (1958). Planning of Experiments. Wiley, New York. Das, M. N. and D. K. Ghosh (1985). Balancing incomplete block designs. Sankhya Set. B 47, 67-77. David, H. A. (1967). Resolvable cyclic designs. Sankhyd Ser. A 29, 191-198. Dey, A. and V. K. Gupta (1986). Another look at the efficiency and partially efficiency balanced designs. Sankhya Set B 48, 43'7-438. Dey, A., U. S. Das and A. K. Banerjee (1986). Constructions of nested balanced incomplete block designs. Calcutta Statist. Assoc. Bull. 35, 161-167. Eckenstein, O. (1912). Bibliography of Kirkman's school girl problem. Messenger Math. 41-42, 33-36. Federer, W. T. (1955). Experimental Design: Theory and Application. Macmillan, New York. Finney, D. J. (1960). An Introduction to the Theory of Experimental Design. The University of Chicago Press, Chicago, IL. Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh. Fisher, R. A. (1926). The arrangement of field experiments. J. Ministry Agriculture 33, 503-513. Fisher, R. A. (1940). An examination of the different possible solutions of a problem in incomplete blocks. Ann. Eugen. 10, 52-75. Fisher, R. A. amd E Yates (1963). Statistical Tables for Biological, Agricultural and Medical Research. 6th edn. Hafner, New York. (lst edn., 1938.) Graf-Jaccottet, M. (1977). Comparative classification of block designs. In: J. R. Bah'a, E Brodean, G. Romier and B. van Cutsem, eds., Recent Developments in Statistics. North-Holland, Amsterdam, 471-474. Gupta, S. C. (1984). An algorithm for constructing nests of 2 factors designs with orthogonal factorial structure. J. Statist. Comput. Simulation 20, 59-79. Gupta, S. (1987). A note on the notion of balance in designs. Calcutta Statist. Assoc. Bull. 36, 85-89. Gupta, S. and S. Kageyama (1994). Optimal complete diallel crosses. Biometrika 81, 420-424. Hall, M., Jr. Combinatorial Theory, 2nd edn. Wiley, New York, 1986. Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. J. Amer Statist. Assoc. 72, 320-338. Hedayat, A. and W. T. Federer (1974). Pairwise and variance balanced incomplete block designs. Ann. Inst. Statist. Math. 26, 331-338. Hedayat, A. S: and S. Kageyama (1980). The family of t-designs - Part I. J. Statist. Plann. Inference 4, 173-212. Hedayat, A. S., M.~Jacroux and D. Majumdar (198-8). Optimal designs for comparing test treatments with controls (with discussion). Statist. Sci. 3, 462-491. Homel, R. J. and J. Robinson (1975). Nested partially balanced incomplete block designs. Sankhygt Ser. B 37, 201-210. Houtman, A. M. and T. E Speed (1983). Balance in designed experiments with orthogonal block structure. Ann. Statist. 11, 1069-1085. Hughes, D. R. and E C. Piper (1976). On resolutions and Bose's theorem. Geom. Dedicata 5, 129-133. Hughes, D. R. and E C. Piper (1985). Design Theory. Cambridge Univ. Press, Cambridge. James, A. T. and G. N. Wilkinson (1971). Factorization of the residual operator and canonical decomposition of nono~hogonal factors in the analysis of variance. Biometrika 58, 279-294. Jarrett, R. G. (1977). Bounds for the efficiency factor of block designs. Biometrika 64, 67-72. Jimbo, M. and S. Kuriki (1983). Constructions of nested designs. Ars Combin. 16, 275-285. John, J. A. (1987). Cyclic Designs. Chapman and Hall, London. John, J. A. and T. J. Mitchell (1977). Optimal incomplete block designs. J. Roy. Statist. Soc. Set. B 39, 39-43. John, E W. M. (1961). An application of a balanced incomplete block design. Technometrics 3, 51-54. John, E W. M~ (1980). Incomplete Block Designs. Marcel Dekker, New York. Jones, R. M. (1959). On a property of incomplete blocks. J. Roy. Statist. Soc. Ser. B 21, 172-179. Kackar, R. N. and D. A. IYarville (1981). Unbiasedness of two-stage estimation and prediction procedures for mixed linear models. Comm. Statist. Theory Methods 10, 1249-1261.

870

T. Calir~ski and S. Kageyama

Kackar, R. N. and D. A. Harville (1984). Approximations for standard errors of estimators of fixed and random effects in mixed linear models. J. Amer. Statist. Assoc. 79, 853-862. Kageyama, S. (1973). On/z-resolvable and affine/z-resolvable balanced incomplete block designs. Ann. Statist. 1, 195-203. Kageyama, S. (t974). Reduction of associate classes for block designs and related combinatorial arrangements. Hiroshima Math. J. 4, 527-618. Kageyama, S. (1976a). On/z-resolvable and affine/z-resolvable t-designs. In: S. Ikeda, et al., eds., Essays in Probability and Statistics. Shinko Tsusho Co. Ltd., Tokyo. Kageyama, S. (1976b). Resolvability of block designs. Ann. Statist. 4, 655-661. Addendum: BulL Inst. Math. Statist. 7(5) (1978), 312. Kageyama, S. (1977). Conditions for c~-resolvability and affine c~-resolvability of incomplete block designs. J. Japan Statist. Soc. 7, 19-25. Kageyama, S. (1978). Remarks on 'Resolvability of block designs'. Bull Inst. Math. Statist. 7(5), 312. Kageyama, S. (1980). On properties of efficiency-balanced designs. Comm. Statist. Theory Methods 9, 597-616. Kageyama, S. (1984). Some properties on resolvability of variance-balanced designs. Geom. Dedicata 15, 289-292. Kageyama, S. (1985). Connected designs with the minimum number of experimental units. In: T. Califiski and E. Klonecki, eds., Linear Statistical Inference, Lecture Notes in Statistics, Vol. 35. Springer, New York, 99-117. Kageyama, S. (1993). The family of block designs with some combinatorial properties. Discrete Math. 116, 17-54. Kageyama, S. and A. S. Hedayat (1983). The family of ~-designs - Part II. J. Statist. Plann. Inference 7, 257-287, Kageyama, S. and R. Mukerjee (1986). General balanced designs through reinforcement. Sankhya Ser. B 48, 380-387; Addendum. Ser. B 49, 103. Kageyama, S. and P. D. Puri (1985a). A new special class of PEB designs. Comm. Statist. Theory Methods 14, 1731-1744. Kageyama, S. and P. D. Puri (1985b). Properties of partially efficiency-balanced designs. Bull. Inform. Cyber (Formerly Bull. Math. Statist.) 21, 19-28. Kageyama, S. G. M. Saha and R. Mukerjee (1988). D-l-partially efficiency-balanced designs with at most two efficiency classes. Comm. Statist. Theory Methods 17, 1669-1683. Kageyama, S. and D. V. S. Sastry (1993). On affine (/zl,..-, #O-resolvable (r, ~)-designs. Ars Combin. 36, 221-223. Kala, R. (1981). Projectors and linear estimation in general linear models. Comm. Statist. Theory Methods 10, 849-873. Kala, R. (1991). Elements of the randomization theory. III. Randomization in block experiments. Listy Biometryczne - Biometr. Lett. 28, 3-23 (in Polish). Kassanis, B. and A. Kleczkowski (1965). Inactivation of a strain of tobacco necrosis virus and of the RNA isolated from it, by ultraviolet radiation of different wavelengths. Photochem. Photobiol. 4, 209-214. Kempthorne, O. (1952). The Design and Analysis of Experiments. Wiley, New York. Kempthorne, O. (1955). The randomization theory of experimental inference. J. Amer. Statist. Assoc. 50, 946-967. Kempthorne, O. (1956). The efficiency factor of an incomplete block design. Ann. Math. Statist. 27, 846849. Kempthorne, O. (1977). Why randomize? J. Statist. Plann. Inference 1, 1-25. Kiefer, J. (1958). On the nonrandomized optimality and randomized nonoptimality of symmetrical designs. Ann. Math. Statist. 29, 675-699. Kimberley, M. E. (1971). On the construction of certain Hadamard designs. Math. Z. 119, 41-59. Kirkman, T. P. (1847). On a problem in combinatorics. Cambridge Dublin Math. J. 2, 191-204. Kirkman, T. P. (1850a). Query. Ladies and Gentleman's Diary, 48. Klrkman, T. P. (1850b). Note on an unanswered prize question. Cambridge Dublin Math. J. 5, 191-204. Kishen, K. (1940-1941). Symmetrical unequal block arrangements. Sankhya 5, 329-344.

Block designs: Their combinatorial and statistical properties

871

Klaczynski, K., A. Molinska and K. Molinski (1994). Unbiasedness of the estimator of the function of expected x,alue in the mixed linear model. Biom. J. 36, 185-191. Kleczkowski, A. (1960). Interpreting relationships between the concentration of plant viruses and numbers of local lesions. J. Gen. Mierobiol. 4, 53-69. Kshirsagar, A. M. (1958). A note on incomplete block designs. Ann. Math. Statist. 29, 907-910. Lindner, C. C. and A. Rosa, eds. (1980). Topics on Steiner Systems. Annals of Discrete Mathematics, Vol. 7. North-Holland, Amsterdam. Longyear, J. Q. (1981). A survey of nested designs. J. Statist. Plann. Inference 5, 181-187. Margolin, B. H. (1982). Blocks, randomized complete. In: S. Kotz and N. L. Johnson, eds., Encyclopedia of Statistical Sciences, Vol. 1. Wiley, New York, 288-292. Martin, E B. and G. Zyskind (1966). On combinability of information from uncorrelated linear models by simple weighting. Ann. Math. Statist. 37, 1338-1347. Mavron, V. C. (1972). On the structure of affine designs. Math. Z. 125, 298-316. Mejza, S. (1992). On some aspects of general balance in designed experiments. Statistica 52, 263-278. Mejza, S. and S. Kageyama (1994). Some statistical properties of nested block designs. Presentation at the International Conference on Mathematical Statistics ProbaStat'94, Smolenice, Slovakia. Mukerjee, R. and S. Kageyama (1985). On resolvable and affine resolvable variance-balanced designs. Biometrika 72, 165-172. Nair, K. R. and C. R. Rao (1942a). A note on partially balanced incomplete block designs. Sci. Culture 7, 568-569. Nair, K. R. and C. R. Rao (1942b). Incomplete block designs for experiments involving several groups of varieties. Sci. Culture 7, 615-616. Nalr, K. R. and C. R. Rao (1948). Confounding in asymmetrical factorial experiments. J. Roy. Statist. Soc. Ser. B 10, 109-131. Nelder, J. A. (1954). The interpretation of negative components of variance. Biometrika 41, 544-548. Nelder, J. A. (1965a). The analysis of randomized experiments with orthogonal block structure. I. Block structure and the null analysis of variance. Proc. Roy. Soc. London Ser. A 283, 147-162. Nelder, J. A. (1965b). The analysis of randomized experiments with orthogonal block structure. II. Treatment structure and the general analysis of variance. Proc. Roy. Soc. London Ser. A 283, 163-178. Nelder, J. A. (1968). The combination of information in generally balanced designs. J. Roy. Statist. Soc. Ser. B 30, 303-311. Neyman, J. (1923). Pr6ba uzasadnienia zastosowari rachunku prawdopodobiefistwa do do~wiadczefi polowych (Sur les applications de la th6orie des probabilit6s anx exp6rience agricoles: Essay de principes). Roczniki Nauk Rolniczych 10, 1-51. Neyman, J. (1935), with co-operation of K. Iwaszkiewicz and S. Kolodziejczyk. Statistical problems in agricultural experimentation (with discussion). J. Roy. Statist. Soc. Suppl. 2, 107-180. Nigam, A. K. and E D. Purl (1982). On partially efficiency balanced designs - II. Comm. Statist. Theory Methods 11, 2817-2830. Nigam, A. K., E D. Puri and V. K. Gupta (1988). Characterizations and Analysis of Block Designs. Wiley Eastern, New Delhi. Ogawa, J. (1963). On the null-distribution of the F-statistic in a randomized balanced incomplete block design under the Neyman model. Ann. Math. Statist. 34, 1558-1568. Ogawa, J. (1974). Statistical Theory of the Analysis of Experimental Designs. Marcel Dekker, New York. Pal, S. (1980). A note on partially efficiency balanced designs. Calcutta Statist. Assoc. Bull. 29, 185-190. Patterson, H. D. and V. Silvey (1980). Statutory and recommended list trials of crop varieties in the United Kingdom. J. Roy. Statist. Soc. Set. A 143, 219-252. Patterson, H. D. and R. Thompson (1971). Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545-554. Patterson, H. D. and R. Thompson (1975). Maximum likelihood estimation of components of variance. In: L. C. A. Corsten and T. Postelnicu, eds., Proc. 8th Internat. Biometrie Conf. Editura Acaderniei, Bucuresti, 197-207. Patterson, H. D. and E. R. Williams (1976a). A new class of resolvable incomplete block designs. Biometrika 63, 83-92.

872

T. Calihski and S. Kageyama

Patterson, H. D. and E. R. Williams (1976b). Some theoretical results on general block designs. In: Proc. 5th British Combinatorial Conf. Congressus Numerantium XV, Utilitas Math., Winnipeg, 489-496. Patterson, H. D., E. R. Williams and E. A. Hunter (1978). Block designs for variety trials. J. Agric. Sci. 90, 395-400. Pearce, S. C. (1960). Supplemented balance. Biometrika 47, 263-271. Pearce, S. C. (1963). The use and classification of non-orthogonal designs. J. Roy. Statist. Soc. Set. A 126, 353-377. Pearce, S. C. (1964). Experimenting with blocks of natural sizes. Biometrika 20, 699-706. Pearce, S. C. (1970). The efficiency of block designs in general. Biometrika 57, 339-346. Pearce, S. C. (1976). Concurrences and quasi-replication: An alternative approach to precision in designed experiments. Biom. Z. 18, 105-116. Pearce, S. C. (1983). The Agricultural Field Experiment. Wiley, Chichester. Pearce, S. C., T. Caliriski and T. E de C. Marshall (1974). The basic contrasts of an experimental design with special reference to the analysis of data. Biometrika 61, 449-460. Preece, D. A. (1967). Nested balanced incomplete block designs. Biometrika 54, 479-486. Preece, D. A. (1982). Balance and designs: Another terminological tangle. Utilitas Math. 21C, 85-186. Puff, P. D. and S. Kageyama (1985). Constructions of partially efficiency-balanced designs and their analysis. Comm. Statist. Theory Methods 14, 1315-1342. Purl, P. D. and A. K. Nigam (1975a). On patterns of efficiency balanced designs. J. Roy. Statist. Soc. Ser. B 37, 457-458. Purl, P. D. and A. K. Nigam (1975b). A note on efficiency balanced designs. Sankhyd Ser. B 37, 457-460. Purl, P. D. and A. K. Nigam (1977a). Partially efficiency balanced designs. Comm. Statist. Theory Methods 6, 753-771. Purl, P. D. and A. K. Nigam (1977b). Balanced block designs. Comm. Statist. Theory Methods 6, 1171-1179. Purl, P. D. and A. K. Nigam (1983). Merging of treatments in block designs. Sankhya Ser. B 45, 50-59. Puff, P. D., B. D. Mehta and S. Kageyama (1987). Patterned constructions of partially efficiency-balanced designs. J. Statist. Plann. Inference 15, 365-378. Purl, P. D., A. K. Nigam and P. Narain (1977). Supplemented designs. Sankhya Ser. B 39, 189-195. Raghavarao, D. (1962). Symmetrical unequal block arrangements with two unequal block sizes. Ann. Math. Statist. 33, 620-633. Raghavarao, D. (1962). On balanced unequal block designs. Biometrika 49, 561-562. Raghavarao, D. (1971). Constructions and Combinatorial Problems in Design of Experiments. Wiley, New York. Rao, C. R. (1959). Expected values of mean squares in the analysis of incomplete block experiments and some comments based on them. Sankhya 21, 327-336. Rao, C. R. (1970). Estimation of heteroscedastic variances in linear models. J. Amer. Statist. Assoc. 65, 161-172. Rao, C. R. (1971). Estimation of variance and covariance components - MINQUE theory. J. Multivariate Anal 1, 257-275. Rao, C. R. (1972). Estimation of variance and covariance components - in linear models. J. Amer Statist. Assoc. 67, 112-115. Rao, C. R. (1974). Projectors, generalized inverses and the BLUEs. J. Roy. Statist. Soc. Ser. B 36, 442-448. Rao, C. R. (1979). MINQUE theory and its relation to ML and MML estimation of variance components. Sankhyd Ser. B 41, 138-153. Rao, C. R. and J. Kleffe (1988). Estimation of Variance Components and Applications. North-Holland, Amsterdam. Rao, C. R. and S. K. Mitra (1971). Generalized Inverse of Matrices and Its Applications. Wiley, New Nork. Rao, M. B. (1966). A note on eqni-replicated balanced designs with b = v. Calcutta Statist. Assoc. Bull 15, 43-44. Rao, V. R. (1958). A note on balanced designs. Ann. Math. Statist. 29, 290-294. Ray-Chaudhuff, D. K. and R. M. Wilson (1971). Solution of Kirkman's school girl problem. Combinatorics - Proc. Syrup. in Pure Mathematics 19, 187-204 (Amer. Math. Soc.). Saha, G. M. (1976). On Califiski's patterns in block designs. Sankhya Ser. B 38, 383-392. Scheff6, H. (1959). The Analysis of Variance. Wiley, New York.

Block designs: Their combinatorial and statistical properties

873

Seber, G. A. E (1980). The Linear Hypothesis: A General Theory. Griffin, London. Shah, K. R. (1964). Use of inter-block information to obtain uniformly better estimates. Ann. Math. Statist. 35, 1064-1078. Shah, K. R. (1992). Recovery of interblock information: an update. J. Statist. Plann. Inference 30, 163-172. Shah, K. R. and B. K. Sinha (1989). Theory of Optimal Designs. Springer-Verlag, Berlin. Shrikhande, S. S. (1953). The non-existence of certain affine resolvable balanced incomplete block designs. Canad. J. Math. 5, 413-420. Shrikhande, S. S. (1976). Affine resolvable balanced incomplete block designs: A survey. Aequationes Math. 14, 251-269. Shrikhande, S. S. and D. Raghavarao (1964). Affine a-resolvable incomplete block designs. In: C. R. Rao, ed., Contributions to Statistics. Pergamon Press, Statistical Publishing Society, Calcutta, 471-480. Speed, T. E, E. R. Williams and H. D. Patterson (1985). A note on the analysis of resolvable block designs. J. Roy. Statist. Soc. Ser. B 47, 357-361. Sprott, D. A. (1955). Balanced incomplete block designs and tactical configurations. Ann. Math. Statist. 26, 752-758. Stanton, R. G. and R. C. Mullin (1966). Inductive methods for balanced incomplete block designs. Ann. Math. Statist. 37, 1348-1354. Steiner, J. (1853). Combinatorische Anfgabe. J. Reine Angew. Math. 45, 181-182. Street, A. E and D. J. Street (1987). Combinatorics of Experimental Design. Clarendon Press, Oxford. Tocher, K. D. (1952). The design and analysis of block experiments (with discussion). J. Roy. Statist. Soc. Ser. B 14, 45-100. Vartak, M. N. (1963). Disconnected balanced designs. J. Indian Statist. Assoc. 1, 104-107. Wallis, W. D. (1988). Combinatorial Designs, Marcel Dekker, New York. White, R. E (1975). Randomization in the analysis of variance. Biometrics 31, 555-571. Williams, E. R. (1975). Efficiency-balanced designs. Biometrika 62, 686-688. Williams, E. R. (1976). Resolvable paired-comparison designs. J. Roy. Statist. Soc. Ser. B 38, 171-174. Williams, E. R., H. D. Patterson and J. A. John (1976). Resolvable designs with two replications. J. Roy. Statist. Soc. Ser. B 38, 296--301. Williams, E. R., H. D. Patterson and J. A. John (1977). Efficient two-replicate resolvable designs. Biometrics 33, 713-717. Woolhouse, W. S. B. (1844). Prize question 1733. Ladies and Gentleman's Diary. Yates, E (1936a)~ Incomplete randomized blocks. Ann. Eugen. 7, 121-140. Yates, E (1936b). A new method of arranging variety trials involving a large number of varieties. J. Agric. Sci. 26, 424-455. Yates, E (1939). The recovery of inter-block information in variety trials arranged in three-dimensional lattices. Ann. Eugen. 9, 136-156. Yates, E (1940). The recovery of inter-block information in balanced incomplete block designs. Ann. Eugen, I0, 317-325. Yates, E (1965). A fresh look at the basic principles of the design and analysis of experiments. In: L. M. LeCam and J. Neyman, eds., Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability, Vol. 4. University of California Press, Berkeley, CA, 777-790. Yates, E (1975). The early history of experimental design. In: J. N. Srivastava, ed., A Survey of Statistical Design and Linear Models. North-Holland, Amsterdam, 581-592. Zyskind, G. (1967). On canonical forms, non-negative covariance matrices and best and simple least squares linear estimators in linear models. Ann. Math. Statist. 38, 1092-1109.

S. Ghosh and C. R. Rao, eds., Handbookof Statistics, Vol. 13 © 1996 Elsevier Science B.V. All rights reserved.

Z_.~,~

Developments in Incomplete Block Designs for Parallel Line Bioassays

S u d h i r Gupta a n d R a h u l Mukerjee

1. Introduction Biological assays or bioassays are experiments for estimating the strength of a substance, called the stimulus, which is usually a drug and can, in particular, be a vitamin or a hormone. Normally, two preparations of the stimulus, both with quantitative doses and having a similar effect, are compared utilizing responses produced by them on living subjects like animals, plants or isolated organs or tissues. One of the two preparations, called the standard preparation, is of known strength while the other, called the test preparation, has an unknown strength. The principal objective of the assay is to measure the potency of the test relative to the standard preparation, the relative potency being defined as the ratio of equivalent, i.e., equally effective doses of the two preparations. We refer to Finney (1978) for numerous practical examples of bioassays with real data - these include an assay of vitamin D3 in codliver oil via its antirachitic activity in chickens (Chapter 4), an assay of testosterone propionate by means of growth of comb in capons (Chapter 5), and so on. A bioassay can be direct or indirect. In a direct assay, doses of the standard and test preparations just sufficient to produce a specified response are directly measured. This is practicable only when both the preparations are capable of administration in such a way that the minimal amounts needed to produce the specified response can be exactly measured. In an indirect assay, on the other hand, predetermined doses are administered to the subjects and their responses, either quantal or quantitative, are recorded. The responses are quantal when the experimenter records whether or not each subject manifests a certain easily recognizable reaction, such as death, and they are quantitative when the magnitude of some property, like survival time, weight etc, is measured for each subject. Throughout the present discussion, only indirect assays, where the response is measured quantitatively, will be considered. As indicated above, interest lies in estimating the relative potency defined as the ratio of equivalent doses of the two preparations. The relative potency, however, is meaningful when the ratio remains constant over all possible pairs of equivalent doses as happens in assays of the analytical dilution type where the two preparations have the same effective constituent or the same effective constituents in fixed proportions of one another. Thus it is important to test 875

876

s. Gupta and R. Mukerjee

the constancy of the ratio of equivalent doses before proceeding with the estimation of relative potency. More specifically, in the present article we are concerned with parallel line assays. These are indirect assays with quantitative responses where the experimenter wishes to estimate the relative potency under the assumption that the relationship between the expected response and the logarithm of doses of the two preparations is representable by parallel straight lines. This assumption entails the desired constancy of the ratio of equally effective doses (vide Section 2) and its validity is checked by appropriate tests of hypotheses. In a parallel line assay, the predetermined doses of the standard and test preparations represent the treatments. However, unlike in usual varietal experiments where interest lies in all elementary treatment contrasts or in a complete set of orthonormal treatment contrasts, here, in conformity with the twin objectives of model testing and estimation of relative potency, certain specific treatment contrasts assume particular importance (cf. M. J. R. Healy's discussion on Tocher (1952)). Thus efficient designing of parallel line assays poses different types of problems which have received a considerable attention in the literature. We refer to the authoritative work of Finney (1978) (see also Finney, 1979) for an excellent account of the developments up to that stage. Further informative reviews are available in Hubert (1984), Das and Giri (1986) and Nigam et al. (1988). A parallel line assay is called symmetric if the standard and test preparations involve the same number of doses; otherwise, it is called asymmetric. In Section 2, we introduce parallel line assays in fuller detail, with some emphasis on the symmetric case, highlighting the treatment contrasts of special interest. Section 3 provides a brief general introduction to block designs in the context of parallel line assays. The efficient designing of symmetric parallel line assays have been reviewed in Sections 4-6 while Section 7 deals with this issue in the asymmetric case. The role of non-equireplicate designs in this connection has been examined in Section 8. Finally, several open problems deserving further attention are discussed in Section 9.

2. Parallel line assays 2.1. A general introduction

Consider an indirect assay with quantitative responses. Let s and t denote typical doses of the standard and test preparations and, with x = logs, z = logt, let their effects be represented respectively by ~?l(X) and ~z(z). Throughout, we shall use natural logarithm. Let the quantitative doses of the standard and test preparations included in the assay be s l , . • . , sin1 and h , . • •, tin2 respectively, where ml, m2 ~> 2. These doses are equispaced on the logarithmic scale the common ratio being the same for both the preparations, i.e, s ~ = c l h ~-1

(1 ~ 3) be odd. Then an L-design with parameters v ( = 2m), b, r and k = 4 exists if and only if v r = 4b.

PROOE The necessity is obvious. To prove the sufficiency, let m = 2# + 1. Then v = 4# + 2, k = 4, r = 2q, b = (2# + 1)q, for some positive integer q. Let

NI=

20

Iv 0I~ be a square matrix of order 2# + 1 and 2/2 be such that the ith column of N1 equals the (i - 1)th column of N2(2 ~< i ~< 2# + 1) and the first column of N1 equals the last column of N2. An L-design with the desired parameters can now be obtained by taking q copies of the design with incidence matrix [N~ N~]'. The design can be seen to be connected. [] REMARK 4.2. From (2.16), (2.17), (2.20b) and (3.5), it is not hard to see that the constructions in Theorems 4.1 and 4.2 ensure full efficiency also with respect to L j and L} for every odd j. In contrast with the case of even ra, a complete solution to the existence problem of L-designs is not yet available for every odd m. Gupta and Mukerjee (1990) obtained a necessary and sufficient condition for the existence of L-designs for odd ra ~< 15, a range which seems to be enough for most practical purposes. Their result is shown below. THEOREM 4.3. Let m be odd, 3 2 and k is even.

While the necessity part of Theorem 4.3 is evident from (4.4) and L e m m a 4.1, Gupta and Mukerjee (1990) proved the sufficiency by actual construction of designs. As illustrated in Example 4.2 below, this was done by first employing (4.4) to enumerate the possible columns of N1 or N2 and then juxtaposing these columns so as to satisfy the requirement regarding replication number. They tabulated L-designs, over the range 3 ~< m ~< 15, for all parameter values satisfying the conditions of this theorem. All these designs, except the single replicate ones, are connected. EXAMPLE 4.2. Let m = 5, v = 10, k = 6 and v r = bk. Then b / r = 5/3, i.e., b = 5q, r = 3q for some positive integer q. Consider the case q = 1. Let (Yl,-. •, Ys) ~ denote a typical column of N1. By (4.4), the non-negative integers Yl,. •., Y5 must satisfy 5

~

yj = 3,

- 2 y l - Y2 + Y4 + 2y5 = 0.

(4.6)

j=l

The only solutions of (4.6) for ( Y l , . . . , Ys) ~ are oq ---- (0, 0, 3, 0, 0) t,

o~2 = (0,1,1,1, 0)',

o~3 = (1, 0,1, 0 , 1 ) ' ,

Developments in incompleteblockdesignsfor parallel line bioassays o~4 = ( 0 , 2 , 0 , 0 , 1)',

889

o~5 -= ( 1 , 0 , 0 , 2 , 0 ) ' .

Now let c~i occur ui times as a column of N1. Then, since by (4.1b) each row sum of N1 must equal r ( = 3), we have u3 + us = u2 + 2u4 = 3Ul + u2 + u3 = u2 + 2u5 = u3 + u4 = 3, a solution of which is Ul = 0, u2 = 1, u3 = 2, u4 = us = 1. Thus 011 NI=

01

100

20

111

00

100

02

011

10

,

and the incidence matrix of a connected L-design with parameters v = 10, k = 6, r = 3 and b = 5 is given by N = [N( N~]'. An L-design in 3q replications can be obtained considering q copies of this design. In so far as the construction of L-designs is concerned, the results discussed above should be adequate. However, before concluding this section, we briefly outline some earlier work in this area. Kulshreshtha (1969) proposed some L-designs for even m in blocks of size at least eight. Kyi Win and Dey (1980) suggested a construction procedure on the basis of (4.4) but their approach seems to involve some amount of guess work. Das (1985) discussed the use of linked block designs and affine resolvable designs in this context. For even m ( = 2#), Nigam and Boopathy (1985) described construction procedures for three series of L-designs: (a) v = 4#, b -- /32, r = #, k = 4, (b) v = 4/3, b = # ( / 3 - 1), r = / 3 - 1, k = 4, (c) v = a/3, b - - # ( # - 1)/2, r = # - 1, k = 8. They noted that all these designs ensure full efficiency also with respect to Lj and L~. for odd j (cf. Remark 4.2) and, recognizing these designs as partially efficiency balanced designs (Puri and Nigam, 1977), indicated their efficiency factors with respect to Lj and Lj1 for even j. For odd ra, these authors described two series of L-designs which have been incorporated in Theorem 4.2 above. Gupta et al. (1987) investigated incomplete block designs which retain full information on Lp and also on Lj and L} for odd j and addressed issues related to the characterization and construction of such designs; see also Nigam et al. (1988) in this context. Gupta (1988) discussed the use of singular and semi-regular group divisible designs in the construction of L-designs and indicated how the three series, due to Nigam and Boopathy (1985), can be covered by this approach.

5. Symmetric parallel line assays: Further results

Continuing with the set-up of the last section, consider a symmetric parallel line assay with m doses of either preparation. Suppose it is desired to conduct the assay in a design involving b blocks of size k (< v = 2m) each such that every treatment is replicated r times, where bk/v ( = r) is an integer. Observe that given v, b and k,

890

s. Guptaand R. Mukerjee

even if k is even (cf. Lemma 4.1), an L-design may not exist. By Theorem 4.1, this happens, for example, when m is even and k = 2 (mod 4). In such situations, the experimenter may wish to retain full information on any two of the important contrasts Lp, L1 and LI. The relevant constructional aspects will be reviewed in this section. THEOREM 5.1. Let vr = bk. Then an equireplicate design, with parameters v ( = 2m),

b, r, k, retaining full information on Lp and any one of L1 and LI exists if and only if k is even. PROOF. The necessity is evident noting that if such a design exists then the corresponding matrices N1, N2, as defined in the last section, must satisfy the first condition in (4.4). The sufficiency is proved by actual construction. Since m r = b(k/2), if k is even then one can always construct a design involving m treatments and b blocks such that each block has size k/2 and each treatment is replicated r times. Let N~ be the m x b incidence matrix of this design. (a) Let N2 = I ' N 1 , i.e., N2 is obtained by arranging the rows of N1 in the reverse order. By (4.3), then

so that, by (2.20) and (3.5), the equireplicate design, given by the incidence matrix [N~ N~]' and having parameters v, b, r, k, retains full information on Lp and L1. This construction is due to Das and Kulkarni (1966). (b) Similarly, the equireplicate design with incidence matrix [N{ iV(] t and parameters v, b, r, k, can be seen to retain full information on Lp and L I . [] REMARK 5.1. The designs constructed in (a) or (b) above will be connected if N1 represents a connected design. From (2.16), (2.17), (2.20b) and (3.5), it can he seen that the construction in (a) ensures full efficiency also with respect to Lj for every odd j and L} for every even j (vide Das and Kulkarni, 1966). Similarly, one can check that the construction in (b) ensures full efficiency also with respect to L} for every j. REMARK 5.2. With reference to the construction in (a), Das and Kulkarni (1966) studied the consequences of choosing N1 as the incidence matrix of a balanced incomplete block (BIB) design where every tv¢o distinct treatments occur together in A blocks. As shown by them, then the efficiency factor for L}, for every odd j, and L j, for every even j, equals Av/(rk). Similarly, in this situation, the construction in (b) ensures an efficiency factor ~v/(rk) with respect to Lj for every j. Das and Kulkarni (1966) also explored the effect of choosing N1 as the incidence matrix of a circular design in (a) while Das (1985) examined the role of the so-called C-designs (Calitiski, 1971; Saha, 1976) in this context.

Developments in incomplete block designs for parallel line bioassays

891

EXAMPLE 5.1. Let m = 4, v = 8, b = 4, r = 3, k = 6. Then by Theorem 4.1, an L-design is not available. Following Theorem 5.1, take

NI=

[Tl1] 01 10

'

(5.1)

11 which represents the incidence matrix of a BIB design with A -- 2. The constructions in (a) and (b) above yield designs with incidence matrices

(a)

011110 101101

N =

110011 and li (b)

N=

110111 0111011 1011101 1101110

respectively. The design in (a) retains full information on Lp, L1 L3, L 21 and ensures an efficiency factor 8/9 for L~, L~,L2, while that in (b) retains full information on Lp, LI, L~, L~ and ensures an efficiency factor 8/9 for L1, L2, L3 (vide Remarks 5.1 and 5.2). Turning to the problem of retaining full information on both L1 and L I (but not necessarily on Lp), we have the following result for even m. THEOREM 5.2. Let vr = bk and m be even. Then an equireplicate design, with parameters v (= 2rr~), b, r, k, retaining full information on L1 and L I exists if and only if k is even. PROOF. Let m = 2#. To prove the necessity, suppose such a design exists and consider the associated matrices N1 and N2 which are defined as in Section 4. By (3.5), then e~N1 = e~N2 = 0 (cf. (4.4)). Hence for a typical column of N1 or N2, say (Yl,//2,..., Y2~)', by (4.3),

j=l

YJ(J - " ) = 2

j=l

yj

892

s. Gupta and R. Mukerjee

Consequently, the sum of the elements in each column of N1 and N2 is even so that, by (4.1a), k must be even. Following Gupta (1989), the sufficiency is proved by actual construction. Since m r = b(k/2), if k is even then one can always construct a design involving m treatments and b blocks such that each treatment is replicated r times and each block has size k/2. Let N* be the incidence matrix of this design. Then it can be seen that the equireplicate design, given by the incidence matrix (cf. (4.5))

t.o]

I

z: o/N, oi: / o

.j

and having parameters v, b, r, k, retains full information on both L1 and L 1.

[]

REMARK 5.3. The design constructed in the sufficiency part of the above proof will be connected if N* represents a connected design. From (2.16), (2.17), (2.20b) and (3.5), one can check that this construction ensures full efficiency on Lj and L} for every odd j. This point was noted by Gupta (1989) who also discussed the choice of N* so as to ensure orthogonal estimation of Lp, L j , L}, 1 ~ j ~ m - 1, and explored the role of balanced_factorial designs (Shah, 1960) in this context. EXAMPLE 5.2. Let m = 4, v = 8, b = 4, r = 3, k = 6. Take N* = N1, where N1 is given by (5.1). Then # = 2 and, following Theorem 5.2 and Remark 5.3, the design with incidence matrix

001111 11t100 111011 retains full information on each of L1, L~, L3 and L~. Also, it can be seen that this design ensures an efficiency factor 8/9 for each of Lp, L2 and L~. Furthermore, one can check that this design, and also the ones indicated in the preceding example, lead to orthogonal estimation of all these seven treatment contrasts; cf. Gupta (1989).

6. Symmetric parallel line assays: Q-designs We continue with symmetric parallel line assays, now with m (~> 3) doses for each preparation, and review equireplicate designs retaining full information on L2 and L~ in addition to Lp, L1 and LI. Such a design will be called a Q-design (el. Mukerjee and Gupta, 1991). In so far as inference on these five contrasts is concerned, a Q-design,

Developments in incomplete block designs .for parallel line bioassays

893

if available, will be optimal, in a very broad sense, over the equireplicate class (vide Remark 3.4). Consider as before an arrangement of the v (= 2m) treatments in b blocks such that each block has size k (< v) and each treatment is replicated r times. Then vr =- bk. The incidence matrix, N, of the design is partitioned as N = (N[ N~)', where Ni and N2 are as in Section 4. By (2.20), (3.5), (4.1a), (4.3) and Remark 3.2, the design under consideration is a Q-design if and only if

l ~ N i = ( 2 k ) l~b,

e~Ni=O,

e~Ni=O

(i = 1,2).

(6.1)

The above is analogous to (4.4). Here el is as given by (4.3) and by (2.15), (2.17), =

12

-

_

1)1..

f2 being an m x 1 vector with jth element {j - (m + 1)/2} 2, 1 ~ j ~ m. Clearly, a necessary condition for the existence of a Q-design is that k is even. Also, for any typical column of N1 or N2, say (yl, y 2 , . . . , y~)~, by (6.1) one must have

= 2 ' j=l

~{j_

j ~ l j y j = -~k(m+ 1), =

2

1

(6.2)

l~(m + 1 ) } yj = ~-~k(m2-1).

j=l

These considerations led Mukerjee and Gupta (1991) to suggest a construction procedure consisting of the following steps. (i) For given v (= 2m), k and r, search all possible non-negative integral valued solutions of (6.2) for (yl, • • •, Ym)q If no solution is available then a Q-design with the given parameters is non-existent. Otherwise, let a l , . . . , aq be the possible solutions. (ii) Find non-negative integers u l , . . . , Uq such that PiLl u i c ~ i - ~ rim. If no such ul, • •., Uq exist then a Q-design with the given parameters is non-existent. Otherwise, construct N1 with columns a l , . . . , ceq such that ai is repeated ui times (1 ~< i ~< q), take N2 = N1 and N = (N~ N~) ~. The incidence matrix so constructed will represent a Q-design with the given parameters. In step (ii), the choice of ul,. •., Uq, if any, may be non-unique. One should try to select ul,. •., Uq in such a way that the resulting design becomes connected. Mukerjee and Gupta (1991) adopted the above approach to derive and tabulate Qdesigns for all possible parameter values over the range k < v and v ~< 24. They noted that it is always possible to ensure connectedness of these designs whenever r>l. EXAMPLE 6.1. For v = 8, ra = 4, the last condition in (6.2) implies that 5k/8 must be an integer. Hence no incomplete block Q-design exists.

894

S. Gupta and R. Mukerjee

EXAMPLE 6.2. Let v = 14, m = 7, b = 7, r = 6, k = 12. Then the possible nonnegative integral valued solutions of (6.2) are al = (0, 3, 0,0,0, 3,0)',

az = (1, 1,0, 1,2,0, 1)'

a3 = (1,0,2, 1,0, 1, 1)'. Since eq + 3a2 + 3a3 = 61m, a Q-design exists and is given by the incidence matrix N = (N( N~)', where -01 1111131 11000 00 00222 Nl=

11111 02 22000 01

3O 0 0 1 1 1 01 11111 The design is connected.

7. Asymmetric parallel line assays In analogy with the developments in Section 4, we now consider the problem of retaining full information on the preparation contrast (~bv), combined regression contrast (~bl) and parallelism contrast (~b~) in asymmetric parallel line assays. In the spirit of Gupta and Mukerjee (1990), an equireplicate design achieving this will be called a @design. Let there be ml and ra2 doses of the standard and test preparations respectively and suppose the assay is to be conducted in a design involving b blocks of size k (< v = ml + m2) each such that each of the v treatments is replicated r times. Here bk = rv. The incidence matrix, N, of the design is partitioned as N = [N~ N~] ~, where N i is of order m~ × b (/ = 1,2). The rows of N1 and N2 correspond to the doses of the standard and test preparations respectively. Clearly, (7.1)

' llmlN1 + lmzN2 = kl~.

By (2.13), (3.5), (7.1) and Remark 3.2, the design under consideration is a ~b-design if and only if lm~'Ni = ( k m i / v ) l ~ , {f(i) _ 1 (m~ + 1)lm,}'Ni---- 0

(i = 1,2)

(7.2)

Developments in incomplete block designs for parallel line bioassays

895

where f(0 = ( 1 , 2 , . . . ,mi)', i = 1,2. The conditions (7.2), due to Kyi Win and Dey (1980), generalize (4.4) to the asymmetric case. Kyi Win and Dey (1980) discussed the construction of designs satisfying (7.2) and presented a short table showing ~bdesigns for (ml, m2, b, r, k) = (3, 6, 3, 2, 6), (3,9, 6, 4, 8), (4, 8, 4, 2, 6), (5, 10, 5, 2, 6), (6,9,6,2,5) and (6,9,6,4,10). EXAMPLE 7.1. Let ml = 3, m2 = 6, b = 3, r = 2, k = 6. Then, as noted in Kyi Win and Dey (1980), a connected @design is given by the incidence matrix !0111

O011]

t

01101101 20011110 In particular, if ?hi and m2 are both even then, as a generalization of Theorem 4.1, we have the following result giving a complete solution to the existence problem of @designs. THEOREM 7. l. Let m l = 21,1 and m2 = 21.2 be both even and vr = bk. Then a @design with parameters v (= ml + m2), b, r and k exists if and only if k is even and k1.1/v is an integer.

PROOF. To prove the necessity, suppose such a design exists and consider the associated matrices N1 and N2. Then for a typical column of N1, say (yl, Y2, • • •, Y2t,1)', by (7.2), 2/Zl 2/'l 1 ~ YJ = /~1/V. ~ _ j Y j ( j -- 1.1) = j=l j=l

(7.3)

The integrality of k1*l/v follows from (7.3). Also by (7.3), the sum of elements in each column of N1 is even. Similarly, the sum of elements in each column of N2 is even. Hence by (7.1), k must be even. The sufficiency part is proved by construction. Let k be even and k p l / V be an integer. Then k # 2 / v = k / 2 - k # l / v is also an integer. Also, p i t = b ( k # i / v ) , i = 1,2. Hence for i = 1,2, one can always construct a design involving #i treatments and b blocks such that every treatment is replicated r times and each block has size k # i / v . Let M i be the #i × b incidence matrix of such a design. Then the incidence matrix of a @design with the desired parameters is given by (cf. (4.5))

*

N = lZ;'

M1

M2

(7.4) "

The above @design will be connected if M = [M( M~]' represents the incidence matrix of a connected design. []

896

s. Gupta and R. Mukerjee

EXAMPLE 7.2. Let ml = 4, m 2 = Theorem 7.1 are satisfied and with

Ms =

8, b =-

00,011 [ 10

001]

4, r = 2, k = 6. Then the conditions of

010 101 110

'

in (7.4), the incidence matrix of a connected @design is obtained as

i 0 0 1 11 0 0 0 0 11010100101 00101011010 11000111100

1 11

This design is available also in Kyi Win and Dey (1980). Das and Saha (1986) described a systematic method for the construction of ~bdesigns through affine resolvable designs. Their method yields designs with even ral and m2 and hence Theorem 7.1 covers all parametric combinations where their method is applicable. Through the use of C-designs, they as well presented a method of construction of possibly non-equireplicate designs retaining full information on each of ~bv,~bl and ~b~; see their original paper for the details of this method and also for a review of the work on asymmetric parallel line assays reported in the unpublished Ph.D. Thesis of Seshagiri (1974).

8. On the role of non-equireplicate designs In Sections 4-7, we have surveyed the results on existence and construction of incomplete block designs retaining full information on various sets of contrasts that are of interest in parallel line assays. Almost all designs discussed in these sections are equireplicate and their optimality, with regard to inference on relevant contrasts and in the class of equireplicate designs, has been indicated in Remark 3.4. This, however, does not guarantee their optimality, even under specific criteria, when one relaxes the condition of equal replication and considers the class of all comparable designs including the non-equireplicate ones. Moreover, v, b and k may be such that no equireplicate design is available at all. Thus, given v, b and k, the problem of finding an optimal or efficient design for the contrasts of interest, within the class of all designs and not just the equireplicate class, is of importance. Such a study also helps in assessing the performance of the designs discussed earlier as members of the broader class of all designs. Some results in this direction were recently reported by Mukerjee and Gupta (1995). They studied the problem of A-optimality (Shah and Sinha, 1989) in the set-up of

897

Developments in incomplete block designs for parallel line bioassays

Section 4 where a symmetric parallel line assay, with m doses of each preparation, is conducted using b blocks of k (< v --- 2m) experimental units each and interest lies in Lp, L1 and L]. By (2.20) and (4.3), the normalized contrasts corresponding to Lp, L1 and LI are 9~', 9~~- and 9~- respectively, where gl = ( 2 m ) - ' / 2 ( 1 ~ -- 1,~)',

92 = [6/{ra(m 2 -- 1)}]'/2(e~e~)', /

f

93 = [6/{ra( m2 - 1)}]'/2(e~ - e~)'.

(8.1)

Then interest lies in GT where the 3 x v matrix G is defined as G = (91,92, g3) t. By (4.3) and (8.1), the diagonal elements of G~G are 01,02,..., 0v, where

0d = O ~ + j -

1

12{1

2 m +m(rr~'---1)

j-~(ra+l)

}2 ,

1 ~k

and

A2(n-1)m=O

(modk-1).

(7)

The most interesting of these for the statistician is the second-associate concurrence count A2 being l. For k --- 3 and A2 = 1, the necessary conditions (7) are sufficient with the exceptions of (m, n) = (2, 3), (2, 6), and (6, 3), for which there are no solutions, and with the possible exceptions of n = 6, m = 2 or 10 (rood 12) (Rees and Stinson, 1987; Assaf and Hartman, 1989). Some large families of designs are known for k = 4 and A2 = 1 (Shen, 1988). The concept of resolvability is extended to a-resolvability (a replicates of each treatment in each nesting block) first by Bose and Shrikhande (1960) and more formally by Shrikhande and Raghavarao (1963), and to (al,C~2,..., at)-resolvability (nesting block i has c~ replicates of each treatment) by Kageyama (1976). Mukerjee and Kageyama (1985) explore the relationship of affine (C~l,a2,-. •, at)-resolvability to variance balance and to the numbers of nesting and nested blocks, extending the resuits of Shrikhande and Raghavarao (1963). Recent construction-oriented papers with which an interested reader may begin to explore these ideas further are Mohan and Kageyama (1989), Jungnickel et al. (1991), Shah et al. (1993), and Kageyama and Sastry (1993). For resolvable designs with unequal sub-block sizes see Patterson and Williams (1976a) and Kageyama (1988).

952

J. P Morgan

4. Nested BIBDs and related designs

4.1. Motivation and preliminaries Nested balanced incomplete block designs, in which both the nesting blocks and the sub-blocks form BIBDs, are the obvious immediate generalizations of resolvable BIBDs to settings in which the nesting blocks cannot accommodate all of the treatments. Like the resolvable BIBDs, these designs possess the robustness property discussed in Section 2.2. In Section 2.3 they are shown to be optimal for the full analysis, though they would generally be considered inferior to resolvable BIBDs with the same sub-block size and number of units, as the latter keep all information on treatment contrasts in the two lowest strata (which are usually subject to less variability). In any case, when ]¢1 < V, SO that a resolvable design is not a possibility, the nested BIBDs, or NBIBDs, are the simplest and most widely studied alternative, and are here the subject of Section 4.2. The notation for these designs will be NBIBD(v, bl, b2, k), there being b2 blocks of size k nested in each of bl blocks of size b2k. A discussion of the available alternatives when the NBIBD conditions are not achievable is in Section 4.3.

4.2. Construction of NBIBDs As noted in Section 2.3, NBIBDs were introduced to the statistical literature by Preece (1967), who gave examples of their prior use in agricultural experiments, fully outlined the analysis, and provided a table of smaller designs. Since then, construction of these designs has been studied by Homel and Robinson (1975), Jimbo and Kuriki (1983), Bailey et al. (1984), Dey et al. (1986), Iqbal (1991), Jimbo (1993) and Kageyama and Miao (1996). Some NBIBDs are also embedded in a more restrictive combinatorial class, namely the balanced incomplete block designs with nested rows and columns (or BIBRCs; cf. Example 3 in Section 6). Especially relevant in this regard are the BIBRCs of Jimbo and Kuriki (1983) reported as Theorem 6.3 in Section 6.2. Other intertwining constructions for these two classes, and from various authors for NBIBDs alone, will be pointed out below as some of the more extensive constructions are listed. The first two recursively combine a BIBD with a known NBIBD. THEOREM 4.1 (Jimbo and Kuriki, 1983). If a NBIBD(v, hi, b2, k) is formed on the treatments of each block of a BIBD(v*, b*, k* = v), the result is a NBIBD(v*, bib*,

b2, k). If a resolvable BIBD is thought of as a special case of a NBIBD, then the nested BIBDs which occur as components of the BIBRCs of Theorem 2 of Singh and Dey (1979) are a special case of Theorem 4.1. Those same NBIBDs are found again in Theorem 2.1 of Dey et al. (1986). THEOREM 4.2. Existence of a NBIBD(v, bl, b2, k) and of a BIBD(v* implies the existence of a NBIBD(v, bib*, k*, k ).

=

b2,

b*, k*),

Nested designs

953

PROOF. Let dl be the given NBIBD, and d2 the BIBD. Take any nesting block of dl, call it/3, and associate each of its b2 = v* sub-blocks with a treatment of d2. Now construct b* new nesting blocks from/3, of k* sub-blocks each, corresponding to the treatments in block i of d2, i = 1 , 2 , . . . , b*. Repeating this for each block of dl gives a design with the stated parameters. The new sub-block design is r* = v*b*/k* copies of that of dl, so is a BIBD. And if A* is the concurrence count for d2, and A1, A2 are the nesting block and sub-block concurrence counts for dl, then it is easy to see that the nesting block concurrence count for the new design is r*Az + (AI - A2)A*. [] Theorem 4.2 is the nested design version of the BIBRC construction due to Cheng (1986) that appears as Theorem 6.2 in Section 6.2. Construction (ii) of Dey et al. (1986, p. 163) results from Theorem 4.2 by taking the NBIBD to be a resolvable BIBD. Another recursive construction of NBIBDs, which starts with two cyclically generated NBIBDs with special properties, is given by Jimbo (1993, p. 98). Kageyama and Miao (1996) also employ recursive methods in completely solving the construction of NBIBD(v, bl, 2, 2)'s; among the inputs they require are designs from Theorems 4.4-4.6. THEOREM 4.3 (Bailey et al., 1984). Let v be odd. Identify a pair o f integers x < y in Zv as forming an s-block i f x + y = s. For each s = 0, 1 , . . . , v - 1 take the (v - 1)/2 s-blocks as sub-blocks o f a nesting block o f size v - 1. The result is a

NBIBD(v, v, (v - 1)/2, 2). Bailey et al= (1984) call the above designs "near resolvable". In the same paper they also give a solution to the construction of resolvable BIBDs with sub-block size 2. The remaining three theorems of this sub-section construct designs using the method of differences on finite fields. The results will be stated without proofs, which can be found in the original papers. As is usual, x wilt denote a primitive element of GFv. For any m that divides V - 1, let Hm,0 = (x °, x m, xZm,..., x v - l - m ) t and H ~ , i = xiHm,o for i = 0 , 1 , . . . , m 1. Also iet S,~ = ( x ° , x ~ , . . . , x m - 1 ) . Subblocks within a block will be displayed separated by bars. Readers unfamiliar with finite fields or with the method of differences may wish to consult Street and Street (1987, Chapter 3). THEOREM 4.4 (Jimbo and Kuriki, 1983). Let v = rah + 1 be a prime power, let L i f o r i = 1 , . . . , n be mutually disjoint s-subsets o f S,~ written as s x 1 vectors, and let A i = L i ® H ~ . Then the m initial blocks

Bj --_X ( j - l )

j=

1,2,...,m,

generate a N B I B D ( v , m v , n, hs). I f rn is even and h is odd, then B 1 , . . . , Bm/2 generate a NBIBD(v, m v /2, n, hs).

954

J. P Morgan

THEOREM 4.5 (Jimbo and Kuriki, 1983). Let v = u t a h + 1 be a prime power, let L be an s-subset o f S,~ written as a s x 1 vector, and let A i = z ( i - 1 ) ~ L ® Hum. Then the m initial blocks

B~ = z (~-~)

,

j=l,2,...,m,

generate a N B I B D ( v , m v , m , hs). I f m is even and u h is odd, then B 1 , . . . ,Bin~2 generate a NBIBD(v, m y ~ 2 , m , hs).

Putting m = 2t, h = 3, s = 1 , n = 2 in Theorem 4.4, and u = 3 , m = 2t, h = 1, s = 2 in Theorem 4.5, gives designs with the same parameters as those of series (ii) of Dey et al, (1986, p. 165). Designs deemed series (i) by the same authors are the next theorem. THEOREM 4.6 (Dey et al., 1986). Let v = 4t + 1 be a prime power. The initial blocks

Bj = x j-1

xat

~--z-r-fl 9c3t '

j=l'2'''''t'

generate a NBIBD(v, v ( v - 1)/4, 2, 2).

For given v, bl, b2, and k, Table 1 lists the smallest feasible NBIBDs, that is, the smallest replication for which BIBDs of blocksize b2k and of blocksize k both exist. References for the designs are to the theorems of this section and to the original paper of Preece (1967), demonstrating that some of the latter's designs occur as special cases of now known infinite series of designs. The recursive constructions are listed only when no direct method is available. For four cases the author has found no design in the literature. For three of these, and for designs neither in Preece's (1967) table nor resulting from a theorem of this section, a solution is listed in Table 2. Existence of the remaining case, NBIBD(10, 15, 2, 3), remains open. NBIBD(10, 30, 2, 3) can be constructed using Theorem 4.2 and a NBIBD(10, 10, 3, 3). There has also been interest in enumerating a special class of NBIBDs. An almost resolvable BIBD(v, b, k) with b = v ( v - l ) / k (i.e., with A = k - 1) is a BIBD whose blocks can be partitioned into v classes of (v - 1 ) / k blocks each so that each class lacks exactly one treatment (these designs are also sometimes called near resolvable). Obviously an almost resolvable BIBD(v, b, k) is a NBIBD(v, v, (v - 1 ) / k , k). The obvious necessary condition for existence of such a design, other than the usual BIBD conditions, is that k divide v - 1. Sufficiency of this condition for k = 2 is Theorem 4.3. Sufficiency for k = 3 is proven by Hanani (1974). Yin and Miao (1993) demonstrate the sufficiency for k = 5 or 6, except possibly for 8 values of v when k = 5, and 3 values of v when k = 6; save for v = 55 or 146 when k = 6, these exceptions have since been solved. For k = 7 and k = 8 sufficiency has likewise been established save for small lists of possible exceptions. Refer to Furino et al. (1994) for details.

Nested designs

Table 1 NBIBDs with v ~ 14 and replication r ~ 30. Preece refers to the table of Preece (1967) v

r

bl

b2

k

Theorem

5

4

5

2

2

Preece, 4.3, 4.6

6

10

15

2

2

Preece

7 7 7

6 6 12

7 7 21

3 2 2

2 3 2

Preece, 4.3, 4.5 Preece, 4.4 4.3, 4.4, 4.5

8 8 8

7 21 21

14 28 28

2 3 2

2 2 3

Preece Table 2 Iqbal (1991)

9 9 9 9 9

8 8 8 8 8

18 12 9 12 9

2 3 4 2 2

2 2 2 3 4

4.6 Preece Preece, 4.3 Preece Preece

10 10 10 10

9 9 9 18

15 15 10 45

3 2 3 2

2 3 3 2

Preece ? Preece 4.2

11 11 11 11 11

10 10 20 30 30

11 11 55 55 55

5 2 2 3 2

2 5 2 2 3

Preece, 4.3, 4.5 Preece, 4.5 4.4, 4.5 4.4 4.4, 4.5

12 12 12 12 12

11 11 11 22 22

33 22 22 33 33

2 3 2 4 2

2 2 3 2 4

Preece Preece Preece Table 2 Table 2

13 13 13 13 13 13 13 13 13 13

12 12 12 12 12 12 12 18 24 24

39 26 13 26 13 13 13 26 39 39

2 3 6 2 4 3 2 3 4 2

2 2 2 3 3 4 6 3 2 4

4.6 Preece, 4.5 Preece, 4.3 Preece, 4.4 Preece Preece Preece 4.4, 4.5 4.5 4.4

14

26

91

2

2

Agrawal and Prasad (1983)

955

J.P. Morgan

956 Table 2 Solutions for NBIBDs v

r

bl

b2

k

Initialblocks

8

21

28

3

2

[0, 1 I 2,41 3,6], [0, 1 I 3,5 I 4, c~],

8

21

28

2

3

12

22

33

4

2

12

22

33

2

4

14

26

91

2

2

[0, 1 [ 3, 6 I 5, cx~], [0, 21 1,4 I 3, c~] (mod 7) [0,1,312,6,7],[0,5,712,4,6 ], [0,1,312,5,6],[0 ,1,612,4,5 ] × 1 (rood 8) [0, 1 I 3,814,7 I 5,6],[0,2 I 3,6 I 5,9 I 4, c~], [0,4 I 1,617,915, oo] (mod 11) [0, 1,2,3 14,7,8, 10], [0, 1,4,7 I 2,3, 9, cx~], [0,2,6,8 ] 3,7,9, cx~] (mod 11) [0, 1 I 9, 8], [0,2 [ 5, 3], [0, 8 I 1,9], [0,3 I 2, 5], [0,4 I 7, 11], [co,0 I 2,9], [~,013,9] (mod 13)

4.3. Other nested incomplete block designs There has been a relatively modest amount of work on nested block designs with b2k less than v and for which at least one of the two component block designs is not a BIBD, which will be briefly reviewed here. The earliest such effort seems to be due to Homel and Robinson (1975), who define nested partially balanced incomplete block designs (NPBIBDs) as designs for which the nesting blocks form a PBIBD with blocksize bzk, the nested blocks form a PBIBD with blocksize k, and the two component designs share a common association scheme (for the definition of a PBIBD see Street and Street, 1987, p. 237). Though not motivated there by any efficiency argument, the full analysis, explained in detail in their paper, is certainly eased by the common association scheme requirement. The same objective can be met if one of the association Schemes is a collapsed version (by combining associate classes) of the other, and a few designs of this type appear in Banerjee and Kageyama (1993). Banerjee and Kageyama (1993) also give two infinite series of NPBIBDs, one based on the triangular association scheme and one based on the L2 association scheme. Homel and Robinson's (1975) constructions are for prime power numbers of treatments, based on generalizations of the pseudo-cyclic and L2 association schemes. In another paper, Banerjee and Kageyama (1990) construct c~-resolvable NBIBDs and NPBIBDs. As they remark, their designs "mostly have large values of some parameters." The resolvability can be used to accommodate a third, super-nesting factor, that will not enter into the analysis. In a dissertation at the University of Kent, Iqbal (1991) uses the method of differences to construct nested designs for which one of the two component block designs is a BIBD and the other is a regular graph design. Relaxing the balance restriction for one component allows smaller replication to be attained than would sometimes otherwise be possible, and produces designs that should be reasonably efficient for either of the analyses discussed in Section 2. This is an idea worthy of further study, as is the entire area of nested block designs which are both small and efficient.

Nested designs

957

Gupta (1993) studies nested block designs for which both the number of sub-blocks within a block, and the sub-block sizes, are non-constant. The main result proves the bottom stratum optimality of any such design for which the sub-blocks are a binary, variance-balanced design. If as in Section 2.2 of this paper, it is first shown that the nesting factor is irrelevant to the bottom stratum analysis, then this result can also be established as an application of Theorem 2.1 of Pal and Pal (1988). Construction amounts to arbitrarily partitioning the blocks of any optimal, nonproper block design. Each partition class will be a nesting block of a bottom stratum optimal nested design, but the behavior under the full analysis may or may not be satisfactory.

5. Nesting of r o w and c o l u m n designs - Models and related considerations

5.1. The nested row and column setting

The examples of Section 1 and the proffered designs through Section 4 share a common feature in their block structure: given any two blocking factors G and H, either G nests H , or H nests G. While as those examples show, this "pure nesting" structure is found in a wide variety of experimental situations, it is also the case that many experiments blend nesting with other, non-nesting relations among some of the blocking factors. The simplest of these is a nesting of two completely crossed blocking factors, which can be visualized as in Figure 2. If the nesting factor is called "blocks", and the two crossed factors within the nest are called "rows" and "columns", then each block is recognizable as the setting for a row-column design. Indeed, one of the mechanisms by which this nested row and column setting arises is through repetition of a row-column experiment in which it is not reasonable to assume that row factor and column factor effects are constant across repetitions (blocks). Nesting can also be used to reduce the numbers of rows and columns per block relative to a single row-column layout, which may be important in assuring row-column additivity; the trade-off is that for a given number of experimental units there will be fewer degrees of freedom for error. A primary area of application is found in agricultural field trials in which blocks are physically separate fields, and two orthogonal sources of variation are modeled to account for yield differences as a function of position within the field. An interesting example in sampling insect populations is discussed in Keuhl (1994, pp. 341-342). It is assumed here, as is common in practice, that there is one experimental unit at each row-column cross within a block (so in Figure 2 there are 36 units). The ideas and terminology attendant to the nested row and column setting have been formalized beginning with the papers of Srivastava (1978) and Singh and Dey (1979), though designs for this setting were considered much earlier, prime examples being the lattice squares of Yates (1937, 1940). Singh and Dey's (1979) paper is notable for introducing a class of designs, the balanced incomplete block designs with nested rows and columns, that generalize the balanced lattice squares and which have since been the focus of much of the design work in this area. But as shown by developments of the past five years reported in Sections 5.2 and 5.3, these designs are not necessarily optimal, and depending on the setting parameters and the analysis

Z P Morgan

958

F~

F2

Fa

H2 H3

H9 G1

G2

Ga

G4

G5

G6

G7

G8

G9

Glo G n

G12

Fig. 2. G and H are crossed within levels of F. This is a nested row and column setting with 3 blocks of size 3 x 4.

used, can be surprisingly poor. In the subsections that follow one may make the interesting observation that the introduction of a second factor in a nest, crossed with the first nested factor, fundamentally changes the requirements for optimal design. The intuition from simple block designs that worked so well in designing for pure nested structures as in Section 2, can quite miserably fail at the next level of complexity.

5.2. Model and bottom stratum analysis To formalize the setting, let b be the number of blocks (levels of the nesting factor), and let p and q be the numbers of rows and columns (levels of the two nested factors which are crossed with one another) within each block. With yjzm denoting the yield from the unit in row l, column m of block j (plot (j, l, m)), the model is

Yjlm = ]A +/~j + Pjl + (~jm + T[jlm] + ejlm

(8)

where/3j, Pjl, and 5jm are respectively block, row, and column effects; T[jlm ] is the effect of the treatment applied to plot (j, l, m); and the ejZm are uncorrelated random variables of constant variance and zero means. In matrix form, ordering the yields row-wise by block, this is

Y = #1 + Z ~ + Z2p + Z35 + Ad~" + E

(9)

with Z1 = Ib ® lpq, Z2 = Ibp ® lq, Z 3 = I b ® lp ® lq and/3, p, 5 are respectively b x 1, bp x 1, and bq x 1. Ad of order bpq x v is the 0-1 design matrix and ~- is the v x 1 vector of treatment effects. For the bottom stratum analysis eliminating blocks, rows, and columns, also called the "within-rows-and-columns" analysis, the information matrix for the best linear unbiased estimators of contrasts of the ~-i's is (Singh and Dey, 1979)

Cd= X d ( I -

~ Z z Z ~ - ~Z3Z~ + I z I Z ~ ) A d

= A'A - 1N2N~ - 1-N3N~3+ 1 N 1 N ~ q P Pq

(10)

Nested designs

959

where the d has been dropped in the latter expression to ease the notation. The matrices N1, N2 and N3 are the treatment-block, treatment-row, and treatment-column incidence matrices (Ni = XaZi ). Singh and Dey (1979) introduced a class of designs for this setting which combine the classical notions of treatment assignment binarity and variance balance in the bottom stratum analysis. They named these designs balanced incomplete block designs with nested rows and columns (for short, BIBRC). DEFINITION. A BIBRC is a nested row and column design with pq < v for which (i) treatment assignment to blocks is binary, that is, N1 contains only 0's and l's, and (ii) Ca is completely symmetric. Because tr(Cd) is constant over the class of all designs satisfying (i), Kiefer's (1975) proposition 2 says that BIBRCs are universally optimum over the class of all binary block nested row and column designs. The question, then, is whether better designs can be found outside that class. Binarity in simple block designs (only one blocking factor) is widely believed to exclude no superior designs (e.g., John and Mitchell, 1977; Shah and Sinha, 1989) and typically functions to exclude far inferior competitors, so if that were to be a guide, the answer to the question would be a definitive "no". But further investigation shows that intuition from the simple block setting fails here. Letting ~ be any optimality criterion as specified in Section 2.1, the goal is to minimize ~(Cd) with respect to choice of d. The matrix N 2 N ~ - ~ N1N~ is nonnegative definite, so

with equality when pN2N~ = NIN;. The matrix A ' A - ~ N3N~ is the information matrix for the simple block design composed of the bq columns of d as blocks of size p, what will be called the column component design. This motivates the definition of the bottom stratum universally optimum nested row and column design (BNRC, for short). DEFINITION. A BNRC is a nested row and column design for which (i) the column component design is a BBD, and (ii) for each i and j, the number of times treatment i appears in row 1 of block j is constant in 1. When necessary to indicate the parameters, the notation will be BNRC(v, b,p, q) (similarly for BIBRCs). Condition (i) says that the column component design is a universally optimum block design, and condition (ii) says that within a block, each row of treatments is a permutation of the first row. Condition (ii) makes pN2N~ = N1N~, from which the first theorem of this section follows. THEOREM 5.1 (Bagchi et al., 1990). A BNRC(v, b, p, q) is universally optimum for the

bottom stratum analysis among all nested row and column designs with the same v, b, p and q.

J. P. Morgan

960 EXAMPLE 2. A BNRC(4, 6, 2, 4). 1122 2211

3344 4433

1234 3421

1234 3421

1234 3421

1234 3421

Unlike a BIBRC, a BNRC is nonbinary in blocks and need not be binary in rows. Further, as illustrated by Example 2, neither the row nor the block component designs need be variance balanced or efficient. If pq ~< v, a direct comparison of the two types of designs can be made (and with this restriction a BNRC must have p ~< q). The common nonzero eigenvalue for the information matrix of a BNRC is then b(p- a)q/(v - 1), and that for a BIBRC is b ( p - 1 ) ( q - 1)/(v - 1), so that the relative efficiency of a BIBRC is (q - 1)/q, which while satisfactory for large q, is for small q quite poor. The worst case is p = q = 2, with BNRCs having twice the bottom stratum information on treatment contrasts. Bagchi et al. (1990) prove even more than stated in Theorem 5.1. If a nested row and column design d satisfies (ii) of the BNRC definition, and if the column component design is ~-optimum over the class of simple block designs with bq blocks of size p, then d is itself ~-optimum. For optimum designs that are not BNRCs, see Section 7.2.

5.3. The full analysis It turns out that the advantage held by BNRCs for the bottom stratum analysis is not necessarily maintained in the full analysis. In demonstrating this, some additional conditions will be imposed on the two classes of designs that will ease the recovery of information from higher strata. For BIBRCs it is demanded that each of the row, column, and block component designs (which respectively have block sizes of q, p, and pq) be BIBDs. BIBRCs with this property have been termed by Agrawal and Prasad (1982b) to belong to "series A"; here they will be said to be completely balanced. The restrictions on the class of BNRCs are that a treatment appear m times in every row of any block in which it appears (m being a constant not depending on the particular block or treatment), and that there are Ab blocks in which any pair of treatments i ~ i I both occur (by virtue of the preceding restriction, i and i ~ both occur m p times in each of these ,kb blocks). For convenience, the restricted BNRC will also be said to be completely balanced. Morgan and Uddin (1993a) point out that many of the known series of BIBRCs and of BNRCs have these properties. It is the completely balanced BIBRCs that produce NBIBDs when one of their row or column classifications is dropped, a relationship alluded to earlier in Section 4.2. The design of Example 2 is a BNRC that is not completely balanced. The advantage of these complete balance restrictions is that both classes of designs are generally balanced with a single efficiency factor in each stratum (see Houtman and Speed, 1983; or Mejza and Mejza, 1994). For the data vector Y of any equireplicate nested row and column design the averaging operators are T = L7" A X

961

N e s t e d designs

for treatments, B

1 -ffZ1Z 1t for blocks, R = ~1 g 2Z 2t for rows, C = ~1 Z 3 Z 3t for

=

columns, and G = 1__!_3 bpq for the grand mean. The stratum projectors are So = G, $I = B - G ,

$2 = R - B ,

$3 = C - B ,

and $4 = I - R - C + B , " n a m i n g "

the strata 0 to 4, respectively. To recover information from strata 1-3, the vectors /3, p, 7, and E in the model (9) are now taken to be mutually uncorrelated random vectors with zero means and with var(/3) = 0-~Ib, var(p) = 0.~Ibp, var(7) = 0.~Ibq, and vat(E) = O'2Ibpq. Hence 2 t 2 t v a r ( r ) = 0.~ZlZ~ + 0.2z2z~ + 0.3z3z3 + 0.Lr 4

= ~ ~sSs s=0 for @ = ~1 = pq0-2 + q0-2 + p0-2 + 0.2, ~2 = q0.2 + 0-2, ~3 = Pa 2 + 0-2, and ~4 = 0-2. The condition for general balance is that (T - G ) S s ( T - G) = A s ( T - G) for constants 0 ~< As ~< 1, ~'~s As = 1; As is the efficiency factor for estimating a treatment contrast in stratum a. With known stratum variances @, ~1,...,~4, the variance of a treatment contrast t ' ( T - G)AT estimated using information from all strata is 1

(11)

v a r ( t ' ( T - G ) Y ) - E s A~J, 1 I I ( T - G)tll z The efficiency factors, aside from a divisor of pq(v - 1) throughout, are

)k0 BIBRC BNRC

)~1

)k2

v(p-1) 0 (ray - q)p 0 0

v-

pq

)k3

)k4

v(q-1) v(p- l)(q-1) v(q - rap) v(p - 1)q

It is seen that a completely balanced BNRC has more information in blocks and in the within-rows-and-columns stratum, less information in rows and in columns. Which type of design is superior depends on the relative values of the stratum variances. THEOREM 5.2 (Morgan and Uddin, 1993a). If the stratum variances are known (equiv-

alently, 0-2, 0-2, 0.2, and 0.2 are known), then for the analysis with recovery of information from every stratum, a completely balanced BNRC(v, b,p, q) is superior to a completely balanced BIBRC(v, b,p, q) if and only if (p--l)

(~p---- i)

[

1

(p - 1)o-2 J

~l> 0.2 |~" Vl - 1

Ldo-

]

"

The proof of Theorem 5.2 is immediate upon comparison of (11) for BNRCs and BIBRCs.

962

~ R Morgan

It is not hard to see that the completely balanced BIBRCs are universally optimum within the binary block class for the random effects model of this section, which gives an immediate improvement: a completely balanced BNRC is superior to every binary design for this analysis whenever the condition of Theorem 5.2 holds. The expression in Theorem 5.2 has been written with the special case of ra = 1 in mind, for which the coefficient of ~1 reduces to 1. When m = 1 the rows of a completely balanced BNRC form the blocks of a balanced incomplete block design and the condition of Theorem 5.2 is equivalent to that of Theorem 4.1(b) of Bagchi et al. (1990). That theorem in this setting says that such a design is universally optimum under the random effects model. For m > 1 and the random effects model a completely balanced BNRC does not necessarily have maximum trace so no such general optimality result is easily obtained; the advantage of course is that if the condition of Theorem 5.2 holds, this class nevertheless provides large families of designs for situations where optimum (m = 1) BNRCs may not exist, that are variance balanced and are superior to any BIBRC and indeed to any binary block design. When for m = 1 the condition of Theorem 5.2 fails, a completely balanced BIBRC is universally optimum for the random effects model.

5.4. Discussion

All of this leaves one with some difficult questions in designing a nested row and column experiment. If it is decreed in advance that the bottom stratum analysis is to be performed, then Section 5.2 says that a BNRC is to be recommended whenever it exists. But in doing so a statistician may well face opposition from scientists reluctant to abandon the binary designs they have always used. Moreover, depending on p and q, the proportion ~4 of information in the bottom stratum can be unacceptably small (a point made by Cheng (1986), for BIBRCs), so that some recovery of higher stratum information may be warranted. Now the choice of design is much less clear, and depends on the stratum variances, which will usually be unknown. Complicating the entire discussion are existence criteria for the two types of designs. Combinatorial considerations show that frequently BNRCs will exist when BIBRCs do not, and vice-versa. So, for instance, if a BNRC is available and a BIBRC is not, the question of what is the best design for the full analysis does not have a straightforward answer as in Theorem 5.2. And more realistic, given the combinatorial conditions, is that for given v, b, p, and q, neither type of design will exist. In this regard, these variance balanced designs are simply the first, albeit important, steps in a relatively new area of inquiry for design theory, that offer insight into the properties of a good design when universal optimality cannot be achieved. Many more steps are to be taken before a reasonably comprehensive catalog of efficient nested row and column designs can be compiled. Sections 6 and 7 will summarize the known results for existence of BIBRCs and BNRCs, and will mention a few other known related designs.

963

Nested designs

6. Binary block nested row and column designs 6.1. BCBRCs and balanced lattice rectangles

The earliest of the systematically studied nested row and column designs are the lattice squares of Yates (1937, 1940). These are arrangements of v = s 2 treatments into s x s blocks so that each treatment occurs once in each block. A completely balanced lattice square is a set of b of these blocks with the property that the row component and column component block designs are each BIBDs. In a semibalanced lattice square, the 2bs row component and column component blocks together form a BIBD. Either type of balance makes the information matrix (10) completely symmetric, so that the completely balanced and semibalanced lattice squares are complete block versions of BIBRCs as defined in Section 5.2, what some authors refer to as balanced complete block designs with nested rows and columns (BCBRCs). For any prime power s, complete balance is achievable with s + 1 blocks, and if s is odd, semibalanced lattice squares exist with (s + 1)/2 blocks. Kempthorne (1952, Chapter 24) gives a simple construction, or alternatively, the orthogonal array construction for affine resolvable designs in Section 3.3 is easily modified for this situation. Tables of these designs may be found in Clem and Federer (1950) and Cochran and Cox (1992). The balanced square lattice displayed earlier in Example 1 is also an example of a completely balanced lattice square. For non-square (p ~ q) blocks, the lattices which fall under the BCBRC heading are called balanced two-restrictional lattices or balanced lattice rectangles. Though a bit more complicated for rectangles than for squares, the case of prime powered number of treatments being divisible by the number of rows, has been thoroughly covered. For the parameters v = sm~

p = sr~

q = sC~

r ÷ c = m~

r ~ c

with s being a prime power, balance requires (sm - 1 ) / ( s - 1) blocks. Designs can be constructed by confounding pseudofactors of the s m factorial set in rows and columns, the problem being in systematically selecting the confounding factors for each replicate. A method for doing so is given by Raktoe (1967), who explicitly displays a compact representation for each s '~ less than 1000 by means of cyclic collineations (see also Federer and Raktoe, 1965). Mazumdar (1967) gives an analytic solution to constructing the generator matrices of collineations for all value of s. Hedayat and Federer (1970) establish the connection between these designs and sets of mutually orthogonal Latin squares. Other than the designs just described, very few BCBRCs are currently available. Series with blocks of size 2 x v / 2 , when v - 1 is a prime power, are given by Agrawal and Prasad (1984), along with a few individual plans. Designs with the same parameters and v ~< 20 are tabled in Ipinyomi (1990). The efficiency factors displayed for BIBRCs in Section 5.3, and the efficiency comparisons with BNRCs, are all correct for BCBRCs as well.

964

J. P. Morgan

6.2. BIBRCs Since the introduction of BIBRCs by Singh and Dey (1979), in excess of a dozen papers have appeared containing constructions for these designs. Nevertheless, progress on the construction problem cannot be said to be great, for other than the odd trial and error solution, most of these results are for prime power v, and many have large v and/or number of replicates. There is also considerable overlap among the results of different authors. As tables of designs are not yet available, the main results (without proof, other than specification of initial blocks where appropriate) will be reproduced here, with a discussion of how they include constructions given by others. Questions of isomorphism will not be addressed: concern is only with designs having the same parameters v, b, p, and q. Two basic recursive constructions are stated first. THEOREM 6.1. If a BCBRC(k, b*, p, q) or BIBRC(k, b*, p, q) is formed on the treatments of each block of a BIBD(v, b, k), the result is a BIBRC(v, bb*, p, q). Theorem 2 of Singh and Dey (1979) is Theorem 6.1 starting with a lattice square BCBRC. Theorem 6.1 starting with a BIBRC is Theorem 2 of Jimbo and Kuriki (1983); it appears again as Theorem 2.1 of Sreenath (1989). Theorem 5.1 of Agrawal and Prasad (1984) is another version that starts with a BCBRC. The second result recursively combines a BIBD with a BIBRC in a different way. It was the first general method to produce BIBRCs that do not, in the terminology of Section 5.3, have the "complete balance" property. THEOREM 6.2 (Cheng, 1986). Existence of a BCBRC(v, b*, p*, q) or a BIBRC(v, b*, p*, q), and of a BIBD(p*, b, k), implies the existence of a BIBRC(v, bb*, k, q). From each p* x q block of the given nested row and column design, one forms b new k x q blocks by retaining rows corresponding to treatments in the blocks of the BIBD; cf. Theorem 4.2. Cheng (1986) further shows that if the given BIBRC is generally balanced, then so too is the resulting design, though the new design need not be balanced in any of the three component (row, column, block) designs. Many such series are listed there. As an illustration, combining a BCBRC(9, 2, 3, 3) with a BIBD(3, 3, 2) gives the design of Example 3, which is completely balanced. EXAMPLE 3. A BIBRC(9, 6, 2, 3). Rows within blocks are a NBIBD(9, 6, 2, 3). Columns within blocks are a NBIBD(9, 6, 3, 2). 147 258

147 369

258 369

159 672

159 834

672 834

Three more series constructions for BIBRCs will be given, all based on the method of differences using finite fields. THEOREM 6.3 (Jimbo and Kuriki, 1983). Let v = mklk2 + 1 be a prime power, where hi and k2 are relatively prime, and let s and t be positive integers satisfying st ~ m.

Nested designs Write L s x t blocks

=

965

(xi+J-2)i,j and Mkl x kz = ( x[(i-1)k2+(j-l)kl]'~)i,j. Then the initial

Bj =x(J-I)L®M,

j=l,2,...,rn,

generate a BIBRC(v, rnv, ski, tk2). If m is even and k 1k 2 is odd, then B1, . . . , Bin~2 generate a BIBRC(v, my~2, ski, tk2). Each of the component designs of Theorem 6.3 is a BIBD (cf. Theorem 4.4), so that these BIBRCs are completely balanced. As special cases of Theorem 6.3, many series of BIBRCs with parameters matching designs appearing elsewhere in the literature can be obtained, from which it is evident that this result has been overlooked by several authors (including this author in a joint paper, as will soon become apparent). Theorem 6.3 also pulls together a number of results appearing concurrent or prior to its publication. The designs with s = t = 1 are also found in Theorems 1 and 2 of Agrawal and Prasad (1982a), the first of these appearing again in Theorem 2.3 of Saha and Mitra (1992). Putting kl = t = 1 gives Theorems 1 and 2 of Uddin and Morgan (1991) and Theorem 6 of Street (1981), the latter of which is restricted to s 1 is odd and write ccTM = 1 - x 4mi. ( a ) / f u~ - uj ~ m (mod 2m) f o r i , j = 1 , . . . , (t - 1)/2, then there is a BIBRC(v, my, t, t). (b) If in addition to the condition in ( a ), ui ~ m (mod 2 r e ) f o r i = 1 , . . . , ( t - 1)/2, then there is a BIBRC(v, m v , t + 1,t + 1). The initial blocks for Theorem 6.5 also have the form B, x B , . . . , x m - l B . For (a), B is the addition table with row margin (x °, x 4 m , . . . , X 4 ( t - 1 ) m ) and column margin (x ra, x 5 m , . . . , x[4(t-1)+llm), and for (b), each margin also includes 0. As an example, t = 3 gives blocks of sizes 3 x 3 and 4 x 4 with v = 1 2 m + 1 a prime power (m >~ 2 is required for the 4 x 4's) and replications 3(v - 1)/4 and 4(v - 1)/3, respectively. With t = 5, blocks of size 5 x 5 and 6 x 6 for prime power v = 20ra + 1 and replications 5(v - 1)/4 and 9(v - 1)/5 are obtained; the 5 x 5's are for all ra ~> 2, while for v < 500 the conditions for the 6 x 6's fail for v = 41 and 61. Other corollaries to Theorems 6.4 and 6.5 may be found in Uddin and Morgan (1990), along with three other theorems that use the addition table approach. Designs from these three other theorems have the complete balance property, and hence more replicates when they overlap for given v, p, and q, than the designs from Theorems 6.4 and 6.5, which are semibaianced in the sense of lattice squares. Uddin and Morgan (1990) tabulate all of their designs for v no greater than 100 and p and q greater than 2. One other direct construction gives relatively large replication numbers but is of interest in producing some designs that cannot be found by the methods so far listed: see Theorem 6 of Agrawal and Prasad (1983). Uddin (1992, 1995) has developed methods for modifying known initial blocks for BIBRCs by inclusion of additional fixed elements, and by recursively combining initial blocks for two different BIBRCs. The existence of all BIBRC(v, b, 2, 2)'s satisfying the necessary condition v ( v - 1)]4b has been independently established by Kageyama and Miao (1996) and Srivastav and Morgan (1996); the latter pair of authors construct all of the designs to have general balance.

6.3. Other binary block nested row and column designs There has been some, albeit considerably less, activity devoted to finding binary block nested row and column designs that trade off full (bottom stratum) balance in exchange for fewer blocks. The concept of partial balance for this setting has been explored by Street (1981), Agrawal and Prasad (1982b, c, 1984), Morgan and Uddin (1990), Sinha and Kageyama (1990), and Gupta and Singh (1991). Closely related are the generalized cyclic row-column designs of Ipinyomi and John (1985). Presumably these designs will be inefficient for the bottom stratum analysis, but it is not clear to what extent this will be so, and some may be of value for the full analysis.

Nested designs

967

For lattice squares and rectangles, Federer and Raktoe (1965) recommend using a subset of blocks from a balanced design, chosen so that "all the pseudo-effects should be confounded as equal a number of times in rows as possible and as equal a number of times in columns as possible," but the efficacy of this tact has not been demonstrated. Analysis of these designs is covered by Cornelius (1983). For the non-prime power case, the use of pseudofactors (e.g., Kempthorne, 1952, Chapter 24; Monod and Bailey, 1992, Section 5.3) provides a method for constructing large classes of complete block nested row and column designs that are relatively amenable to analysis. A few cyclic constructions for designs with complete blocks may be found in Ipinyomi and John (1985). Nested row and column designs with complete blocks are sometimes called resolvable row-column designs. If just two complete blocks are to be used, a catalog of 20 designs for up to 100 varieties is available in Patterson and Robinson (1989) (details of construction are explained in John and Whitaker (1993)). A construction which produces many of their designs may be found in Bailey and Patterson (1991), using the ideas of contraction employed by Williams et al. (1976) for two-replicate resolvable designs as described in Section 3.5. The method starts with a row-column design for two non-interacting sets of treatments (e.g., Preece, 1982), and if the starting design is optimal, the resulting nested design is optimal over all two-block nested row and column designs which are adjusted orthogonal (see Eccleston and John, 1988), and also over all designs in the resolvable class. Bailey and Patterson (1991) are able to reproduce some designs of Patterson and Robinson (1989), and find some designs not listed by these authors. A different approach to the problem of finding efficient, resolvable row-column designs is taken by John and Whitaker (1993) and Nguyen and Williams (1993). Both sets of authors describe search algorithms for non-exhaustive siftings through the large number of possible designs, concentrating on the two-replicate case. Though not flawless, both algorithms are able to construct some of the designs discussed in the preceding paragraph, and in some instances improve on those designs (in some other instances falling short). John and Whitaker's (1993) is a simulated annealing algorithm; Nguyen and Williams' (1993) is an iterative improvement algorithm. Each is available from its respective authors. Very recently there has been work, other than that implicit in lattice rectangles, on finding nested row and column designs for factorial experiments by sacrificing information on some interactions. Gupta (1994) uses the techniques of classical confounding to construct single-replicate nested row and column designs. Morgan and Uddin (1993b) obtain main effects plans by superimposing BIBRCs in the manner of orthogonal Latin squares. In Morgan and Uddin (1996a), the superimposition method produces optimal main effects plans for the analysis with recovery of row and column information. Some generalizations of potential interest are the extension of the BIBRC concept to generalized binary blocks (Panandikar, 1984), supplemented balance for comparing test treatments with a control (Gupta and Kageyama, 1991), and neighbor balancing in nested row and column designs (Ipinyomi and Freeman, 1988; Uddin, 1990; Uddin and Morgan, 1995).

968

J.P. Morgan

7. Nested row and column designs optimal for the bottom stratum analysis 7.1. BNRCs

With the first papers having appeared only recently (Bagchi et al., 1990; Chang and Notz, 1990), the literature on this topic is not nearly so well developed as is that for BIBRCs and other binary block nested row and column designs. This section will draw together the currently known results, showing how they relate to one another and to available BIBRCs. One of the main themes is the close connection between BNRCs and the regular generalized Youden designs (GYDs). A regular GYD(v, p, q) is a p x q row-column design for which columns are a BBD, and for which each treatment occurs q / v times in each row. Obviously a regular GYD(v,p, q) is a BNRC(v, 1,p, q). Put another way, BNRCs are a generalization of regular GYDs that allows for smaller block sizes and more than one block. With the regular GYDs included in the BNRC family, most of the published constructions can be cast as methods of combining BNRCs with themselves or with BIBDs. These will be shown first. THEOREM 7.1. Construction of a BNRC(k, bl, p, q) on the treatments o f each block o f a BIBD(v, b2, k) produces a BNRC(v, blb2,p, q). Theorem 7.1 as stated appears as Theorem 2 of Morgan and Uddin (1993a). Particular cases are Theorems 3.2.1 and 3.2.4 of Bagchi et al. (1990), and Theorems 1 and 4 of Gupta (1992). If it is demanded that the BNRC(k, b~, p, q) have p = k and q = sh for some s, then any BBD(v, bz, k) can be used (the restriction to k < v, inherent in a BIBD, is dropped), slightly generalizing Theorem 3.1 of Chang and Notz (1990). With the pragmatic requirement of keeping bl b2 small, the best use of Theorem 7.1 will be with bl of 1, that is, forming a regular GYD on each block of a BIBD. COROLLARY 7.2. There is a BNRC(v, v(v - 1)/2, 2, 2) f o r every v >~ 2. COROLLARY 7.3. There is a BNRC(v, b, 2, 3) and a BNRC(v, b, 3, 3) f o r every v >~ 3, where v(v -

1)/6, 1)/3,

b=

v(v-

1)/2, 1),

if v if v if v ifv

-= l o r 3 (mod6) - 0 or 4 (mod 6) - 5 (rood 6) - 2 (mod 6).

Corollary 7.2 uses the Latin square on two treatments with all unordered pairs of v treatments to solve the construction of 2 x 2 BNRC's. Corollary 7.3 takes advantage of the fact that for k = 3 the necessary conditions for the existence of a balanced incomplete block design are also sufficient. The wealth of literature on balanced incomplete block designs (an extensive table is given in Mathon and Rosa, 1990) and Youden designs (see Ash, 1981) makes it easy to write down any number of corollaries. Similar to Corollary 7.3 one can use the results of Hanani (1961, 1975) to construct B N R C ( v , b , p , 4)'s for p = 3, 4 and any v and b satisfying vl4b and v ( v - 1)l12b.

Nested designs

969

These designs for small q are especially important in view of the relative inefficiency of BIBRCs of the same parameters. THEOREM 7.4. Existence of a BNRC(v, b*,p*, q) and a BIBD(p*, b, k) implies the existence of a BNRC(v, bb*, k, q). Theorem 7.4 is the method of Theorem 6.2 applied to a BNRC rather than a BIBRC, and generalizes Theorem 3 of Gupta (1992), which starts with a BNRC(v, 1, v, v) (i.e., a Latin square). When starting with a Latin square, the resulting BNRC(v, b, k, v) can be extended to a BNRC(v, b, k + 1, q) by adding the row (1,2 . . . . . v) to each block, which is Theorem 2 of Gupta (1992). THEOREM 7.5 (Morgan and Uddin, 1993a). The blocks of size p x (ql + q2), found by connecting block j of a BNRC(v, b,p, ql) to block j of a BNRC(v, b,p, q2) for each j = 1 , 2 , . . . , b , form a BNRC(v,b,p, ql +q2). Theorem 3.2.5 of Bagchi et al. (1990) is repeated application of Theorem 7.5 with qi =- q. Theorem 3.2.6 (also compare Theorem 3.2.7) of Bagchi et al. (1990) is an application of Theorem 7.5 to designs from Theorems 7.1 and 7.4 (each starting with a Latin square). THEOREM 7.6 (Morgan and Uddin, 1993a). Let 8 divide the number of blocks b in a BNRC(v, b, p, q). Then partitioning the blocks into b/s groups of s, and connecting the blocks in each group, gives a BNRC(v, b/s, p, sq). Theorem 7.6 with s equal to b says that combining all blocks of a BNRC(v, b,p, q) gives a BNRC(v, 1,p, bq), that is, a regular GYD! In a sense then, the regular GYDs include all BNRCs, as the latter are carefully chosen partitions of columns of the former. It was stated in the first paragraph of this section that BNRCs are a generalization of GYDs for more than one block, but from a combinatorial perspective the opposite is true, for the BNRCs with more than one block are a combinatorially more restrictive class of designs. Every BNRC with more than one block yields a GYD, but the converse does not hold. Before moving on to direct constructions, two other methods that start with known designs need be mentioned. Chang and Notz (1994) establish that some number b of permutations of treatment symbols in any maximum trace p x q row-column design produces a BNRC(v, b, p, q), and give bounds for b depending on the starting structure. Morgan and Uddin (1993a) point out that a k-resolvable BIBD(v, b, k) gives a BNRC(v, b/v, k, v). The k-resolvable BIBDs include all those generated from difference sets and many from supplementary difference sets, providing numerous rich families. Theorem 7.1 can then be applied to produce designs with smaller pq relative to the number of treatments; details are in Morgan and Uddin (1993a).

J. P. Morgan

970

The few direct constructions currently known for BNRCs are based on the method of differences using finite fields. The most far-reaching application of this approach is stated next. THEOREM 7.7 (Bagchi et al., 1990; Morgan and Uddin, 1993a). Let v = mq + 1 be a prime power. The initial blocks xO Xm

Xm x2m

..

x(q--1)m

..

X0

~ i ~ xi--I

, x(p "--t)ra xpra

i= 1,2,...,ra,

.. x(P-2)m

generate a BNRC(v, m v , p , q ) for each 2 #d,~ are the eigenvalues of Ma. Note that D-, A- and E-criteria correspond to p = 0, - I and - c ~ , respectively. Earlier literature in optimal design often focuses on the minimization of some convex and nonincreasing function 4i of the information matrix, which is related to the minimization of dispersion matrices; for instance, the D-, A- and E-criteria minimize det(Vd), tr(Va), and the maximum eigenvalue of Vd, respectively. Although Pukelsheim (1993, p. 156) argued that maximization of information matrices is a more

Ching-Shui Cheng

980

appropriate optimality concept, in this article we shall also refer to the minimization of ~(Ma). We shall use • and ¢ to denote nonincreasing and nondecreasing functions of information matrices, respectively. When we say a design is if- or C-optimal, we mean that it minimizes ~ ( M a ) or maximizes ¢(Ma), respectively. EXAMPLE 2.1 (Chemical balance weighing designs). The weights of n objects are to be measured by using a chemical balance. Each observation measures the difference between the total weight of the objects put on the right pan and the total weight of those on the left pan. Suppose N observations are to be made. Then for each design, the (i, j)th entry of the design matrix Xd is 1, --1 or 0 depending upon whether in the ith weighing, the jth object is put on the right pan, left pan or is not present. In this case, 79 consists of all the N × n (1,-1,0)-matrices. EXAMPLE 2.2 (Spring balance weighing designs). In the spring balance weighing design problem where each observation measures the total weight of the objects put on the scale, 79 consists of all the N × n matrices with entries equal to 0 or 1. Now we consider the estimation of a subsystem of parameters. Sometimes one may be interested in estimating only part of the parameters since the other parameters are nuisance parameters. For instance, in the setting of block designs discussed below in Example 2.3, one is usually interested in estimating the treatment effects, not the block effects. Suppose

y where

(2.1)

= X d l O 1 + X d 2 0 2 .-]- ¢

Xdl is N × s, 01 is s × 1, and one is interested in estimating 01 only. Let C d = X T X d 1 -- X d lTX d 2 ( X d 2 XTd 2

) - X d 2T X d l ,

(2.2)

(Xff2Xd2)- being a generalized inverse of XTXd2. Then all the s parameters in 01 are estimable if and only if Cd is nonsingular, and in this case the covariance matrix of the least squares estimator of 01 is equal to crZC~-1. We shall call Cd the information matrix for estimating 01, and the problem of maximizing ¢(Ca) or minimizing ~(Ca) can be similarly considered. Note that we can express Ca as Cd = X~IXdl

(2.3)

with ~

Xdl

= PT"a,.(Xa2)±Xdl

(2.4)

where Pn(xa2)±is the orthogonal projection matrix onto the orthogonal complement of the range of Xd2. In many settings including, e.g., block designs, all the Cd's are singular; so not all the parameters in 01 are estimable. For instance, suppose 1N C R(Xd2),

(2.5)

981

Optimal design: Exact theory

where 1N is the N x 1 vector of ones, and

Xdl has constant row sums.

(2.6)

Then -Xdlls = P~(xa2)±Xdlls = 0, which implies Cdls = 0. Therefore a linear function aTO1 is estimable only if it is a contrast, i.e., aT18 = 0. A design is called connected if rank(Cd) = s - 1, which is equivalent to that all the contrasts of 01 are estimable. As in Kiefer (1958), let

.. l] be an orthogonat matrix, and P = [P1 ' " Ps-1] T. Then P01 consists of a maximal system of normalized orthogonal contrasts. Let Vd be the covariance matrix of the least squares estimator of PO1 under a connected design d. Then D-, A- and Eoptimal designs can be defined as those which minimize det(Vd), tr(Vd) and the maximum eigenvalue of Vd, respectively. Since Vd = cr2(PCd PT), a D-, A- or ]~-1,

s-1

-2

-1

E-optimal design minimizes 1-I~_--] di ~ i = l #di or #d,s-l' respectively, where ~ d l ~ " "" ~ ~ds = 0 are the eigenvalues of Cd. Therefore all these criteria depend on Vd through Cd. Rigorously speaking, the information matrix of a connected design for estimating P01 should be defined as ( P C d P T ) -1, but the above discussion demonstrates that it is easier to work with Cd directly. We shall continue to call Cd the information matrix of d, and the optimality criteria are considered as functions of Cal. Note that if Xdl is the (0, 1)-incidence matrix between the units and the levels of a certain factor, then (2.6) is satisfied if each unit is assigned exactly one level. EXAMPLE 2.3 (Block designs). Suppose t treatments are to be compared on bk experimental units grouped into b blocks each of size k. Let Yij be the observation taken on unit j of the ith block, i = 1 , . . . , b, j = 1,..., k. Assume the usual additive fixed-effects model in which E(yq) = c~t(i,j) +/3i, where t(i, j) is the label of the treatment assigned to unit (i, j), a l , . . . , at are the treatment effects and/31,...,/35 are the block effects. Let 01 = ( a i , . . . , at) T and 02 = (/31,...,/35) T. Here/3~,...,/3b are considered as nuisance parameters, and one is only interested in the treatment effects. Then Xdl, Xd2 in (2.1) are the unit-treatment and unit-block incidence matrices, respectively. Let rd~ be the number of replications of treatment i, and ndij be the number of times treatment i appears in the jth block. Then it is easy to see that X T X d l is the diagonal matrix d i a g ( r d l , . . . , rdt) with the ith diagonal equal to rdi, X T X d 2 = kIb, and X T Xd2 is equal to the treatment-block incidence matrix Nd with the (i,j)th entry ndij. Thus, from (2.2), the information matrix for estimating the treatment effects is

Cd=diag(rdl,...,rdt) - k - l N d N T,

(2.7)

Ching-Shui Cheng

982

which has zero row and column sums since (2.5) and (2.6) obviously hold. A connected design has rank(Ca) = t - 1, and the commonly used optimality criteria are defined in terms of the t - 1 nonzero eigenvalues of Ca. It can be shown that the A-criterion is equivalent to the minimization of the average variance 1

t(t- 1)

(2.8)

var( ,-

EXAMPLE 2.4 (Block designs for comparing test treatments with a control). In the block design problem of the previous example, the interest is focused on the estimation of a maximal system of normalized orthogonal contrasts; so all the treatments are considered equally important. Now suppose one of the treatments is a control or standard treatment, and the other treatments are test treatments. Then some contrasts may be more important than others. For example, suppose treatment 1 is the control, and we are interested in estimating the t - 1 comparisons cq - c ~ 2 , . . . , c~1 -c~t between the control and the test treatments. Bechhofer and Tamhane (1981) showed that the covariance matrix of the least squares estimators of ~1 - c~2,..., cq - c~t under a connected design d is equal to cr2x the inverse of the (t - 1) x (t - 1) matrix (Td obtained by deleting the first row and column of Cd. Therefore for control-test treatment comparison, the information matrix of a design d is Cd. An A-optimal design, for example, mlnlmazes Y~-i=l #di, where # a l , . . . , #d,t-1 are me elgenvalues of Ca. Such a design minimizes ~ Y~2~ 2, a design with x T x = N I in Example 3.1 exists only if N is a multiple of 4. In this section, we present some results on optimal asymmetrical designs. n We shall restrict to the minimization of criteria of the form ~ f (Ma) = ~i=1 f(#di), where f is a convex and nonincreasing function, and #all >/ "'" ~> #an are the eigenvalues of Ma. This covers the Cp-criteria with p > - ~ by taking

992

I --xP~ f(x) =

Ching-Shui Cheng

x p,

for 0 < p ~< 1; for -cx~ < p < 0;

- log x,

f o r p = 0.

Results on the ¢_~-criterion can be obtained by using the fact that ¢ _ ~ is the limit of •p as p --+ -cx~. As before, ~ f ( C d ) = ~i=1 s-1 f(#di) is defined in terms of the nonzero eigenvalues of Ca. n n Consider the minimization of )--]~i=1f(#i) subject to the constraint ~i=1 #i = A, pi >~ 0, for a fixed constant A. The convexity of f implies that the minimum value is equal to n f ( A / n ) , attained at #1 . . . . . /z~ = A/n. Since f is nonincreasing, nf(A/n) is nonincreasingqn A. Therefore )-~i~=1f(pi) is minimized by the n (#1,... ,#n) with #1 . . . . . #,~ that maximizes )-'~i=1 #i. This is essentially the results of Theorems 3.1 and 3.2. Theorem 3.1 or 3.2 fails to produce an optimal symmetric design when there is no design corresponding to the center of the simplex { ( # 1 , . . . , Pn): ElL1 Izi = A, #i ~> 0} with the largest A. In this case, a design with (Pall,..., #an) close to the center of this simplex is expected to be highly efficient, if not optimal. This motivates the procedure of maximizing ~-~i~1 #di first, and then minimizing the squared distance ~in=l (#di - ~in=l Pdi/n) 2 among those which maximize ElL1 Pdi. Since ~ = 1 (#di -- ~ = 1 #di/n) 2 = ~i~=l #2ai -- ()-~-1 #di)2/n, this is equivalent to maximizing ~ i ~ 1 IZdi (= tr(Md) or tr(Cd)), and then minimizing ~i~1 P~i (= tr(M~) or tr(C~)) among those which maximize )-]~i~l #di. Such a design is called (M.S)optimal. The main advantage of this procedure is that tr(md) and t r ( m a2) (or tr(Cd) and tr(C~)), which are, respectively, the sum of the diagonal entries and sum of squares of all the entries of Md (or Cd), are very easy to calculate and optimize. The (M.S)-criterion was originally proposed in the setting of block designs (Shah, 1960; Eccleston and Hedayat, 1974). EXAMPLE 4.1 (Weighing designs, Example 3.1 continued). In the chemical balance weighing design problem of Examples 2.1 and 3.1, since t r ( X T X ) is maximized by any X with all the entries equal to 1 or - 1 , an (M.S)-optimal design minimizes the sum of squares of the entries of x T x among the N x n (1,-1)-matrices X . When N is odd, none of the off-diagonal entries of x T x can be zero. Therefore for N _= 1 Or 3 (mod4), a design with x T x = ( N - 1)! + J or (N + 1 ) I - J , respectively, is (M.S)-optimal. When N = 2 (mod 4), to calculate t r ( X T X ) 2, without loss of generality, we may assume that the first m columns of X have even numbers of entries equal to 1, and the remaining columns have odd numbers of entries equal to 1. Then x T x can be partitioned as

[a.] BT C

'

Optimal design: Exact theory

993

where all the entries of A and C are congruent to 2 (mod 4), and all the entries of B are multiples of 4. For such an X , t r ( x T x ) 2 is minimized when A = ( N 2)Ira + 2Jm, C = ( N - 2)I~_,~ + 2J~-m and B = 0. Comparing different re's, one concludes that if there exists an X such that

x T x = [ ( N - 2)Ira + 2Jr~ 0 0 (N - 2)I,~-,n + 2Jn_.~

L

'

(4.1)

where m = int[n/2], then it is (M.S)-optimal. EXAMPLE 4.2 (Incomplete block designs, Example 3.2 continued). From Example 3.2 follows that when k < t, tr(Cd) is maximized by binary designs. For such designs, by (2.7), the ith diagonal of Cd equals k - l ( k - 1)rd~, and the (i,j)th off-diagonal entry is equal to --k-lAdij, where )~dij is the number of times the ith and jth treatments appear together in the same block. Since ~ti= 1 rdi and ~ i ¢ j Adij are constants, tr(Cd) 2 is minimized if all the rdi'S are equal

(4.2)

/~ or

(4.3)

and )~dij :

)~ + 1 for some )~.

Binary designs satisfying (4.2) and (4.3) are called regular graph designs by John and Mitchell (1977). We have just shown that such designs are (M.S)-optimal incomplete block designs. The (M.S)-criterion is not really an optimality criterion. Rather, it is a procedure for quickly producing designs which are optimal or efficient with respect to other more meaningful criteria. Indeed, John and Mitchell (1977) conjectured the A-, Dand E-optimality of regular graph designs. To study the efficiency of (M.S)-optimal designs, we shall consider the following minimization problem: n

Minimize

f(/zi) i=1

subject to

Z#i

= A,

i=1

n

= B,

(4.4)

i=1

Conniffe and Stone (1975) considered such a problem with f ( x ) = x -1 to determine A-optimal block designs. Cheng (1978a) solved (4.4) for general f , providing a useful tool for proving the optimality of asymmetrical designs in various settings. Cheng (1978a) showed that if f l is strictly concave and f(0) = limx~0+ f ( x ) = oc, then the solution of (4.4) is attained at a ( # 1 , . . . , #,~) with #1 ~> /z2 . . . . . /z~.

994

Ching-Shui Cheng

Denote #1 and the common value of # 2 , . . . , #,~ by # and #', respectively. Then # = and p' = { A - [ n / ( n - 1 ) ] l / 2 P } / n , where P = [ B - A 2 / n ] 1/2. n n n Therefore the minimum value of ~i=1 f(P~) subject to ~-]~=1/zi = A, ~i=1/*2 = B, #i >~ 0 is

{A+[n(n-1)]l/2p}/n,

f ( { A + [n(n - 1 ) ] l / 2 p } / n ) + (n - 1 ) f ( { A - [n/(n - 1 ) ] l / 2 p } / n ) . (4.5) The decreasing monotonicity and convexity of f , respectively, imply that (4.5) is a decreasing function of A and an increasing function of P. This gives support to the n (M.S)-criterion in that to minimize ~ i = l f(#~), it is desirable to have A as large as possible, and P as small as possible. However, even though an (M.S)-optimal design has minimum P among those with the same maximum A, it may not minimize P among all the designs. Comparing (4.5) for different A and P values leads to the following result: THEOREM 4.1 (Cheng, 1978a). Suppose there exists a design d* such that its information matrix Md* has two distinct eigenvalues, both positive and the larger one having multiplicity one. If d* maximizes tr(Md) and maximizes tr( M d ) -- [n/ (n -- 1)] 1/2. [tr(M 2) (tr(Md))2/n] 1/2 over 79, then it is qSy-optimal over 79for any convex and nonincreasing f such that f ' is strictly concave and f(0) = limx--+o+ f ( x ) = exp. -

PROOF. It is sufficient to show that if Al >/A2 and AI - [n/(n - 1)]1/2P1 >/A2 [ n / ( n - 1)]1/2/92, then f ( { A1 + [n(n - 1)]l/2p1} / n ) + (n - 1 ) f ( { Z l - [ n / ( n -

1)]1/2p1} / n )

P2, since in this case, A1 + [n(n - 1)]l/2P1 > A2 + [n(n - 1)]I/2p2 and Aa - [n/(n - 1)]1/2p1 >/ A2 - [ n / ( n 1)]1/2p2. [] COROLLARY 4.2. Suppose tr(Md) is a constant for all the designs in 79. If there exists a design d* such that its information matrix Md* has two distinct eigenvalues, both positive and the larger one having multiplicity one, and d* minimizes tr( m 2) over 79, then it is ~ f-optimal over 79for any convex f such that f~ is strictly concave and f(0) = lim~_+0+ f ( x ) = c~. Note that in Corollary 4.2, f is not required to be nonincreasing, since A is a constant. EXAMPLE 4.3 (Weighing designs, Example 4.1 continued). Temporarily restrict attention to the chemical balance weighing designs whose design matrices Xd have no

Optimal design: Exact theory

995

zero entries. In particular, this covers the application to 2 n fractional factorial designs. In this case, t r X f f X d is a constant. Suppose N -- 1 (mod4) and an X d . with X f f . X d . = ( N - 1)I + J exists. It was shown in Example 4.1 that d* minimizes tr(M~). Since X f f . X d • has two distinct eigenvalues with the larger one having multiplicity one, it follows from Corollary 4.2 that d* is #f-optimal over all the N x n (1,-1)-matrices for any convex f such that f~ is strictly concave and f(0) = limx_+0+ f ( x ) = cc (Cheng, 1980a). In particular, it is A-, D- and E-optimal. When N - 2 (mod 4), as in Example 4.1, for any (1,-1)-matrix Xd, X T X d can be partitioned as

B T

where all the entries of A and C are congruent to 2 (mod 4), and all the entries of B are multiples of 4. By a result of Fan (1954), the vector consisting of the eigenvalues of [BAT B ]

majorizes that of

[ 0A ; 1 "

~im~lf(vi) + EiL-im f(hi) for all convex f , where v l , . . . , v , ~ are the eigenvalues of A, and 51,...,5,~-m are the eigenvalues of C. Since all the entries of A and C are congruent to 2 (rood 4), the same argument as in the N - 1 (mod4) case shows that ~-']im__lf(vi) >1 ~-~im=lf(u~) n--m ~'-~n--m ¢(~,~ and ~ = l f(d~) ~ z..,i=l J~ i J, for any convex f such that f ' is strictly concave and f(0) = limx._+0+f ( x ) = cx~, where u ~ , . . . , v m are the eigenvalues of ( N - 2)Ira + 2Jm, and ~ , . . . , ~n-m are the eigenvalues of (N - 2)In-m + 2Jn-m. The problem is now reduced to the comparison of matrices of the form (4.1) with different m's. It is easy to see that the best choice of m is m = int[n/2]. Therefore if there exists an Xd* such that x T . X d • is as in (4.1), where m = int[n/2], then it is #f-optimal over all the N x n (1,-1)-matrices for any convex f such that f l is strictly concave and f(0) = limx__+o+f ( x ) -- oc (Jacroux et al., 1983). If zero entries are allowed in Xd, then tr(Md) is no longer a constant. In the N 1 ( mod 4) case, a more delicate analysis shows that a design d* such that x T . Xd* ---( g - 1 ) I + d maximizes t r ( M d ) - [n/(n-- 1)]1/2 [tr(Md2) --(tr(Md))2/n]l/2 . Therefore by Theorem 4. l, it is #f-optimal over all the N x n (0, 1, - 1)-matrices for any convex and nonincreasing f such that f ' is strictly concave and f(0) = limx._.0+ f ( x ) = oo (Cheng, 1980a). Again this includes A-, D- and E-optimality. Such an extension covering design matrices with zero entries does not hold for the N - 2 (mod 4) case; see Example 4.6 and the discussion at the end of Section 6.1

Therefore ~i~l f(#ai) >1

EXAMPLE 4.4 (Incomplete block designs, Example 4.2 continued). We define a groupdivisible design to be a binary equireplicate design in which the treatments can be

996

Ching-Shui Cheng

divided into groups of equal size such that any two treatments in the same group appear together in the same number of blocks, say )q blocks, and those in different groups also appear together in the same number of blocks, say )~z blocks. Suppose there exists a group-divisible design d* with two groups and )~2 = )q + 1. Then it is easy to see that Cd. has two distinct nonzero eigenvalues with the larger one having multiplicity one. It can be shown that d* has the maximum value of tr(Cd) -- [ ( t - 1 ) / ( t - 2)]l/2[tr(C 2) - ( t r ( C d ) ) 2 / ( t -- 1)] 1/2. Therefore by the version of Theorem 4.1 for the case where the information matrices have zero row sums, d* is el-optimal over all the designs with the same values of t, b and k, for any convex and nonincreasing f such that f ' is strictly concave and f(0) = lim~_+0+ f ( x ) = (Cheng, 1978a). Roy and Shah (1984) also used Theorem 4.1 to prove the optimality of aclass of minimal covering designs. In a covering design, each pair of treatments appear together in at least one block. A covering design with the smallest number of blocks is called a minimal covering design. Roy and Shah (1984) showed when t - 5 (mod 6), a minimal covering design in blocks of size 3 satisfies all the conditions in Theorem 4.1, and therefore is optimal. By refining the proof of Theorem 4.1, one can derive the following modification of Corollary 4.2, which is applicable to the case where the information matrix has two distinct eigenvalues, but the larger one does not have multiplicity one. THEOREM 4.3 (Cheng, 1981b; Cheng and Bailey, 1991). Suppose tr(Md) is a constant for all the designs in D. If there exists a design d* such that its information matrix Md* has two distinct eigenvalues, both being positive, and d* minimizes tr( M ~ ) and maximizes the maximum eigenvalue of M d over 79, then it is ~ y-optimal over 79 for any convex f such that f ' is strictly concave and f(0) = lim~__+o+ f ( z ) = cx~. EXAMPLE 4.5 (Incomplete block designs, Example 4.4 continued). The optimality of a group-divisible design with two groups and )~2 = )~1÷ 1 is established in Example 4.4. Can the restriction to two groups be removed? The information matrix of a groupdivisible design has two nonzero eigenvalues, but the larger one has multiplicity 1 only when the number of groups is two. Therefore Theorem 4.1 is not applicable when there are more than two groups. However, there are indications that, in general, groupdivisible designs with ~2 = )q + 1 have strong optimality properties. For example, their E-optimality was obtained by Takeuchi (1961, 1963); see Section 6.1. (In fact, this was the first result on the optimality of asymmetrical designs.) It seems difficult to show that these designs enjoy the same optimality property as those with only two groups, but the following partial result can be obtained. Let 79 be the set of all the regular graph designs with t treatments and b blocks of size k. Then both tr(Cd) and tr(Cd2) are constants for all d E 79. Suppose 79 contains a group-divisible design d* with 9 groups and )~2 = )q + 1. Let r = bk/t. Then the t eigenvalues of kCd. - [r(k - 1) +/~l]It ÷/~2Jt are kl~d*l -- [r(k - 1) + At] >~-" >~ k#d*,t-1 -- [r(k - 1) + )q] and (t - 1))~2 - r(k - 1) + 1. On the other hand

kCd* -- [r(k - 1) + )q]It + ,k2at =

Ji/ 9 .. • 0 Jt/g

,

(4.6)

Optimal design: Exact theory

997

where the diagonal blocks are matrices of l's, and the off-diagonal blocks are zero matrices. Comparing the eigenvalues of both sides of (4.6), we have k#d*l -- [r(k 1) + A1] = t/9. On the other hand, for any regular graph design d in D, the largest eigenvalue of kCa - [r(k - 1) + A1]It + A2Jt is equal to t/9, since it has (0,1)entries with constant row sum t/9. From this it follows that #al ~ #d*l. Therefore d* maximizes the largest eigenvalue of Ca. It now follows from Theorem 4.3 that d* is ~f-optimal over 79 for any convex f such that ff is strictly concave and f(0) = lim~_+o+ f ( x ) = c~. This shows the optimality of group-divisible designs with ~2 = )k l nt- 1 over the regular graph designs (Cheng, 1981b). Whether it is also optimal over non-regular graph designs is a challenging problem. Note that the result in Cheng (198 lb) was stated in terms of the maximization of the total number of spanning trees in a multigraph, which is closely related to the determination of D-optimal block designs. Group-divisible designs are not the only designs with two distinct values among the t - 1 nontrivial eigenvalues #all >~ "'" ~ lZd,t-1 of Cd. In general, any partially balanced incomplete block design with two associate classes (PBIBD(2)) has this eigenvalue structure. We refer the readers to Raghavarao (1971) for the definition of partially balanced incomplete block designs. We shall call a regular graph design which is also a PBIBD(2), i.e., a PBIBD(2) with As = A1 + 1 or A2 = A1 - 1, a strongly regular graph design. As another application of Theorem 4.3, we shall derive the optimality of certain strongly regular graph designs. Let 79 be the set of all the equireplicate binary designs with t treatments and b blocks of size k. Suppose 79 contains a strongly regular graph design d* whose concurrence matrix Nd* NaT. is singular. For any design d in 7), the largest eigenvalue of Ca = r l - - k - I N a N f is at most r. On the other hand, since N a . N T is singular, the largest eigenvalue of Ca. is equal to r. Therefore d* maximizes the largest eigenvalue of Ca over 79. Since all the designs in 79 are binary, tr(Ca) is a constant. Furthermore, being a regular graph design, d* minimizes tr(C~). Hence all the conditions in Theorem 4.3 are satisfied, and we have shown that a strongly regular graph design with a singular concurrence matrix is ~bf-optimal over the equireplicate binary designs for any convex f such that ff is strictly concave and f(0) = lim~_+o+ f ( x ) = ec (Cheng and Bailey, 1991). Examples of strongly regular graph designs with singular concurrence matrices are: (i) all the PBIBD(2) with )k2 -----)~1 q- 1 and b < t; (ii) all the resolvable PBIBD(2) with As = A1 + 1 and b < t + r - 1; (iii) all the partial geometries; (iv) all the singular group-divisible designs with /~2 = ~1 -- 1; (V) all the semiregular group-divisible designs with A2 = ~1 -}- 1. See Raghavarao (1971) for definitions and examples of these designs. It is interesting to see whether similar optimality properties can be extended to other strongly regular graph designs. Some kind of conditions are needed, but perhaps the singularity of the concurrence matrix is a bit too strong. Theorem 4.3 can be proved by modifying (4.4) to the problem of minimizing n ~ i n = l f ( # i ) subject t o ~-~in=l]~i -----A, ~i=1 #2 = B, #i >/0 and #i ~< C, where C is taken to be the maximum largest eigenvalue. This is useful when an upper bound on the largest eigenvalue of the information matrix is available. Jacroux (1985) modified 7t n T~ (4.4) to the minimization of ~ i = l f ( # i ) subject to ~i=1 #i = A, )-]i=1 #2 = B, #i/> 0 and minl 0 for even j. Therefore sequentially minimizing ( - 1 ) h tr(C h) is equivalent to sequentially minimizing the number of circuits of length h in the treatment concurrence graph of d. EXAMPLE 4.6 (Chemical balance weighing designs, Example 4.3 continued). It was shown in Example 4.3 that when N - 2 (mod4), an Xd* such that X f f . X d . is

Optimal design: Exact theory

999

of the form (4.1) with m = int[n/2] has strong optimality properties over the N × n ( 1 , - 1 ) - m a t r i c e s . The result, however, does not hold when zero entries are allowed in the design matrices. It can be shown that D-optimality over ( 1 , - 1 , 0 ) - m a t r i c e s is always attained by a matrix without zero entries (Galil and Kiefer, 1980); so Xd* is D-optimal. But one cannot draw the same conclusion for the A- and E-criteria. In fact, if there exists a ( 1 , - 1 , 0 ) - m a t r i x Xd such that x T x d = (N - 1)I, then it is better than Xd. under the E-criterion. (In fact, Xd is E-optimal, see Section 6.1.) Since ¢-oo = limp~_oo Cp, Xd* cannot be Cp-optimal for p close to - c ~ . Applying the result of Cheng (1992), we conclude that for any n and p, - o o 1 N(n,p) and N ~ 2 (mod4), then Xd. is Cp-optimal, if it exists. Apparently limp--+_oo N(n,p) = co. It is interesting to note that when N = n, Xd* and an Xd such that X ~ X d = (N - 1 ) I tie for the A-criterion: tr(X~.Xd.) -1 ---- tr[(N - 1)I] -1. It is unknown whether they are both A-optimal. When N ~ 3 ( mod 4), an Xd* such that X~.Xd. = ( N + I ) I - J , which is (M.S)optimal, can fail to be optimal even among the (1, - 1)-matrices. The result of Cheng (1992) implies that for any n - a n d p , - c ~ ~M(n,p) and N = 3 (mod 4) ~ an Xd. such that x T . X d * = ( N + 1 ) I - J is Cp-optimal. Sharp bounds onM(n, p) need to be derived for individual criteria. Kiefer and Galil (1980) derived the bound M(n, 0) = 2n - 5 for the D-criterion. Cheng et al. (1985b) gave a rough bound for the A-criterion, which has been improved by Sathe and Shenoy (1989) in the case where all the entries of the design matrices are 1 or - 1 . Results on D-optimal designs when N - 3 (mod 4) and N < 2n - 5 can be found, for instance, in Galil and Kiefer (1982), Moyssiadis and Kounias (1982), Kounias and Chadjipantelis (1983), Kounias and Farmakis (1984) and Chadjipantelis et al. (1987).

5. Using the approximate theory In this section, we shall demonstrate how the approximate theory can be used to solve a discrete optimal design problem. Specifically, we shall present a solution to the optimal spring balance weighing design problem. As noted in Section 3, in this case there is no universally optimal design. Jacroux and Notz (1983) obtained D-, A- and E-optimal designs. We shall use the approximate theory to derive Cv-optimal designs, and enlarge the set of competing designs to all the N × n matrices with entries 0 ~< xij ~< 1, not just 0 and 1. Therefore the problem is to maximize O v ( x T x ) over all the N x n matrices with 0 ~< xij ~< 1. Harwit and Sloane (1979) described an application to Hadamard transform optics in spectroscopy. Write such a matrix X as [Xl, w2,.. ,, XN] T. Then each xi can be considered as a point in the n-dimensional unit cube Pc' _= {x = (xl, x 2 , . . . , xn): 0 ~< xi ~< 1}, and X is equivalent to a selection of N points from X. The information matrix X T X can be expressed as N i=1

Ching-Shui Cheng

1000

Denote by ( x the probability measure on X which assigns probability 1 / N to each xi. Then

xTx

= N f x fl~xT~x(dx)"

(5.1)

Let .U be the set of all the discrete probability measures on X. For any ~ E ~~, called an approximate design, define its information matrix as

M ( ( ) = f x :e:eT~(dx)' extending (5.1). Instead of maximizing C p ( x T x ) 0 ~< xij int[(n+ 1)/2] is ep-optimal over the N x n matrices with 0 ~ xij ~< l, for all p such that a(k) / • "" >~ #dr = 0 be the eigenvalues of Cal. Then for any numbers x and y, the eigenvalues of C d - x l t +Y Jr are #dl --x >1 ... >1 #d,t-1--x and - x +ty, the common row sum of C d - x l t + y dt. I f - x + ty > 0 and C d - x l t + ydt is not positive definite, then #d,t-1 -- X 0. Whether they are still optimal when the restriction of equireplication is removed is not obvious and requires extra work. Using upper bounds on #a,t-1 derived by the method of Section 6.1, Cheng (1980b) showed the E-optimality of these designs when the competing designs are not necessarily equireplicate. The D-optimality of linked block designs without the restriction of equireplication was proved in Cheng (1990), re-discovered by Pohl (1992).

Acknowledgement This article was written with the support of National Science Foundation Grant No. DMS-9404477, National Security Agency Grant No. MDA904-95-1-1064 and National Science Council, R.O.C., when the author was visiting the Institute of Statistics, National Tsing Hua University. The United States Government is authorized to reproduce and distribute reprints notwithstanding any copyright notation hereon.

References Ash, A. (1981). Generalized Youden designs: Construction and tables. J. Statist. Plann. Inference 5, 1-25. Bagchi, S., A. C. Mukhopadhyayand B. K. Sinha (1990). A search for optimal nested row-column designs. Sankhy~t Ser. B 52, 93-104. Bechhofer, R. E. and A. C. Tamhane (1981). Incomplete block designs for comparing treatments with a control: General Theory. Technometrics 23, 45-57. Chadjipantelis, Th., S. Kounias and C. Moyssiadis (1987). The maximum determinant of 21 x 21 (+ 1, -1)matrices and D-optimal designs. J. Statist. Plann. Inference 16, 167-178. Chang, J. Y. and W. I. Notz (1994). Some optimal nested row-column designs. Statist. Sinica 4, 249-263. Cheng, C. S. (1978a). Optimality of certain asymmetricalexperimental designs. Ann. Statist. 6, 1239-1261. Cheng, C. S. (1978b). Optimal designs for the elimination of multi-way heterogeneity. Ann. Statist. 6, 1262-1272. Cheng, C. S. (1980a). Optimality of some weighing and 2n fractional factorial designs. Ann. Statist. 8, 436-446. Cheng, C. S. (1980b). On the E-optimality of some block designs. J. Roy. Statist. Soc. Ser. B 42, 199-204. Cheng, C. S. (1981a). Optimality and construction of pseudo Youden designs. Ann. Statist. 9, 201-205. Cheng. C. S. (1981b). Maximizing the total number of spanning trees in a graph: Two related problems in graph theory and optimum design theory. J. Combin. Theory Ser. B 31, 240-248. Cheng, C. S. (1987). An applicationof the Kiefer-Wolfowitzequivalencetheorem to a problem in Hadamard transform optics. Ann. Statist. 15, 1593-1603. Cheng, C. S. (1990). D-optimality of linked block designs and some related results. In: Proc. R. C. Bose Symp. on Probability, Statistics and Design of Experiments. Wiley Eastern, New Delhi, 227-234.

Optimal design: Exact theory

1005

Cheng, C. S. (1992). On the optimality of (M.S)-optimal designs in large systems. Sankhya 54, 117-125 (special volume dedicated to the memory of R. C. Bose). Cheng, C. S. and R. A. Bailey (1991). Optimality of some two-associate-class partially balanced incompleteblock designs. Ann. Statist. 19, 1667-1671. Cheng, C. S., D. Majumdar, J. Stufken and T. E. Ture (1988). Optimal step-type designs for comparing test treatments with a control. J. Amer. Statist. Assoc. 83, 477-482. Cheng, C. S., J. C. Masaro and C. S. Wong (1985a). Do nearly balanced multigraphs have more spanning trees? J. Graph Theory 8, 342-345. Cheng, C. S., J. C. Masaro and C. S. Wong (1985h). Optimal weighing designs. SlAM J. Alg. Disc. Meth. 6, 259-267. Cheng, C. S. and C. E Wu (1980). Balanced repeated measurements designs. Ann. Statist. 8, 1272-1283. Conniffe, D. and J. Stone (1975). Some incomplete block designs of maximum efficiency. Biometrika 61, 685-686. Constantine, G. M. (1981). Some E-optimal block designs. Ann. Statist. 9, 886-892. Constantine, G. M. (1986). On the optimality of block designs. Ann. Inst. Statist. Math. 38, 161-174. Eccleston, J. A. and A. S. Hedayat (1974). On the theory of connected designs: Characterizatior~ and optimality. Ann. Statist. 2, 1238-1255. Eccleston, J. A. and J. Kiefer (1981). Relationships of optimality for individual factors of a design. J. Statist. Plann. Inference 5, 213-219. Eccleston, J. A. and K. G. Russell (1975). Connectedness and orthogonality in multi-factor designs. Biometrika 62, 341-345. Eccleston, J. A. and K. G. Russell (1977). Adjusted orthogonality in nonorthogonal designs. Biometrika 64, 339-345. Ehrenfeld, S. (1956). Complete class theorems in experimental designs. In: J. Neyman, ed., Proc. 3rdBerkeIcy Symp. on Mathematical Statistics and Probability, Vol. 1. University of California Press, Berkeley, CA, 57-67. Fan, K. (1954). Inequalities for eigenvalues of Hermitian matrices. Nat. Bur. Standards Appl. Math. Ser. 39, 131-139. Galil, Z. and J. Kiefer (1980). D-optimum weighing designs. Ann. Statist. 8, 1293-1306. Galil, Z. and J. Kiefer (1982). Construction methods for D-optimum weighing designs when n ----- 3 (rood 4). Ann. Statist. 10, 502-510. Harwit, M. and N. J. A. Sloane (1979). Hadamard Transform Optics. Academic Press, New York. Hedayat, A. S., M. Jacroux and D. Majumdar (1988). Optimal designs for comparing test treatments with controls (with discussions). Statist. Sci. 3, 462-491. Hedayat, A. S. and D. Majumdar (1985). Families of A-optimal block designs for comparing test treatments with a control. Ann. Statist. 13, 757-767. Hedayat, A. S. and W. Zhao (1990). Optimal two-period repeated measurements designs. Ann. Statist. 18, 1805-1816. Jacroux, M. (1980). On the E-optimality of regular graph designs. J. Roy. Statist. Soc. Ser. B 42, 205-209. Jacroux, M. (1983). Some minimum variance block designs for estimating treatment differences. J. Roy. Statist. Soc. Ser. B 45, 70--76. Jacroux, M. (1984). On the D-optimality of group divisible designs. J. Statist. Plann. Inference 9, 119-129. Jacroux, M. (1985). Some sufficient conditions for type-I optimality of block designs. J. Statist. Plann. Inference 11, 385-394. Jacroux, M. and W. I. Notz (1983). On the optimality of spring balance weighing designs. Ann. Statist. 11, 970-978. Jacroux, M., C. S. Wong and J. C. Masaro (1983). On the optimality of chemical balance weighing designs. J. Statist. Plann. Inference 8, 231-240. John, J. A. and T. J. Mitchell (1977). Optimal incomplete block designs. J. Roy. Statist. Soc. Ser. B 39, 39--43. Jones, B. and J. A. Eccleston (1980). Exchange and interchange procedures to search for optimal designs. J. Roy. Statist. Soc. Ser. B 42, 238-243. Kiefer, J. (1958). On the nonrandomized optimality and randomized nonoptimality of symmetrical designs. Ann. Math. Statist. 29, 675-699.

1006

Ching-Shui Cheng

Kiefer, J. (1975a). Construction and optimality of generalized Youden designs. In: J. N. Srivastava, ed., A Survey of Statistical Design and Linear Models. North-Holland, Amsterdam, 333-353. Kiefer, J. (1975b). Balanced block designs and generalized Youden designs, I. Construction (patchwork). Ann. Statist. 3, 109-118. Kounias, S. and Th. Chadjipantelis (1983). Some D-optimal weighing designs for n = 3 (mod4). J. Statist. Plann. Inference 8, 117-127. Kounias, S. and N. Farmakis (1984). A construction of D-optimal weighing designs when n = 3 ( mod 4). J. Statist. Plann. Inference 10, 177-187. Kunert, J. (1983). Optimal design and refinement of the linear model with application to repeated measurements designs. Ann. Statist. 11, 247-257. Kunert, J. (1984). Optimality of balanced uniform repeated measurements designs. Ann. Statist. 12, 10061017. Kunert, J. (1985). Optimal repeated measurements designs for correlated observations and analysis by weighted least squares. Biometrika 72, 375-389. Lucas, H. L. (1957). Extra-period_Latin-square change-over designs. J. Dairy Sci. 40, 225-239. Magda, G. C. (1980). Circular balanced repeated measurements designs: Comm. Statist. Theory Methods 9, 1901-1918. Majumdar, D. and W. I. Notz (1983). Optimal incomplete block designs for comparing treatments with a control. Ann. Statist. 11, 258-266. Moyssiadis, C. and S. Kounias (1982). The exact D-optimal first order saturated design with 17 observations. J. Statist. Plann. Inference 7, 13-27. Paterson, L. J. (1983). Circuits and efficiency in incomplete block designs. Biometrika 70, 215-225. Pohl, G. M. (1992). D-optimality of the dual of BIB designs. Statist. Probab. Lett. 14, 201-204. Pukelsheim, F. (1993). Optimal Design of Experiments. Wiley, New York. Raghavarao, D. (1971). Constructions and Combinatorial Problems in Design of Experiments. Wiley, New York. Roy, B. K. and K. R. Shah (1984). On the optimality of a class of minimal covering designs. J. Statist. Plann. Inference 10, 189-194. Ruiz, F. and E. Seiden (1974). On the construction of some families of generalized Youden designs. Ann. Statist. 2, 503-519. Sathe, Y. S. and R. G. Shenoy (1989). A-optimal weighing designs when N = 3 (rood4). Ann. Statist. 17, 1906-1915. Seiden, E. and C. J. Wu (1978). A geometric construction of generalized Youden designs for v a power of a prime. Ann. Statist. 6, 452--460. Shah, K. R. (1960). Optimality criteria for incomplete block designs. Ann. Math. Statist. 31, 791-794. Shah, K. R., D. Raghavarao and C. G. Khatri (1976). Optimality of two and three factor designs. Ann. Statist. 4, 419-422. Shah, K. R. and B. K. Sinha (1989). Theory of Optimal Designs. Springer, Berlin. Stufken, J. (1987). A-optimal block designs for comparing test treatments with a control. Ann. Statist. 15, 1629-1638. Takeuchi, K. (1961). On the optimality of certain type of PBIB designs. Rep. Statist. Appl. Res. Un. Japan Sci. Eng. 8, 140-145. Takeuchi, K. (1963). A remark added to "On the optimality of certain type of PBIB designs". Rep. Statist. Appl. Res. Un. Japan Sci. Eng. 10, 47.

S. Ghosh and C. R. Rao, eds., Handbook of Statistics, Vol. 13 © 1996 Elsevier Science B.V. All rights reserved.

A'/

Optimal and Efficient Treatment-Control Designs

Dibyen Majumdar

1. Introduction Comparing treatments with one or more controls is an integral part of many areas of scientific experimentation. In pharmaceutical studies, for example, new drugs are the treatments, while a placebo and/or a standard treatment is the control. Most attention will be given to the use of a single control; we will consider designs for comparing treatments with a control for various experimental settings, models and inference methods. This article is expected to update, supplement and expand upon the earlier survey of the area by Hedayat, Jacroux and Majumdar (1988). The section titles are: 2. Early development. 3. Efficient block designs for estimation. 4. Efficient designs for confidence intervals. 5. Efficient row-column designs for estimation. 6. Bayes optimal designs. 7. On efficiency bounds of designs. 8. Optimal and efficient designs in various settings.

2. Early development Consider an experiment to compare v treatments (which will be called test treatments), with a control using n homogeneous experimental units. Let the control be denoted by the symbol 0 and the test treatments by the symbols 1 , . . . , v. We assume the model to be additive and homoscedastic, i.e., if treatment i is applied to unit j then the observation yij can be expressed as: y~j = # + T i + e ~ j ,

(2.1)

where # is a general mean, "ri is the effect of treatment i and e~j's are random errors that are assumed to be independently normally distributed with expectation 0 and variance cr2. A design d is characterized by the number of experimental units that are assigned to each treatment. For i = 0, 1 , . . . , v, we will denote by rdi the number of experimental units assigned to treatment i, or replication of treatment/. 1007

1008

D. Majumdar

Given n what is the best allocation of the experimental units to the control and the test treatments, i.e., what are the optimal replications? Unless the experimenter has some knowledge about the performance of the control that can be used at the designing stage, it is intuitively clear that the control should be used more often than the test treatments, since each of the v test treatments has to be compared with the same control 0. One way to determine an optimal allocation is by considering the Best Linear Unbiased Estimators (BLUE's) of the parameters of interest, which are the treatment-control contrasts, Ti -- 7"0, i = 1 , . . . , V. For a design d, let the BLUE of 7"i - 7"0 be denoted by ~di -- ~dO. For model (2.1), it is clear that ~di -- ~dO : ffdi -- ffd0, where Ydi is the average of all observations that receive treatment i. Also, (2.2)

V a r ( ~ d , - ~d0) = ~2(r~l + rd01) •

A possible allocation (design) is that which minimizes )--Jill Var(?di -- ?ao). We will presently (Definition 2.2 later in this section) call this design A-optimal for treatmentcontrol contrasts, since it minimizes the average (hence the 'A' in A-optimal) of the variances of the v treatment-control contrasts. It is easy to see that a design that is A-optimal for treatment-control contrasts is obtained by minimizing the expression, v

E (?~/1 _1_rd1 ) i=l subject to the constraint, ~i=0 v r d i = n . The following theorem is obvious. THEOREM 2.1. If v is a square and n - 0 (mod (v + v/~)), then a design do given

by rdol . . . . .

rdov = n / ( v

q- V ~ ) ,

rdoO = V ~ r c l o l

(2.3)

is A-optimal f o r treatment-control contrasts for model (2.1).

This result was noticed by Fieler (1947), and possibly even earlier (see also Finney, 1952). Thus, if v = 4 test treatments have to be compared with a control using n = 24 experimental units then the A-optimal design assigns 4 units to each test treatment and 8 to the control. Dunnett (1955) found that the same allocation performs very well for the problem of simultaneous confidence intervals for the treatment-control contrasts 7-i - TO, i = 1,..., V. More discussion of Dunnett's work and that of other researchers in the area of multiple comparisons can be found in Section 4. Next consider the situation where the experimental units are partitioned into b blocks of k homogeneous units each. Here too, we assume the model to be additive and homoscedastic, i.e., if treatment / is applied to the unit l of block j then the observation yijt can be expressed as: yijt = # + ~-i +/3j + ~ijt,

(2.4)

Optimal and efficient treatment-control designs

1009

where/33- is the effect of block j. A design d is an allocation of the v + 1 treatments to the bk experimental units. We shall use the notation 79(v + 1, b, k) to denote the set of all connected designs, i.e., those designs in which every treatment-control contrast is estimable. For a design d, let ndij denote the number of times treatment i is used in block j. Further, for treatment symbols i and i t (i ~ it), let Adii, = gS, -"v . ndi~ndi,~ -~J ~ 1 We shall use the notation BIB(v, b,r, k, ~) to denote a Balanced Incomplete Block (BIB) design based on v treatments, in b blocks of size k each, where each treatment is replicated r times and each pair of treatments appears in )~ blocks. Cox (1958), p. 238, recommended using a BIB (v, b, r, k - t, )~) design based on the test treatments with each block augmented or reinforced by t replications of the control, where t is an integer. Das (1958) called such designs reinforced BIB designs. The idea was to have all test treatments balanced, as well as to have the test treatments balanced with respect to the control, and reinforcing BIB designs is a natural way to do it. Pearce (1960) took a different approach to achieve the same goal. He proposed a general class of designs for the problem of treatment-control comparisons, of which reinforced BIB designs is a subclass. These were the designs with supplemented balance that were proposed by Hoblyn et al. (1954) in a different context. Designs with supplemented balance have v + 1 treatments, one of which is called the supplemented treatment, the control in this case. •

J

J

"

DEFINITION 2.1. A design d E :D(v + l, b, k) is called a design with supplemented balance with 0 as the supplemented treatment if there are nonnegative integers .~d0 and .kdl, such that: )~dii' :

"~dl, for i , i r = 1,... ,v (i ~ i'),

AdOi = )~dO,

(2.5)

for i = 1 , . . . , v.

For a reinforced BIB design of Das (1958), AdO = tr. Pearce provided examples of these designs, their analysis for the linear model (2.4), and computed the standard error of the BLUE for the elementary contrasts. He showed that every elementary contrast in two test treatments have the same standard error, while every treatment-control contrast have the same standard error (see also Pearce, 1963). EXAMPLE 2.1. The following design, due to Pearce, was used in an experiment involving strawberry plants at the East Mailing Research Station in 1953 (see Pearce, 1953). Four herbicides were compared with a control, which was the absence of any herbicide, in four blocks of size seven each. Denoting the herbicides by 1,2, 3, 4 and the control by 0, the design, with columns as blocks, is the following: 0000 0000 1111 2222 3333 4444 1234

D. Majumdar

1010

According to Pearce (1983), this was the very first design with supplemented balance. Note that for this design ,Xd0 = 10, )~d~ = 6. Analysis of data for the experiment can be found in Chapter 3 of Pearce (1983) as well as in Pearce (1960). Given v, b and k, the question is, which design in 79(v+ 1, b, k) should be used? The answer would depend on the particular circumstances of the experiment, on various factors and constraints that usually confront the experimenter. (In this context, we refer the reader to page 126 of Pearce (1983) for an account of the circumstances that led to the use of the design in Example 2.1.) One of the considerations, possibly the most important one, is which design results in the best inference of the treatment-control contrasts, which are the contrasts of primary interest in the experiment. Determination of optimal designs for inference and efficiencies of designs in 79(v + 1, b, k) with respect to the optimal design in this class started much later. It may be noted that usually there are several designs with supplemented balance within 79(v + 1, b, k). Often there is a design with supplemented balance that is optimal in 79(v + 1, b, k) or at least highly efficient. On the other hand, usually the class of designs with supplemented balance also contain designs that are quite inefficient. Clearly one has to make a judicious choice of a design, even when one decides to restrict oneself to the class of designs with supplemented balance. Several optimality criteria have been considered in the literature (see Hedayat et al. (1988), and the discussions in that article). For a large portion of this article, we will focus on two optimality criteria for estimation because of the natural statistical interpretation of these criteria in experiments to compare test-treatments with a control. These criteria are given in the next definition which is quite general, i.e., it is not restricted to block designs and model (2.4). DEFINITION 2.2. Given a class of designs 79, and a model, if ~di -- ~a0 denotes the

BLUE of 7a~ - 7-a0, then a design is A-optimal for treatment-control contrasts (abbreviated as A-optimal) if it minimizes ~-'~.iv 1 Var(@di -- ~d0) in 79. A design is MV-optimal for treatment-control contrasts (abbreviated as MV-optimal) if it minimizes

M a x Var(~di -- ~dO) in 79.

l~g*(r*),

(3.15)

with equality if d is BTIB(v, b, k; t, s) where bt + s = r*. Hence a BTIB(v, b, k; $, s) with bt + s = r* is A-optimal for treatment-control contrasts in :D(v + 1, b, k ). The quantity r* may be viewed as the optimal replication of the control, since if a BTIB(v, b, k; t, s) design do is A-optimal, then rdoo = bt + s = r*. (Is the minimum of g*(r) attained at a unique point r*? The answer is yes, except in rare cases. For

1016

D. Majumdar

now, there is no loss in assuming that r* is unique. We will return to this point in Theorem 3.6.) EXAMPLE 3.5. Let v = 7, b = 7 and k = 4. Then solving the optimization problem in (3.12) we get t = 1,s = 0. Hence, r* = 7. The BTIB(7, 7, 4;1, 0) given in Example 3.2 is A-optimal. EXAMPLE 3.6. Let v = 6, b = 18 and k = 5. Here t = 1, and s = 6. Hence r* = 24. The BTIB(6, 18, 5; 1,6) given in Example 3.3 is A-optimal. The A-optimal designs given by Theorem 3.2 are BTIB(v, b, k; t, s). It can be seen that the structure of a BTIB(v, b, k;t, s) can be of two types. If s = 0, then it is called a Rectangular-type or R-type design, while if s > 0, then it is called a Steptype or S-type design. The terminology is due to Hedayat and Majumdar (1984). With columns as blocks an R-type design d may be visualized as a k × b array:

d =

Edll d2

(3.16)

'

where dl is a t x b array of controls (0), while d2 is a (k - t) x b array in the test treatments ( 1 , 2 , . . . , v) only. Clearly, d2 must be a BIB(v, b,r, k - t , )~) design. The R-type designs are thus exactly the reinforced BIB designs. An S-type design d can be visualized as the following k x b array: d=

[ d l l d~2J d21 d22 '

(3.17)

where dll is a (t + 1) x s array of controls, d12 is a t x (b - s) array of controls, d2a is a (k - t - 1) x s array of test treatments and d22 is a (k - t) x (b - s) array of test treatments. The following result of Hedayat and Majumdar (1984) gives some properties of a BTIB(v, b, k; t, s). LEMMA 3.1. (i) For the existence of a BTIB(v, b, k; t, s), the following conditions are necessary (where ro = bt + s): (b(k - t) - s ) / v = (bk - r o ) / v ( = ql, say) s ( k - t - 1 ) I v ( = q2, say)

is an integer,

(3.19)

is an integer,

[q2(k - t - 2) + (ql - q2)(k - t - 1)]/(v - 1)

(3.18)

is an integer,

(3.20)

(ii) For an R-type design it is necessary that b ~ v, while f o r an S-type design it is necessary that b >/v + 1.

Optimal and efficient treatment-controldesigns

1017

It can be shown that a BTIB(v, b, k; t, s) is equireplicate in test treatments. Condition (3.18) is necessary for this. Conditions (3.19) and (3.20) are necessary for condition (2.5). Part (ii) of the Lemma is Fisher's inequality for BTIB(v, b, k; t, s) designs. Based on Theorem 3.2 and Lemma 3.1, Hedayat and Majumdar (1984) suggested a method for obtaining optimal designs that consists of three steps: (1) Starting from v, b, k determine t, s (equivalently r* ) that minimize 9( x, z ). (2) Verify conditions of Lemma 3.1 (i), using t and s from step 1. If the conditions are not satisfied then Theorem 3.2 cannot be applied to the class D(v + 1, b, k ). If the conditions are satisfied, then go to step 3. (3) Attempt to construct a BTIB(v, b, k; t, s). (Note that even when the conditions of Lemma 3.1 are satisfied, there is no guarantee that this design exists.) Instances of design classes where this method does not PrOduce A-optimal designs is given in examples 3.9-3.11. For R-type designs, the construction problem reduces to finding BIB designs. For S-type designs the problem is clearly more involved, and unlike the case of R-type designs this case does not reduce to the construction of designs that are well studied in the literature. We return to the construction of such designs later in this section. It may be noted that Majumdar and Notz (1983) obtained optimal designs for criteria other than A-optimality also. Giovagnoli and Wynn (1985) used approximate design theory techniques to obtain results similar to Theorem 3.2. For certain values of v, b and k the minimization in (3.12) gives a nice algebraic solution. This can sometimes be exploited to obtain infinite families of A-optimal designs with elegant combinatorial properties. Here are some results. THEOREM 3.3. A BTIB(v, b, k; 1,0) is A-optimal for treatment-control contrasts in 79(v + 1, b, k) whenever ( k - 2 ) 2 + 1 ~ < v ~ < ( k - 1 ) 2.

This result is due to Hedayat and Majumdar (1985). An example of a design that is A-optimal according to Theorem 3.3 is the design in Example 3.2. It is interesting to note that when v = (k - 2) 2 + k - 1, an A-optimal design is obtained by taking the BIB design d2 in (3.16) to be a finite projective plane of order (k - 2 ) , while when v = (k - 1) 2, an A-optimal design is obtained by taking the BIB design d2 as a finite euclidean plane of order (k - 1). The next result, due to Stufken (1987), is a generalization of Theorem 3.3. THEOREM 3.4. A BTIB(v, b, k; t, 0) is A-optimal for treatment-control contrasts in D(v + 1, b, k) whenever

(k-t-

1 ) 2 + 1 ~~ f*(r*),

(3.25)

with equality if d is BTB(v, b, k; t, s) where bt + s : r*. Hence a BTB(v, b, k; t, s) with bt + s = r* is A-optimal f o r treatment-control contrasts in :D(v + 1, b, k).

EXAMPLE 3.12. Let v = 3, b : 10 and k = 4. The following BTB(3, 10, 4; 1,3) is A- and MV-optimal in 79(4, 10, 4): 0 0 1 2

0 0 1 3

0 0 2 3

0 1 2 3

0 1 2 3

0 1 2 3

0 1 2 3

0 1 2 3

0 1 2 3

0 1 2 3

Optimal and efficient treatment-control designs

1023

The following corollary, due to Jacroux and Majumdar (1989), gives an infinite family of A-optimal designs. It is clear that these designs are also MV-optimal. COROLLARY 3.2. For any integer 0 > 1, a BTB(02, b, (0 + 1)z; 0 + 1, O), f o r a b such that the design exists, is A-optimal f o r treatment-control contrasts in 79(02 + 1, b, (0 + 1)2).

4. Efficient designs for confidence intervals There are two methods for deriving inferences on the treatment-control contrasts. One is estimation, and the other is simultaneous confidence intervals. Bechhofer and Tamhane, in their discussion of Hedayat et al. (1988), describe situations where the simultaneous confidence interval approach is the appropriate one for choosing a subset of best treatments from the v test treatments. More examples are available in Hochberg and Tamhane (1987). In this section we discuss optimal and efficient designs for simultaneous confidence intervals for the treatment-control contrasts.

4.1. Designs f o r the zero-way elimination of heterogeneity model First consider the 0-way elimination of heterogeneity model, i.e., Yij = # + "ci + eij. Let, #i = # + ~-i, for i = 1 , . . . , v. We have to impose some more distributional assumptions on the random variables in order to obtain confidence intervals. For i = 0, 1 , . . . , v and j = 1 , . . . , rdi let Yij's be independent with yo "~ N ( # i , ai).

(4.1)

Dunnett (1955) was the first to give simultaneous confidence intervals for the treatment-control contrasts #i - # o , i = 1 , . . . , v. Suppose aO=~l .....

a~.

(4.2)

If Ydi denotes the mean of all observations that receive treatment i, and s 2 denotes the pooled variance estimate (mean squared error), then the simultaneous 100P% lower confidence limits for #i - #0 are: fldi -- 9dO -- 5iS~/rdi 1 + r'do1,

i = 1 , . . . , V,

(4.3)

where the 5i are determined from: P ( t i < 5i, i = 1 , . . . , v ) = P.

(4.4)

The joint distribution of the random variables ~1,.-., tv is the multivariate analog of Student's t defined by Dunnett and Sobel (1954). Here the subscript d represents the design. The design problem is to allocate a total of n observations to the v test

D. Majumdar

1024

treatments and one control. Thus ~ i =v 0 rdi = n. We shall denote by D(v + 1, n) the class of all designs. The simultaneous 100P% two-sided confidence limits for #i - #o are:

Y d i - - Y d o ± g i s ~ / r ~ l+r-ldO ,

i = 1, ..., V,

where,

P(Itd < ~ , i = 1 , . . . ,v) = P. For the design rdi = n/(v + 1), for all i, and 51 . . . . . 5v = 5, Dunnett (1955) gave tables of the 5's for various values of P. For the same design and/~1 . . . . . / ~ v = he gave tables for the t~'s for various values of P. He also investigated optimal designs for lower confidence limits in the following subset of D(v + 1, n): 7~*(v+ 1,n) = {d: d G D ( v +

1,n), rd, . . . . .

rdv}.

A design was called optimal if it maximized P for a fixed value of 5Vrdi / 1 -q- rd01 . Dunnett's numerical investigations revealed that the optimal value of rdo/rdl was only slightly less than x/~, which is the value given in Theorem 2.1, as long as 5 was chosen to make the coverage probability P in (4.4) of magnitude 0.95 or larger. The tables for the simultaneous two-sided confidence limits in Dunnett (1955) were obtained using an approximation. More accurate tables for the two-sided case were given in Dunnett (1964); optimal allocations were also discussed. Bechhofer (1969) considered the design problem for Model (4.1) when the cri are known, without the restriction (4.2). The lower 100P% simultaneous confidence limits are: -

-

1 +

1,

i =

(4.5)

1,...,

where -

-

i

1,...,v)

e.

(4.6)

The quantity 5' ~/crir~-i1 + cror~o1 is called a yardstick. Let,

7di = rdi/n,

for i = 0, 1 , . . . , v ,

and

ff =

¢o;(

(riT~ 1 + aOTd-J).

For a fixed value of ff (equivalently, for a fixed yardstick), Bechhofer defined a design d to be optimal if it maximizes P in (4.6) among all designs in the subclass, 79"(v + l , n ) = {d: d C V(v + 1,n),c~Z/rdl . . . . .

cr21rdv}.

Optimal and efficient treatment-control designs

1025

We shall call this design one that maximizes the coverage probability for a fixed yardstick. Note that, if d c 79* (v + 1, n) then the treatment-control contrasts are all estimated with the same variance; thus this is a natural subclass of 79(v + 1, n) for this setup. 79* (v + 1, n) = 79* (v + 1, n), for the homoscedastic case, tr0 = ~rl . . . . . (rv. Taking the approximate design theory approach, i.e., viewing the 7ai as nonnegative reals, not restricted to being rational, such that ~iv__0 7di = 1, Bechhofer (1969) gave an explicit equation to determine an optimal design. This is given in the following theorem, where f ( . ) is the standard normal density function, Fk (x I P) is the k-variate standard normal distribution function with all correlations equal to p, and 13

¢* = (1/2)v(v - 1 ) V / - ~ F v _ 2

/3=/_..,V~a2/~r2i,0,

(0 I 1/3)

i=1

co = co(')') = ~7V//3(1 - 3')/((1 - 3' + 73)(2(1 - 7) + "7/3)). THEOREM follows is For fixed (0, 1/(1 +

4.1. Given v >~ 2, n, and cri > 0, a design do with ~/aoi = 7~ given as optimal, i.e., it maximizes the coverage probability for a fixed yardstick. 0 < ~ ~*, 7o is the unique root in v ~ ) ) , of the equation: -

-

+

2(1_7)+3,/3

l

-

1-3'

3 (] -- 2/5-+----.~/3 . 3 (1 - ~) ~_ 7/3

~0,

Also, for i = 1 , . . . , v, 7i = or2( 1 - 7o)/(/3a2) • EXAMPLE 4.1. Suppose v = 2 and ~rl = or2 = (7o. If ff = 2, then 3'0 = 0.32, 3'1 = "/2 = 0.34; hence 32% of the observations are allocated to the control, and 34% to each of the two test treatments. If ff = 5 then 70 = 0.40, 71 = 72 = 0.30; hence 40% of the observations are allocated to the control, and 30% to each of the two test treatments. As a corollary, Bechhofer showed that when cri = ~r0 for all i, in the limit as --+ ec, the optimal 7o/7i -+ v/-~, the allocation given in Theorem 2.1. The results of B echhofer (1969) were generalized by Bechhofer and Turnbull (1971) by allowing the ~' to vary with the treatment symbol i (i = 1 , . . . , v ) . Optimal design s for the simultaneous two-sided confidence interval were given in Bechhofer and Nocturne (1972). Bechhofer and Tamhane (1983a) gave optimal designs that minimized the total sample size for one-sided intervals for given P , / 3 and a yardstick. Taking a different approach, Spurrier and Nizam (1990) minimized the expected allowance for a fixed coverage probability. For the model given by (4.1) and (4.2), the allowance, or yardstick, for the simultaneous 100P% lower confidence limits in

D. Majumdar

1026 (4.3) when 51 . . . . .

5v = fi, is given by 58~rdilq

-

td 1, where 5 is determined

from equation (4.4), i.e., P(ti < 5, i = 1 , . . . ,v) = P. Spurrier and Nizam (1990) defined the Expected Average Allowance (EAA) as, EAA = (1/v)SE(s)\/r~' + r2o1, and defined a design do to be optimal in the sense of minimizing EAA for fixed

coverage probability if it minimizes 5~/rdi 1 + rdd for fixed P, over all of D(v + 1, n). In general, determination of an optimal design is a difficult problem. Spurrier and Nizam (1990) were able to show that when v = 2, for the optimal design do, ]rdol -rdo2[ ~< 1. When v > 2, based on their numerical calculations they conjectured that [rdoi -- rdoi, [ ~< 1, for i ~ iq This result is helpful in limiting the search for an optimal design. Spurrier and Nizam gave tables of optimal designs for 2 - d2

if and only if bl ~ b2,

~bZd~~ Pal2,

with at least one inequality strict. EXAMPLE 4.2. If v = 4 and k = 3 then for the following BTIB designs dl and d2, dl ~ d2.

dl=

(oooo112) (oooo1123) 1 1 2 0 2 3 3

2443344

,

d2=

1 1 2 4 2 4 4 4

.

2 3 3 4 3 4 4 4

For two designs dl and d2 the union dl U d2 will denote a design that consists of all blocks of dl and d2. It is interesting to note that if da ~- d2, then it is not necessarily true that for a BTIB design d3, dl U d3 >- d2 tAd3; see Bechhofer and Tamhane (1981, p. 51) for a eounterexample. For a given pair (v, k) it will be convenient to have a set of designs A(v, k), such that all admissible BTIB designs, except possibly some equivalent ones are obtainable from this set by the operation of union of designs. Among all sets A(v, k), a set with minimal cardinality is called a minimal complete class of generator designs for the pair (v, k). When k = 2, it is easily seen that there are only two designs in a minimal complete class: a BTIB(v, v, 2; 1,0) and a BTIB(v, v(v - 1)/2, 2; 0, 0). Similarly when v = k = 3, there are only two designs in a minimal complete class, a BTIB(3, 3, 3; 1,0) and the design with only one block {1,2, 3}. These are the only two simple cases - in general it is difficult to identify a minimal complete class of generator designs. Notz and Tamhane (1983) gave minimal complete classes for k = 3, 3 ~ v 0 and 5dO-~-V~dl > O, then for (i,i t) E { 1 , . . . ,v} × { 1 , . . . ,v}, i ~ i I,

Var(?di -- ?d0) = a2(6d0 + 5dl)/(6dO(6dO + V6dl)), CO1T(T'di -- ~dO, ~di' -- ~dO) = 6d1/(6dO -]- 6dl).

(5.2)

It was noted in Majumdar (1986) that Corollary 5.1 can be alternatively derived from Theorem 3.1 by taking (2.1) for Model A and (5.1) for Model B. Jacroux (1986) applied Theorem 3.1 with (2.4) as Model A and (5.1) as Model B to get the following result. THEOREM 5.2. Let the k × b array do be such that, with columns as blocks, do is A(MV-) optimal for treatment-control contrasts in 79(v + 1, b, k) under model (2.4), and

rdi = 0 (mod k),

i=O, 1,...,v.

(5.3)

D. Majumdar

1032

Then there is a row-column design d~ C with the same column contents as do and Waoit = r d J k , i = O, 1 , . . . , v. The design donC is A- (MV-) optimal for treatmentcontrol contrasts in 7) RC ( v + 1, b, k ). EXAMPLE 5 . 2 . F o r DRC(10,24,3):

v = 9, b =

24 and

k =

0 4 1 0 2 4 5 7 3 0 9 6 1 0 5 8 0 2 0 0 5 3 0 4 3 1 0 1 4 0 2 2 0 7 3 0

3,

the following design is A-optimal in

0 0 8 0 7 9 1 6 2 3 4 5 4 6 6 9 0 7 2 1 3 8 7 9 9 5 0 6 8 0 9 7 6 4 5 8

Note that if a block design satisfies condition (5.3), then it can be converted to a row-column design with each treatment distributed uniformly over rows (i.e., Wdoil = rdi/k, i = 0, 1 , . . . ,v) by permuting symbols within blocks (columns). This is guaranteed by the results of Hall (1935), and Agrawal's (1966) generalization of Hall's results. Hedayat and Majumdar (1988) noticed that the designs that are obtained in this fashion are model robust in the sense that they are simultaneously optimal under models (2.4) and (5.1). Several infinite families of model robust designs were given in Hedayat and Majumdar (1988). We give one example. Start from a Projective Plane, which is a BIB(s 2 + s + 1, s 2 + s + 1, s + 1, s + 1, 1) design, for some prime power s, that is constructed from a difference set. Write a difference set as the first column of the design using symbols 1 , . . . , 8 2 + 8 + 1. Obtain the remaining columns of the design by successively adding 1 , . . . , 8 2 + 8, t o the first column, with the convention, s 2 + s + 2 - 1. Now delete any one column, and in the rest of the design replace all symbols that appear in the deleted column by the symbol 0. The resulting design is A- and MV-optimal for treatment-control contrasts in 79RC(s2 + 1, s 2 + 8, s + 1). Designs generated by this method form the Euclidean Family of designs. EXAMPLE 5.3. Starting from the Fano Plane, i.e., BIB(7, 7, 3, 3, 1), we get the following member of the Euclidean Family that is optimal in D n c ( 5 , 6, 3): 103420 034201 420013 All of the above methods for obtaining optimal designs utilize orthogonality in some form or the other. Without this, in general, it is very difficult to establish theoretical results that produce optimal row-column designs. An approach similar to Kiefer (1975) for the case of a set of orthonormal contrasts, has been attempted by Ting and Notz (1987). Ture (1994) started with the inequality

if--2 ~ Var(~di _ ~dO) ~ V(~dOq- ~dl)/(~d0(~d0 + V~al)) i=1

Optimal and efficient treatment-controldesigns

1033

- : ~. 1 ~ r t ffCaH', (see (5.2)), where d is the symmetrized version of d, i.e., C d = -Ca where the sum is over all permutation matrices H that represent permutations of the test treatments only. (Note that while there may be no design d with Fisher Information matrix for the treatments Cd, the matrix is well defined for any design d.) Using theoretical and computational results Ture (1994) gave two tables, one of A-optimal designs in the range 2 ~< v ~< 10, 2 ~< k ~< 10, k ~ b ~< 30 and one of designs, in the same range, that he conjectures are optimal. For the setup of Example 5.1, suppose there are only 3, not 4, test treatments to be compared with the control. What is an optimal design? Ture (1994) provides the following answer.

EXAMPLE 5.4. For v = 3, b = k = 6 an A-optimal design in ~DnC(4, 6, 6) is: 001231 100232 213003 322100 032310 110023 Several combinatorial techniques for constructing BTCRC designs were given in Majumdar and Tamhane (1996). The following example illustrates one method. EXAMPLE 5.5. Start from a 4 x 4 Latin square in symbols 1,2, 3 and 4, with two parallel transversals. Replace the first transversal with controls, 0. Then use the Hedayat and Seiden (1974) method of sum composition to project the other transversal. This procedure is illustrated in the following sequence of designs. The second design in the sequence is a BTCRC design in Dnc(5, 4, 4), while the last design is a BTCRC design in Dnc(5, 5, 5): 1234 3412 4321 2143

>

1204 3410 0321 2043

02041 30104 >03012 20403 14230

6. B a y e s o p t i m a l d e s i g n s

How can we utilize prior information at the designing stage? In his pioneering work, Owen (1970) studied Bayes optimal block designs for comparing treatments with a control. It is his setup that will form the basis of this section. Even though the main focus of this section is block designs, row-column designs, as well as designs for the zero-way elimination of heterogeneity model will be reviewed briefly.

D. Majumdar

1034

6.1. Bayes optimal continuous block designs Suppose the observations follow the one-way elimination of heterogeneity model (2.4). Let 8k='r~-~-o,

i=l,...,v,

7j=#+'r0+flj,

8o=0,

j=l,...,b.

The 0i's are the treatment-control contrasts that we wish to estimate, while 7d is the expected performance of the control in block j. Model (2.4) can be written as: Yijl : Oi 4- "[j 4- £ijl.

Suppose there is a total of n observations. Let Y denote the n x 1 vector of observations, £ denote the n x 1 vector of errors, 8' = (81,..., 8v), " / = ('Yl,..., %). Then the model can be expressed as:

Y = XldO + 3227 + e, where X2 is a known matrix, and Xld is a known matrix that depends on the design d. In the Bayesian approach, 0, 7 and e are all assumed random. Specifically we shall assume the following model for the conditional distribution:

Y ]8,7 ,'~ N n ( X l d 8 4 - X z % E ) ,

(6.1)

for some covariance matrix E, where Nn denotes the n-variate normal distribution. The assumed prior is:

for some vectors/z0, #7 and matrices B*, B. From (6.1) and (6.2) it follows that the posterior distribution of 0 is:

8 I Y ~ Nv(p*a, Dd),

(6.3)

where

D21 = X~a(E + X 2 B X ~ ) - I X l d +

B *-1 ,

and

Dd-l#d. = X~d(E 4- X2BX~) -I (Y - X2#7) 4- B*-I#o.

(6.4)

Optimal and efficient treatment-control designs

1035

For the loss function L(O, 0) = ( 0 - O)'W('O- 0), where W is a positive definite matrix, the Bayes estimator of 0 is 0" = #~, with expected loss tr(WDd). Owen (1970) developed a method to find a design that minimizes tr(WDa) in

D(v + 1, b, k l , . . . , kb) = {d: d is a design based on v + 1 treatments in b blocks of sizes k ~ , . . . , kb}.

(6.5)

Note that in this subsection we do not assume that the blocks are of the same size. The total number of observations is n = y'~b=l kj. In order to fix the order of observations in the vector Y" in (6.1), for d E 79(v + 1, b, k l , . . . , kb), we shall write, X2 = d i a g ( l k ~ , . . . , lkb), a block-diagonal matrix. We will focus on the special case, W=[. DEFINITION 6.1. Given the matrices E, B and B*, a design that minimizes tr(Dd) will be called a Bayes A-optimal design. Owen (1970) took the approximate theory approach, i.e., the incidences ndij's are viewed as real numbers, rather than integers. Giovagnoli and Wynn (1981) called these designs continuous block designs. The problem reduces considerably for a certain class of priors. This is stated in the following result of Owen (1970). THEOREM 6.1. Suppose the error covariance matrix is of the form:

E = Xzff~X~ + d i a g ( e l , . . . , e~), where ei > O, for i = 1 , . . . , n , and E = (eij) is a symmetric matrix. Also, suppose, /~*

=

0"2((~1

--

~2)Iv + ~2J~,),

(6.6)

where ff 2 > 0 and (v -- 1) -1 < ~2/~1 < 1, Iv is the identity matrix of order v and Jv is a v × v matrix of unities. Moreover, suppose B + ffS is a nonnegative definite matrix. For designs with fixed ndOj, j = 1 , . . . , b, the optimal continuous block design has nd~j = (kj - ndOj)/v, for all i = 1 , . . . , v, and each j = 1 , . . . , b. It may be noted that Owen established Theorem 6.1 under a condition more general than the condition, B ÷ J~ is nonnegative definite. We use the latter in Theorem 6.1 in order to avoid more complicated technical conditions, and also since this condition would be satisfied by a large class of priors. The theorem says that for the optimal continuous design,

ndlj

. . . . .

ndvj

for j = l , . . . , b .

(6.7)

In view of this it remains to determine only the ndOj'S that minimize tr(Dd). Owen gave an algorithm for this, studied some special cases and gave an example.

D. Majumdar

1036

Instead of the trace, one can minimize other functions of Dd in order to obtain Bayes optimal designs. Taking this approach, Giovagnoli and Verdinelli (1983) considered a general class of criteria for optimality, with special emphasis on D-, and E-optimality. Verdinelli (1983) gave methods for computing Bayes D- and A-optimal designs. Optimal designs under the hierarchical model of Lindley and Smith (1972) were determined by Giovagnoli and Verdinelli (1985). The model is:

Y I 0,7 N Nn(AlldO + A12% Ell), '7

A2271

( 0 1 ) ,.~Nv+b ((A3102) 71

A3272

'

(E31 '

0

0

E22

O )) E32

'

'

Giovagnoli and Verdinelli (1985) extended Owen's results to this model. It may be noted that this work followed that of Smith and Verdinelli (1980) who investigated Bayes optimal designs for a hierarchical model with no block effects, i.e., the hierarchical model corresponding to (2.1). Smith and Verdinelli's model is:

Y 10 ~ N(AldO, Ei),

O lO1 ~ N(AzO1,E2),

O1 N(Oz, E3). ~

6.2. Bayes optimal exact designs The approximate, or continuous, block design theory is wide in its scope. As the theorems in this approach specify proportions of units which are assigned to each treatment, they give an overall idea of the nature of optimal designs. Also, one rule applies to (almost) all block sizes, and requires only minor computations when the block sizes are altered. These are very desirable properties. On the other hand, application of these designs is possible only after rounding off the treatment-block incidences to nearby integers , and this could result in loss of efficiency. Moreover, it follows from the restriction (6.7) that the known optimal designs cannot be obtained unless the block sizes are somewhat large. In particular, optimal designs cannot be obtained for the incomplete block setup, i.e., k v, are available in Jacroux and Majumdar (1989). An alternative approach to identifying efficient designs is to restrict to a subclass of all designs. If the subclass is sufficiently large, then this method is expected to yield designs that are close to an optimal design. The method also has the advantage that the chosen design is guaranteed to possess any property that is shared by all designs in the subclass. Examples of subclasses are BTIB designs or GDTD's in 79@ + 1, b, k). Hedayat and Majumdar (1984) gave a catalog of designs that are A-optimal in the class of BTIB designs for k = 2, 2 ~< v ~ 10 and v ~< b ~< 50. Jacroux (1987b) obtained a similar catalog of designs for the MV-optimality criterion. Stufken (1988) investigated the efficiency of the best reinforced BIB design (Das, 1958; Cox, 1958), i.e., a BTIB(v, b, k; t, 0) which has the smallest value of the A-criterion. He established analytic results, using which he evaluated eL for each k in the interval [3, 10] and all v. Here is an example. EXAMPLE 7.3. Let k = 4. For a b for which it exists, let d denote a BTIB(v, b, 4; 1,0). For 5 ~< v ~< 9, it follows from Theorem 3.3 that d is A-optimal. Stufken (1988) shows that if v = 4, eL(d) >~99.99%, and if v /> 10,

eL(d) >~ (3v -- 1)(v -- 1 + ~/-v-~)2/(4v2(v + 1)). Thus if v = 10, eL(d) ~> 99.98%, if v = 20, eL(d) >>-97.65%, and if v = 100, eL(d) >~ 88%. Stufken's study strengthens the belief that the subclass of BTIB designs in 79(v + 1, b, k) usually contains highly efficient designs, with exceptions more likely when v is large compared to k. The reason seems to be that due to its demanding combinatorial structure, sometimes it is impossible to find a BTIB design d with rd0 in the vicinity of r*. Here is an example. EXAMPLE 7.4. Let v = 10, b = 80 and k = 2. Here r* = 39 and 9*(r*) = 1.896. There is no BTIB design with 39 replications of the control, so Theorem 3.2 does not give an optimal design. Let d 0) be a BTIB(10, 10,2; 1,0) and d (2) be a BIB(10,45,9,2, 1) design in the test treatments only. Further let d (3) E 79(10,40, 2) be a BIB(10,45,9,2, 1) design in the test treatments with the blocks {1,6}, {2, 7}, {3, 8}, {4, 9} and {5, 10} deleted, d (4) C 79(10, 41,2) be the design d (3) U {1,6}, and d (5) be the design d (1) with the block {0, 10} deleted. Hedayat and Majumdar (1984) showed that an A-optimal design in the subclass of BTIB designs within 79(11,80,2) is do = 8d (1). For this design, rdoo = 80, tr(M~0 ~) = 2.5, hence eL(do) = 75.8%. Now consider two other designs in 79(11,80,2): dl = 4d(U tO d (3) and d2 = 3d0) t3 d (4) U d (5). It is easy to see that, /'dl0 40, tr(M~ 1) = 1.904, eL(dr) = 99.6%, and rd20 = 39, tr(M~ 1) 1.905, eL(dz) = 99.5%. Neither dl nor "=

Optimaland efficienttreatment-controldesigns

1043

dz are BTIB designs - dl is a GDTD, d2 is not. Both of these designs are highly efficient, and both are about 24% more efficient that the best BTIB design do in 79(11, 80, 2). Since the class of GDTD's is larger than the class of BTIB designs, the best design in the former class will perform at least as well as the best design in the latter class, and, as in Example 7.4, sometimes substantially better. Stufken and Kim (1992) gave a complete listing of designs that are A-optimal in the class of GDTD's for k = 2, 3, k ~< v ~< 6 and b ~< b0 where b0 is 50 or more. In this context it may be noted that all of the optimal designs for simultaneous confidence intervals in Section 4 were optimal within certain subclasses of designs. A general method of obtaining a lower bound ~bL to • is to use the principle of Theorem 3.1. Consider the models A and B in (3.2). For a design d if ¢J(d) denotes the value of the criterion based on model J ( J -- A, B), and if the criterion • (d) = ¢(Cd) has the property ~b(H - G) /> ¢ ( H ) , whenever H, G, H - G are nonnegative definite, then ¢B(d) ) ~A(d). Since A is a simpler model it is usually easier to obtain a lower bound to the criterion under Model A. This is an old technique in design theory. The 0-way elimination of heterogeneity model is used to compute the efficiency of designs under the l-way elimination of heterogeneity model, the 1-way for the 2-way elimination of heterogeneity model and so on. In Majumdar and Tamhane (1996) this method has been used to compute the efficiency of row-column designs. To describe the results we need some notation. For the class of designs with b blocks of size k each, i.e., 79(v + 1, b, k), let us denote the function f*(~) by the extended notation f~,k(r) and the quantity f*(r*), that was used in Theorem 3.7*, by fb*,k(r~,k). Now consider the class 79RC(v + 1, b, k) of row-column designs. If d c 79Rc(v + 1, b, k), then it follows from the discussion above that tr(Md 1) ~> Max(f~,k(r ), f~,b(r))

>~Max(f~*k(r~,k), f;,b(r~,b)),

which gives a measure of efficiency,

eL(d) =

Max(f

*k

*

*

- 1 ).

Here is an example. EXAMPLE 7.5. Let v = 4, b = k = 5. The design d E 79gc(5, 5, 5) in Example 5.5 (the last design in the sequence in that example), has efficiency eL(d) = 1.5/1.568 = 95.7%. Several researchers have computed efficiency of designs in various settings. They include, Pigeon and Raghavarao (1987) for repeated measurements designs, Angelis et al. (1993) and Gupta and Kageyama (1993) for block designs with unequal-sized blocks, Gerami and Lewis (1992) and Gerami et al. (1993) for factorial designs. These will be briefly discussed in the next section.

D. Majumdar

1044

8. Optimal and efficient designs in other settings Research on treatment-control designs has progressed in various directions to accommodate different experimental situations and models. In this section we will briefly outline some of these.

8.1. Repeated measurements designs Consider the problem of designing experiments where subjects receive some or all of the treatments in an ordered fashion over a number of successive periods. The model may include the residual effect of the treatment applied in the previous period, in addition to the direct effect of the treatment applied in the current period. Suppose there are b subjects and k periods. We can say that d(1,j) = i if design d assigns treatment i (0 ~< i ~< v) to subject j (1 ~< j ~< b) in period l (1 ~< 1 ~< k). We will denote the class of connected designs by 79nM(v + 1, b, k). The model for an observation in period l of subject j is:

Yjl = P q- Td(l,j) q- Pd(l-l,j) -}- flj q- ")'l q- ~ijl, where /3j and 3't are the subject and period effects, Tdq,j) is a direct effect of the treatment applied in period 1 and Pd(t-l,j) is a residual effect of the treatment applied in period l - 1. Since there is no residual effect in the first period, we can write Pd(o,j) = 0. See Stufken (1996) for a general treatise of repeated measurement designs. Pigeon (1984) and Pigeon and Raghavarao (1987) studied this problem. They called a design a control balanced residual effects design if the information matrix of the treatment-control direct effect contrasts is completely symmetric, and the information matrix of the treatment-control residual effect contrasts is completely symmetric. They characterized control balanced residual effects designs, gave several methods for construction of these designs, and also gave tables of such designs along with their efficiencies. A-optimal designs for the direct effect of treatments versus control contrasts were obtained in Majumdar (1988a). The main tool used in this paper was Theorem 3.1. Hedayat and Zhao (1990) gave a complete solution for A- and MV-optimal designs for the direct effect of treatments versus control contrasts for experiments with only two periods, k = 2. Let v be a perfect square and d (~) E 7~nM(v + 1,bi,2) be a design with treatment i in the first period of each subject and the in the second period, treatments distributed according to Theorem 2.1. Then a result of Hedayat and Zhao (1990) asserts that the union d (~1) t_J.. • U d (i'0 for any i i , . . . , ira, each chosen from {0, 1 , . . . , v} is A- and MV-optimal in 79RM(v + 1, ~t=lm bit, 2). EXAMPLE 8.1. Let v = 4, k = 2 and b = 18. The following design with subjects represented by columns and periods represented by rows is optimal: 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 0 0 1 2 3 4 0 0 1 2 3 4 0 0 1 2 3 4

Optimaland efficienttreatment-controldesigns

1045

Koch et al. (1989) did an extensive study of comparing two test treatments among themselves, as well as with a control, in a two-period repeated measurements design (thus v = 2, k = 2). The design they considered consisted of the treatment sequences (1,2), (2, 1), (0, 1), (1,0), (0,2), (2,0) distributed on subjects in the ratio r e : m : 1 : 1 : 1 : 1, with special emphasis on the case m = 3. They used several models, each consisting of subject effects, period effects and direct effects of treatments; in addition, some of the models had residual effects. The efficiencies of the estimators of the contrasts were computed and a simulated example was studied.

8.2. Designs for comparing two sets of treatments A natural extension of the problem of comparing several treatments with a control is the problem of comparing several treatments with more than one control. For instance, in drug studies, there may be a placebo (inactive control) and a standard drug (active control). In general, the two controls may be of different importance, e.g., the test treatment-placebo contrasts and the test treatment-standard treatment contrasts may be of different significance in an experiment. Thus these contrasts may have to be weighted differently in the design criterion. This will lead to a somewhat complicated criterion that depends on the weights chosen. If the weights are the same, the criterion simplifies to some extent and some results on optimal and efficient designs are available in the literature. Suppose there are v ÷ u treatments divided into two groups, G and H of v and u treatments each. Then the contrasts of primary interest are: "ra -~-h, where 9 E G and h c HI Thus there are, in all, vu contrasts of interest. Consider the block design setup with model (2.4), and suppose G = { 1 , . . . , v } and H = {v + 1 , . . . , v + u}. It was shown in Majumdar (1986) that the information matrix of the contrasts of interest is completely symmetric if the Fisher Information matrix for the treatments of a design d E 79(v + u, b, k) is of the form,

Cd= ( pdIv + qdJvv tdJvu ) tdJ~,~ rdI~ + SdJ~ '

(8.1)

for some Pd, qd, rd, 8d, td, where J,~v is a u × v matrix of unities. Condition (8.1) is the generalization of condition (2.5) for the case of several controls. In this setup a design is called A-optimal if it minimizes ~gcC ~heH Var(?d9 -?dh) among all designs in 79(v + u, b, k). Some results on A-optimality were given in Majumdar (1986). A generalization of Theorem 3.2 was given for the case k ~< min(v, u). Another result in this paper, obtained by applying Theorem 3.1, gave A-optimal designs for design classes with k -= 0 (mod (v + v/v--~)) and k - 0 (mod (u ÷ v / ~ ) ) . Christof (1987) generalized some of the results in Majumdar (1986) in the setup of approximate theory.

D. Majumdar

1046

8.3. Trend-resistant designs Jacroux (1993) considered the problem of comparing two sets of treatments, G and H of sizes v and u using experimental units that are ordered over time, where it is assumed that the observations are affected by a smooth trend over time. (The special case u = 1 corresponds to a single control.) If the unit at the jth time point (j = 1 , . . . , n), receives treatment i then the model is:

Yij = # + "ri + ~ l j + " " + ~pjP +eij,

(8.2)

where ~lj + " " + flpJP is the effect of a trend that is assumed to be a polynomial of degree p. A design is a sequence or run order. We will denote the set of all connected designs by 79to(V, u; n). For a design d E 79~o(v, u; n), if the least square estimator of ~-g - ~-h under model (8.2) is the same as the least square estimator under model (2.1) (i.e., the model with all fit = 0), then this design is called p-trend free. Thus in a p-trend free design, treatments and trends are orthogonal. Jacroux (1993) studied trend free designs, and obtained designs that are A-optimal and/or MV-optimal as well as p trend-free. EXAMPLE 8.2. The following run order is A-optimal and 1-trend free in 79(2, 5; 29). Here G = {1,2} and H = {3,4,5}. (12121267673455344537676212121)

8.4. Block designs with unequal block sizes and blocks with nested rows and columns Consider the class D(v + 1, b, k l , . . . , kb) of designs defined in (6.5), and the 1-way elimination of heterogeneity model given by (2.4). What are good designs for estimation in this class? Angelis and Moyssiadis (1991) and Jacroux (1992) extended Theorem 3.2 to this setup. All of these authors, as well as Gupta and Kageyama (1993), characterized designs in D(v + 1, b, k-l,..., kb) for which the information matrix of treatment-control contrasts is completely symmetric, i.e., the matrix Md in (3.1) is completely symmetric. Gupta and Kageyama (1993) gave several methods for constructing such designs, and tables of designs along with their efficiencies. Angelis and Moyssiadis (1991) gave methods to construct these designs, as well as an algorithm to find A-optimal designs. They also gave tables of optimal designs. Angelis et al. (1993) gave methods for constructing designs with Cd as above that are efficient. Jacroux (1992) gave a generalization of Theorem 3.6, as well as infinite families of A- and MV-optimal designs. Next consider experiments where the units in each of the b blocks are arranged in a p × q array, i.e., the nested row-column setup. For a design, if the unit in row l and column m of block j receives treatment i, then the model is:

y~jt,-. = # ÷ T~ + flj + Pt(y) + 7,~(j) + ~ijlm.

Optimal and efficient treatment-control designs

1047

Gupta and Kageyama (1991) considered the problem of finding good designs for treatment-control contrasts in this setup. Extending Pearce's (1960) definition of designs with supplemented balance, they sought designs for which the information matrix for the treatment-control contrasts is completely symmetric. Let n~dit, n~i h and nd~j denote respectively the number of times treatment i appears in row l, column h and block j. Let A~dii, ---- AX-'v r r l, "~dii' e . ~ / = I ?Zdilndi' = ~ - - ~ = 1 ?~dihndi c h and )~dii' : ~-~=1 •dijndi'j" Gupta and Kageyama (1991) showed that the information matrix for the treatment-control contrasts is completely symmetric if for s and so with so ~ 0 and so + (v - 1)s ~ 0, the following is true for all ( i , i ' ) e { 1 , . . . ,v} × { 1 , . . . ,v}, i # i':

and p)Cdi e + q)~ii' -- Adii" = S.

P)~dio + qA~diO -- )~diO = SO

These designs were called type S designs by Gupta and Kageyama, who also gave several methods for constructing such designs. Here is an example. EXAMPLE 8.3. For v = 3, b = 3, p = 2 and q = 3, here is a type S design with s0=6, s=3. Blockl

Block2

Block3

012

013

023

120

130

230

8.5. Block designs when treatments have a factorial structure

Gupta (1995) considered block designs when each treatment is a level combination of several factors. Suppose that there are t factors F 1 , . . . , Ft, where Fi has m i levels, 0, 1 , . . . , m i - 1 for / = 1,... ,t. For each factor the level 0 denotes a control. A treatment is written as X l X 2 , . . . ,Xn, with xi E {0, 1,... , m i - 1}. Gupta extended the concept of supplemented balance to this setup, and called such designs type S-PB designs. He also gave methods for constructing such designs. Since the description of the factorial setup involves elaborate notation, we refer the reader to Gupta's paper for details, but give an example here to illustrate his approach. EXAMPLE 8.4. Let t -----2, ml = m2 ---- 3. Thus we have two factors at three levels each. Let r be the vector of treatment effects with the treatments written in lexicographic order. Let,

Ull

= 'U,21 =

U13 =

~23 ~

v1(1) ~

~

-1 0

1 1

,

U12 = U22 =

v1(1) ~

0

-1

,

1048

D. Majumdar

and for (/,l') 7~ (3,3), let u(1,l') = T'(U,t ® U2V), where @ denotes Kronecker product. The two contrasts u(1,3) and u(2, 3) will together represent the main effect of F1, the two contrasts u(3, 1) and u(3, 2) represent the main effect of F2, while the remaining four contrasts u(l, l'), l ¢ 3, l' ¢ 3, represent the FIF2 interaction. Note that the set of contrasts that represent a main effect or interaction are not orthogonal, as is the case with the traditional definition when the factors do not have a special level such as the control. Gupta (1995) defined a type S-PB design as one in which the estimators for all contrasts of any effect (main effect of F1, main effect of F2 or the interaction F1F2) have the same variance. The idea is that the design for each effect has the properties of a design with supplemented balance. For k = 5 and b = 6 here is an example of a type S-PB design, where treatments are denoted by pairs x l x 2 , (xi = 0, 1,2; i = 1, 2) and, blocks are denoted by columns. 01 01 01 01 00 00 02 02 02 02 01 11 10 10 10 10 02 12 20 20 20 20 10 21 11 12 21 22 20 22 Motivated by a problem in drug-testing, Gerami and Lewis (1992) also considered block designs when treatments have a factorial structure, but their approach was different. Consider the case of two factors, i.e., t = 2. Suppose each factor is a different drug and the level 0 is a placebo. Each treatment, therefore, is a combination of two drugs. Gerami and Lewis considered experiments where it is unethical to administer a double placebo, i.e., the combination (0, 0). There are, therefore, v = rnlrn2 - 1 treatments. The object is to compare the different levels of each factor with the control level, at each fixed level of the other factor. The contrasts of interest are, Tie - "rio, the difference in the effects of level i' and level 0 of factor F2 when F1 is at level i, and "rii, - "roe, the difference in the effects of level i and level 0 of factor F1 when F2 is at level i', for i = 1 , . . . , m l - 1 and i' = 1 , . . . ,rn2 - 1. A designs do E D ( v , b, k) is called A-optimal if it minimizes Tn, 1 - - 1 rr~2-- 1

+ i=1

it=l

For the case rn2 = 2, Gerami and Lewis determined bounds on the efficiency of designs, described designs that have a completely symmetric information matrix for the contrasts of interest and discussed methods of constructing such designs. Gerami et al. (1993) continued the research of Gerami and Lewis (1992) by identifying a class of efficient designs. A lower bound to the efficiency of designs in that

Optimal and efficient treatment-control designs

1049

class was obtained to determine the performance of the worst design in the class. Tables of designs and their efficiencies were also given. Here is an example. EXAMPLE 8.5. For ml -- 3, m2 = 2, b = 5 and k : 8, here is a design that is at least 94% efficient. 01 01 01 01 01 01 01 01 01 01 10 10 10 10 10 20 20 20 20 20 11 I1

11 10 20

11 11 11 11 11 21 21 21 11 21 21 21 21 21 21

8.6. Block designs when errors are correlated

Consider the class of designs D ( v + 1, b, k) and the model (2.4), with one difference - instead of being homoscedastic, the errors, eifl, are possibly correlated. The problem of finding optimal designs in this setup has been studied by Cutler (1993). He assumed that errors have a stationary, first-order, autoregressive correlation structure. The estimation method is the general least square method. Cutler established a general result on optimality for treatment-control contrasts which generalizes Theorem 3.2. He also suggested two families of designs and studied their construction and optimality properties.

Acknowledgements I received an extensive set of comments from the referee on the first draft of this paper, which ranged from pointing out typographical errors to making several excellent suggestions regarding the style and contents. These led to a substantial improvement in the quality of the paper. For this, I am extremely grateful to the referee. I am also grateful to Professor Subir Ghosh for his patience and encouragement.

References Agrawal, H. (1966). Some generalizations of distinct representatives with applications to statistical designs. Ann. Math. Statist. 37, 525-528. Angelis, L., S. Kageyama and C. Moyssiadis (1993). Methods of constructing A-efficientBT|UB designs. Utilitas Math. 44, 5-15. Angelis, L. and C. Moyssiadis (1991). A-optimal incomplete block designs with unequal block sizes for comparing test treatments with a control. J. Statist. Plann. Inference 28, 353-368.

1050

D. Majumdar

Bechhofer, R. E. (1969). Optimal allocation of observations when comparing several treatments with a control. In: E R. Krishnalah, ed., Multivariate Analysis, Vol. 2. Academic Press, New York, 463-473. Bechhofer, R. E. and C. Dunnett (1988). Tables of percentage points of multivariate Student t distribution. In: Selected Tables in Mathematical Statistics, Vol. 11. Amer. Math. Soc., Providence, RI. Bechhofer, R. E. and D. J. Nocturne (1972). Optimal allocation of observations when comparing several treatments with a control, II: 2-sided comparisons. Technometrics 14, 423-436. Bechhofer, R. E. and A. C. Tamhane (1981). Incomplete block designs for comparing treatments with a control: General theory. Technometrics 23, 45-57. Bechhofer, R. E. and A. C. Tamhane (1983a). Design of experiments for comparing treatments with a control: Tables of optimal allocations of observations. Technometrics 25, 87-95. Bechhofer, R. E. and A. C. Tamhane (1983b). Incomplete block designs for comparing treatments with a control (II): Optimal designs for p = 2(1)6, k = 2 and p = 3, k = 3. Sankhya Ser. B 45, 193-224. Bechhofer, R. E. and A. C. Tamhane (1985). Selected Tables in Mathematical Statistics, Vol. 8. Amer. Math. Soc., Providence, RI. Bechhofer, R. E. and B. W. Tumbull (1971). Optimal allocation of observations when comparing several treatments with a control, III: Globally best one-sided intervals for unequal variances. In: S. S. Gupta and J. Yackel, eds., Statistical Decision Theory and Related Topics. Academic Press, New York, 41-78. Berger, J. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd edn. Springer, New York. Cheng, C. S., D. Majumdar, J. Stufken and T. E. Ture (1988). Optimal step type designs for comparing treatments with a control. J. Amer. Statist. Assoc. 83, 477-482. Christof, K. (1987). Optimale blockpl~e zum vergleich yon kontroll- und testbehandlungen. Ph.D. Dissertation, Univ. Augsburg. Constantine, G. M. (1983). On the trace efficiency for control of reinforced balanced incomplete block designs. J. Roy. Statist. Soc. Ser. B 45, 31-36. Cox, D. R. (1958). Planning of Experiments. Wiley, New York. Cutler, R. D. (1993). Efficient block designs for comparing test treatments to a control when the errors are correlated. J. Statist. Plann. Inference 36, 107-125. Das, M. N. (1958). On reinforced incomplete block designs. J. Indian Soc. Agricultural Statist. 10, 73-77. DasGupta, A. (1996). Review of optimal Bayes designs. In: Handbook of Statistics, this volume, Chapter 29. Dunnett, C. W. (1955). A multiple comparison procedure for comparing several treatments with a control. J. Amer. Statist. Assoc. 50, 1096-1121. Dunnett, C. W. (1964) New tables for multiple comparisons with a control. Biometrics 20, 482-491. Dunnett, C. W. and M. Sobel (1954). A bivariate generalization of Student's t-distribution, with tables for certain special cases. Biometrika 41, 153-169. Fieler, E. C. (1947). Some remarks on the statistical background in bioassay. Analyst 72, 37-43. Finney, D. J. (1952). Statistical Methods in Biological Assay. Haffner, New York. Freeman, G. H. (1975). Row-and-column designs with two groups of treatments having different replications. J. Roy. Statist. Soc. Ser. B 37, 114-128. Gerami, A. and S. M. Lewis (1992). Comparing dual with single treatments in block designs. Biometrika 79, 603-610. Gerami, A., S. M. Lewis, D. Majumdar and W. I. Notz (1993). Efficient block designs for comparing dual with single treatments. Tech. Report, Univ. of Southampton. Giovagnoli, A. and I. Verdinelli (1983). Bayes D- and E-optimal block designs. Biometrika 70, 695-706. Giovagnoli, A. and I. Verdinelli (1985). Optimal block designs under a hierarchical linear model. In: J. M. Bemardo, M. H. DeGroot, D. V. Lindley and A. E M. Smith, eds., Bayesian Statistics, Vol. 2. North-Holland, Amsterdam, 655-662. Giovagnoli, A. and H. P. Wynn (1981). Optimum continuous block designs. Proc. Roy. Soc. (London) Ser. A 377, 405-416. Giovagnoli, A. and H. E Wynn (1985). Schur optimal continuous block designs for treatments with a control. In: L. M. LeCam and R. A. Olshen, eds., Proc. Berkeley Conf. in Honor ofJerzy Neyman and Jack Kiefer, Vol. 2. Wadsworth, Monterey, CA, 651-666. Gupta, S. (1989). Efficient designs for comparing test treatments with a control. Biometrika 76, 783-787. Gupta, S. (1995). Multi-factor designs for test versus control comparisons. Utilitas Math. 47, 199-210. Gupta, S. and S. Kageyama (1991). Type S designs for nested rows and columns. Metrika 38, 195-202.

Optimal and efficient treatment-control designs

1051

Gupta, S. and S. Kageyama (1993). Type S designs in unequal blocks. J. Combin. Inform. System Sci. 18, 97-112. Hall, P. (1935). On representatives of subsets. J. London Math. Soc. 10, 26-30. Hayter, A. J. and A. C. Tamhane (1991). Sample size determination for step-down multiple test procedures: Orthogonal contrasts and comparisons with a control. J. Statist. Plann. Inference 27, 271-290. Hedayat, A. S., M. Jacroux and D. Majumdar (1988). Optimal designs for comparing test treatments with controls (with discussions). Statist. Sci. 3, 462-491. Hedayat, A. S. and D. Majumdar (1984). A-optimal incomplete block designs for control-test treatment comparisons. Technometrics 26, 363-370. Hedayat, A. S. and D. Majumdar (1985). Families of optimal block designs for comparing test treatments with a control. Ann. Statist. 13, 757-767. Hedayat, A. S. and D. Majumdar (1988). Model robust optimal designs for comparing test treatments with a control. J. Statist. Plann. Inference 18, 25-33. Hedayat, A. S. and E. Seiden (1974). On the theory and application of sum composition of latin squares and orthogonal latin squares. Pacific J. Math. 54, 85-112. Hedayat, A. S. and W. Zhao (1990). Optimal two-period repeated measurements designs. Ann. Statist. 18, 1805-1816. Hoblyn, T. N., S. C. Pearce and G. H. Freeman (1954). Some considerations in the design of successive experiments in fruit plantations. Biometrics 10, 503-515. Hochberg, Y. and A. C. Tamhane (1987). Multiple Comparison Procedures. Wiley, New York. Jacroux, M. (1984). On the optimality and usage of reinforced block designs for comparing test treatments with a standard treatment. J. Roy. Statist. Soc. Ser. B 46, 316-322. Jacroux, M. (1986). On the usage of refined linear models for determining N-way classification designs which are optimal for comparing test treatments with a standard treatment. Ann. Inst. Statist. Math. 38, 569-581. Jacroux, M. (1987a). On the determination and construction of MV-optimal block designs for comparing test treatments with a standard treatment. J. Statist. Plann. Inference 15, 205-225. Jacroux, M. (1987b). Some MV-optimal block designs for comparing test treatments with a standard treatment. Sankhyd Set B 49, 239-261. Jacroux, M. (1989). The A-optimality of block designs for comparing test treatments with a control. J. Amer. Statist. Assoc. 84, 310-317. Jacroux, M. (1992). On comparing test treatments with a control using block designs having unequal sized blocks. Sankhya Ser. B 54, 324-345. Jacroux, M. (1993). On the construction of trend-resistant designs for comparing a set of test treatments with a set of controls. J. Amer. Statist. Assoc. 88, 1398-1403. Jacroux, M. and D. Majumdar (1989). Optimal block designs for comparing test treatments with a control when k > v. J. Statist. Plann. Inference 23, 381-396. Kiefer, J. (1975). Construction and optimality of generalized Youden designs. In: J. Srivastava, ed., A Survey of Statistical Design and Linear Models. North-Holland, Amsterdam, 333-353. Kim, K. and J. Stufken (1995). On optimal block designs for comparing a standard treatment to test treatments. Utilitas Math. 47, 211-224. Koch, G. G., I. A. Amara, B. W. Brown, T. Colton and D. B. Gillings (1989). A two-period crossover design for the comparison ef two active treatments and placebo. Statist. Med. 8, 487-504. Krishnaiah, P. R. and J. V. Armitage (1966). Tables for multivariate t-distribution. Sankhyd Ser. B 28, 31-56. Kunert, J. (1983). Optimal design and refinement of the linear model with applications to repeated measurements designs. Ann. Statist. 11, 247-257. Lindley, D. V. and A. E M. Smith (1972). Bayes estimates for the linear model (with discussion). J. Roy. Statist. Soc. Ser. B 34, 1-42. Magda, G. M. (1980). Circular balanced repeated measurements designs. Comm. Statist. Theory Methods 9, 1901-1918. Majumdar, D. (1986). Optimal designs for comparisons between two sets of treatments. J. Statist. Plann. Inference 14, 359-372.

1052

D. Majumdar

Majumdar, D. (1988a). Optimal repeated measurements designs for comparing test treatments with a control. Comm. Statist. Theory Methods 17, 3687-3703. Majumdar, D. (1988b). Optimal block designs for comparing new treatments with a standard treatment. In: Y. Dodge, V. V. Fedorov and H. P. Wynn, eds., Optimal Design and Analysis of Experiments. NorthHolland, Amsterdam, 15-27. Majumdar, D. (1992). Optimal designs for comparing test treatments with a control utilizing prior information. Ann. Statist. 20, 216-237. Majumdar, D. (1996). On admissibility and optimality of treatment-control designs. Ann. Statist., to appear. Majumdar, D. and S. Kageyama (1990). Resistant BTIB designs. Comm. Statist. Theory Methods 19, 2145-2158. Majumdar, D. and W. I. Notz (1983). Optimal incomplete block designs for comparing treatments with a control. Ann. Statist. 11, 258-266. Majumdar, D. and A. C. Tamhane (1996). Row-column designs for comparing treatments with a control. J. Statist. Plann. Inference 49, 387-400. Miller, R. G. (i966). Simultaneous Statistical Inference. McGraw Hilll New York. Naik, U. D. (1975). Some selection rules for comparing p processes with a standard. Comm. Statist. Theory Methods 4, 519-535. Notz, W. I. (1985). Optimal designs for treatment-control comparisons in the presence of two-way heterogeneity. J. Statist. Plann. Inference 12, 61-73. Notz, W. I. and A. C. Tamhane (1983). Incomplete block (BTIB) designs for comparing treatments with a control: Minimal complete sets of generator designs for k = 3, p = 3(1)10. Comm. Statist. Theory Methods 12, 1391-1412. Owen, R. J. (1970). The optimal design of a two-factor experiment using prior information. Ann. Math. Statist. 41, 1917-1934. Pearce, S. C. (1953). Field experiments with fruit trees and other perennial plants. Commonwealth Agric. Bur., Famham Royal, Bucks, England. Tech. Comm. 23. Pearce, S. C. (1960). Supplemented balance. Biometrika 47, 263-271. Pearce, S. C. (1963). The use and classification of non-orthogonal designs (with discussion). J. Roy. Statist. Soc. Ser. B 126, 353-377. Pearce, S. C. (1983). The Agricultural Field Experiment: A Statistical Examination of Theory and Practice. Wiley, New York. Pesek, J. (1974). The efficiency of controls in balanced incomplete block designs. Biometrische Z. 16, 2i-26. Pigeon, J. G. (1984). Residual effects for comparing treatments with a control. Ph.D. Dissertation, Temple University. Pigeon, J. G. and D. Raghavarao (1987). Crossover designs for comparing treatments with a control. Biometrika 74, 321-328. Roy, J. (1958). On efficiency factor of block designs. Sankhy(~ 19, 181-188. Smith, A. F. M. and I. Verdinelli (1980). A note on Bayes designs for inference using a hierarchical linear model. Biometrika 67, 613-619. Spurrier, J. D. (1992). Optimal designs for comparing the variances of several treatments with that of a standard. Technometrics 34, 332-339. Spurrier, J. D. and D. Edwards (1986). An asymptotically optimal subclass of balanced treatment incomplete block designs for comparisons with a control. Biometrika 73, 191-199. Spurrier, J. D. and A. Nizam (1990). Sample size allocation for simultaneous inference in comparison with control experiments. J. Amer. Statist. Assoc. 85, 181-186. Stufken, J. (1986). On-optimal and highly efficient designs for comparing test treatments with a control. Ph.D. Dissertation, Univ. Illinois at Chicago, Chicago, IL. Stufken, J. (1987). A-optimal block designs for comparing test treatments with a control. Ann. Statist. 15, 1629-1638. Stufken, J. (1988). On bounds for the efficiency of block designs for comparing test treatments with a control. J. Statist. Plann. Inference 19, 361-372. Stufken, J. (1991a). On group divisible treatment designs for comparing test treatments with a standard treatment in blocks of size 3. J. Statist. Plann. Inference 28, 205-211.

Optimal and efficient treatment-control designs

1053

Stufken, J. (1991b). Bayes A-optimal and efficient block designs for comparing test treatments with a standard treatment. Comm. Statist. Theory Methods 20, 3849-3862. Stufken, J. (1996). Optimal crossover design. In: Handbook of Statistics, this volume, Chapter 3. Stufken, J. and K. Kim (1992). Optimal group divisible treatment designs for comparing a standard treatment with test treatments. Utilitas Math. 41, 211-227. Ting, C.-R and W. I. Notz (1987). Optimal row-colurml designs for treatment control comparisons. Tech. Report, Ohio State University. Ting, C.-E and W. I. Notz (1988). A-optimal complete block designs for treatment-control comparisons. In: Y. Dodge, V. V. Fedorov and H. R Wynn, eds., Optimal Design and Analysis of Experiments, NorthHolland, Amsterdam, 29-37. Ting, C.-E and Y.-Y. Yang (1994). Bayes A-optimal designs for comparing test treatments with a control. Tech. Report, National Chengchi Univ., Taiwan. Toman, B. and W. I. Notz (1991). Bayesian optimal experimental design for treatment-control comparisons in the presence of two-way heterogeneity. Z Statist. Plann. Inference 27, 51-63. Ture, T. E. (1982). On the construction and optimality of balanced treatment incomplete block designs. Ph.D. Dissertation, Univ. of California, Berkeley, CA. Ture, T. E. (1985). A-optimal balanced treatment incomplete block designs for multiple comparisons with the control. Bull. lnternat. Statist. Inst.; Proe. 45th Session, 51-1, 7.2-1-7.2-17. Ture, T. E. (1994). Optimal row-column designs for multiple comparisons with a control: A complete catalog. Technometrics 36, 292-299. Verdinelli, I. (1983). Computing Bayes D- and A-optimal block designs for a two-way model. The Statistician 32, 161-167.

S. Ghosh and C. R. Rao, eds., Handbookof Statistics, Vol.13 © 1996 Elsevier Science B.V. All rights reserved.

,~Q

Model Robust Designs

Y-J. Chang and W. L Notz

1. Introduction There is a vast body of literature on design of experiments for linear models. Most of this literature assumes that the response is described exactly by a particular linear model. Experimental design is concerned with how to take observations so as to fit this model. In practice, the assumed model is likely to be, at best, only a reasonable approximation to the true model for the response. This true model may not even be linear and is generally unknown. One may then ask whether an experimental design selected for the purpose of fitting the assumed model, will also allow one to determine if the assumed model is a reasonable approximation to the true model. If the assumed model is a poor approximation, does the design allow one to fit a model that better approximates the true model? In addition, to what extent are inferences based on the fitted model using a given design biased? These are very important practical questions. In tliis article we provide an overview of the wide variety of approaches that have been proposed in the literature to answer them. To motivate what follows we consider a simple example. An experimenter observes a response Y which is thought to depend on a single independent variable z. The experimenter intends to fit a straight line to the data using the method of least squares. The response Y can be observed over some range of values of the independent variable which are of particular interest to the experimenter. The "naive" experimenter will typically take observations spread (often uniformly) over the region of interest. This is not the classical "optimal" design which would take half of the observations at one end of the region of interest and half at the other end. If you ask our naive experimenter why they plan to take observations spread out over the regions of interest, you typically get an answer of the form "I need to observe Y at a variety of values of z in case the response isn't a perfect straight line but instead has some curvature". This suggests that the experimenter does not believe that the response is actually a straight line function of z, but may be something more complicated with some curvature. However, the experimenter also probably believes a straight line is a reasonable approximation to the true response and so plans to fit a straight line to the data unless the data indicate thata straight line model is a very poor approximation. The beliefs of our naive experimenter reflect what happens in all real problems. In investigating how a response depends on several independent variables one generally 1055

1056

Y-J. Chang and W. L Notz

Least squares line based on observations taken only at the two extreme points (indicated by tick marks on the horizontal axis) of the design region. "Best" linear approximation to the true response

True response

I

I

X

Fig. 1.1. Fitting a straight line when the true model is a quadratic.

does not know the exact form of the relationship between the response and the independent variables. At best, one believes that a relatively simple functional relation is a good approximation to the true unknown relation. One plans to fit some relatively simple model, often a linear model with normal errors, using standard methods such as least squares or maximum likelihood. If one can control the values of the independent variables at which the response is observed, at what values should observations be taken so that the simple model we fit to the data adequately approximates the true model? Since the true model is unknown, this last question must mean that the fitted simple model provides an adequate approximation to a range of possible true models, i.e., is in some sense "robust" to the exact form of the true model. This is the fundamental goal of model robust design. In order to see what sorts of problems may arise, we consider two examples. We plan to fit a straight line to data. We know that the true response is probably not a straight line, but believe that a straight line will provide a suitable approximation to the true response. Suppose the situation facing us is as in Figure 1.1. The true response is a quadratic. If we use the standard "optimal" design, which takes observations at the extreme points (indicated by the tick marks in the figure) of the design region, our least squares line (assuming negligible error in the observed response) will be biased, lying almost entirely above the true response. The "best" straight line approximation to the true response would lie below the fitted least squares line. Without taking observations in the interior of our design region, we would not even be aware of the bias in our fitted least squares line. Box and Draper (1959) examined the consequences of this sort of bias in some detail. Obviously a few

Model robust designs

1057

Least squares line based on uniformly spaced observations taken at the values of x indicated by tick marks on the horizontal axis Y

"Best" linear approximation to the true response

True response

I

I

I

t

I

I

I

X

Fig. 1.2. Fitting a straight line when the true model is a rapidly varying curve with many local maxima and minima.

observations in the middle of the design region would have alerted us to the form of the true model and might have yielded a fitted least squares line more like the line that best approximates the true response. In fact, Box and Draper (1959) showed that if we are fitting a low order polynomial to data when the true model is some higher order polynomial of degree, say d, a design whose moments up to degree d + 1 match those of uniform measure is, in some sense, best for protecting against bias. Thus the intuition of our naive experimenter does not appear too far fetched. Unfortunately no design which takes a finite number of observations can protect against all possible forms of bias. For any given design, there is some perverse "true model" which the design fails to protect against. As our second example, we again plan to fit a straight line to data. Suppose the true response is quite "wiggly" as in Figure 1.2. If we take observations at the values of the independent variable indicated by the tick marks on the horizontal axis, we have a design which is uniformly spaced over the design region. This is the sort of design our naive experimenter might use. However (unbeknown to us) these turn out to be the relative maxima of the "wiggly" true response. The fitted least squares line (assuming negligible errors in the observed responses) will again be biased, lying above the true response. The "best" straight line approximation to the true response also lies below the fitted least squares line. In principle then, no matter at how many uniformly spaced points in the design region we take observations, there is some perverse rapidly varying continuous function for which our design will be very poor (yielding a very biased fitted least squares

1058

Y-J. Chang and W. I. Notz

line). Thus there is no design (at a finite number of points) which will be reasonable regardless of the form of the true response. Some information about the possible form of the true response is needed in order to construct designs. For example, it is not unnatural to restrict to relatively smooth functions which vary only slowly, since in real problems this is usually the case. Thus to determine suitable designs, several pieces of information are crucial. These include (i) On what set independent variables does the response depend and over what range of values of these variables (the design region) do we want to model the response? (ii) What (approximately correct) model does one intend to fit and what estimator will be used to fit this model? (iii) What assumptions is one willing to make about the nature of the true model? (iv) What is the purpose of the experiment (model fitting, prediction, extrapolation)? Whatever the answers to these questions, a "reasonable" design ought to allow one to fit the (approximately correct) model, allow one to detect if the fitted model is a poor approximation to the true model, and if a poor approximation allow one to fit more complicated models which better approximate the true response. Box and Draper (1959) appear to be the first to address these issues in a systematic way. Ideally, one should choose a design that "best" accomplishes the above. Exactly what we mean by "best", however, is difficult to formulate and depends on (i), (ii), (iii), and (iv). Roughly "speaking, "best" usually involves either being minimax (with respect to some loss function) over a class of possible true models, minimizing a weighted average of some loss function over a class of possible true models, or minimizing the Bayes risk in a Bayesian formulation of the problem. In the literature on robust design, some authors seek good estimators (usually linear estimators are considered) and designs simultaneously. Others assume that the usual least squares estimators will be used (because models will be fit using standard software) and only seek for good designs.

2. Notation and a useful model

To make the above more precise, we introduce some notation following Kiefer (1973). We consider the following model. Suppose on each of N experimental units we measure a response which depends on the values of k independent or explanatory variables. Let Y/and xi = ( x i l , . . . , xik)~ denote the values of the response and the explanatory variables, respectively, for the ith unit. Here, ' denotes the transpose of a vector or matrix. We assume the x~ belong to some compact subset X of k-dimensional Euclidean space R k and Y1,..., YN are uncorrelated observations with common variance cra and mean (2.1)

=

j=l

Model robustdesigns

1059

The fj are real-valued functions defined on X U L (not necessarily disjoint) and the/3j are unknown real valued parameters. Let 9 1 , - . . , 98 be s known, real valued functions defined on L. The problem, to be made precise shortly, is to choose x l , . . . , XN in X in some "good" fashion for the purpose of estimating the expected response m ~ j = l fljfj(xi) for x in L. However, we use not a linear combination of the fj(xi) to model the expected response in (2.1) but, instead, a linear combination of the 9t (x i). In particular, we model the expected response as

EYi

= ~ cqgz(wi).

(2.2)

/=l

Justification for using (2.2) might be as follows. In real problems, the true response typically depends on a large number of explanatory variables in a complicated (and unknown) manner. In practice a relatively simple model, as in (2.2), depending on only some of these explanatory variables is used to describe the response and is assumed to provide a reasonable approximation to the response, at least over some range of values of x. Thus the 9 in (2.2) may depend on only a subset of the explanatory variables (coordinates of x). While this possibility is inherent in (2.2), most of the literature treats 9 as depending on all the coordinates of x. In any case, we will estimate the expected response at x by estimating }-~-t~l c~zgt(xi). At this point we should note the following identifiability problem, namely exactly what are we estimating when we estimate ~ Z ~ l c~tgt (xi)? This issue is sometimes ignored in the literature, 8 but it appears that most authors assume that the ~Z=l c~tgt(xi) in (2.2) that we are estimating is the best (in some sense) approximation to (2.1) among the set of all functions having the form given on the right hand side of (2.2). N o w let Y = (Y~,... ,YN)', F the N x m matrix whose i , j t h entry is fj(xi), f(X) = ( f l ( x ) , f 2 ( x ) , . . . ,fro(X))', f~ = (/~1,... ,tim)', G the N x s matrix whose i,/th entry is gl(xi), g(x) = (gl(x),g2(x),..., gs(x))', and et = ( c q , . . . , c~8)'. We can then write (2.1) as E Y = Ff~

(2.3)

and (2.2) as E Y = Get.

(2.4)

As is common in the linear model, we restrict attention to estimators of et which are linear in Y . Thus, we consider the estimator C ' Y of et, where C is N x s. Notice E(c'Y)

=

C'Ff~,

Cov(C'Y) =

~2C'C.

(2.5)

Finally, by an exact design, we will mean a particular choice of x l , . •., XN in X. How do we evaluate the goodness of a particular linear estimator and design? To place this in a decision theoretic type framework, let u be an N x N loss measure on L, nonnegative definite in value, and such that f and g are square integrable relative

Y-J. Changand W. I. Notz

1060

to v and fL a(x)' dv(x)a(x) is nonnegative definite for any vector valued square integrable function a ( z ) on L. We define F f f , Fog, Ffg, and Fgf by (2.6)

F~b =fLa(W)'du(x)b(x).

The (quadratic) expected loss incurred if the design x l , . . . , XN and estimator C~Y are used and/3 is the true parameter value is then

R(~; x , , . . . , xN, C) -----E fL [g(x)'C'Y - f ( x ) ' / 3 ] ' d v ( x ) [g(x)'C'Y

/(x)'/3]

= cr2tr C'CFgg + f l ' [ F ' C r g g C ' F - F ' C F g f - F f g C ' F + Ff.¢]~ --- c r Z ( V + B )

(say)

(2.7)

the last being a notation which specializes to that used in many papers on robust design when certain restrictions on C are imposed, as will be described shortly. Our goal will be to minimize this expected (quadratic) loss. Without further restrictive criteria, the unknown relative magnitudes of (r2 and/3~/3 make the problem of choosing a design X l , . . . , XN and estimator C~Y so as to make (2.7) "small" too vague or unfruitful. Several approaches are possible. One approach is to minimize B alone in (2.7). This approach is recommended by Box and Draper (1959) and by Karson et al. (1969). Another approach is to minimize the maximum of (2.7) subject to an assumption such as o"-1/3 E S, where S is a specified set in R k. Somewhat simpler, but in the same vain, is the minimization of an integral of (2.7) with respect to a specified measure ¢ on ~r-1/3. This has an obvious interpretation for Bayesians, but may appeal to others as a possible compromise. For, in order to inspect the risk functions of the admissible choices of design m l , . . . , xN and estimator C ' Y (after reduction to (2.7)), it suffices to consider the closure of such integral minimizers (more generally, without the invariance reduction to c~-1/3). In this context one could, of course, question the restriction to linear estimators. Retaining the restriction to linear estimators, Kiefer (1973) follows this latter approach. These and other approaches will be discussed in more detail below. Before proceeding, it is worthwhile recalling the distinction between the exact and approximate design theory. The exact theory follows the above development, wherein a design is an N-tuple of points X l , . . . , WN (not necessarily distinct) in X. Alternatively, a design is a discrete probability measure ( on X restricted to the family of probability measures taking on values which are only integral multiples of N -1. In this case, the ~ corresponding to x l , . . . , XN is given by N~(x) = the number of xi equal to x. In the approximate theory ~ is permitted to be any member of the family of all probability measures on X relative to a specified (r-field which contains at least all the finite subsets of X. The use of the approximate theory makes various minimizations tractable which are unwieldy in the exact theory. Unfortunately, the use of the approximate theory necessitates implementation of the optimal approximate

Model robustdesigns

1061

design in terms of exact designs which may then only be approximately optimum. These issues are discussed further in Kiefer (1959), Kiefer and Wolfowitz (1959) and Fedorov (1972). We write

M(~) = f f ( x ) f ( x ) ' ~ ( d x )

(2.8)

for the information matrix per observation under ~. Notice M ( ~ ) = F F ' , for F as in (2.3), when ~ corresponds to an exact design. A number of authors have considered the above issues of "model robust" design. A large variety of approaches have been investigated. There appears to be no agreement on the proper formulation of the model robust design, but there are a few features common to all. These include specification of a model to be fit, some assumptions concerning the nature of the true model, and some objective the design is to satisfy. The more realistic the formulation, the more difficult it is to obtain analytic solutions. In what follows we attempt to provide a survey of the literature on model robust design. In the interest of space, however, only a few of these approaches will be presented in detail. In what follows we discuss various applications of the above formulation, as well as additional formulations of the problem of model robust design. In most cases we provide simple examples, usually in the univariate setting with (2.2) a straight line model. We cover designs for fitting a model over some region of interest, the problem of extrapolation, designs allowing detection of model inadequacy, a survey of some results in a Bayes setting, and applications to computer experiments.

3. Designs for model fitting and parameter estimation

3.1. Finite dimensional sets of true models Box and Draper (1959) appear to be the first to formally investigate the problem of model robust design. In the notation of Section 2, they take X = L. In their general formulation, no assumptions concerning the true model, as represented by f ( x ) , are made, but the 9j are assumed to be monomials so that the fitted model, as represented by 9(x), is a polynomial of some specified degree. Model (2.2) then represents the polynomial of this degree that in some sense best fits the true model. Box and Draper (1959) restrict attention to the least squares estimators for (2.2). Thus in the notation of Section 2, C' = ( G ' G ) - G t. The motivation for this is that since practitioners will typically use existing software to fit models, for practical purposes the relevant estimators are the least squares estimators. Box and Draper (1959) seek designs having two properties. First, designs should make (2.7) relatively small, with u taken to be Lebesgue measure on X = L and C~Y =. ( G ~ G ) - G t Y the usual least squares estimator under model (2.2). Second, designs should allow for a relatively sensitive test for lack of fit of the fitted model. These properties are to be achieved in a sequential manner. First find all designs minimizing (2.7). Then from these designs, choose the one that makes the power of the lack of fit test as large as possible. In

1062

DZ Chang and W. L N o ~

order to carry out these objectives f ( x ) must be specified. The resulting optimal model robust designs are thus robust against true models having a particular form. In all their examples, Box and Draper (1959) take s < m and assume f ( x ) is a higher order polynomial than g(x) with 9i = fi for i = 1 , . . . , s. In this case it is convenient to express/3 as (cq/32). Thus model (2.4) is just (2.3) with/32 = 0. To illustrate what happens, we now sketch the results for the case k = 1 when the true model a quadratic and the fitted model is a straight line. For simplicity, assume variables are scaled so X = [ - 1 , 1], Box and Draper (1959) restrict to designs { which are symmetric on X (i.e., have first moment = 0) and show that in this case (2.7) becomes (using the least squares estimators and u = Lebesgue measure)

R=

2(v + B)

=O'2{[ 1 + ~m2]

N/322[ 2

1 5-- +

m2] +

(3.1)

where mi denotes the ith moment of the design ~ and/32 is the coefficient of the quadratic term in the true model. Notice that the term

is proportional to the contribution of the variance V to R and the term N/3~ [m 2 _ 2m2

[

-3- +

1

m2 ]

+ 3, 2q

(3.3)

is proportional to the contribution of the bias B to R. (3.1) is minimized when m3 = 0. Restricting to { with both first and third moments 0, the only way the design enters (3.1) is through its second moment m2. The minimizing value of m2 (and hence the optimal design) can be found using calculus. The result will depend on the quantity N/3~/cr2. Box and Draper (1959) note that

V

(3.4)

which might be interpreted as the ratio of the curvature (as represented by/32) of the true model to the sampling error (as represented by ~r/v/-N). When/32 is quite small relative to a/v/-N (and hence the true model cannot be distinguished from a straight line up to sampling error), the optimal design essentially minimizes the variance term in (3.2). This leads to the classical optimal design which puts all its mass on - 1 and ÷1. When/32 is quite large relative to ~r/v/-N (and hence a straight line is a very poor approximation to the true model), the optimal design essentially minimizes the bias term in (3.3). The design minimizing (3.3) is called the "all bias design" by Box and

Model robustdesigns

1063

Draper (1959). It has m~ = 1/3, which is easily verified after setting m3 -- 0 in (3.3). Thus any design with ml = m3 = 0 and m 2 = 1/3 is an all bias design. They show that over a very wide range of values of fl2/(~r/x/-N), any all bias design comes close to minimizing (3.1). Having determined the value of m2, say m~, which minimizes (3.1), Box and Draper (1959) show that among all designs having first and third moments 0, and second moment equal to the optimal value m~, the fourth moment 7924must be maximized in order to maximize the power of the lack of fit test. For designs on [ - 1, 1], m4 ~< m2 with equality if and only if the design is supported on { - 1,0, 1}. Thus the best design is one supported on { - 1,0, 1} with second moment equal to m~. This is achieved by a design with mass m~/2 at - 1 , mass m~/2 at 1, and mass 1 - m~ at 0. This must be regarded as an approximate theory design since for a given sample size N m~/2 need not be an integer multiple of 1IN. Box and Draper (1959) carry out similar calculations for fitting a plane when X is a spherical region in k dimensional Euclidean space and the true model is quadratic. They again note that the all bias design performs well over a wide range of departures from the fitted model. This leads to the proposal that one simply minimize the bias term since, in the examples considered, this leads to designs which are reasonably efficient (relative to the design which actually minimizes (3.1)) as long as the fitted model is a reasonable approximation to the true model. The above formulation has been further explored in Box and Draper (1963) for fitting a second order model when the true model is a cubic. In Box and Draper (1975) the authors develop a measure of the insensitivity of a design to outliers, suggesting that this measure be used along with that discussed above in evaluating the suitability of a design. A simple way to extend the results of Box and Draper (1959) might be by removing the restriction to least squares estimators. This is the approach adopted by Karson et al. (1969). In the setting described above for Box and Draper (1959), note that the matrix in the quadratic form for B in (2.7), namely [F~CT'ggCrF - F~CFgf _Pfg(7rF + I'ff], can be rewritten as

F f f - F f g F ~ F g f + ( C ' F - F ~ F g f )'Fgg(C'F - F ~ F g f ).

(3.5)

Here/'~_/'g$ is well-defined and (3.5) holds even if/'~g is singular; note that

elf r g) rgl rg.

(3.6)

is nonnegative definite. If our design is such that (7 can be chosen to satisfy

C'F = F~Fgf

(3.7)

then any such choice minimizes the matrix in (3.5) and hence B. We note by (2.5) that there is a C satisfying (3.7)if and only if l'~-oFgf/3 is estimable for the design

~z Changand W. LNo~

lO64

used. For any such design, it follows from what is essentially the Gauss-Markov Theorem, that CtC achieves its matrix minimum (in the sense of the usual ordering of nonnegative definite matrices, namely A />/3 if and only if x~Ax ~ x I B x for all vectors x) among all C satisfying (3.7) with the choice !

--

_

C = F ( F F) Fyol~o

(3.8)

which yields

c'c = r

roAF'F)-r

r£.

(3.9)

Thus the prescription of Karson et al. (1969) is to restrict to the class of designs, say D, for which P ~ / ' g f / 3 is estimable and select G (our estimator) as in (3.8). For this choice of C, any design in D minimizes B. Now choose the design in D which minimizes V in (2.7), i.e., minimizes trC~Cl-'oo. While this is generally difficult if one restricts to exact designs, the minimization is not too difficult if one uses the approximate theory mentioned in Section 2. For example, Fedorov (1972) characterizes the solution to this sort of problem in the approximate theory and obtains iterative methods for solving the problem. We note that the two stage approach of Karson et al. (1969) need not yield a global minimum of R in (2.7). By restricting the class of designs used in the minimization of V at the second stage, they may eliminate designs that produce a much smaller value of V, nearly minimize B, and hence produce a smaller overall value of R. However, the designs (and estimators) of Karson et al. (1969) must yield a value of R no larger than the designs (using the least squares estimators) of Box and Draper (1959) simply because Karson et al. (1969) allow the additional flexibility of arbitrary linear estimators. These issues are further discussed in Kiefer (1973). As mentioned in Section 2, the global minimization of R is quite difficult. Kiefer (1973) suggests minimizing an "averaged" or integrated version of R and obtains some results in this direction. A somewhat different approach to model robust design for polynomial models is discussed in Atwood (1971). In the notation of Section 2 let X -- [-1, 1], let ~z~=l oqgt(x ) in (2.2) be a polynomial of degree s - 1, and let ~ j = l / 3 j f j ( x ) in (2.1) be a polynomial of degree m - 1, with m > s. Atwood (1971) restricts attention to estimators of the response at x which are a convex combination of the best linear unbiased estimator of the response at x under model (2.1) and the best linear unbiased estimator of the response at x under model (2.2). Likewise designs are restricted to convex combinations of the known, classical D-optimal designs for models (2.1) and (2.2). Let e = m a x x e x [ ~t~=l o~zgt(x)--~-~?=1flJfJ(X)l" For a given value of Ne2/o "2, Atwood (1971) seeks the estimator and design (of the forms described above) which minimize the maximum over X of the mean squared error for the estimate of the response at x. Numerical results are tabulated for the cases m = s + 1, 1 ~< s ~< 9, and m = s + 2, 1 ~< s ~ 8. These results can be used to select "good" model robust designs in the sense just described. Note that when the true model is assumed to be a quadratic and we plan to fit a straight line, the designs considered by Atwood (1971) m

Model robust designs

1065

are convex combinations of the D-optimal design for fitting a straight line, which puts equal mass on - 1 and 1, and the D-optimal design for fitting a quadratic, which puts equal mass at - 1, 0, and 1. Such designs therefore put equal mass on - 1 and 1, and place the remaining mass at 0. The "optimal" mass will, of course, depend on Nc2/~r 2. While the support of the designs of Atwood is the same as the corresponding optimal design in Box and Draper (1959), the optimal number of observations to take at each support point is different.

3.2. Infinite dimensional sets of true models In the papers discussed above, the "true" model (2.1) is assumed to lie in a known finite dimensional space of functions (the space spanned by a particular specification of the fi, usually polynomials) so that the risk in (2.7) can be evaluated. One criticism of this approach (see Huber, 1975) is that this fails to safeguard against all potentially dangerous small deviations from the model rather than just a few arbitrarily selected polynomial ones. In fact one might wonder why, if the form of the true model is essentially known, we are fitting (2.2) at all. Why not simply fit the true model and use the corresponding optimal design? Beginning with Huber (1975), a number of authors have therefore considered the case in which the true model lies in an infinite dimensional space of functions. In the notation of Section 2, all these authors, except Li (1984), take X = L and model (2.1) is written as

EVi = ~ azgt(wi) + f(xi).

(3.10)

/=1

f is the difference between the true unknown model and the known model we intend to fit, namely ~ z ~ l aLgz(xi), f is assumed to belong to some class of functions F. Our assumptions about the form of the true model are expressed by our specification of F. Huber (1975), Marcus and Sacks (1978), and Li (1984) consider the univariate case where X is a subset of the real line and with the fitted model being a straight line, i,e.,

~'~ a,gz(xi) = at +

alxi.

(3.11)

I=l

To use the notation of Section 2, in all these papers/3 = (a0, al, 1)', g(x) = (l, x)', f(x) = (1, x, f(x))', and C ' Y = (at, ~'i)' where ~'o and 61 are linear estimators of a0 and a l , respectively. These authors differ in their choice of X, F, and the loss function to be minimized. f + l / 2 ¢2[ "~ Huber (i975) takes X = [ - 1 / 2 , 1/2], F = {f(x);a_~/2 a ~xjdx < e}, and restricts attention to the case where g0 and ~1 are the least squares estimators of a0

Y-J. Changand W. L Notz

1066

and oq, respectively, in the fitted model. The loss function is the supremum of (2.7) over F with u being Lebesgue measure on X. In this case the loss can be written as sup E 1o+ 1 r/ 2 ( ~

+ ~lX - f(x))2 dz.

IE.F J-l~2

(3.12)

The object is to find the design minimizing (3.12). Since the set F includes f which have an arbitrarily high, narrow spike above any point in X, only designs which are absolutely continuous on X have a finite loss. The design minimizing (3.12) is a continuous design and is given in Section 4 of Huber (1975). It is not clear how it should be implemented in practice since any discrete implementation would make (3.12) infinite. Huber's approach might lead to implementable designs if F were a smoother class of functions. Marcus and Sacks (1978) take X = [-1, 1], F = {f(x); If(x)l < ¢(x)} where ¢(x) is a positive bounded even function on [ - 1 , 1], ¢(0) = 0, and 4(1) = 1, and ~0 and ~q are allowed to be arbitrary linear estimators. They use the weighted mean square error loss supE((ao-~o)2+02(~l fEE

-~1)2).

(3.13)

This is not of the form of (2.7), although (2.7) reduces to this for 0 = 1 if one uses f ( z ) = (1, z, 0)' and takes u to be Lebesgue measure on X. The goal is to find both the design and linear estimators ~o and ~1 minimizing (3.13). If qS(z) ~> rrzx for some rr~ then the (unique) optimal design is supported on { - 1 , 0 , 1}. For convex qS(z) there is a wide range of cases for which the optimal design is supported on { - z , z}. In general z depends on 4), 0, ~r2, and N in a complicated way (see Marcus and Sacks, 1978, Theorem 3.2). In the case where qS(ac) = ma: 2 and ~r2/Nrrz2 ~< 04 it turns out that z = (cr2/Nm2) 1/4 (assuming this yields a value ~< 1). Li (1984) uses the loss given in (3.13) but takes X = { k / 2 M , - k / 2 M ; k = 1 , 2 , . . . , M } for a fixed M; L = [ - 1 / 2 , 1/2]. Designs are restricted to have support in X. This choice may be motivated by the fact that in practice the predictor variables can usually only be set to rational values and by the "naive" approach of spreading observations uniformly over some interval as mentioned in Section 1. Here the interval is [ - 1 / 2 , 1/2] and designs with support on a finite set of points spread evenly over this interval are investigated. The question is whether the proportion of observations taken at each point in X is the same, as the naive approach would suggest. Li (1984) chooses F = {f(x); If(x)l ~ e, f z f ( x ) d u ( x ) = 0, and f z x f ( x ) d u ( x ) = 0) where u is Lebesgue measure on X. The conditions f~ I(x) du(x) = 0 and f , xf(x) du(x) = 0 are chosen to make c~0 and al identifiable in the true model. The choice of F is motivated by the fact that the choice in Marcus and Sacks (1978) gives special status to the point 0 so that there is no contamination (f(x) = 0) at this point. Thus there is value in taking observations at 0 (or near 0 in the case of convex qS). Li (1984) restricts ~0 and ~1 to be the least squares estimators of a0 and cq for the fitted model and seeks the design minimizing (3.13). The optimal designs are given in Theorems

Model robust designs

1067

4.1, 4.2, and 4.3 of Li (1984). They do not take the same proportion of observations at each point in X, but rather take observations at the endpoints of X (namely - 1/2 and 1/2) as well as at certain interior points of X. Asymptotically (see Theorems 6.1 and 6.2 of Li (1984)) these designs put equal mass at - 1 / 2 and 1/2, and spread the remaining mass uniformly over an interval of the form [ - 1 / 2 , - x ] U [x, 1/2], where 0 ~< x < 1/2 and x depends on 0-2, N, e, and 0. Li and Notz (1982) consider robust regression in a multiple regression setting, fitting a plane to data. X is a compact subset of k-dimensional Euclidean space R k and (3.11) becomes

•

~tgt(xi) = C~o+ cqzil + . . . + c~kzik.

(3.14)

/=1

Li and Notz (1982) take F as in Li (1984), namely F = {f(x); If(x)] ~ 6, f~ f(x) du(x) = 0, and f~ x f ( x ) du(x) = 0} where u is Lebesgue measure on X. Li and Notz (1982) restrict designs to have finite support and seek the design and linear estimators ~'i of the c~i minimizing

supg fEF

(~0 - OL0)2 + ~ 0 2 ( ~ i i=1

--

O~i)2

•

(3.15)

Optimal designs are shown to have support on the extreme points of X. When X is the simplex or cube in R k, Li and Notz (1982) show that the classical optimal designs (those putting uniform mass on the corners of the simplex or cube) and the usual least squares estimators are optimal for the loss in (3.14). For the univariate case, the reason is essentially as that given in the example corresponding to Figure 1.2 in Section 1. It is not clear whether the results in Li and Notz (1982) are of much practical value. In many practical settings the true model is likely to have some smoothness so that the F considered by Li and Notz (1982) is too broad. However, these results do indicate the need for some restrictions on the class F. They indicate that no design with finite support can protect against arbitrary bias, and hence improve upon the classical optimal design for the fitted model. We note in passing that Li and Notz (1982) also give results for interpolation, extrapolation, and bilinear models. Pesotchinsky (1982) also considers the setting of multiple regression. The fitted model is (3.14). F is similar to that in Marcus and Sacks (1978), namely F = { f ( x ) ; If(x)[ < ¢qS(x)}, where ¢ > 0 and ~b(x) is a convex function of IIxll 2 Pesotchinsky (1982) assumes the usual least squares estimators for parameters in the fitted model are used. For design ( on a compact subset X of k-dimensional Euclidean space, he defines the matrix O-2

D(~, f ) = - - M -1(~) + M -1 (,~)~(,~)k~t(~)M -1 (~)

(3.16)

where M ( ~ ) is the information matrix (see (2.8)) for the fitted model (3.14) and ff"(~) = N(E~[f(x)], E~[f(x)xl],..., E~[f(x)]xk). E~ denotes the expectation

Y-J. Chang and W. L Notz

1068

over X with respect to ¢. D(~, f ) serves as the analog of the covariance matrix (o'2/n)M-l(~) when the "contamination" f in (3.10) is taken into account. Pesotchinsky (1982) seeks designs which minimize sup/eF g~(D(~, f ) ) for some realvalued function ~b. Choices for # might be the determinant (D-optimality), trace (A-optimality), or maximum eigenvalue (E-optimality). D-, A-, and E-optimal are found. These designs put uniform mass on a sphere of a particular radius (which depends on qS, the choice of optimality criterion ~, and characteristics of X) assuming this sphere is contained in X. These designs might be regarded as generalizations of the optimal designs of Marcus and Sacks (1978) in the univariate case which put mass on the two point set { - z , z} for appropriate choice of z. One difficulty of the optimal designs of Pesotchinsky (1982) is that they are continuous designs and so cannot be directly implemented. However, it is shown that star-point designs or regular replicas of 2 k factorial designs are very efficient under the appropriate choice of levels of the factors. Sacks and Ylvisaker (1984) take yet a somewhat different approach to model robust design than those discussed above. They consider the univariate case and two forms for F, namely F = FI (M) = {f; If( x ) - f(Y)l ~ M i x - Yl, for every x,y ~ R 1}

(3.17)

F = F2(M) = {f; f is differentiable and d f / d x E F l ( M ) } .

(3.18)

and

Note that b"l (M) contains all constant functions and F2(M) contains all linear functions. Rather than seeking to globally fit some simple approximation ~ = 1 atgt(xi) to the true model, it is assumed that one wishes to only estimate some linear functional A of the true regression function f E F. For the case where X = [-1, 1], examples of linear functionals that might be of interest are A f = f(0) (corresponding to the intercept if f were a straight line), A f = If(l) - f ( - 1 ) ] / 2 (corresponding to the slope if f were a straight line), A f = df(O)/dx (again corresponding to the slope if f were a straight line), or A f = f 7(x)f(x) dx for some known function 7(x). They restrict to linear estimators of A f which, for a design ~ supported at the k points {xl, x 2 , . . . , xk} E X, have the form ~i=1 c~IT"(xi) where Y(z) is the average of all the observations taken at x. They seek designs ~ (specification of the xi and the number of observations ni to be taken at each xi subject to the total number of observations being N) and linear estimators (choice of the ci) which minimize

(k

sup Ey

~cifZ(xi) - Af

fEF

i=l

)2

k

2

=or 2 ~ c 7 i=1 ni

+sup(C ffcF

A f) 2

(3.19)

Model robust designs

1069

where C is the linear functional defined by

Cf = Ef

ci~"(xi)

=

cif(xi) = i=l

f dC

(3.20)

xl,...,xk}

and C is identified with the measure it induces. It is straightforward to verify that the right hand side of (3.19) is minimized when ni/nk = [cil/[ckl for i = 1 , . . . , k. For this choice (3.19) becomes

--

N

Ic~l

i=1

+ sup ( c f - Af) 2.

(3.21)

feF

Sacks and Ylvisaker (1984) obtain results for a variety of types of linear functionals for both FI(M) and F2(M). Optimal solutions depend on A and which Fi(M) one considers. For example, in F1 (M) and with A f = ~ = 1 (f(zj/Q)) for given values of Q and { z l , . . . , zQ} E X, the optimal design takes N / Q observations at each zj and estimates A f by ( l / Q ) ~jQ--1 Y(zj). The interested reader should consult Sacks and Ylvisaker (1984) for additional results and examples. Not surprisingly, optimal designs depend very much on the specification of F and the loss function. Application of the above results therefore requires one to think carefully about the assumptions one is willing to make about the form of the true model and the purpose of the experiment (as expressed by the loss function). We see, for example, that if the fitted model is only a rough approximation to the true model, the standard optimal design for the fitted model may yield very misleading information. At the same time, the naive design that takes a moderate to large number of observations spread uniformly over the design region is sub-optimal also. The practical value of these results (and the results that will appear in later sections) therefore is probably in alerting us to the dangers of ignoring the approximate nature of any assumed model and in providing some insight concerning what features a design should have in order to be robust against departures from an assumed model while allowing good fit of the assumed model. This insight may be more valuable in practical settings than a slavish adoption of any particular mathematical model.

3.3. Randomization and robustness The above discussion indicates how careful selection of a design (and in some instances the estimator) can provide protection against departures from a fitted model. One issue that we have not yet addressed is the role of randomization in robustness. A justification often given for experimental randomization is that it is a source of robustness against model inadequacies. Wu (1981) attempts to give a rigorous basis for this justification in the context of comparative experiments. His approach is quite different than that formulated in Section 2. In the spirit of Chapter 9 in Scheff6 (1958), he associates with each unit two components. The first is called the unit error,

Y-J. Chang and W. L Notz

1070

and is an unknown constant associated with some feature of the unit, for example initial weight or income. The second is called the technical error and corresponds to other sources of error associated with the response. The technical errors are random variables with mean 0 and those corresponding to different responses are assumed to be uncorrelated with common variance. It is assumed that no interaction between the unit and technical errors exists. Let e~, denote the unit error associated with unit u. The model for the response Y~t to treatment t by unit u is

gut=o~t+euWcut,

t=l,2,...,v,

u=l,2,...,N,

(3.22)

where c~t is the effect of treatment t and eut the technical error. Denote the set of possible values of e by E. /3 is called the neighborhood of model violations. Wu (1981) assumes E is bounded, contains unit errors of the form e~, = e for all u, and is invariant under some transformation group T, i.e., eEE~-ecE

for all ~- c T

(3.23)

where ~-e = {er-l~}u, and ~--1 is the inverse of "r in the group T. The invariance assumption reflects the vagueness of the experimenter's knowledge about e~,. It is assumed that the experimenter is interested in estimating the pairwise treatment contrasts c~s - st, which will be estimated by the usual least squares estimators denoted ~, - ~¢. Wu (1981) defines It = {units u: u is assigned to treatment t} and calls I = {It}t=l a pattern. I corresponds to a nonrandomized design with treatment group sizes nt = Iltl for t = 1,2,. .., v and ~t=lV nt = N . Let Z denoted the collection of all such I's. A randomized design is defined to be a probability measure ~ over Z, i.e., {r/(I); I c Z}, with r/(I) ) 0 and ~x~_= r/(I) = 1. For any nonrandomized I with treatment group sizes nt = Ihl, let a ( I , e) denote the expected mean squared error, i.e., a(±, e) =

E(a, -

-

+

sl.

(4.13)

Note that when 032 ~ 1 the Karson Manson and Hader approach yields the same B (namely /32) and smaller V than the Box and Draper approach. However when ~02 > 1 the Box and Draper approach will yield a smaller value of 0.2(V + / 3 ) than the Karson, Manson and Hader approach when N'y 2 2002 q- 2-v/co2(w2 + 1) o.---T - < w2- 1

(4.14)

Next we consider extrapolation to L = the k-dimensional ball of radius /~ centered at the origin for g quadratic and h cubic. We consider only the univariate case

Y-J. Chang and W. L Notz

1076

with u uniform probability measure on [-R, R]. We again may restrict to symmetric designs 4. Thus all odd moments of u and ( are 0. If we follow the Box and Draper approach, we find that

°'27-2B1

=

(4.15)

3

so B1 = 0 if and only if R 2 5/3, the optimum design can be shown to be of the form 4(±1) = a / 2 , ~(0) = 1 - a, where -1

a=

1+

5

3 R2 -3-+-5R4

(4.18)

The Karson, Manson and Hader approach is quite complicated here. Kiefer (1980) discusses this approach briefly and indicates that as R --+ 0o this approach is preferable to Box and Draper's when (approximately) N',/2/~r 2 < 8. Kiefer (1980) also discusses both approaches when L = { - R , R} in which case some simplification occurs. The interested reader should consult this paper for details.

1077

Model robust designs

Huber (1975) also considers extrapolation to a point in the univariate (k = 1) setting. In our notation, he takes X = [0, oo), L = [x0] (x0 < 0), u the probability measure putting all its mass at xo, and model (2.1) becomes E(Yi) = f ( x i )

(4.19)

where f is assumed to belong to the class of functions with bounded (h + 1)th derivative, namely the class

If(h+l)(x)l ~< c,

Fo = { f ;

xo 3 for some positive constant ~}. It can be shown that As

~" A I ( T - I A ) ,

where AI ( T - 1 A ) is the smallest eigenvalue of T - 1 A . Since 3 is a constant, Al-optimality is equivalent to maximizing A 1 ( T - 1 A ) . A design is A2-optimal if it maximizes !

A2 = leo/32A/32 dB feo dB where dB is the differential of the area on the surface of the ellipsoid ~0 = {/32,/3~A/32 = 3 for some positive constant 6}. They have shown that A2 = 7T~--8 1 3 t r ( T - l A ) . Since T i ~1- - 8 and 3 are constants, A2-opti_mality is equivalent to maximizing tr(T -1A). In Jones and Mitchell (1978), some characterizations of Al-optimality and A2-optimality are presented and applied for constructing optimal designs for first-order versus second-order polynomial models. When the true model is unknown, a common approach in experimental design is to consider a simple model which is thought to provide an adequate approximation to the more complicated true model. The work discussed so far in this section emphasizes the detection of model inadequacy. However, it is also important that the design protect against selecting an oversimplified model. DeFeo and Myers (1992) developed a new criterion for model robust design that considers the two conflicting goals: protecting against the use of an oversimplified model and detecting lack of fit. Instead of maximizing the power of detecting model adequacy, the criterion simultaneously uses the integrated model bias and power. They also propose a class of experimental designs that appear to perform well under this criterion. These designs are rotations of 2 k factorial designs or central composite designs through a small angle. We have been assuming that the true model departs from the assumed model. If the true model makes too strong an assumption about the form of the departures from the assumed model, the design may only detect these specific departures, and may not

1084

Y-J. Chang and W. L Notz

detect departures of other kinds. Atkinson and Fedorov (1975a) developed designs for discriminating between two possible true models (assuming one of the models is true) which need not be linear in the parameters. No particular assumed model is to be fit a priori. Rather, one tries to determine which of several models to fit. Atkinson and Fedorov (1975b) extended these results to designs for discriminating between several models. The criterion they used is called T-optimality and maximizes the sum of squares for the lack of fit of the incorrect models. The design which maximizes the criterion depends on which of the models is true and on the values of the unknown parameters. Atldnson and Fedorov (1975a, b) suggested three approaches for solving this problem: a sequential approach, a Bayesian approach, and a maximin approach. In the special case of discriminating between models (5.1) and (5.2), when (5.2) is true, the T-optimality criterion reduces to the maximization of the noncentrality parameter /5~A/5> In this case, the sequential approach uses available information on/52 at each stage to select the next run; the Bayesian approach uses a prior distribution on/52; the maxmin approach maximizes the minimum/5~A/52 over a specific region in the /52 space.

6. Bayesian robust design Robustness in a Bayes context usually refers to insensitivity to specification of the prior. This usually takes the form of requiring a design to be optimal (or nearly optimal) for some optimality criterion with respect to a particular prior subject to the condition that the design perform reasonably well over a class of possible priors. One of the earliest and most general approaches to robust Bayes design is the paper by O'Hagan (1978). We now give some details of the results. In this paper a localized regression model is used to reflect the fact that any particular regression model is only an approximation to the true model over a small portion of the design space. The localized regression model allows the regression parameters to depend on z, the point at which an observation is taken, and the regression parameters are assumed to vary "slowly" with x. In the setting of regression with a single independent variable z E ( - 0 % +ee), response Y, m x 1 vector of regression functions f ( z ) , and r a x 1 vector of regression parameters/5(x), a simple localized regression model would be the following. The distribution of Y given x and/5(x) is assumed normal with E(Y I z,/5(z)) = f ( x ) ' / 5 ( x ) ,

(6.1)

Var(Y ix,/5(x)) = 0-2.

(6.2)

The prior ought to reflect our belief about the local stability of the regression model. One simple possibility is to assume that our information about/5(x) is the same for all values of x, i.e., the prior mean vector is E(/5(x) [ b0) = b0.

(6.3)

Model robust designs

1085

If we further assume that the correlation between ~ ( x ) and/3(x*) depends only on Ix - x* I we might m o d e l / 3 ( x ) as a second-order stationary process with

E((/3(x) - bo)(~(x*) - bo)' I bo) =

p(Ix

-

x*l)Bo

(6.4)

where p(d) is a monotonic decreasing function of 0 ~< d < oo, and p(0) = 1. We might finally assume the/3(x) are jointly normal. The above model, including the prior, is the localized regression model. Designs and estimators are chosen to minimize the posterior expectation of the squared error loss for prediction• This posterior expectation is a complicated function of the independent variable so minimization in a particular problem must be done numerically. Suppose we observe N values of the dependent variable, Yl,. - •, YN at corresponding x values x l , . . . , XN. The posterior distribution of the/3(x) is such that they are jointly normal with means bl (x) = E ( ~ ( x ) I y l , . . . , YN, bo) = S ( x ) ' A - l y + Q(x)'bo

(6.5)

and covariances

S l (x, x*) = E [(~(x) - bl (x)) (/~(x*) - b I (x*)) ! I Yl,..., YN, Do] = ~(Ix - ~*l)Bo

-

S(x)tA-lS(x *)

(6.6)

where

Q(x) = I m -- F ' A - 1 S ( x ) ,

p(Ix - Xll)f(xl)'Bo "~ y =

s(~)= p(lz - XN])f(XN)'Bo

Y

z

k f(XN)t A=S2INWC

(6.7)

and C is the N × N matrix whose ( i , j ) t h element is

~j = p(Ix~

-

xjl)Y(xO'Bo$(xj).

(6.8)

Inference about a single future value of y at x is made from its posterior predictive distribution N ( f ( x ) ' b l (x), ~r2 + f(x)'B1 (x, x)f(x)). In particular, an obvious point estimator is the mean f ( x f b l ( x ) .

Y-J. Changand W. L Notz

1086

If f ( x ) ' b l (x) is very complicated, one may prefer to fit a simpler model (assumed to be a good approximation to f ( x ) ' b l ( x ) over the region of interest. Suppose we approximate f ( x ) t b l (x) by a simple regression model of the form

~(x)=g(x)'h

(6.9)

for some s x 1 vector h and for a given s x 1 vector function g(x). The best value of h for the approximation will be made with reference to a well-defined loss function. If we consider predicting the unknown future value y(x) of the independent variable at x by if(x), the loss sustained in this prediction is L, (x) = [y(x) - ~(x)] 2

(6.10)

Since y(x) is unknown at the time of prediction, the relevant loss when we use predictor ~(x) will be the posterior expectation of L1 (x), say L2(x). Let the measure function ~2(x) denote the relative importance to us of predictions at the various values of x.-Thus when choosing the value of h in (6.9) our expected loss is

d/2(x).

~L2(x)

(6.11)

oo

The value of h minimizing (6.11) is

h = W -1

/

c2~

g(x)f(x)'bl(x)df2(x)

(6.12)

where W =

F

g(x)g(x)' d~(x)

(6.13)

oo

assuming the integrals exist. Substituting for bl (x) from (6.5) gives

h = T ~ A - l y + R~bo

(6.14)

where

T ' = W -1

R ' = W -1

/? /? oo

g ( x ) f ( x ) ' S ( x ) ' dg2(x),

g ( x ) f ( x ) ' Q ( x ) ' d~(x).

(6.15)

Model robust designs

1087

For purposes of design, suppose we can choose the values of z l , . . . , ZN at which we take observations. For predictive curve fitting as above, we would like to find the design that gives the lowest expected loss. This is the design which maximizes U = tr(WT'A-1T).

(6.16)

This is an extremely complicated function of if:l,..., ZN and so generally must be maximized numerically. As an example, suppose a straight line is to be fitted a posterior and prediction is to be done at x = 0. Suppose also p(d) = exp ( - 1 d2/0-2),

(6.17)

dY2(z) = (27r~r~)-l/2 exp ( _ 21z2 ./o_o )2"~dx

(6.18)

and assume a locally linear model, i.e., f ( x ) ' = (1, x). For

(6.19) O'Hagan (1978) gives the following list of optimal designs for small N. N

Optimaldesign

2 3 4 5 6

-2.20, -3.02, -3.44, -3.64, -3.84,

2.20 o, 3.02 -0.60, 0.60, 3.44 -1.39, 0, 1.39, 3.64 -1.33, -1.33, 1.33, 1.33, 3.84

The optimal designs take observations at a variety of points centered at the point 0 at which we wish to do prediction and without the tendency to take all observations at the boundary of the design region (as classical D-optimal designs do, thus requiring the design region to be compact). This behavior of the optimal design mimics the "naive" approach of spreading observations over the region of interest in order to observe curvature in the model. O'Hagan (1978) generalizes the above discussion to the multivariate setting. The interested reader should see Section 3 of O'Hagan (1978) for details. As is the case for (6.16), analytic solutions appear to be difficult to obtain, even more so than in the univariate setting discussed above. Solutions to specific problems will generally need to be found numerically. While O'Hagan (1978) provides a very general development, a number of other authors also discuss robust Bayesian design. In the interest of space, we provide only a summary of these papers. The interested reader should consult the papers themselves

1088

Y-Z Chang and W. I. Notz

for details. The papers represented below are not exhaustive but are representative of other approaches. Many authors consider the problem of robustness to the specification of the prior. In the context of regression and the approximate theory of design, DasGupta and Studden (1991) constructed a framework for robust Bayesian experimental design for linear models. They found designs that minimize a measure of robustness (related to the Bayes risk) over a class of prior distributions and, at the same time, are close to being optimal for some specific prior. A variety of measures of robustness are considered, including a minimax approach, and two classes of priors are investigated. Seo and Larntz (1992) suggested criteria for nonlinear design that make the design robust to specification of the prior distribution. In particular; they suggested seeking designs which are optimal with respect to a given prior subject to the constraint that the design attain a certain efficiency over a class of closely related prior distributions. DasGupta et al. (1992) gave a detailed approach to design in a linear model when the variance of the response is proportional to an exponential or power function of the mean. They considered the case where the experimenter wants to find a design which is highly efficient for several criteria simultaneously, and gave examples of such "compromise designs". For normal one way analysis of variance models, Toman (1992a, b) considered robustness to a class of normal prior distributions where the variances take values in specified intervals. Optimality criteria involve maximizing the average, with respect to a distribution on the prior precision parameters, over the class of posterior distributions, of either the determinant or the trace of the posterior precision matrix. Toman and Gastwirth (1993) investigated both robust design and estimation for analysis of variance, models when the class of priors is a class of finite mixtures of normals. Squared error loss is used and the posterior risk averaged over the class of corresponding posterior distributions. Toman and Gastwirth (1994) suggested specifying the prior distribution of the treatment means in a one way analysis of variance model from a pilot study. They assumed the error variances of the pilot and of the follow up experiments to be unknown, but that intervals in which they can vary can be specified. Again, squared error loss is used and a designs and estimators chosen to minimize a minimax criterion over the class of posterior distributions. In addition to robustness to specification of the prior, there is also work on robustness to specification of the linear model in a Bayesian setting. One approach is to consider mixtures of linear models using a criterion that is a weighted average of optimality criteria for a variety of candidate models. The weights would correspond to the prior probability that a candidate model was the "true" model. L~uter (1974, 1976) considered such an approach using a criterion of the form m

i=1

where ¢i(~) is the D-optimality criterion under the ith of m candidate models. The weights wi for each model are the prior probabilities for that model. Cook and Nachtsheim (1982) applied such a criterion, based on A-efficiency, to the problem of finding

Model robust designs

1089

designs for polynomial regression when the degree of the polynomial is unknown. In particular, if ~ is the A-optimal design for the ith model, Ai corresponds to the average variance of prediction over the design region, and if Mi is the matrix of moments for the ith model, the criterion used was to maximize w. t r ( A i M i ( ~ ) - l ) ¢(~) = _ ~

!tr(AiMi(~i)-l)"

A more fully Bayes approach might instead maximize m

¢(~) = - E

wi t r ( A i m ~ ( ~ ) - l ) .

i=1

A summary of the mathematics of such criteria and how the general equivalence theorem can be applied can be found in Pukelsheim (1993, pp. 286-296). Dette (1990) gives some general results for D-optimality and polynomial regression. Dette (1991, 1993a, b) used mixtures of Bayesian linear model criteria involving the prior precision matrix and derived a version of Elfving's (1952) theorem for this case. Dette and Studden (1994) provided further results in this direction, characterizing the optimal design in terms of its canonical moments. Another approach is to be found in DuMouchel and Jones (1994). They introduced a modified Bayesian D-optimal approach for factorial models. They used a prior distribution with a structure that recognizes "primary" and "potential" terms in order to recognize uncertainty in the model. The resulting Bayesian D-optimal designs are resolution IV designs, and so this approach provides justification for the use of resolution IV designs over other designs with the same value of the D-criterion. Steinberg (1985) considered using a two-level factorial experiment to investigate a response surface and used a Bayesian formulation to represent the uncertainty in the adequacy of the proposed model. He derived a method for choosing the high and low levels for each factor of the two factor experiment, conditional on the particular fractional factorial design used. This allows one to quantify the trade-off between choosing design points on the boundary of the design region where information is maximized versus the fact that the model holds to better approximation near the center of the design region.

7. Applications to computer experiments

7.1. Introduction Computer modeling of complex physical phenomena has become increasingly popular as a method for studying such processes. This is particularly true when actual experimentation on such processes is very time consuming, expensive, or impossible. Examples include weather modeling, integrated circuit design, plant ecology, and the study of controlled nuclear fusion devices. Such computer models (or codes) usually

1090

Y-J. Chang and W. L Notz

have high dimensional inputs. These inputs may be scalars or functions. The output from such models may also be multi-dimensional. For example, as might be the case in weather modeling, the output may be a time dependent function from which a few summary responses are selected. A computer experiment involves running the computer at a variety of input configurations in order to make inferences about characteristics (parameters) of the computer model on the basis of the resulting output. For example, one may wish to determine the inputs that optimize some function of the outputs. Design involves the selection of the input configurations so as to yield efficient inference about these characteristics. It is assumed that the computer code adequately models the physical process that is of ultimate interest so that inferences made about characteristics the computer model yield reliable information about the corresponding characteristics of the physical process. Whether this is a reasonable assumption is a question that might well be addressed by statistics. However, in the design of computer experiments this issue is generally ignored. Attention is restricted to inference concerning the computer model itself. There are several features of this problem that make it unusual. First, the output from computer code is deterministic. Hence it is not immediately obvious how statistical models are relevant. Second, it is not possible to give an explicit functional relation between the inputs and outputs. This is due to the complexity of the physical phenomenon being modeled. Third, we assume the computer model is sufficiently complex that it is very time consuming (and expensive) to obtain a single run of the code. Thus output can be obtained for only a relatively small number of input configurations. Extensive grid searches are ruled out. In order to make the problem tractable, the literature on computer experiments usually assumes that the inputs are scalar and their number is relatively small. It is also assumed that the response (output of interest) is a single scalar. In order to make inferences about characteristics of the computer model, one approach is to fit a relatively simple statistical model to the output. We hope this model adequately approximates the true output. Inferences about characteristics of the computer model are then made by making inferences on the corresponding characteristics of the fitted statistical model. Statistical models are thus used to approximate the output from the computer code for purposes of inference. The connection to model robust design should be clear. The true model is unknown. A relatively simple model is used to approximate the true model. Design involves where best to take observations so that the fitted model will be an adequate approximation to the true model for the purpose of making some inference about the true model. However, a feature of computer experiments that differs from the models discussed so far is that output is deterministic. One consequence of this fact is that no sensible design will require more than one observation at any input configuration. Another very important consequence is that there is no random error in the classical sense. The difference between the fitted and true model is solely deterministic, hence all error is due to bias. This would seem to suggest the use of an all bias criterion such as advocated by Box and Draper (1959), but application of this approach requires we make some assumption about the form of the true model. It is precisely such knowledge that we are lacking in computer experiments. The lack of random error in the

Model robust designs

1091

classical sense also makes it difficult to justify on classical statistical grounds most methods for fitting statistical models as well as most methods for selecting a design. The only uncertainty in computer experiments arises from the fact that the computer code is a sort of "black box" and we lack knowledge as to the precise relation between the inputs and output. We might take a Bayesian approach and quantify this uncertainty or lack of knowledge by means of probability. In this case the random component of any statistical model that we fit to the output represents our uncertainty concerning the adequacy of this statistical model. If the output of the computer code is a smooth function of the inputs, we also note that the residuals (differences between the actual output and that predicted by any smooth fitted model) corresponding to inputs which are "close" will appear to be correlated. The closer the inputs, the more strongly correlated these residuals will appear to be. It would seem reasonable to build this property into statistical models used to approximate the actual computer model. Thus one approach, which has become popular, is to fit a regression model to the output and model the residuals as though they were the realization of a stochastic process with covariance which is a function of some measure of the distance between two input configurations.

7.2. Modelingand estimation The above issues are all addressed in Sacks et al. (1989) which gives an excellent overview of the literature on computer experiments. We follow these authors in describing a method of fitting a regression model with errors which form a stochastic process with covariance which is a function of some measure of the distance between inputconfigurations. Issues of design can only be discussed in the context of a model and method of inference. For simplicity we restrict to the case of a scalar response. We use notation as in Section 2 with X = L. Let x denote a particular input configuration, X the set of possible input configurations, and y ( x ) the actual deterministic response at x which is viewed as a realization of some random function (stochastic process) Y(x). We assume the following model for Y(x). 7~

=

+

(7.1)

j=l

Z ( . ) is a random process which is assumed to have mean 0 and covariance

v(w, x) = 0-2R(w, x)

(7.2)

between Z(w) and Z(x), where 0-2 is the process variance and R(w, x) is the correlation. As previously mentioned, justification for (7.1) might be that the difference between the actual output of the computer code and a simple regression model, while deterministic, resembles a sample path of a suitably chosen stochastic process. Alternatively, one might regard Y(x) as a Bayesian prior on the actual output with the/3's either specified a priori or given a prior distribution.

1092

Y-J. Chang and W. L Notz

While analysis of (7.1) might proceed along a variety of lines, if the objective is prediction of the response at untried inputs, a kriging approach has become popular. This is the approach suggested by Sacks et al. (1989) for such an objective. Given observations at input configurations, or sites, Xl, x2, • • •, X N in X and output Yd = (y(xl), y ( x 2 ) , . . . , y ( X N ) ) ' consider linear predictors of y ( x ) at an as yet unobserved site x of the form (7.3)

~ ( x ) = c ( x ) ' yd.

If we replace Yd in (7.3) by Yd = ( Y ( x l ) , Y ( x 2 ) , . . . , Y(xN))' then if(x) is random. The best linear unbiased predictor (BLUP) is that value of e(x) which minimizes the mean squared error (averaged over the random process) MSE[~'(x)] = E[c(x)'Yd - Y!x)] 2

(7.4)

subject to the unbiasedness constraint

(7.5)

=

Note that a Bayesian approach would predict y ( x ) by the posterior mean ElY(x) I Yd]. If Z(*) is Gaussian and improper uniform priors on the fl's are used, then it is well known that the BLUP in this case is the limit of the Bayes predictor as the prior variances on-the fl's tend to infinity. To calculate the BLUP for model (7.1), let f ( x ) and F be as defined above (2.3) in Section 2. Let R be the N x N matrix whose i, j t h entry is R(x,i, x j ) and let r ( x ) = [R(Xl : X), R ( x 2 , x ) , . . . , R ( X N , x)]'. The MSE in (7.4) is then (r2 [1 + c ( x ) ' R c ( x )

- 2c(x)'r(x)]

(7.6)

and the unbiasedness constraint in (7.5) becomes F ' c ( x ) = f ( x ) . Minimizing (7.6) subject to this constraint using the method of Lagrange multipliers X(x) we find that c ( x ) for the BLUP must satisfy

(7.7) and yields the BLUP =

+

(yd

-

(7.8)

where /~ = ( F ' R - 1 F ) - I F ' R - 1 Y d is the usual generalized least-squares estimate of /3. Under (7.1) the two terms on the right hand side of (7.8) are uncorrelated and might be interpreted as follows. The first is the usual generalized least-squares

1093

Model robust designs

predictor. The second is a smooth of the residuals. Notice that if (7.7) is substituted into (7.6) one may obtain the following expression for the MSE of the BLUE

E

MSE[ff(x)] =0-2 1 - ( f ( x ) % ( x ) ' )

()1)1 o F'

(y(x)

F R

~r(x)

"

(7.9)

In order to compute any of these quantities, the correlation R(w, x) must be specified. For a smooth response R ( w , x) should have some derivatives while for an irregular response a function with no derivatives would be preferred. Choice of _R(w, x) is discussed in some detail in Sacks et al. (1989). Stationary families which are products of one-dimensional correlations, i.e., of the form R(w, x) = HRj(wj - xj), are suggested as a natural choice. Some examples are

R(w,x) = Hexp(-Ojlwj - zjlv),

0 M1 in the Loewner ordering, i.e., if M2 - M1 is nonnegative definite, then the criterion q~ satisfies ~b(M2) >~ ~b(M:). This motivates the following definition: DEFINITION. An information matrix M1 is called inadmissible if there exists another information matrix M2 such that M2 > M1 (i.e., M2 - M1 is nonnegative definite but not the null matrix). In construction of optimal designs, it is therefore necessary to only consider probability measures resulting in admissible information matrices: this is like the well known fact in decision theory that admissible rules form a complete class. In polynomial regression problems, due to the moment interpretation of the information matrix, this helps in bounding the number of support points in an optimal design according to any criterion that is monotone increasing in the moment matrix in the Loewner ordering. Indeed, the following holds: THEOREM. Under the hypothesis of monotonicity of (o in the Loewner ordering, an optimal design for a polynomial regression model of degree p can have at most p + 1 points in its support with at most p - 1 points in the interior of P(. This result aids in understanding why the theoretical optimal designs are generally so thinly supported. Further pinpointing of the exact number of points and their weights do not come out of this theorem.

4.2. Bayesian formulation of an optimal design problem In a strictly Bayesian decision theoretic setup, one has a set of parameters 0 with a prior distribution G, a specified likelihood function f ( z ] 0), and a loss function L(O, a). Given a design, there is an associated Bayes rule with respect to the trio (f, L, G); an optimal design should minimize over all designs the Bayes risk, i.e., the average loss of the Bayes estimate over all samples and the parameters. Chaloner and Verdinelli (1994) give a fairly comprehensive review of this formulation. In particular, they give a number of loss functions that have been proposed, and there is an instructive account of which alphabetic Bayesian criteria correspond to such a a loss-prior formulation. Note that the formulation can as well take the route of prediction rather than estimation; Eaton et al. (1994) consider a predictive formulation and show that sometimes one returns with the alphabetic optimal designs again, but not always.

Reviewof optimalBayesdesigns

1111

There is another (simplistic) way to look at the Bayes design problem which in fact has the axiomatic justification under a normal-normal-gamma linear model with squared error loss. Thus, consider the canonical linear model _Y ~ N(XO, a2I), 0 '~ N ( ~ , o'2R-1). Then, under the standard squared error loss 1]0 - all 2, the Bayes risk (in fact even the posterior expected loss itself) equals tr(M + R/n) -1, where n denotes the sample size. One would therefore seek to minimize tr(M + R/n) -1, which has a remarkable similarity to the classical A-optimality criterion. The Bayesian alphabetic criteria are thus defined for linear models as: Bayesian A-optimality: Minimize tr(M + R / n ) -1, Bayesian D-optimality: Minimize IM + R/hi -1, Bayesian c-optimality: Minimize ct (M + R / n ) - 1c for a given vector c, Bayesian E-optimality: Minimize the maximum eigenvalue of (M + R/n) -1, Bayesian G-optimality: Minimize f c' (M+R/n)-lc du(c), where u is a probability measure on the surface of the unit ball c'c = 1. (Note that Studden (1977) calls this integrated variance optimality.) Of course, in the absence of a meaning for R, these criteria do not stand to reason. They do stand to reason by doing one of two things: a structured setup of normalnormal-gamma distributions with a squared error loss, or restriction to affine estimates with only assuming that the dispersion matrix of 0 equals q2R-1. The presence of 0-2 as a factor in the dispersion matrix of 0 makes this less inoccuous than it seems. A substantial amount of the optimality theory in Bayes design has been done with these alphabetic criteria. Note that if the sample size n is even reasonably large, the extra factor R/n in these functionals should not (and indeed do not) play much of a role. Thus, for priors in linear models which are not flatter in comparison to the normal likelihood tend to report optimal Bayes designs that track the classical ones very closely, or even exactly. On the other hand, although there is some scope for optimality work with t or other flat priors, so far there are no published works in this direction. The field of Bayes optimal designs therefore still holds out some (hard) open problems even for the Gauss-Markov linear model. Of course, estimation and prediction are not the only inference problems one can design for; indeed, the design to be used should be consistent with what would be done with the data. The role of optimal designs in testing problems is described in Kiefer (1959), where he shows that for maximizing the minimum power over small spheres around the null value in ANOVA problems, it is not correct to use the F test regardless of the design. Kiefer's criterion would not be very interesting in a Bayesian framework (although some Bayes design work has used average power as the criterion: see Spiegethalter and Freedman(1986)); however, Bayes optimal design for testing problems has generally remained neglected. DasGupta and Studden (1991) give a fully Bayesian formulation and derive Bayes designs; there are also a number of remarkably charming examples in Chapter 7 of Berger (1986), and there is some more theory with conjugate priors in normal linear models in DasGupta and Mukhopadhyay (1994). In closing, the Kiefer-Wolfowitz theory has had a profound impact on the work in Bayes optimal designs in two ways: use of the alphabetic criteria and adoption of the approximate theory.

1112

A. DasGupta

5. Mathematics of Bayes design 5.1. General exposition The mathematics of Bayes optimal designs is generally the same as that in classical optimal design. There are three main routes to obtaining an optimal design: (i) Use an equivalence theorem. (ii) In polynomial models, use inherent symmetry in the problem (if there is such symmetry) and convexity of the criterion functional in conjunction with Caratheodory type bounds on the cardinality of the support, and (iii) Use geometric arguments, which usually go by the name of Elfving geometry, due to the pioneering paper Elfving (1952). An equivalence theorem does the following: it prescribes a function F ( g , x) defined on the design space such that F(g, x) ~ 0 for all x in 2( and is = 0 if and only if x is in the support of an optimal design g. Usually, but not always, some guess work and some luck is involved in correctly using equivalence theorems for identifying an optimal design. The nice thing about equivalence theorems is that really general equivalence theorems are known that cover probably almost all cases one would be interested in, and in principle, it is supposed to work. One can see Silvey (1980), Whittle (1973) and Pukelsheim (1993) for increasingly general equivalence theorems. Convexity arguments do the following: First by using Caratheodory type theorems, or if possible upper principal representations from moment theory, one gets an upper bound on the number of points in the support of an admissible design. Then, one proves that the criterion functional has some symmetry or invariance property; finally, one proves that the functional is convex in a convex class of moment matrices. Application of all of these together would reduce the dimensionality of the problem to a very low dimension, which is then solved ]ay standard calculus. The geometric methods attributed to Elfving (and developed by many others subsequently) are by far the most subtle methods of optimal design theory, and need to be stated very carefully with changes in the criterion function. It is best understood by a verbal geometric description for the c-optimality problem. For this, one takes the symmetric convex hull of the design space, i.e., E = C H ( X U - 2(), where C H denotes convex hull. This set is symmetric, convex and compact provided ,9( is compact. Now take any vector c; if c ¢ 0, then on sufficient stretching or shrinking, it will fall exactly on the boundary of the convex set E (the scalar by which e is divided in order that this happens is called the Minkowski functional of E evaluated at c). Call this scaled vector c*. Then c* can be represented in the form ~ piyi where each yi is either in X or - 2 ( . If X is not already symmetric, then those that are in 2( give the support of an optimal design. A concise general version of this method for c-optimality is given in Pukelsheim (1994); there is also a wealth of information with many greatly unifying results in Dette (1993). One should be cautious about the use of the terminology "prior" in Dette (1993); the unifying nature of the theorems is the most gratifying aspect of this article, but the worked out examples indicate that again elements of intuition and good luck are needed for the Elfving geometry to be useful.

Reviewof optimalBayesdesigns

1113

5.2. State of the art in Bayesian alphabetic optimality 5.2.1. c-optimality It seems that the best results on Bayesian optimality are known for this criterion. Chaloner (1984) already considers Bayesian c-optimality and gave a form of the Elfving geometry in this case. Her results imply that Bayesian c-optimal designs can be one point, i.e., they can sometimes take all observations at one point. The deepest results on Bayesian c-optimality are given in E1-Krunz and Studden (1991). They succeeded in achieving the following: (a) give a characterizing equation completely specifying a c-optimal design, together with a Bayesian embedding of the classical Elfving set that describes the c-optimal design, (b) characterize the situations when the c-optimal design is in fact one point, (c) characterize the situations when a particular one point design is c-optimal, (d) characterize the cases when the classical and the Bayesian c-optimal designs are exactly the same, and (e) demonstrate that for any prior precision matrix, there is a sufficiently large sample size beyond which the classical and the Bayesian c-optimal designs have exactly the same support. This last result has a remarkable consequence: it is a classic fact (see Karlin and Studden, 1966) that in polynomial regression, for the extrapolation problem, i.e., for estimating the mean response at an z outside of the design space, the c-optimal design is always supported at the same set of points (it is a particularly brilliant application of the methods of orthogonal polynomials to optimal designs). Therefore, the result in E1-Krunz and Studden (1991) demonstrate the same property for the Bayesian c-optimal design in the extrapolation problem for any prior precision matrix provided the sample size is large. This is extraordinary, because one is saying much more than weak convergence to the classical design. That the supports coincide for large sample s~zes was already recognized in Chaloner (1984) also. 5.2.2. A-optimality The criterion for c-optimality can be written in the equivalent form tr(cd(M + R/n)-l). A generalization of this is the functional tr(T(M + R / n ) - l ) , where T is some nonnegative definite matrix of rank k, k ~ O, then the Chebyshev expansion of f

uniformly converges to f. In addition to the above theorem, for purposes of deciding how many terms one should use, estimates of the error are useful. There are several results known; we find the following useful. THEOREM. Let E n ( f ) denote the error in the approximation of f by s n ( f ) using

supnorm, and let E~ ( f ) denote the same error by using the best polynomial approximation to f in supnorm. Then

The suggestion in Atkinson and Donev (1992) is to linearize a nonlinear model by using its Taylor expansion; we believe that Chebyshev expansions can estimate more efficiently with a smaller number of terms. One reason is the following theorem. THEOREM. Consider the expansion of a continuous function in terms of ultraspherical polynomials defined in Section 13.3.3. Then, the choice a = 1/2 always gives the best approximation in sup norm if the coefficients {ai, i > n} in the ultraspherical

expansion o f f are nonnegative for that given n; in particular, a Chebyshev expansion corresponding to a = I / 2 is better than a Taylor expansion which corresponds to OL=~.

Acknowledgement I learned from Bill Studden the little I know about optimal designs. I am very thankful for the scholarly inspiration he provides, in an understated brilliant way.

1142

A. DasGupta

References Akhiezer, N. (1962). Some Questions in the Theory of Moments. Amer. Mathematical Soc., Providence, RI. Antelman, G. R. (1965). Insensitivity to non-optimal design in Bayesian decision theory. J. Amer. Statist. Assoc. 60, 584-601. Atkinson, A. C. and A. N. Donev (1992). Optimum Experimental Designs. Clarendon Press, Oxford. Basu, S. and A. DasGupta (1992). Robustness of standard confidence intervals under departure from norreality, to appear in Ann. Statist. Berg, C. (1985). On the preservation of determinacy under convolution. Proc. Amer. Math. Soc. 93, 351-357. Berg, C. (1995). Indeterminate moment problems and the theory of entire functions. J. Comput. Appl. Math. 65, 27-55. Berger, J. (1986). Statistical Decision Theory and Bayesian Analysis. 2nd edn. Springer, New York. Berger, J. (1994). An overview of Robust Bayesian analysis. Test 3(1), 5-59. Bickel, P. J. and A. M. Herzberg (1979). Robustness of design against autocorrelation in time I. Ann. Statist. 7, 77-95. Billingsley, P. (1986). Probability and Measure. Wiley, New York. Bock, J. and H. Toutenberg (1991). Sample size determination in clinical research. In: Handbook of Statistics, Vol. 8. Elsevier, Amsterdam, 515-538. Bose, R. C. (1948). The design of experiments. In: Proc. 34th Indian Sci. Cong., Delhi, 1947. Indian Science Congress Association, Calcutta, (1)-(25). Bowman, K. O. and M. A. Kastenbaum (1975). Sample size requirement: Single and double classification experiments. In: Selected Tables in Mathematical Statistics, edited by IMS, Vol. 3. Amer. Mathematical Soc., Providence, RI. Box, G. E. P. and N. R. Draper (1987). Empirical Model-Building and Response Surfaces. Wiley, New York. Box, G. E. P. and J. S. Hunter (1957). Multi-factor experimental designs for exploring response surfaces. Ann. Math. Statist. 28, 195-241. Box, G. E. P. and H. L. Lucas (1959). Design of experiments in non-linear situations. Biometrika 46, 77-90. Brooks, R. J. (1972). A decision theory approach to optimal regression designs. Biometrika 59, 563-571. Brooks, R. J. (1974). On the choice of an experiment for prediction in linear regression. Biometrika 61, 303-311. Brooks, R. J. (1976). Optimal regression designs for prediction when prior knowledge is available. Metrika 23, 217-221. Brown, L. D. (1986). Fundamentals of Statistical Exponential Families, IMS Lecture Notes - Monograph Series, Vol. 9. Hayward, CA. Brown, L. D. (1991). Minimaxity, more or less. In: S. Gupta and J. Berger, eds., Statistical Decision Theory and Related Topics, 1-18. Caselton, W. E and J. V. Zidek (1984). Optimal monitoring network designs. Statist. Probab. Lett. 2, 223-227. Chaloner, K. (1984): Optimal Bayesian experimental design for linear models. Ann. Statist. 12, 283-300; Correction 13, 836. Chaloner, K. (1989). Bayesian design for estimating the turning point of a quadratic regression. In: Commun. Statist. Theory Methods 18(4), 1385-1400. Chaloner, K. (1993). A note on optimal Bayesian design for nonlinear problems. J. Statist. Plann. Inference 37, 229-235. Chaloner, K. and K. Lamtz (1986). Optimal Bayesian designs applied to logistic regression experiments. Tech. Report, University of Minnesota. Chaloner, K. and K. Lamtz (1989). Optimal Bayesian designs applied to logistic regression experiments. J. Statist. Plann. Inference 21, 191-208. Chaloner, K. and I. Verdinelli (1994). Bayesian experimental design: A review. Tech. Report, Department of Statistics, University of Minnesota. Cheng, C.-S. (1978b). Optimal designs for the elimination of multi-way heterogeneity. Ann. Statist. 6, 1262-1272. Chernoff, H. (1953). Locally optimum designs for estimating parameters. Ann. Math. Statist. 24, 586-602.

Review of optimal Bayes designs

1143

Chemoff, H. (1972). Sequential Analysis and Optimal Design. Society for Industrial and Applied Mathematics, Philadelphia, PA. Clyde, M. A. (1993). An object-oriented system for Bayesian nonlinear design using xtispstat. Tech. Report 587, University of Minnesota, School of Statistics. DasGupta, A., S. Mukhopadhyay and W. J. Studden (1992). Compromise designs in heteroscedastic linear models. J. Statist. P/ann. Inference 32, 363-384. DasGupta, A. and S. Mukhopadhyay (1988). Uniform and subuniform posterior robustness: Sample size problem. Tech. Report, Purdue University. DasGupta, A. and W. J. Studden (1988). Robust Bayesian analysis and optimal experimental designs in normal linear models with many parameters. I. Tech. Report, Department of Statistics, Purdue University. DasGupta, A. and W. J. Studden (1991). Robust Bayes designs in normal linear models. Ann. Statist. 19, 1244-1256. DasGupta, A. and S. Mukhopadhyay (1994). Uniform and subunivorm posterior robustness: The sample size problem. Proc. 1st Intemat. Workshop on Bayesian Robustness, Special issue of J. Statist. Plann. Inference 40, 189-204. DasGupta, A. and B. Vidakovic (1994). Sample sizes in ANOVA: The Bayesian point of view. Tech. Report, Purdue University. Submitted: J. Statist. Plann. Inference. DasGupta, A. and M. M. Zen (1996). Bayesian bioassay design. Tech. Report, Purdue University. Submitted J. Statist. Plann. Inference. Dehnad, K., ed. (1989). Quality Control, Robust Design, and the Taguchi Method. Wadsworth and Brooks/Cole, Pacific Grove, CA. DeRobertis, L. and J. A. Hartigan (1981). Bayesian inference using intervals of measures. Ann. Statist. 9, 235-244. Dette, H. (1991). A note on robust designs for polynomial regression. J. Statist. Plann. Inference 28, 223-232. Dette, H. (1992). Optimal designs for a class of polynomials of odd or even degree. Ann. Statist. 20, 238-259. Dette, H. (1993a). Elfving's theorem for D-optimality. Ann. Statist. 21, 753-766. Dette, H. (1993b). A note on Bayesian c- and D-optimal designs in nonlinear regression models. Manuscript. Dette, H. and H.-M. Neugebauer (1993). Bayesian D-optimal designs for exponential regression models. J. Statist. Plann. Inference, to appear. Dette, H. and S. Speflich (1994a). Some applications of continued fractions in the construction of optimal designs for nonlinear regression models. Manuscript. Dette, H. and S. Sperlich (1994b). A note on Bayesian D-optimal designs for general exponential growth models. Manuscript. Dette, H. and W. J. Studden (1994a). Optimal designs for polynomial regression when the degree is not known. Tech. Report, Purdue University. Dette, H. and W. J. Studden (1994b). A geometric solution of the Bayes E-optimal design problem. In: S. Gupta and J. Berger, eds., Statistical Decision Theory and Related Topics, Vol. 5. 157-170. Diaconis, P. (1987a). Bayesian numerical analysis. In: S. Gupta and J. Berger, eds., Statistical Decition Theory and Related Topics, IV, Vol. 1, 163-176. Diaconis, P. (1987b). Application of the method of moments in probability and statistics. In: Moments in Mathematics. Amer. Mathematical Soc., Providence, RI. Donev, A. N. (1988). The construction of exact D-optimum experimental designs. Ph.D. Thesis, University of London. Dykstra, Otto, Jr. (1971). The augmentation of experimental data to maximize I x T x I . Technometrics 13, 682-688. Eaton, M. L., A~ Giovagnoli and P. Sebastiani (1994). A predictive approach to the Bayesian design problem with application to normal regression models. Tech. Report 598, School of Statistics, University of Minnesota. Elfving, G. (1952). Optimum allocation in linear regression theory. Ann. Math. Statist. 23, 255-262. E1-Krunz, S. M. and W. J. Studden (1991). Bayesian optimal designs for linear regression models. Ann. Statist. 19, 2183-2208. Elliott, P. D. T. A. (1979). Probabilistic Number Theory, Vol. 2. Springer, New York.

1144

A. DasGupta

Erdos, R (1958). Problems and results on the theory of interpolation, I. Acta. Math. Acad. Sci. Hungar 9, 381-388. Farrell, R. H., J. Kiefer and A. Walbran (1967). Optimum ultivariate designs. In: L. M. Le Cam and J. Neyman, eds., Proc. 5th Berkeley Symp. Math. Statist. Probab., Berkeley, CA, 1965 and 1966, Vol. 1. University of California, Berkeley, CA. Fedorov, V. V. (1972). Theory of Optimal Experiments. Academic Press, New York. Ferguson, T. S. (1989). Who solved the secretary problem? Statist. Sci. 4(3), 282-296. Fisher, R. A. (1949). Design of Experiments. Hafner, New York. Freeman, P. R. (1983). The secretary problem and its extensions - A review. Internat. Statist. Rev. 51, 189-206.

Friedman, M. and L. J. Savage (1947). Experimental determination of the maximum of a function. In: Selected Techniques of Statistical Analysis. McGraw-Hill, New York, 363-372. Gaffke, N. and O. Krafft (1982). Exact D-optimum designs for quadratic regression. J. Roy. Statist. Soc. Set B 44, 394-397. Ghosh, S., ed. (1990). Statistical Design and Analysis of Industrial Experiments. Marcel Dekker, New York. Ghosh, M., B. K. Sinha and N. Mukhopadhyay (1976). Multivariate sequential point estimation. J. Multivariate Anal. 6, 281-294. Giovagnoli, A. and I. Verdinelli (1983). Bayes D-optimal and E-optimal block designs. Biometrika 70(3), 695-706. Giovagnoli, A. and I. Verdinelli (1985). Optimal block designs under a Hierarchical linear model. In: J. M. Bernardo et al., eds., Bayesian Statistics, Vol. 2. North-Holland, Amsterdam. Gladitz, J. and J. Pilz (1982). Construction of optimal designs in random coefficient regression models. Math. Operations.fbrsch. Statist. Ser. Statist. 13, 371-385. Gradshteyn, I. S. and I. M. Ryzhik (1980). Table oflntegrals, Series and Products. Academic Press, New York. Haines, L. M. (1987). The application of the annealing algorithm to the construction of exact D-optimum designs for linear-regression models. Technometrics 29, 439-447. Hedayat, A. S., M. Jacroux and D. Majumdar (1988). Optimal designs for comparing test treatments with a control. Statist. Sci. 3, 462-476; Discussion 3, 477-491. Herzberg, A. M. and D. R. Cox (1969). Recent work on the design of experiments: A bibliography and a review. J. Roy. Statist. Soc. Ser. A 132, 29-67. Huber, P. J. (1972). Robust statistics: A review. Ann. Math. Statist. 43, 1041-1067. Huber, P. J. (1975). Robustness and designs. In: J. N. Srivastava, ed., A Survey of Statistical Design and Linear Models. North-Holland, Amsterdam. Huber, P. J. (1981). Robust Statistics. Wiley, New York. Joseph, L., D. Wolfson and R. Berger (1994). Some comments on Bayesian sample size determination. Preprint. Joseph, L. and R. Berger (1994). Bayesian sample size methodology with an illustration to the difference between two binomial proportions. Preprint. Kacker, R. N. (1985). Off-line quality control, parameter design, and the Taguchi method. J. Qual. Tech. 17, 176-188. Karlin, S. and L. S. Shapley (1953). Geometry of Moment Spaces, Vol. 12 of Amer. Math. Soc. Memoirs. Karlin, S. and W. J. Studden (1966). Tchebycheff Systems: With Applications in Analysis and Statistics. Interscience, New York. Kemperman, J. H. B. (1968). The general moment problem, a geometric approach. Ann. Math. Statist. 39, 93-122. Kemperman, J. H. B. (1972). On a class of moment problems. In: Proc. 6th Berkeley Symp. Math. Statist. and Probab. Vol. 2, 101-126. Kiefer, J. (1953). Sequential minimax search for a maximum. Proc. Amer Math. Soc. 4, 502-506. Kiefer, J. (1987). Introduction to Statistical Inference. Springer, New York. Kiefer, J. C. (1959). Optimum experimental designs. J. Roy. Statist. Soc. Ser. B 21, 272-304. Discussion on Dr. Kiefer's paper 21, 304-319. Kiefer, J. C. (1974). General equivalence theory for optimum designs (approximate theory). Ann. Statist. 2, 849-879:

Review of optimal Bayes designs

1145

Kiefer, J. and J. Wolfowitz (1959). Optimum designs on regression problems. Ann. Math. Statist. 30, 271-294. Kiefer, J. C. and W. J. Studden (1976). Optimal designs for large degree polynomial regression. Ann. Statist. 4, 1113-1123. Korner, T. W. (1989). Fourier Analysis. Cambridge, New York. Krein, M. G. and A. A. Nudelman (1977). The Markov Moment Problem and Extremal Problems. Amer. Mathematical Soc., Providence, RI. Kurotschka, V. (1978). Optimal design of complex experiments with qualitative factors of influence. Commun. Statist. Theory Methods 7, 1363-1378. Lau, T.-S and W. J. Studden (1985). Optimal designs for trigonometric and polynomial regression using canonical moments. Ann. Statist. 13, 383-394. Leamer, E. E. (1978). Specification Searches: Ad hoc Inference with Nonexperimental Data. Wiley, New York. Lee, C. M.-S. (1988). Constrained optimal designs. J. Statist. Plann. Inference 18, 377-389. Lehmann, E. L. (1986). Testing Statistical Hypotheses, 2nd edn. Wiley, New York. Lindley, D. V. (1956). On a measure of the information provided by an experiment. Ann. Math. Statist. 27, 986-1005. Majumdar, D. (1992). Optimal designs for comparing test treatments with a control using prior information. Ann. Statist. 20, 216--237. Majumdar, D. (1995). Optimal and efficient treatment-control designs. Preprint. Marcus, M. B. and J. Sacks (1976). Robust design for regression problems. In: S. Gupta and D. Moore, eds., Statistical Decision Theory and Related Topics, Vol. 2, 245-268. Mitchell, T., J. Sacks and D. Ylvisaker (1994). Asymptotic Bayes criteria for nonparametric response surface design. Ann. Statist. 22, 634-651. Mukhopadhyay, S. and L. Haines (1993). Bayesian D-optimal designs for the exponential growth model. J. Statist. Plann. Inference, to appear. Nalimov, V. V. (1974). Systematization and codification of the experimental designs - The survey of the works of Soviet statisticians. In: J. Gani, K. Sarkadi and I. Vincze, eds., Progress in Statistics. European Meeting of Statisticians, Budapest 1972, Vol. 2. Colloquia Mathematica Societatis Jtlnos Bolyai 9. NorthHolland, Amsterdam, 565-581. Nalimov, V. V., ed. (1982). Tables .for Planning Experiments for Factorials and Polynomial Models. Metallurgica, Moscow (in Russian). Odeh, R. E. (1975). Sample Size Choice: Charts for Experiments with Linear Models. Marcel Dekker, New York. O'Hagan, A. (1978). Curve fitting and optimal design for prediction (with discussion). J. Roy. Statist. Soc. Ser. B 40, 1--41. Owen, R. J. (1970). The optimum design of a two-factor experiment using prior information. Ann. Math, Statist. 41, 1917-1934. Papalambros, P. Y. and D. J. Wilde (1988). Principles of Optimal Design. Cambridge, New York. Pilz, J. (1981). Robust Bayes and minimax-Bayes estimation and design in linear regression. Math. Operationsforsch. Statist. Ser. Statist. 12, 163-177. Pilz, J. (1991). Bayesian Estimation and Experimental Design in Linear Regression Models. Wiley, New York. Polasek, W. (1985). Sensitivity analysis for general and hierarchical linear regression models. In: P. K. Goel and A. Zellner, eds., Bayesian Inference and Decision Techniques with Applications. North-Holland, Amsterdam, 375-387. Powell, M. J. D. (1981). Approximation Theory and Methods. Cambridge Univ. Press, New York. Pukelsheim, E (1980). On linear regression designs which maximize information. J. Statist. Plann. Inference 4, 339-364. Pukelsheim, E (1988). Analysis of variability by analysis of variance. In: Y. Dodge, V. V. Fedorov and H. P. Wynn, eds., Optimal Design and Analysis of Experiments. North-Holland, New York. Pukelsheim, E and S. Rieder (1992). Efficient rounding-of approximate designs. Biometrika 79, 763-770. Pukelsheim, E (1993). Optimal Design of Experiments. Wiley, New York. Pukelsheim, E and W. J. Studden (1993). E-optimal designs for polynomial regression. Ann. Statist. 21(1).

1146

A. DasGupta

Rao, C. R. (1946). Difference sets and combinatorial arrangements derivable from finite geometries. Proc. Nat. Inst. Sci. 12, 123-135. Rao, C. R. (1947). Factorial experiments derivable from combinatorial arrangements of arrays. J. Roy Statist. Soc. Ser. B 9, 128-140. Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd edn. Wiley, New York. Rivlin, T. J. (1969). An Introduction to the Approximation of Functions. Dover, New York. Rivlin, T. J. (1990). Chebyshev Polynomials, 2nd edn. Wiley Interseienee, New York. Royden, H. L. (1953). Bounds on a distribution function when its first n moments are given. Ann. Math. Statist. 24, 361-376. Rubin, H. (1977). Robust Bayesian estimation. In: S. Gupta and D. Moore, eds., Statistical Decision Theory and Related Topics, Vol. 2, 351-356. Sacks, J. and S. Schiller (1988). Spatial designs. In: Statistical Decision Theory and Related Topics, Vol. 4. Springer, New York, 385-399. Sacks, J., W. J. Welch, T. J. Mitchell and H. E Wynn (1989). Design and analysis of computer experiments. Statist. Sci. 4, 409-435. Sacks, J. and D. Ylvisaker (1970). Statistical designs and integral approximation. Proc. 12th Bien. Seminar Canad. Math. Cong., Montreal, 115-136. Sacks, J. and D. Ylvisaker (1964). Designs for regression problems with correlated errors III. Ann. Math. Statist. 41, 2057-2074. Sarkadi, K. and I. Vincze (1974). Mathematical Methods of Statistical Quality Control. Academic Press, New York. Schoenberg, I. J. (1959). On the maximization of some Hankel determiants and zeros of classical orthogonal polynomials. Indag. Math. 21, 282-290. Schumacher, E and J. V. Zidek (1993). Using prior information in designing intervention detection experiments. Ann. Statist. 21, 447-463. Schwarz, G. (1962). Asymptotic shapes of Bayes sequential testing regions. Ann. Math. Statist. 33, 224-236. Seber, G. A. E and C. J. Wild (1989). Nonlinear Regression. Wiley, New York. Shah, K. S. and B. K. Sinha (1989). Theory of Optimal Designs. Lecture Notes in Statistics 54, Springer, New York. Siegmund, D. (1985). Sequential Analysis. Springer, New York. Silvey, S. D. (1980). Optimal Design. Chapman and Hall, London. Skibinsky, M. (1968). Extreme nth moments for distributions on [0, 1] and the inverse of a moment space map. J. AppL Probab. 5, 693-701. Smith, A. E M, and I. Verdinelli (1980). A note on Bayesian design for inference using a Hierarchical linear model. Biometrika 67, 613-619. Smith, K. (1918). On the standard deviations of adjusted and interpolated values of an observed polynomial function and its constants and the guidance they give towards a proper choice of the distribution of observations. Biometrika 12, 1-85. Spiegelhalter, D. J. and L. S. Freedman (1986). A predictive approach to selecting the size of a clinical trial, based on subjective clinical opinion. Statist. Med. 5, 1-13. Starr, N. and M. B. Woodroofe (1969). Remarks on sequential point estimation. Proc. Nat. Acad. Sci. 63, 285-288. Staudte, R. G. and S. J. Sheather (1990). Robust Estimation and Testing. Wiley Interscience, New York. Steinberg, D. M. and W. G. Hunter (1984). Experimental design: Review and comment. Technometrics 26, 71-97. Stigler, S. M. (1971). Optimal experimental design for polynomial regression. J. Amer. Statist. Assoc. 66, 311-318. Stigler, S. M. (1974). Gergonne's 1815 paper on the design and analysis of polynomial regression experiments. Historia Math. 1, 431-447. Studden, W. J. (1977). Optimal designs for integrated variance in polynomial regression. In: S. S. Gupta and D. S. Moore, eds., Statistical Decision Theory and Related Topics I1. Proc. Symp. Purdue University, 1976. Academic Press, New York, 411-420. Studden, W. J. (1982). Some robust-type D-optimal designs in polynomial regression. J. Amer. Statist. Assoc. 77, 916-921.

Review of optimal Bayes designs

1147

Tang, Dei-in (1993). Minimax regression designs under uniform departure models. Ann. Statist. 21, 434446. Toman, B. and W. Notz (1991). Bayesian optimal experimental design for treatment control comparisons in the presence of two-way heterogeneity. J. Statist. Plann. Inference 27, 51-63. Toman, B. (1992). Bayesian robust experimental designs for the one-way analysis of variance. Statist. Probab. Lett. 15, 395--400. Toman, B. and J. L. Gastwirth (1993). Robust Bayesian experimental design and estimation for analysis of variance models using a class of normal mixtures. J. Statist. Plann. Inference 35, 383-398. Verdinelli, I. (1983). Computing Bayes D- and E-optimal designs for a two-way model. The Statistician 32, 161-167. Verdinelli, I. and H. E Wynn (1988). Target attainment and experimental design, a Bayesian approach. In: Y. Dodge, V. V. Dedorov and H. E Wynn, eds., Optimal Design and Analysis of Experiments. NorthHolland, New York. Wald, A. (1943). On the efficient design of statistical investigations. Ann. Math. Statist. 14, 134-140. Wasserman, L. (1992). Recent methodological advances in Robust Bayesian inference. In: J. Bemardo et al., eds., Bayesian Statistics, Vol. 4. Oxford Univ. Press, Oxford. Whittle, E (1973). Some general points in the theory of optimal experimental design. J. Roy. Statist. Soc. Ser. B 35, 123-130. Wilde, D. J. (1978). Globally Optimal Design. Wiley Interscience, New York. Wu, C.-E (1988). Optimal design for percentile estimation of a quantal response curve. In: Y. Dodge, V. V. Fedorov and H. E Wynn, eds, Optimal Design and Analysis of Experiments. North-Holland, Amsterdam. Wynn, H. E (1977). Optimum designs for finite populations sampling. In: S. S. Gupta and D. S. Moore, eds., Statistical Decision Theory and Related Topics IL Proc. Syrup. Purdue University, 1976. Academic Press, New York, 471-478. Wynn, H. P. (1984). Jack Kiefer's contributions to experimental design. Ann. Statist. 12, 416--423.

S. Ghosh and C. R. Rat, eds., Handbookof Statistics, Vol.13 © 1996 ElsevierScienceB.V. All fights reserved.

mAN Jltl

Approximate Designs for Polynomial Regression: Invariance, Admissibility, and Optimality

Norbert Gaffke and Berthold Heiligers

1. Introduction

This paper brings together different topics from the theory of approximate linear regression design. An overview as well as new results are presented on invariance and admissibility of designs, their interrelations and their implications to design optimality, including numerical algorithms. Although a great part of concepts and results will be presented in the framework of the general linear regression model, the emphasis lies on multiple polynomial models and their design, with particular attention to the linear, quadratic, and cubic cases which are most frequently used in applications. Invariance structures combined with results on admissibility provide a tool to attack optimal design problems of high dimensions, as occurring for second and third order multiple polynomial models. Only in very rare cases explicit solutions can be obtained. So an important aspect are numerical algorithms for computing a nearly optimal design. From the general results on these topics presented below, we wish to point out the following ones at this place. Invariance w.r.t, infinite (though compact) transformation and matrix groups usually calls for the Haar probability measures, involving thus deep measure theoretic results. We will show a way of avoiding these by using only linear and convex structures in real matrix spaces. The Karlin-Studden necessary condition on the support of an admissible design is known to become useless for regression models involving a constant tenn. Here we derive a modified necessary condition which also works for models with constant term, and also including possible invariance structures. This result originates from Heiligers (1991). Well known numerical algorithms for solving extremum problems in optimal design are pure gradient methods, among them the steepest descent method of Fedorov and Wynn. However, after a quick but rough approximation towards the optimum they become very inefficient. The Quasi-Newton methods of Gaffke and Heiligers (1996) provide an efficient way of computing the optimum very accurately. Our general results will be applied to invariant design for multiple polynomial regression models of degree three or less, and to rotatable design for models with arbitrary degree. 1149

N. Gaffke and B. Heiligers

1150

Throughout we will deal with linear regression models under the standard statistical assumptions. That is, an independent variable x with possible values in some design space ,-V effects a real-valued response

=

0,fj(x) =

o'f(x),

(1.1a)

j=l where 0 = ( 0 1 , . . . , Ok)t • ~k is an unknown parameter vector, and f = ( f l , . . . , f k ) ' is a given Nk-valued function on X. This is the deterministic part of a linear regression model. Embedded in the usual statistical context, observations of the response y at points z l , . . . , xn • 2(, say, are represented by real-valued random variables Y 1 , . . . , Yn, such that

Var(Y Cov(Y/, Yj) = O,

i,j=l,...,n,

i=l,..,n, iCj.

(1.1b)

The constant variance e2 • (0, c~) is usually unknown, and is hence an additional parameter in the model. It should be emphasized that by (1.1b) observations of the response variable y include random errors, whereas observations of the independent variable x are exact. Regression setups with both variables subject to random errors often called 'error-in-the-variables models' - will not be considered here. Moreover, we assume that the values of x at which observations of y are taken can be controlled by the experimenter. So, we are concerned with designed experiments, which frequently occur in industrial experiments. An (approximate)design ~ for model (1. la) consists of finitely many support points xl, •. •, xr • X, at which observations of the response are to be taken, and of corresponding weights ((xi), i = 1 , . . . , r, which are positive real numbers summing up to 1. In other words, an (approximate) design is a probability distribution with finite support on the experimental region X. For short, we write

=

xl

(xl)

""

...

x~ )

'

(1.2)

, x r • X with ~ ( X l ) , . . . , ~(Xr) > O, ~ i = l ~(Xi) = 1, and r • N. T h e s e t supp({) := { x l , . . . , x~} is called the support of 4. A design { assigns the percentage {(xi) of all observations to the value xi of the independent variable, i = 1 , . . . , r. Note that different designs may also have different support sizes r. Of course, when a total sample size n for an experiment has been specified, a design { from (1.2) cannot, in general, be properly realized, unless its weights are integer multiples of 1/n, i.e., unless where Xl,...

~(xi) = n__~i for all i = 1 , . . . , r , n

(1.3)

Approximate designs .for polynomial regression: lnvariance, admissibility, and optimality

1151

?,

for some positive integers nl,..., nr with ~i=~ ni = n. That is why we call, following Kiefer, a design from (1.2) an approximate design, which only in the special case (1.3) becomes an exact design of size n. Since we will mainly deal with the approximate theory, we simply call ~ from (1.2) a design, while (1.3) will be referred to as an exact design of size n (denoted by ~n). The statistical quality of a design ~ for setup (1.1a, b) is reflected by its moment matrix (or information matrix) M(~), defined by

M(~) = ~ f(xi) f(xi)' ~(xi),

(1.4)

i=1

which is a nonnegative definite k × k matrix. For an exact design ~n, the inverse of M(~n) times o'2/n is the covariance matrix of the Least Squares estimator of 0 (provided M(~n) is non-singular), and under normality assumption in (1. lb) (n/aZ)M(~,~) is the Fisher information matrix. The Loewner partial ordering of moment matrices provides a first basis for comparing designs. That partial ordering on the set Sym(k) of real symmetric k × k matrices is defined by A ~< B ¢==v B - A E NND(k)

(A, B E Sym(k)),

where NND(k) consists of the nonnegative definite matrices from Sym(k). If ~ and r/are designs with M(~) ~< M(rl), then r/is said to be at least as good as ~, and if additionally M(~) ~ M(rl), then 77 is said to be better than (. This is statistically meaningful through the linear tbeory of Gauss-Markov estimation, whether or not the moment matrices of competing designs are non-singular, and whether or not normality is assumed in (1.1b). Suppose that ~ is an exact design of size n. Given a coefficient vector c E R k, the variance of the BLUE (or Gauss-Markov estimator) of c'O under ~n is equal to (o-2/r~) etM-(~n)e, where M-(~,~) denotes a generalized inverse of M((n), provided that c'O is linearly estimable (identifiable) under (n, i.e., c E range(M((n)). So, for any design ~, we consider the variance function (per unit of error variance)

V(~,c) = ~ c'M-(~)c, t ~,

if c E range(M(~)), otherwise.

Then, for any two designs ~ and r1 we have M(~) ~< M(r/) -.' .'- V((, c) i> V(rh c)

for all c E R k.

(1.5)

If the moment matrices of ~ and ~ are nonsingular, theia (1.5) can simply be stated as M(~) ~ M(77) ~

M-l(r/) ~< M-I(~).

Equivalence (1.5) can be refined for subspaces of linear parameter functions c'O, where e is restricted to some given linear subspace of dimension s /> 1. Let the

1152

N. Ga.OTceand B. Heiligers

subspace be represented as range(K I) with an s x k matrix K of rank s. Then, given a design ~, we consider the nonnegative definite s x s matrix CK (~) - often called the reduced information matrix of ~ for KO - whose definition is somewhat implicit for arbitrary K and 4, namely

CK (~) = min L M(~) L',

(1.6a)

L

where the minimum refers to the Loewner partial ordering in Sym(s) and is taken over all left inverses L of K ' (i.e., over all s x k matrices L with L K ~ = Is, with/8 being the unit matrix of order s). The existence of the minimum in (1.6a) was proved by Krafft (1983). This definition of reduced information matrices has been used by Gaffke (1987) and Pukelsheim (1993), Chapter 3.2, who also showed how to compute a minimizing L for (1.6a) (Pukelsheim, 1993, p. 62). A familiar special case of (1.6a) is K = [Is, 0] = Ks, say; here, partitioning

M(4) = [ Ml(4)

M12(~)]

[M(2(~) M2(~) (where the matrices M1({), M2(4) and M12({) are of sizes s x s, (k - s) x (k - s), and s x (k - s), respectively), yields (1.6a) as the Schur complement

CKs (4) = M1 (~) - M12(~) M 2 (~) M/2(~ ). For general K, but under the assumption that dO is linearly estimable under ~ for all c c range(K'), i.e., range(K') C range(M(()), we may write more explicitly

CK (~) -----( K M - (() K ' ) -1,

(1.6b)

in which case CK (4) is positive definite. Now, the refinement of (1.5) (for general K, 4, and r/) is, cf. Gaffke (1987, Section 3),

CK(4) ~ CK(rl) ~

V(4, c) >~ V(~?, c)

for all c E range(K').

(1.7)

The Loewner partial ordering for moment or reduced information matrices, though statistically of fundamental importance, does not suffice for selecting a single 'optimal' design. For a large set of designs the associated moment or reduced information matrices are not comparable in that ordering (unless k = 1 or s = 1). A popular way out is to specify a real-valued optimality criterion, defined as a function of the moment matrices of the designs. An optimal design is one whose moment matrix minimizes the criterion over the set of competing moment matrices (or designs). By now, the following seems to cover all statistically meaningful criteria.

Approximate designs for polynomial regression: Invariance, admissibility, and optimality

1153

DEFINITION 1.1. A function #: .A -~ ~ is called an optimality criterion iff (i) .4 is a convex cone in Sym(k), such that PD(k) C .A c NND(k) (where PD(k) denotes the set of all real symmetric, positive definite k x k matrices); (ii) # is antitonic w.r.t, the Loewner partial ordering, i.e., A, B E .4, A < B, imply #(A) ~> #(B); (iii) # is convex. The domain .A in Definition 1.1 is often referred to as the feasibility cone of the design problem. A design ~ is feasible iff it allows unbiased linear estimation of all mean parameters of interest, i.e., iff M(~) E .A. In most cases the full mean parameter vector 0 is of interest, whence A = PD(k), and the feasible designs are those with positive definite moment matrices. If the parameters of interest build a proper linear subsystem, represented by K8 with some given s x k matrix K of rank s, then ,A = A ( K ) = {A E NND(k): range(K') C range(A)},

(1.8a)

and the feasible designs are those under which K8 is estimable (or, equivalently, those with M(¢) E .A(K)). In this case, the linear subsystem is usually reflected also by a particular form of the optimality criterion ~5, namely

qS(A) = ¢(CK(A)),

A E A(K),

(1.8b)

where CK (A) is defined similarly to (1.6a),

CK(A) =

rain

L: LK'=I~

LAL'

(A E NND(k));

(1.9a)

on .A(K) we may write more explicitly, analogously to (1.6b),

CK(A) = ( K A - Kt) -1

(A E A(K)).

(1.9b)

The function ¢ on the right hand side of (l.8b) has to satisfy (it) ¢: PD(s) -+ IR; (iit) ¢ is antitonic w.r.t, the Loewner partial ordering on PD(s); (iii') ¢ is convex. In fact, properties (f)-(iii') of ¢ imply that (1.8b) defines an optimality criterion ~5. This follows immediately using formula (1.9a), by which

• (A) =

max

L: LK'=I~

¢(LAL')

for all A E.A(K).

Given an optimality criterion according to Definition 1.1, the optimal design problem for regression model (1.1a) is to minimize ~5(M(()) over all designs ~ with M(~) E .A.

(1.10)

N. Gaffl(e and B. Heiligers

1154

Considering the moment matrix (rather than the design) as a variable, and introducing the set .hA of moment matrices M(~) when ( ranges over the set of all designs, problem (1.10) rewrites as minimize ¢ ( M ) over M e .AA N A.

(1.10a)

This is a convex minimization problem, since A4 is a convex subset of Sym(k). Actually, as it is easy to see, we have

vM = C o n v { f ( x ) f ( x ) ' : x e X } ,

(1.11)

where Conv S denotes the convex hull of a subset S in a linear space. Moreover, if f ( X ) (the range of f ) is compact - as it is usually true - then by (1.11) the set .M is compact. An important class of optimality criteria are orthogonally invariant criteria on PD(k) (i.e., criteria based on the eigenvalues of a positive definite moment matrix). These can be constructed from the following result. LEMMA 1.2. Let ¢ be a real-valued function on (0, c~) k, such that ¢ is convex, permutationally invariant and antitonic w.r.t, the componentwise partial ordering of vectors in (0, oo) k. Define q5 by ¢(A) := ¢()~(A)),

for all A E PD(k),

where )~(A) = ()q (A),...,)~k(A))' denotes the vector of eigenvalues of A arranged in ascending order, Ax(A) ~ O, and since E is positive

Approximate designs for polynomial regression." Invariance, admissibility, and optimality

1159

definite, (B,/3)E = 0 implies BEB' = 0, hence B = 0. Thus, the space IRkxk endowed with the scalar product ( . , . ) ~ is a Hilbert space. Denote the associated /~ , B E lRkXk. The scalar product, and hence the norm, norm by [IBIIE :-- (B, B\1/2 enjoys the invariance property that

( Q B Q ' , Q C QI ) E = ( B , C ) E

for a l l B , C c R kxk and a l l Q E Q,

since the left hand side of this equation equals

tr(OBQ I E(Q C Q I)I E) = tr ( B Q i~ C

I

Q i~ ) .

=E

=E

The convex hull C := Conv{QAQ': Q E Q} is compact by compactness of Q; in particular, C is a convex and closed subset of the Hilbert space. Hence, as it is well known, C-contains a unique point with minimum norm, C*, say. Observing that C satisfies QCQ' E C for all C c C and all Q E Q, invariance of the norm yields QC*Q ~= C* for all Q E Q. Therefore, A := C* is a matrix as desired. For proving uniqueness of A it suffices to show that any .4 E IRk×k satisfying the conditions of the theorem fulfills

(A - ft., B)E = 0 for all B c £, i.e., A is the (unique) orthogonal projection of A onto the subspace/2 in the Hilbert space (R kxk, (., ")E). In fact, from

i=l

for some r c N, a l , . . . , a r > 0 with ~i=lr

ai = 1, and Q1, • • ., Qr c Q, we obtain

I I ai tr(Q~AQiEB E) =

(A, B)E = i=1

I I I ai tr(AQiE(Q~BQ~) EQi)

i=I

?.

T

ai tr(AQ~EQiB'Q~EQi) = ~_, cqtr(AEB'E) i=l

(A, B)E

i=1

for all B E £.

This also shows that the mapping 79 is a linear projection operator from 7-/ onto E, namely the orthogonal projection operator under the scalar product (., • )E[] For the case that the matrix group Q consists of orthogonal matrices only (and hence E - - / k ) , the projection property of the average .4 was observed in Pukelsheim (1993, p. 349).

N. GaJ]keand B. Heiligers

1160

Any transformation g from ,-Y onto 2d induces a transformation of designs ~ by (g(Xl) ~g:= ~(x,)

'" ...

g(Xr)~ ~(xr),]

for~=

(

Xl ~(x,)

"'" ...

Xr ) . ~(xr)

(2.4)

Given an equivariant linear regression model, the moment matrix of M(~ g) is linearly related to M(~), since we immediately obtain from (1.4), (2.4), and Definition 2.1

M(~ g) :

Q9M (~)Qg'

for all designs ~ and all 9 E G.

(2.5)

DEFINITION 2.5. Given an equivariant linear regression model (w.r.t. groups 6 and Q), a design ~ is said to be invariant iff ~g = ~ for all 9 E G; ~ is called weakly invariant iff M(~ g) = M(~) for all 9 C G. Weak invariance of a design (in an equivariant regression model) just means invariance of its moment matrix under the linear transformations on Sym(k) suggested by (2.5),

A--+ QAQ',

A c Sym(k)

(Q E Q).

(2.6)

Of course, for this situation the terminology 'design with invariant moment matrix' would be more adequate. However, we would like to have a short notation, additionally emphasizing that invariance of a design is in fact a stronger property. The set of invariant designs may be much smaller than that of weakly invariant designs, as, e.g., for multiple polynomial models on symmetric regions (see Example 2.8, below). LEMMA 2.6. Consider an equivariant linear regression model (1.1 a) (w.r.t. groups and Q),~and let Q be compact. Then, for any design ~ there exists a weakly invariant design ~, such that M(~) C Conv{M(~g): g C ~}.

If ~ is finite, then for any weakly invariant design ~ there exists an invariant design with M ( O = M(~) and supp(~) C supp(~). PROOF. By (2.5), given a design ~, we have Conv{M(~g): g e g} = Conv{QM(~)Q': Q e Q}. Since the moment matrices of all designs form a convex set, the average M of QM(~)Q ~ over Q E Q from Theorem 2.4 is the moment matrix of some design ~, which is trivially weakly invariant. If ~ is finite, then the average of the probability distributions ~ 9 over g c G,

Approximate designsfor polynomial regression:Invariance, admissibility, and optimality 1161 is again a probability distribution with finite support, i.e., a design. Obviously, ~ is invariant with supp(~) C supp((), and

M(~) = - ~ ~ M(~ g) = gEG

Z M(~) = M(~). gEG []

Let us consider some examples of equivariant linear regression models. For the trivial groups G and Q consisting only of the identity on ,-Y and the unit matrix Ik, any linear regression setup is equivariant (w.r.t. these trivial groups). Nontrivial examples are multiple polynomial regression models on symmetric regions. EXAMPLE 2.7. Consider a dth degree multiple polynomial regression setup on some experimental region 2( C R v,

y(X)=~O~x

X = ( X l , . . . , X v ) t C X,

~,

(2.7)

sEA

where a = ( o £ 1 , . . . , 5 v ) is a v-dimensional multi-index with nonnegative integer components, A is a given nonempty set of v-dimensional multi-indices, such that 151 := ~ = 1 5~ ~< d for all a E A, and [a*[ -- d for at least one 5" E A. The power v x~',• with the convention x 0i = 1. x a is to be understood as I-[/=1 Note that by (2.7) we admit proper submodels of the full polynomial setup of order d, the latter being the case A = Ad, where Ad:={sEN~:

[5 I~1 tr(Q'AQM)

for all Q E Q and all M E .M.

(3.1)

Applying Theorem 2.4 to the compact matrix group Q~ := {Qq Q E Q} and the matrix A, we obtain the average A of Q~AQ over Q~ E Q~, and by (3.1)

tr(AMo) >1tr(AM)

for all M E .M.

This means that each support point of ~0 maximizes qz (z) over z E 2(. The proof is completed by noting that since A is nonzero and nonnegative definite, so is Q~AQ for each Q E Q, and hence .A has the same property. [] Unfortunately, the necessary condition for admissibility of a design from part (a) of the theorem becomes useless if there exists a nonzero and nonnegative definite matrix A such that

qA(X)

=

f ( x ) ' A f ( x ) = constant on X.

(3.2)

In particular, if the regression model (1.1a) includes a constant term, i.e., if one component of f is a nonzero constant, fl - 1 say, then (3.2) is valid (with A = d i a g ( 1 , 0 , . . . , 0)). Regression models with constant term are rather the rule than the exception. This demands for a modification of the Karlin and Studden result for situation (3.2); in fact, such a modification has been proved by Heiligers (1991), Theorem 4. We will derive here a result slightly different from his, in that the matrix group Q is treated differently. If Q consists of orthogonal matrices only - as it is true in most applications - our approach is that from Heiligers (1991). The starting point is a condition weaker than (3.2), which will turn out to be more compatible to the Loewner partial ordering of moment matrices. Define the set 79 of nonnegative definite matrices by 79 := {2142 - MI: M1,M2 E ./M, M1 ~ M2}.

(3.3)

Approximate designsfor polynomial regression: lnvariance, admissibility, and optimality

1167

As it is easily seen, it follows from the convexity 73 that there exists a D* E 73 with largest range, equivalently, with smallest nullspace, /C := nullspace(D*) c nullspace(D)

for all D E 73.

(3.4)

Now, a condition weaker than (3.2) is that t C ¢ {0}.

(3.5)

In fact, if there exists an A E NND(k), A ¢ 0, with (3.2) then for D* = M~ - Ml* from (3.4) (with M1*, M~ E .M and M~ ~< M~) we have tr(AD*) = tr(AM2*) - tr(AMl*) = O, hence AD* = 0, that is, {0} # range(A) C nullspace(D*) = / C . Thus (3.2) implies (3.5), but the converse seems not to be true in general. LEMMA 3.3. Denote by P~: the orthogonal projector from R k onto ~ gwen by (3.4), and let £ be a linear subspace of ]Rk, complementary to ~. Then, for any two moment matrices M, N E AJ we have

M 0 with ~ i =7" l ai = 1, and Q 1 , . . . , Q r E Q. Then take for example ~r~ := Q-{1M1(Q-{1)," By Q ~ Q ' E 34 and Q BoQ = Bo for all Q E Q, and by (3.10) we get

tr( C M1) + b tr(BoM,) =

L

t

~

t

ai (tr(A P~cQ~M1Qi) + b tr(BoOiM, e~))

i=l

~ tr(BM) + tr(CM). In particular, with M = M1 we get from Lemma 3.3 (observing C = CP~c), C M o =

CP~cMo = CPIcM1 = CM1, and hence tr(B(M~ - Mo)) = 0, i.e., /2 : range(B) C nullspace(M1 - Mo). By (3.4), /C C nullspace(M1 - M o ) ; hence both, /2 and K: are subspaces of nullspace(M1 - M0), and therefore M1 - Mo -:- 0 because of/2 + K: = R k. []

4. Invariant and admissible multiple polynomial regression designs

Regression models of particular interest in practice are the dth degree multiple polynomial setups from Examples 2.7, 2.8 =

• c x,

(4.1)

c~EA

where ,Y C N v is compact, and, as before A C A d contains at least one multi-index of order d. Here the moment matrix of a design ~ with support points x l , . . . , Xr consists of mixed moments #~(~) = ~ir__l x.~(xi), a • N~, up to order ] a [ ~< 2d, M(~) = (/~+~(~))c,,ZeA"

(4.2)

Approximate designsfor polynomial regression: lnvariance, admissibility,and optimality 1173 Depending on the index-set A, (4.1) does not necessarily involve the constant term x ° _= 1, and thus the Karlin and Studden result (see Theorem 3.2 above) may yield a nontrivial result on admissible designs. The following Lemma 4.1, however, will enable a unified approach to admissibility for all possible index sets A in the setup. In this section we will use the notion 'A-admissibility' instead of 'admissibility', referring to the particular setup under consideration. The moment matrix of a design for the full polynomial setup y(x) = ~-~lal M(~) and Md(~) >~Md(rl). The latter inequality implies M(~) /> M(~7), since M(~) and M(~?) are principal sub-matrices of Md(() and Md(rl), respectively. Thus, A-admissibility of r/implies U(~) = M(r/), and therefore also A-admissibility of ~ follows. [] Recall that Lemma 4.1 applies as well to non-equivariant setups (which are covered by choosing ~ and Q as the trivial groups). As a consequence from Lemma 4.1, under the above invariance assumptions and under an invariant optimality criterion, the search for a solution to an optimal design problem in setup (4.1) can be always restricted to the set of weakly invariant and Ad-admissible designs. In this context an important step is the determination of the Iinear space/C from (3.4) for the full polynomial model of degree d. This was done by Heiligers (1991, Lemma 2); we restate his result here, for convenience. THEOREM 4.2. For the full multiple polynomial model of degree d, i.e., (4.1) with A = Ad, we have £:=span{e,~: a E l ~ l ~ , l a J ~ < d - 1 } C / C , where e,z, a E Aa, denote the unit-vectors in ]~k, k = (v~d). If the monomials z'~, I a] ~ 0 such that the polynomial 2d-I

=

x

+

• c [a, b], i=0

is non-constant, and each support point of ( maximizes p over [a, b]. (Note that CPK. = C and B E N N D ( d + 1), range(B) C K;-k, means here Cid = 0 for all i = 0, 1 , . . . ,d, where cij, i , j = O, 1 , . . . , d , are the entries of C, and B = diag(0,... ,0, fl) with fl 7> 0.) Since the leading coefficient of p is nonnegative, that polynomial possesses at most d - 1 maximum points in the open interval (a, b). Conversely, let ~ be a design with #(supp(() fq (a, b)) ~< d - 1, supp(~) fq (a, b) = { z l , . . . , ze}, ~ ~ d - 1, say. Consider the 2dth degree polynomial g

p ( x ) = 1 - (x - a) 2(d-e)-1 (b - x) H (x - z~) 2 i=l 2d

= E

ci x i,

x E [a, b],

say,

i=0

which obviously has leading coefficient C2d = 1. The only maximum points of p in [a, b] are the endpoints of the interval and the support points of (. Thus, defining for O ¢(A) + (G, B - A)

for all B E ~4;

(5.2)

the set of all subgradients of ~ at A is_called the subdifferential of • at A, denoted by O~5(A). Subgradients of ~5 exist at least at interior points of its domain, but they may

N. Gafflce and B. Heiligers

1186

also exist at boundary points of A, which becomes relevant in case of a criterion of type (1.8b) for partial parameter systems (cf. Gaffke, 1985, Section 3, or Pukelsheim, 1993, pp. 164 ft.). If A E PD(k) and if ~ ( A ) consists of a single point G, then ~ is differentiable at A and G = V~(A). Conversely, if • is differentiable at A E PD(k), then V~(A) is the unique subgradient of • at A. The above description of gradients or subdifferentials via (5.2) becomes in particular useful for orthogonally invariant criteria (see also Lemma 1.2): LEMMA 5.1. Let ¢ be a real function on (0, c~) k which is antitonic (w.r.t. the componentwise partial ordering on (0, ~ ) k ) , convex, and permutationally invariant. Consider the optimality criterion on PD(k) given by qh(A) = ¢(A(A))

for all A E PD(k),

where, as before, A(A) denotes the vector of eigenvalues of A arranged in ascending order, As(A) ~ ¢(A(A)) + A(G)'(# - A(A))

for all/z E (0, oo) k.

Hence; g := A(G) is a subgradient of ¢ at A(A). If ¢ is differentiable at A(A), the matrix P diag(V¢(A(A)))P' does not depend on P E SA (proving the statement on differentiability and the gradient of ~5 at A). For, this is obvious if all the eigenvalues of A are simple. Otherwise, if some eigenvalues have multiplicities greater than 1, then the corresponding components of V¢(A(A)) coincide, as follows from the permutational symmetry of ~. Hence, denoting by AT,..., l~ the distinct eigenvalues of A and by g{',..., g~. the corresponding components of V¢(A(A)), we see that r

Pdiag(V¢(A(A)))P'= Eg~E{, i=l

where Ei is the orthogonal projection matrix onto the eigenspace to A~', i = 1 , . . . , r. [] EXAMPLE 5.2. For the ~v-criteria from (1.12), - c o ~< p ~< 1, we have qSp(A) = ~@(A(A)), A C PD(k), with

I I le \--l/p ') ( k

'

i f p ¢ - o c , 0,

,-ilk

t,__lTz,)

,

( min z{] - 1 \i=l,...,k

]

if p = 0,

z C (0, oo) k.

if p = - c %

Now, Lemma 5.1 yields the well known formulae for the gradients of ~i~ip, if p > - o c ,

V~p (A) = - k ( ~ (A)) p+l Ap_ 1

N. Gaffke and B. Heiligers

1188

(c.p. Pukelsheim (1993, p. 179), dealing with 1/qSp), where the power A t, A ¢ PD(k) and t ¢ JR, is defined via the spectral decomposition of A, namely A t := Pdiag(A~(A),...,A~(A))P', which does not depend on the particular choice of PESA. Consider now the ]E-criterion qS_oo. Let z -----( z ] , . . . ,zk) ~ E (0, cx~)k, and define I(z) := {i E { 1 , . . . , k}: z~ = minj=l .....k zj}. It is easily seen that the subdifferential of ¢ _ m at z is given by

=

-

(

¢-oo(z)) 2

wiei: wi ~ 0 for all i E I(z), ieI(z)

iEI(z)

where ei is the ith unit vector in R k. Denote, for A E PD(k), by r the multiplicity of the smallest eigenvalue of A, and by g ~ , ( A ) the corresponding eigenspace. Lemma 5.1 gives that the subgradients of ~ 5 ~ at A are precisely the matrices r

G:

- (~f_oo ( A ) ) 2 ~-~ w~p~p~ i=1

where w l , . . . , w r £min(A), thus

~> O, ~ir__l wi = 1, and P l , . . - , P r is an orthonormal basis of

0qs_o~(A) = {±(qs_oo(A))2E: E e NND(k), range(E) C Emin(A), tr(E) = 1}, (cf. Kiefer, 1974, Section 4E, or Pukelsheim, 1993, Lemma 6.16). For optimality criteria from (1.8b) related to parameter subsystems, designs with singular moment matrices may be relevant when solving problem (1.10) or (1.10a). The problem of describing the subdifferentials of ~f at singular points of its domain ~4 = ~4(K) from (1.8a) (which are boundary points of A) turned out far from being trivial, and was completely solved in Gaffke (1985, Section 3), and in Pukelsheim (1993, Section 7.9). However, we will not present the details here, and the interested reader is referred to these references. In fact, the most important case is that of full parameter estimation, and thus leading to designs with positive definite moment matrices. A general equivalence theorem for problem (1.10) is the following, which comes from a major result of convex analysis, cf. Rockafellar (1970, Theorem 27.4). However, if the criterion ~5 is differentiable at the optimal point M* = M(~*), then the equivalence is fairly obvious and can be derived by rather elementary arguments (see the remark stated below). THEOREM 5.3. Given a regression model (1.1a) and an optimality criterion qs, consider the optimal design problem (1.10) of minimizing ~5(M(~)) over all designs ~ with

Approximate designs.for polynomial regression: Invariance, admissibility, and optimality 1189 M(~) c ~ Assume that there exists a design with positive definite moment matrix, equivalently, assume that the components of f from (1.1a) are linearly independent on 2(. Let ~* be a design with M* := M(~*) E ~A. Then, ~* is an optimal solution to (1.10) iff there exists a subgradient G* of ~ at M*, such that each support point of ~* is a global maximum point of the function q-a*(x) := f ( x ) ' ( - G * ) f ( x ) ,

x C X.

REMARK. Since q-G* (x) = t r ( - G * f ( x ) f ( x ) ' ) and, by (1.11), the convex hull of all f ( x ) f ( x ) ' , x E X, equals the set M of all moment matrices of designs, the condition of the theorem on the support of an optimal design can be restated as

t r ( G * ( M - M*)) >1 0

for all M E M .

Hence, if q5 is differentiable at M*, then G* = VqS(M*), and the theorem simply means, that a matrix M* is optimal iff the directional derivatives of q5 at M* are nonnegative for all feasible directions. In this case the stated equivalence is fairly obvious and easily proved.

6. Reduction of dimensionality In problem (1.10a) the variable is the moment matrix of a design, which may cause a large dimension of the optimization problem, especially for multiple regression models, such as multiple polynomial models of degree two or three. When equivariance properties can be utilized, the restriction to invariant designs according to Lemma 2.6 and Lemma 2.10 may reduce the dimension considerably (see the examples below). In order to include the possibility of such reductions, it is convenient to state the extremum problem in a more general way, giving, as a byproduct, a deeper insight into the specifc structure of problem (1.10a). To this end, and as a price we have to pay, a more abstract frame has to be considered. Firstly, the underlying space is now a real Hilbert space 7-/of finite dimension (with scalar product and norm denoted by (., .) and II. II, respectively). In applications, this will be either the space Sym(k) with scalar product (A, B) = tr(AB) as before, or the (column vector) space R t with the usual scalar product (a, b) = a~b. The ingredients of (1.10a), ~, .M, and ,4, carry over to the following. Let 3,4 and .A be convex subsets of the Hilbert space 7-/(.A needs not to be a cone), such that .M is compact, and .M N in@A) ~ 0, where int(.A) denotes the interior of .A, and ~ is a real valued convex function on ¢4. The extremum problem now reads as minimize qS(m) over m E .M N A.

(6.1)

The intersection of Ad with ,A expresses the original condition of identifiability of parameters under the competing designs, now restated in terms of the (possibly) reduced variable m. As a particular feature of this problem, translating (1.11) into the more abstract framework, the set .M is assumed to be given in the form AA = Conv{m(x): x E X } ,

(6.2)

N. Gafflce and B. Heiligers

1190

where {re(x): x C X} is a given, compact set of points re(x) in N (where X may be any nonempty set). For reconstructing a design ~ associated to some m E A4 it will be important that for any point re(x) from the generating family an associated design ~x, say, is known, whence the problem of finding ~ becomes a decomposition problem, ___r

findrEN,

xl,...,xrEX,

W l , • • • , Ww >

0,

with

)_~ w~ = 1, i=1

(6.3)

such that ~ wim(xi) = m. i=1

Then, a design corresponding to m is the mixture ~ = ~ 1 w ~ , . For obtaining implementable versions of our conceptual algorithm described below, an implicit assumption is that the family re(x), x c X, has a fairly simple structure, in the sense that any linear extremum problem over these points, minimize (a, ra(x)} over x E X, should be easily solvable, for any given a E 7-/. We note that (6.1) and (6.2) cover the original problem (1.10a) (under regression model (1.1a)), with the settings 7-/= Sym(k), X = X, and re(x) = f(x)f(x)'. On the other hand, possible reductions of the original problem by invariance or admissibility (or both) is accounted for, as illustrated by the following examples. EXAMPLE 6.1. Consider a quadratic multiple polynomial setup (i.e., (2.7) with d = 2), on a symmetric cube or ball, centered at zero. Assume that the index set A is permutationally symmetric, so that the model is equivariant w.r.t, the transformation group ~sp and the associated matrix group Q (see Example 2.8(c)). For any invariant (w.r.t. Q) optimality criterion • (and thus, in particular, for any orthogonally invariant criterion), the optimal design problem (1.10) can be restricted to invariant and A2-admissible designs, as follows from Lemma 2.6, Lemma 2.10 and Lemma 4.1. We thus obtain a reduction (6.1), (6.2) to only two dimensions, as we will demonstrate now. By the sign change and permutation invariance, the moment matrix of an invariant design ~ includes only three nontriviat moments, namely #2(~) := E~ (x2),

~4(~):= E~(x4)

(independent of i = 1 , . . . , v)

and

(6.4) 2 2 #2~2(¢) := E¢(xixj)

(independent of 1 ~ 0. By Corollary 4.7(a), an invariant design ~ is A2-admissible, if and only if each

Approximate designs for polynomial regression." lnvariance, admissibility, and optimality

1191

x E supp(~) has all its coordinates in {0, +c}. Hence the moments from (6.4) are given by c2 #4(~) = c2#2(~) = ~-

E~

(llxl12), (6.5)

1

#2,2(~) - v(v - 1)(Ee(llxll4) - cEEe(llxll2))' where the last equation in (6.5) comes from

~2,2(~)

1

hXt

V(V ~- 1) E~ \ h~l 1 - - v ( v - l ) E~

/ 1 v ( v -- l ) E~

x~

1

-- v(v -- 1)(E~ (llxl14) - czE~

x4 h

(llxl12))'

So, by (6.5), the moment matrices M(~) of invariant and A2-admissible designs ( are linearly parameterized by the two-dimensional moment vector

m(¢)

(Ee(IIxlIZ),E~(IIxlI4))'.

(6.6)

It remains to describe the range .hi of all moment vectors m = m(~) when ~ ranges over all invariant and A2-admissible designs. To this end, denote by ~ the uniform distribution over those vertices of the cube which have 1 coordinates ±c and v - 1 coordinates zero, l = 1 , . . . , v. Since any invariant design whose support consists of the vertices of the cube only, is a mixture of the designs ~, l = 0, 1 , . . . , v, we obtain A4 = Conv{m((t): / = 0 , 1 , . . . , v } (6.7) = Conv{ (lc2, lZe'): l = O, 1 , . . . , v}. v (b) Let 2t" = B~ = {x = ( x l , . . . , xv)' ERv: ~i=1 xi2 0. By Corollary 4.7(b), an invariant design ~ is Az-admissible iff supp(() C OB~ U {0}. Hence, for these designs,

#e'2-v(v-1)

E~

~-~xix5 iCj

T2

- v - 1 ( # 2 ( ( ) - #4(~)).

-v(v/

1)

E~([IxII4)-E~

~-'~"z4 i=1

1192

N. Gaffke and B. Heiligers

So, the moment matrices of invariant and A2-admissible designs are linearly parameterized by the two-dimensional moment vector T/Z(~) : = (V/Z2(~),V~4(~))t.

As we will show next, the range of these moment vectors is given by the triangle .M = C o n v { (0, 0), (r 2, 1T4), (7"2, T4)}.

(6.8)

Note that the three vertices correspond to the invariant designs @, {1, and ~2, where {0 denotes the one-point design at zero, {l the uniform distribution over the 2 ~ points with coordinates -4-fly/v, and {2 the uniform distribution over the 2v points +rei, i = 1 , . . . , v (where ei denotes the ith unit vector in N~). Thus, for verifying (6.8) we firstly remark that any invariant and A2-admissible design { is a mixture of {0 and of an invariant design r/concentrated on the surface of the ball. For such designs r/we have

vm(w)=En

x~

=r

2

and

v>4(~)=E~

i=1

~z

4 •

i=1

Now, as r/ varies, the moment/z4(r]) ranges over the interval [/Z4,min,/3,4,max],where v 4 ]g4,min and/~4,max are the minimum and the maximum value, respectively, of ~i=1 xi v 2 ~,4/V taken over the sphere ~i=1 xi = r2, and hence, as it is easily seen, #4,rain = and/Z4,max = r 4. EXAMPLE 6.2. Consider cubic multiple polynomial regression, i.e., (2.7) with d = 3, again on the symmetric cube Cc or the ball B~ centered at zero, and for a permutationally invariant index set A, ensuring equivariance w.r.t, the transformation group ~sp and the associated matrix group Q. Again, for solving an optimal design problem with an invariant (w.r.t. Q) optimality criterion, we may restrict ourselves to invariant designs. The moment matrices of these are linearly parameterized by the moment vector 7/Z(~) = (/Z2(~) ,//,4(~), ~6(~),/Z2,2(~),//,4,2({), 1/,2,2,2(~) )t, if v~>3

where #t(~) : = E ( ( X2i),

t=2,4,6(independentofi=l,...,v),

2 2 #2,2({) := E~(xixj)

(independent of 1 ~< i ¢ j ~< v),

4 2 E((xixj)

(independent of t ~ 3 (independent of 1 ~< h < i < j ~< v).

From Theorem 4.8 it follows that the range of the moment vectors (6.9) is given by (6.10)

.M = Conv{ra(x): z E 2(0},

where re(x) := m(~x), ~ denotes the uniform distribution on the ~sp-orbit of x, and the set 2(0 is defined by (4.7a) or (4.7b) as a union of finitely many line segments of the cube or the ball. Contrary to the preceding example, now the family of moment vectors re(x), x E 2(0, is infinite; nevertheless, as a union of finitely many line segments, its structure is still simple enough to solve easily linear extremum problems minimize arm(x) over x E 2(0. For, the usual parameterization of a line segment by A E [0, 1], say, yields a'm(xx) as a cubic polynomial in A2 on that segment. We omit the lengthy formulae here. EXAMPLE 6.3. Consider a rotatable multiple polynomial model (2.7) of arbitrary, but fixed degree d/> 1 on the ball 2( -- Br. Suppose that for the optimal design problem under consideration the restriction to rotatable designs, see Example 2.8(d), is justified, (as it is true for the D- and I-criterion, for example). By Lemma 4.9, the moment matrix of a rotatable design can be decomposed according to (4.11), and from the proof of that lemma we see that Md,p = Dp-Md,IDp, where D o denotes the (v+d) x (v+a) diagonal matrix with diagonal entries pill, [a I ~< d. Hence it follows that the moment matrices of rotatable designs ~ are linearly parameterized by the vector :=

re(p), (6.11)

where m ( p ) := (p2, p4,...,p2d)t,

O/ 1 •

e~

(7.7a)

N. Gaffl~e and B. Heiligers

1198

Using the lower bound in (7.7a) as a stopping criterion might be preferable over the scale dependent difference (7.6), as it bounds the scale independent 'relative efficiency' from (1.15). As mentioned in Section 1, in most cases the function ~ is positive and, moreover, 1 / ~ is concave. Then the lower bound in (7.7a) can be improved by minm~J~nA ~(m)

>~

~(mn)

(7.7b)

(cf. Gaffke and Mathar, 1992, p. 95). When using a mixture (7.2) for the search direction rT~n in step (i), it turned out that the most efficient choice of the weights wi is obtained by minimizing a local quadratic approximation of ~. Thereby we switch from a pure gradient method to a 'second order' (or 'Quasi-Newton') method, whose good global and excellent local convergence behavior we observed for particular problems (cf. Gaffke and Heiligers, 1995a, b). Let a local (at the current point ran) quadratic approximation of ~ be given by

q,~(m):=¢(mn)+(gn,m-mn)+ l(Hn(m-mn),m-mn)),

mEA.4,

where the approximation Hn to the Hessian operator of ~ at m,~ is a nonnegative definite linear operator on the Hilbert space ~ . For example, Hn may be the usual BFGS approximation (cf. Fletcher, 1987, pp. 55-56, or Gaffke and Heiligers, 1996, equation (2.10)), or the Hessian itself. As above, let a bundle Xl,.. •, x~ be available, among them a global minimizer of (9n, re(x)) over x E X. Then the problem to be solved is minimize qn(m) over m E Conv{m,~,m(xl),..., m(x~)},

(7.8)

which can be done by the Higgins-Polak method (see Gaffke and Heiligers (1996) for a description of a suitable version). This method yields an optimal solution m~ of (7.8) and a decomposition of m,~

mn

=

WO

n"}i=l

with w l* , . algorithm

• • ~ w s*

~> 0 and

E i =~ I

8

mn

E ff)im(xi),

"t0i*

= 1. Then we take in step (i) of the overall

,

where wi := 1 -wiw 0

i

1~. , s.

(7.9)

i=1

Convergence to the optimum, when using (7.9) in step (i) and the modified Fletcher line search procedure in step (ii), was proved in Gaffke and Heiligers (1996, Theorems 2.1, 2.2), for the cases that the Hn are the Hessians or the BFGS approximations, and under assumption (7.3).

Approximate designs for polynomial regression: lnvariance, admissibility, and optimality

1199

As a further advantage of these second order methods, the decomposition p r o b l e m (6.3) gets a practical solution at the final stage of the iterations, and thus an associated optimal design is obtained. For, as it turned out, the difference b e t w e e n the actual point r a n and ~ is negligible at the end of iterations, so that (7.9) provides a desired decomposition, and thus an associated design. Moreover, by the HigginsPolak method, the supporting points in (7.9) (i.e., those m ( x i ) with positive z~i), form an affinely i n d e p e n d e n t family. As a consequence, the support size of the obtained optimal design is limited, which is particular advantageous w h e n dealing with high d i m e n s i o n a l regression models c o m b i n e d with fairly large transformation groups, as those in the examples from Section 6.

References Atwood, C. L. (1969). Optimal and efficient designs of experiments. Ann. Math. Statist. 40, 1570-1602. Farrell, R. H., J. Kiefer and A. Walbran (1967). Optimum multivariate designs. In: J. Neyman, ed., Proc. Fifth Berkeley Syrup. on Math. Statist. Probab. Theory, Vol. 1. University of California, Berkeley, CA, 113-138. Fletcher, R. (1987). Practical Methods of Optimization. 2nd edn. Wiley, New York. Gaffke, N. (1985). Singular information matrices, directional derivatives, and subgradients in optimal design theory. In:-T. Califiski and W. Klonecki, eds., Linear Statistical Inference. Proc. lnternat. Conf. on Linear Inference, Poznati 1984. Lecture Notes in Statistics 35. Springer, Berlin, 61-77. Gaffke, N. (1987). Further characterizations of design optimality and admissibility for partial parameter estimation in linear regression. Ann. Statist. 15, 942-957. Gaffke, N. and B. Heiligers (1995a). Algorithms for optimal design with application to multiple polynomial regression. Metrika 42, 173-190. Gaffke, N. and B. Heiligers (1995b). Computing optimal approximate invariant designs for cubic regression on multidimensional balls and cubes. J. Statist. Plann. Inference 47, 347-376. Gaffke, N. and B. Heiligers (1996). Second order methods for solving extremum problems from optimal linear regression design. Optimization 36, 41-57. Gaffke, N. and O. Krafft (1982). Matrix inequalities in the Loewner-ordering. In: B. Korte, ed., Modem Applied Mathematics: Optimization and Operations Research. North-Holland, Amsterdam, 592-622. Gaffke, N. and R. Mathar (1992). On a class of algorithms from experimental design theory. Optimization 24, 91-126. Heiligers, B. (1991). Admissibility of experimental designs in linear regression with constant term. J. Statist. Plann. Inference 28, 107-123. Karlin, S. and W. J. Studden (1966). Optimal experimental designs. Ann. Math. Statist. 37, 783-815. Kiefer, J. (1959). Optimum experimental designs. J. Roy. Statist. Soc. Ser. B 21, 272-304. Kiefer, J. (1960). Optimum experimental designs V, with applications to systematic and rotatable designs. In: J. Neyman, ed., Proc. Fourth Berkeley Symp. on Math. Statist. Probab. Theory, Vol. 1. University of California, Berkeley, CA, 381-405. Kiefer, J. (1961). Optimum design in regression problems II. Ann. Math. Statist. 32, 298-325. Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). Ann. Statist. 2, 849-879. Kiefer, J. and J. Wolfowitz (1960). The equivalence of two extremum problems. Canadian. J. Math. 12, 363-366. Krafft, O. (1983). A matrix optimization problem. Lin. Algebra Appl. 51, 137-142. Marshall, A. W. and I. Olkin (1979). Inequalities: Theory of Majorization and its Applications. Academic Press, New York. Pukelsheim, E (1993). Optimal Design of Experiments. Wiley, New York. Rockafellar, R. T. (1970). Convex Analysis. Princeton Univ. Press, Princeton, NJ.

Subject Index

A-criterion 898, 979 A-optimal design 774 A-optimal design for treatment control contrasts 1008, 1010, 1012, 1015, 1017, 1022, 1030, 1032 A-optimality 896, 1068, 1111, 1113, 1116, 1155 absorption 35 Accelerated Life Testing (ATL) 145 adaptive Bayesian designs 162 adaptive designs 151, 154, 156, 157, 165 adaptive procedure 157 consistency of 158 adaptive R-estimator 98 Addelman plans 426 additive effect 192 adjusted p-value 591 adjusted cell means 54 adjusted design matrix - for blocks 814 - for treatments 814 adjusted orthogonality 912, 983 adjustment factor 205, 214, 235 admissibility of design 1027, 1029, 1165 necessary condition 1165, 1169 affected concomitant variables 188 affine-equivariant estimators 111 affine resolvable designs 948, 949 affine transformations 111 air pollution 146 algebra of Bose and Mesner 322 algebra of Bose and Srivastava 322, 327 aliasing structure 228 aligned rank statistics 100, 107 alignment principle 129 all bias design 366, 1062, 1063 almost resolvable BIBD 954 alternating panel design 41 analysis of covariance 19 analysis of covariance (ANOCOVA) 101 Anderson-Thomas plans 419, 424 Andrews' plots 380 animal studies 131, 145 annihilator 792 ante-dependence 55 - model 53, 54 anti-ranks 132 approximate design 1150, 1151, 1060

-

-

approximate theory 1060 arteriosclerosis 131 associates first 831 - ith 830 second 831 association parameter of/th order 124 association scheme 769, 830 parameters 830 asymmetric parallel line assays 882, 894 asymptotic efficacies 96 asymptotic properties 95 asymptotic relative efficiency (ARE) 98 asymptotic uniform linearity 100 asymptotically distribution-freeness (ADF) 99 asymptotically equivalent statistics 635, 640 AUC (area under the plasma curve) 36, 38 average bioequivalence 36 average direction 242 average efficiency factor 772 -

-

-

balance 185, 408 X-Z-balance 829 balance for set of contrasts 866 balanced (B)RMD's 129 balanced array 326, 333-335 balanced block (BB) design 824, 986 balanced complete block design with nested rows and columns (BCBRC) 963, 964 balanced crossover design 67, 75 balanced incomplete block (BIB) design 38, I00, 412, 479, 481, 566, 712, 744, 747, 748, 762, 772, 809, 810, 824, 831,832, 890, 986, 1001, t009, 1010, 1017, 1038 - with nested rows and columns 952, 953, 957, 959, 960, 962-966, 969, 970 balanced lattice 766 balanced lattice rectangles 963 balanced treatment incomplete block (BTIB) design 488, 991, i010, 1013, 1014, 1021, 1027, 1040, 1042 bandit problems 152, 172 Bartlett's test 253 baseline measurements 55, 66 baseline variables 9, 17 basic contrasts 822, 828, 835, 837 -

1201

1202

Subject index

Bayes designs 151 Bayes experimental designs 1099 Bayes risk 1105 Bayesian adaptive designs 169 Bayesian approach 51 Bayesian c-optimality 451,455 Bayesian c-optimum design 451 Bayesian D-optimal design 1089 Bayesian D-optimality 454, 451,452 Bayesian designs 155 Bayesian feasible sequence 169 Bayesian methods 12, 16, 23, 37, 52 Bayesian optimum design 437, 450 Bayesian robust design 1084-1089 Bayesian sequential allocations 172 Bayesian T-optimality 470 BCBRC (balanced complete block design with nested rows and columns) 963, 964 Behrens-Fisher problem 716 Behrens-Fisher problem, generalized 645, 674, 680 - fixed models - - one-way layout 645 - - two-way layout 653, 654 - mixed models - - cross-classification 699 - - matched pairs 674 - nested designs 680 - partially nested designs 689 Behrens-Fisher problem, nonparametric 639 Bessel function 274 Bessel (squared) processes 138 best linear unbiased estimator (BLUE) 385, 820 best linear unbiased predictor (BLUP) 266, 269, 1092, 1095 between-subject covariate 55 between-subject design 46, 53 between-subject information 64 bias 31, 52, 365, 1062, 1072 BIB (balanced incomplete block) design 412, 772, 809, 810, 831, 832 BIBRC (balanced incomplete block design with nested rows and columns) 952, 953, 957, 959, 960, 962-966, 969, 970 binary data 51 binary designs 986 bioassays 39, 151, 875 bioavailability 35, 36 bioequivalence 35, 36 bioequivalence studies 5 biological assays 145, 875 biological markers 146 block designs 760, 812, 813, 882, 981, 986, 993, 995, 996 checking basic assumption in 318 with unequal block sizes 316, 317 -

-

-

-

block designs for comparing test treatments with control 982, 991 block designs with nested rows and columns 989 block effect 103 block sizes 813 block structure 759 block sum of squares 363 blocking 882 blocks 760 BLUP (best linear unbiased predictor) 266, 269, 1092, 1095 BN design 841 BNRC (bottom stratum universally optimum nested row and column design) 959, 960, 962, 963, 968-970 Bonferroni method 590, 595 Bonferroni procedure 595 modified 619 bootstrapped trimmed t-statistics 37 border plots 490 bottom stratum universally optimum nested row and column design (BNRC) 959, 960, 962, 963, 968-970 Box-Behnken design 370, 371 Box-Cox power transformation 254 Box-Cox procedure 256 Box-Cox type transformations 96 Box-Draper determinant criterion 381 Brown and Mood median test 96 statistic 105 Brownian bridge 539 Brownian motion 136, 538 BTIB (balanced treatment incomplete block) design 488, 991, 1010, 1013, 1014, 1021, 1027, 1040, 1042 -

-

C-design 833, 839, 846, 890 C-matrix 815 c-optimality 437, 1111, 1113, 1117 C-restricted D-optimality 1080, 1081 C-restricted G-optimality 1080, 1081 cancer chemotherapy studies 40 canonical design 457 canonical efficiency factors 773 canonical form 458, 772 canonical reduction 94 carcinogenicity studies 40 carryover 8 carryover effects 38, 43, 55, 64, 128 carryover x block interactions 128 categorical data 21 causal-effect relationship 43 censoring 131

Subject index censoring variable 138 center-by-treatment interaction 51 center points 358 central composite design 356, 800 central limit theorem 292, 297 change-over design 63, 128, 761,780 - balanced 781 - strongly balanced 781 uniform 781 - universally optimal 781 changing covariates 50 characteristic distance 519 Chatterjee plans 419 Chatterjee-Mukerjee and Chatterjee plans 419 Chatterjee-Sen multivariate rank permutation principle i01 chi squared distribution 95 CID (clinically important dose) 42 circuit simulator 261 circular block design 785 circular data 241 circular design 890 circular standard deviation 252 circular triads 124 circular variance 252 class intervals 92 classical randomized complete block design 823 clinical designs 92 clinical epidemiology 145 clinical significance 5 clinical trial phases 32 clinical trials I, 92, 131, 151, 164, 312 closure method 594 coding 344, 358 coefficient matrix 815 coherence 194, 592 coincidence numbers 830 collapsing levels 231 column permutations 102 combinatorial balance 823 combined array experiments 232, 233 combined normal equations 924 combined regression contrast 885 comparisons with control 600 compartmental model 441,446, 452, 454 compatible R-estimators of contrasts 110 competing models 469 competing risk 144 competition 483, 484, 489 competition model 484 complementary log-tog link 472 complete block design 103 complete factorial experiment 407 complete Latin square 497 -

1203

complete symmetry 984 completely balanced BIBRC 960-962, 965, 966 completely balanced BNRC 961,962 completely balanced nested row and column design 96O completely randomised design 760 completely symmetric structure 919 compliance 31 composite design 356, 357, 372, 408, 428 compound design criterion 468, 469 compound factor 231 compound symmetry 48, 50, 53, 660, 663,666, 685, 694 compound symmetry model 666 compromise design 469 computer experiments 201, 203, 208, 261-308, 1089-1096 concomitant medications 31 concomitant (p-)vectors 101 concordance matrix 823 concurrence matrix 823 concurrences 823, 824 conditional autoregression 486 conditional likelihood 52 conditional means 55 conditionally distribution-free (CDF) tests 102 confident directions 591 confirmation experiment 211,214, 235 confirmation run 228 confounded design 123 confounding partial 838, 926 total 838 connected designs 566, 981 connected portions 816 connectedness 815, 905, 914 Connor plans and Connor-Young plans 427 consistent asymptotically normal (CAN) estimator 97 consonance 592 continuous design measure 391 continuous first-order autoregressive process, CAR(l) 5 3 contraction 951 contrast 595, 604, 981 contrast matrix 636 control versus treatment 933 correlation function 267, 270-277, 281,285, 287 covariance matrix 545 - direct estimation 545 covariate adjustment 18 Cramrr-Rao information inequality 93 Cramrr-Rao regularity conditions 152 criteria for designs 346 -

-

1204

Subject index

criterion: (M.S)-criterion 992 CRM (continued reassessment method) 41 cross-over design 36, 50, 52, 55, 63, 128, 478, 492, 761,780 - two-period 690, 691 cross-over effect 692 - nonparametric 691 cross-over trials 8, 45, 483 cross-polytope 356 cubic splines 272, 274 curse of dimensionality 281 cyclic association scheme 771 cyclic difference set 763

equiblock-sized 813 equireplicated 813 exact 1151 fan 803 - generalized efficiency-balanced (GEB) 829, 832 - y-generator 792 481 - group divisible (GD) 831, 839 - hedgerow-alley 802 - index 762 - invariant 1160, 1165 J-balanced 826 - Latin-square type 839 - A2-optimal 395 MV-optimal 899 optimal 517 - orthogonal 776, 798, 815, 817 - paired-comparison 847 - pairwise balanced 823, 834 - parameters 813, 830 -partially efficiency-balanced (PEB) 832, 834 - planar grid 803 - proper 813 regular 831 - replication number 762 - resolvable 760, 766, 843 799, 1163, 1180, 1181, 1183, 1193 second kind of parameters 830 semi-regular 831 simple PEB or C 832 singular 831 - symmetric 763 systematic 803 totally balanced 824 totally balanced in the sense of Jones 826 - totally connected 913 810, 831,846 - treatment-connected 905 - trend-free factorial 803 triangular 839 two-associate PBIB 831, 832 type S 834 variance-balanced (VB) 829, 832 weakly invariant 1160, 1164, 1177 design array 222. 223 design density 535 design factors 201,203, 204, 206-211, 213, 215217, 221,222, 225, 230, 232, 233,235, 236 design generators 372 design levels 151 design locus 439 design matrix 796 - for blocks 813 for superblocks 858 for treatments 813

-

-

-

-

- ' g e r e c h t e '

-

-

D-criterion 979 D-efficiency 1081 D-optimal designs 153, 389, 391,774 - locally 389, 441 D-optimality 437, 1068, 1111, 1114, 1116, 1155 Daniel plans 427 Data and Safety Monitoring Committee 15 data-dependent allocation 10 defining contrasts 790 defining contrasts subgroup 787, 792 degrees of freedom (DF) 95 design ( 0 ; v - 1;0)-EB 838 (0;v - 9 ; 9 - 1)-EB 838 (0; pl,pz;0)-EB 839 (v - 1;0)-EB 838 ( p o ; v - 1 - po;0)-EB 838 (p0;pl;0)-EB 839 - (P0;Pl,P2;0)-EB 839 - ( P 0 ; P l , . . - , P m - 1 ) - E B 837 - #-resolvable 843 - ~-optimal 517 - X-l-balanced 829, 832 _ x-~-partially efficiency-balanced 832 - X - 1 - P E B ( m ) 832 1165, 1166, 1180, 1181, 1183 affine (/~1,/z2,...,/za)-resolvable 843 - c~-design 766, 768, 775, 847, 949, 950 1150, 1151, 1160 balanced bipartite block 834 balanced in the sense of Jones 826, 827 balanced treatment incomplete block 834 - binary 813 - combinatorially balanced 834 - connected 772, 814, 815, 816, 883 disconnected 814, 815, 823 disconnected of degree y - 1 816, 825 efficiency factor 822 efficiency-balanced (EB) 829, 832 -

-

-

-

-

-

- a d m i s s i b l e

-

- a p p r o x i m a t e

-

-

-

-

-

-

-

-

-

- r o t a t a b l e

-

-

-

-

-

-

-

- t - d e s i g n

-

-

-

-

-

-

-

Subject mdex design point 796 design space 152 desirability function approach 399 detection of hidden bias 193 DETMAX 1095 deviance 471 device simulator 261, 303 difference sets 771 dilutive assays 151 direct assay 875 directional data 241 disconnectedness of degree g - 1 825, 826 discounting sequence 172 discrete response 91 discriminating between models 469 dispersion 251 dispersion effects 205, 211,213,216, 217, 219, 220, 222, 227 distribution function 635 - empirical 635 distribution-free procedures 714, 730 dosage 145 dose-escalation designs 43 dose-limiting toxicity (DLT) 40 dose linearity 38 dose metameter 132, 145 dose-proportionality study 38 dose-response studies 39, 40, 43 dose-response studies for efficacy 42 dose-titration designs 43 dose-titration studies 43 double-blind treatment 11 double-dummy method 11 doubly-nested BIBD 946 dropouts 34 dual balanced design 78 dual designs 1003 dual treatment sequence 78 Duncan procedure 607 Dunnett procedure 601 Durbin statistic 658 Dykstra plans 427 dynamic allocation index 175 dynamic problems 205, 234, 236, 237 dynamic programming 175 dynamic systems 235

E-criterion 979, 1001 E-optimality (efficiency) 113 E-optimality 1068, 111t, 1114, 1117, 1155 subgradient 1188 E-optimal design 774 EB (efficiency-balanced) design 829 -

1205

ecology 145 effective cardinality 1183 effective replication 825 effects 790 - aliased 787, 790 - confounded 787, 789 - independent 790 efficacy 31, 33, 46 efficiency 52, 775, 836, 916, 1156 balance 828, 832 - comparisons 432 -factors 773, 810, 834, 835, 837 measures 486 elaborate theories 194 elementary contrasts 827 Elfing's optimal designs 153 eliminate inferior treatments 558 EM algorithm 51, 52 empirical Bayes approach 275, 276 empirical estimator 858 empirical generalized least squares (EGLS) procedure 51 empirical model 343 end-pair design 784 entropy 281,282, 284, 285, 297, 1094, 1095 entropy design 282 environmental health sciences 146 environmental studies 92 environmetrics 145 epidemiological investigations 92 eqnidistribution 300, 303 equineighboured designs 493 equivalence theorems 392, 1105, 1112, 1122, 1131, 1185, 1188 equivalence trials 4 equivalent model 1156, 1189 error rate 588 - false discovery rate (FDR) 590 familywise error rate (b'WE) 589 - strong control 589 - weak control 589 per-comparison error rate (PCE) 590 per-family error rate (PFE) 589 error structure 760 error transmission 230 estimating equations 97 estimation after selection 563 estimation of parameterized covariance 548 estimation of PCS 564 ethical issues 5 etiology 145 exact design 472, 1061 exact distribution-freeness (EDF) 99 exact theory t060 -

-

-

-

-

-

-

1206

Subject index

exactly distribution-free (EDF) tests 92 exchange type algorithm 522 exchangeable random values 104 expected loss 225 expected yield of strategy 173 experiment balanced for contrast 866 experiment in economics 473 experimental design 346 experimental region 1150 experiments with blocking 565 explanatory attitude 4 exponential family 51, 54, 56 extended-group-divisible association scheme 770 extra-period design 989 extrapolation 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078

first-order autoregressive process, AR(1) 50, 53, 55, 81 first-order carry-over effects 66, 128 first-order decay 468 first-order design 346 first-order orthogonal design 362 first-order polynomial model 796 first-stage screening 556 Fisher information 93, 96, 151, 155, 156 Fisher information function 152 Fisher's inequality 817 Fisher's iterative method of scoring 857 Fisher's protected LSD 607 fixed effects 45, 48, 49 model with first-order residual effects 128 fixed models 637 one-factor designs 637 - - asymptotic results 639 - - statistics 643 - two-factor designs 645 - - example 655 - with interaction 646 - without interactions 656 fixed width confidence interval 177 fold-over 413 fold-over technique 411 Food and Drug Administration (FDA) 33-36 forced titration design 43 fractional 3 k - p design 370 fractional factorial design 349 fractional factorials 210 French curve 343 frequency square 777 Friedman statistic 657 Friedman X2 test statistic 105, 113 Fr6chet derivative 396, 397 full efficiency 836, 837, 886 full information 886, 890 full matching 186 fully-bordered design 491

-

-

face 1176 factor 759 factor screening 408, 421 factor levels 759 factorial 241 factorial design 7, 246, 510 - AD-optimal 329 - asymmetrical 787 balancing bias in 338 complete factorial design 787 fractional factorial design 787 identifying the correct model in 330 incomplete factorial design 787 - information in 336, 337 minimising bias in 338 - mixed 787 of even resolution 328 - of parallel fiats type 326 - of response surface type 330 - optimally balanced 327 - orthogonal 325, 326 - pure 787 - sampling in 336 ,~ - symmetrical 787 tree structure in 328 , - 3 k 370 factorial experiment 707, 713, 720, 750-752,,755 factorial experiments with interaction 571 family 588 FDA (Food and Drug Administration) 33-36 fertilizer dressing 447 Fieller's method 37 , : Fieller's Theorem 168 finite horizon 175 first-order asymptotic distributional representation (FOADR) 97 -

-

-

-

-

-

-

-

-

-

G-efficiency 1081 G-optimal design 392 G-optimality 1111 gamma data 55 GB (general balance) for all basic contrasts 866 GD (group divisible) association scheme 831 GEE (generalized estimating equations) 51, 52 GENDEX 774, 779 general balance 865 General Equivalence Theorem 438, 446, 451,460, 461,470 generalized cyclic design 792

Subject index generalized draftsman's display 378 generalized estimating equations (GEE) approach 51 generalized interactions 790 generalized inverse 95, 772 generalized least squares 479, 499 generalized linear models 96, 217, 403, 437, 456, 468, 471 generalized Noether condition 96 generalized Youden design (GYD) 968-971,990 regular 986 generally balanced (GB) block designs 865 genotoxicity 146 Ghosh and Ghosh-Avila plans 421 Ghosh plans and Ghosh-Lagergren plans 427 Ghosh-Talebi and Ghosh plans 421 Ghosh-Talebi plan 419 Ghosh-Zhang plan 418, 420 Gibbs method 51 Gibbs sampling 23, 52 global robustness 144 glyphs 378 good lattice points 294, 295 gradient information 277, 280 Graeco-Latin squares 210 group divisible designs 889, 995, 1002 group divisible scheme 770 group divisible treatment design (GDTD) 1019, 1021, 1029, 1043 group sequential designs 15 group sequential trial 49 grouped data 92 groups of transformations 92 growth curve analysis 49 growth curve model 50 Gupta and Gupta-Carvazal plans 420 GYD (generalized Youdeu design) 968-971, 990 -

Haar measure 1156 Hadarnard matrices 353, 986 Hadamard transform optics 999 Hamming schemes 771 hard-to-change factors 232 heavy-tailed distributions 93 heterogeneity 24 hidden bias 184 hierarchical design (HD) 140 Higgins-Polak method 1198 higher way layouts 749 Hodges-Lehmann estimates 193 homogeneity of location parameters 94 homoscedasticity 91 Hotelling-Lawley's trace 387, 388

1207

Hotelling's T 2 54 Human Genetics 146 hyperbolic cross points 290 hypotheses - of invariance 92 - of permutation-invariance 94 - of randomness 94

I-optimality 1155 invariance 1164 ideal function 205, 234 identifiability 144, 1059 idle column technique 231 importance sampling method 51 IMSE (integrated mean squared-error) 285-287, 291,297, 398, 1094 incidence matrix 812, 813 incomplete block design (IBD) 55, 115, 481,760 incomplete layout 712, 745 incomplete matching 187 incomplete multiresponse clinical designs 140 incomplete multiresponse design (1MD) 122, 140 IND (Investigational New Drug) 33 indifference interval 13 indifference-zone approach 556 indirect assay 875, 876 individual bioequivalence 36, 38 induced design space 439, 445, 457, 461,462 industry 32, 40 influential nonnegligible elements 421 information 93 information function 979, 1155 information matrix 457, 516, 772, 815, 978, 1061, 1151 - reduced 1153 informative missing values 46 inhalation toxicology 146 initial block 763 inner array 223 instant responses 527 integrated mean squared-error (IMSE) 285-287, 291,297, 398, 1094 intention-to-treat (ITT) analysis 19 inter-block 860 inter-block analysis 863 inter-block comparisons 104 inter-block information 105, 811 interaction effects 225 interaction profile 47 interactions 201, 211,212, 214, 216, 219-221,226229, 231,232, 234, 236, 244-246, 248,407 interactions dispersion effects 217 intercept parameter 94

-

1208

Subject mdex

interchangeable random values 104 interdisciplinary approach 146 interference 483, 489 interference model 484 interim analysis 14, 132 interim analysis schemes 133 intermediate missing values 54 intersecting fiat fraction 424 intersection-union method 593 interval censoring 146 intra-block analysis 811, 819, 822, 823, 835, 837, 863, 864 intra-block analysis of variance 821 intra-block contrasts 815 intra-block equations 815 - reduced 815 intra-block matrix 815 - reduced 815 intra-block rank-vectors 104 intra-block residual mean square s 2 821 intra-block residual sum of squares 821 intm-block submodel 819, 820, 831, 852 intra-block total sum of squares 821 intra-block treatment sum of squares 821 intransitiveness 124 mvariance 139, 1070 reverse regression problems 152, 165 xrregular fractions 359 isotonic regression 44

Latin hypercube 292, 295-299, 303 Latin hypercube sampling 1095 Latin square 38, 41, 43, 775, 963, 971,987, 1030, 1033, 1071 - column complete 782 - complete 786 - design 46, 566 - MOLS 765 - mutually orthogonal 765, 778 - of order k 761,765, 776 - orthogonal 778 - quasi-complete 783, 786 Latin-square-type association scheme 770 lattice design 766, 771, 840, 847, 947 lattice rule 294, 295 lattice square 957, 963, 964, 966 lattices 947, 948 m-dimensional lattices 947, 948 least favorable configuration (LFC) 556 least squares estimator (LSE) 98 level adjustment 213 likelihood-based procedure 52 likelihood function 157, 174 likelihood ratio test 48, 54 linear blocks 785 linear coefficients 796 linear dependencies among responses 382 linear graphs 227, 229 linear multiresponse model 384, 387, 391,398 designs for 391, 398 testing lack of fit 387 linear parameter function 1151 linear rank statistics 94 linear regression model 1150 linear variance 486 linearity of the model 91 link function 51 linked block designs 1003 local alternatives 95 local control 809 locally c-optimum design 445, 446, 449, 450, 455, 460 locally D-optimum design 449, 457, 458, 460 locally optimal designs 151, 152, 440 locally T-optimum design 471 location 243 location-regression functional 103 Loewner ordering 979 Loewner partial ordering 1151 log-dose relationship 882 log-rank procedures 139 log-rank scores 94 log transformed data 37 logistic distribution function 96 -

-

-

John plans 424 Johnson scheme 770

Kiefer-Wolfowitz equivalence theorem, generalized version 516 Knight's Move Latin square 498, 505 Knight's Move square 481, 507 known effects 194 Kolmogorov-Smirnov type test statistics 135 Koshal designs 798 Koshal-type first-order designs 797 Krehbiel-Anderson plans 426 Kriging 265-290 Kronecker-product 636, 637 Kronecker-sum 636, 637 Kruskal-Wallis statistic 642 Kruskal-Wallis test 96, 714, 715, 721

L-design 885, 886 lack of fit 365, 368, 369, 407 Laird-Ware model 49

Subject index logistic model 458, 459 logistic regression 461,472 logistic tolerance distribution 161 logit-linear model 45 logit link function 54 lognormal data 38 longitudinal binary responses 52 longitudinal data 49, 50, 54 lost to follow-up 46

Magic Latin squares 481 main effects 245, 248, 255, 407 main effects plans 227 main effects plans in nested rows and columns 967, 971 MANOCOVA nonparametrics 115 MANOCOVAPC 126 MANOVA/MANOCOVA 114 MAR (missing at random) 52 marginal means 55 marginal models 51 Markov chain 54, 55 Markovian models 46, 53 MARS 263 matched pairs design 665, 675 matrix average (w.r.t. Q) 1158 matrix group compact 1156 induced 1156 orthogonal 1157 unimodular 1156 maximin 281,288-290 maximum concentration over observation period, Cmax 36, 38 - time when Cmax occurs, Tmax 36 maximum effective dose (MAXED) 42 maximum entropy design 283 maximum likelihood estimator (MLE) 98, 154, 157 maximum mean squared error (MMSE) 1094 maximum tolerable dose (MTD) 42 MCAR 52 mean effect 103 mean squared error 366 means: p-means 979 measure of rank dispersion 97 measurement errors 146 mechanistic model 343 meta-analysis 24, 51 metabolism 35 method of differences 763 method of reinforcement 839 method of steepest ascent 799 method "up and down" 41 -

-

-

-

1209

metroglyphs 378 rnidrank 636 MINED 44 minimal covering designs 996 minimal ellipsoid 439, 462 mlnimai-point second-order design 372 minimax 281,288-290 minimax criterion 559 minimization 9 nunlmum aberration 229, 794, 795 minimum effective dose (MINED) 42 minimum norm quadratic unbiased estimator (MINQUE) 821, 854 MINQUE principle 856 mirror-image pair 352, 360 misdassifications 146 missing data 50 missing observations 141, 144, 546 missing values 46 mixed effects 32, 51, 52, 101, 128 mixed effects model 45, 145, 403, 923 mixed models 662 asymptotic results 665 examples 665 one fixed factor - - cross-classification 668, 670, 673 - - - example 673, 675 - - - missing observations 675 - - - statistics 670, 673 - - nested designs 677, 678 - - statistics 678--680 two fixed factors - - cross-classification 694 - - - example 697 - - - statistics 696 - partially nested designs 680 - - - example 687, 693 - - statistics 682, 689 mixed quadratic coefficients 796 mixed resolution designs 233 ML (maximum likelihood) 49-51 model checking 467 model inadequacy 1079-1084 model robust designs 1055ff. modified (or marginal) maximum likelihood (MML) estimation method 856 molecular biology 146 MOLS 765, 778 moment matrix 392, 1151 moment methods 1100, 1134 monitoring of clinical trial 13 monotone data 54 monotone pattern 53, 56 Monte-Carlo sampling 452 -

-

-

-

-

-

-

Subject index

1210

most informative subset of sensors 547 moving average process 55 MTD (maximum tolerable dose) 40 Miiller plans 421 multicenter trial 45-47, 50, 51 multidimensional plot 378 multinomial models 578 multi-phase design 145 multiple comparison approach 562 multiple comparison procedure (MCP) 588 multiple test procedure (MTP) 589 simultaneous confidence procedure (SCP) 599 single-step procedures 592, 601,604 - stepwise procedures 592 - - step-down procedures 593, 597, 602, 607 - - step-up procedures 593, 597, 603, 611 multiple comparisons 18, 34, 44, 97, 587, 721, 729, 736, 740, 748 - one-sided procedures 718, 724, 729, 743 - simultaneous confidence bounds 725 simultaneous confidence intervals 719 - treatment versus control 718, 743, 744 - two-sided procedures 717, 736, 740, 747, 748 multiple comparisons with the best 614 multiple control groups 194 multiple design multivariate model 385 multiple endpoints 615 multiple polynomial regression 1161ft., 1190ff. A-admissibility of design 1173-1178, 1180 - equivariance 1162 1163, 1180, 1193 multiplicity 17 multiresponse data 378, 382 linear dependencies 382 - plotting 378 multiresponse design 389 multiresponse experiments 377, 396, 402 multiresponse model 380, 386, 389 - designs for 389, 398 estimation of parameters 380 fitting 380 inference 386 multiresponse optimization 398, 403 multiresponse rotatability 398 multiresponse surface methodology 377, 402 multivariate analysis 21 multivariate analysis of covariance (MANOCOVA) 114 multivariate analysis of variance (MANOVA) 111 multivariate general linear models (MGLM) 142 multivariate lack of fit 387, 388, 393 mutually orthogonal idempotent matrices 835 MV-criterion 898, 1003 MV-optimal design for treatment control contrasts 1010, 1012, 1018, 1020, 1032, 1042 -

-

-

-

-

-

r

-

-

-

-

o

t

a

t

a

b

i

l

i

t

y

Nair design 496 natural contrasts 827 NB (nested block) design 841 - sub-block binary 841 - sub-block connected 841 - sub-block efficiency-balanced 842 - sub-block orthogonal 842 - sub-block proper 841 - sub-block variance-balanced 842 - superblock binary 841 superblock connected 841 superblock efficiency-balanced 842 superblock orthogonal 842 - superblock proper 841 - superblock variance-balanced 842 NBIBD (nested balanced incomplete block design) 840, 945, 952-956 neighbour balance 482, 489, 491, 493, 494, 497500, 502, 503, 506-509, 511 neighbour balanced design 498, 499, 503, 506, 509 neighbour balanced Latin square 497, 498 - nearest 497 neighbour balanced quasi-complete Latin square 497 neighbour designs 489, 490, 499, 785 neighbour designs for field trials 784 neighbour matrix 483, 484 neighbouring plots 480 neighbouring units 483, 484, 489 neighbours 478, 483, 484, 489-493, 497, 498, 502, 506-508 nested balanced incomplete block (NBIB) design 840, 945, 952-956 nested block (NB) design 841 nested blocking factors 939, 941 nested design 730 nested multidimensional crosses 971 nested partially balanced incomplete block design (NP1BD) 956 nested row-column designs 921 net (t, m, s)-nets 299-303 incremental effect 44 network flow 186 neural network 263 New Drug Application (NDA) 33 Newman-Keuls procedure 607 Newton-Raphson method 460 Newton-Raphson sampling 52 Noether condition 98 noise array 222-224 noise factors .201, 203, 204, 208, 209, 212, 217, 219, 221-225, 230-233, 235-237 nonadditivity 929 -

-

-

-

-

Subject index non-Bayesian designs 166 noncentral chi squared distribution function 96 noncentrality parameter matrix 394 noncompliance 138 non-equireplicate designs 876 non-homogeneous Markov chain 55 noninformative censoring 138 nonlinear dose-response 45 nonlinear models 437, 440, 457, 1105, 1131 nonlinear regression 373, 375 nonlinear regression model 457 nonlinear transformation 91 nonlinearity 219 non-normal distributions 51 nonparametric hypotheses 647, 648, 650, 652 nonparametric MANOVA 111 nonparametric point estimates 719, 740 nonparametric procedures 705 nonparametric statistical procedure 91 nonparametrics - for crossover designs 127 - for incomplete block designs 115 factorial designs 118 non-orthogonal structure 778 nonrandomized design 1070 nonresponders 43 non-stationary transition probabilities 54 normal probability plot 408 normal scores 94 normality 91 NPBIBD 956 NPSOL 286

- i n

OA 223 OBS (orthogonal block structure) property 862 observable noise factors 233 observational study 181 odds ratios 52 Ohnishi-Shirakura plans 419 one-armed bandit problems 175 one-dimensional trials 785 one-factor block design 665 one-factor hierarchical design 665 one-way ANOVA model 559 one-way layout 706, 708, 713, 720 alternative hypotheses 714, 716 nonparametfics 92 - one-sided multiple comparisons 718, 724, 729 - ordered alternative hypotheses 708, 722, 728 simultaneous confidence bounds 725 simultaneous confidence intervals 719 two-sided multiple comparisons 717 - umbrella alternatives hypotheses 725, 728 - g e n e r a l

-

-

-

-

1211

optimal allocation 175 of experiments 152 of observing stations 515 of sensors 515 optimal design 93, 346, 486, 501, 517, 885, 1099, 1100 - approximate theory 999 - exact theory 997 optimal design problem 1153 - algorithmic solution 1195 first-order method 1196 - second-order method 1198 optimal matched samples 186 optimal sample sizes 1126-1128 optimal search designs 420 optimal statistical inference 93 optimal stopping variable 176 optimal stratification 185 optimality 129, 310, 311,325, 486, 774, 775, 919 - Al-optimality 1083 - A2-optimality 1083 - (M,S)-optimality 774 T-optimality 1084 optimality criterion 979, 1151, 1153 gradient 1185 invariant 1164 - Kiefer's qsp 1155 - gradient 1187 - - invariance 1164 - Al-optimality 395 orthogonally invariant 1154 subgradient 1185, 1186 optimum 399-401 - compromise 399, 401 - ideal 401 - individual 400 simultaneous 400 optimum design 437 - locally 441 order statistics 94 ordered alternative 109 Omstein-Uhlenbeck processes 272 orthogonal array design 231 orthogonal array of Type 1 67 orthogonal arrays 68, 201,210, 211,229, 231,297303, 326, 333, 949, 986 orthogonal block structure (OBS) 862 orthogonal blocking 361,363 orthogonal polynomial model 877 orthogonal polynomials 236, 880, 1100, 1116, 1117, -

-

-

-

-

-

-

-

-

-

-

-

-

1 1 3 7

orthogonal set-up 904 orthogonality 408 orthogonally supplemented block design 839

1212

Subject index

outer array 223 overall model 862, 864 overt bias 183, 184

PACE 286 paired characteristics 123 paired comparisons (PC) 123 paired comparisons designs (PCD) 123 paired differences 94 pairwise comparisons 604 pairwise efficiency factor 772 Papadakis method 483 parallel and intersecting fiats 423 parallel coordinates 380 parallel fiat 424 parallel-group design 43, 44 parallel-group trials 6 parallel line assays 876 parallelism contrast 885 parameter design 199 parameter space 152 parametric models 151 partial balance 766 partial efficiency balance 832 partial likelihood 142 partial likelihood functions 138 partially balanced array 327, 335 partially balanced designs 766 partially balanced incomplete block (PBIB) designs 769, 770, 773, 809, 830 - with two associate classes 997 partially confounded design 123 Patel plans 426 patient log 14 PBIB design 773, 830 PEB design 835 period effect 692 - nonparametric 691 PerMIA 214, 215 permutation distribution 644, 675, 677, 679 permutational central limit theorems 95 permuted blocks 9 pharmacokinetic profiles 32 pharmacokinetic studies 33, 35 pharmacokinetics models 52 pharmacologic profiles 32 Phase I 33, 40-42 Phase I studies 3, 151 Phase II 33, 40, 42 Phase II trials 3 Phase III 33, 43 Phase III studies 42 Phase III trials 3

Phase IV studies 3 photographic emulsions 472 Pillai's trace 387, 388, 397 Pitman-type alternatives 112 placebo 10 placebo vs. treatment setup 132 Plackett and Burman designs 210, 350, 351, 430, 797 Poisson data 51, 55 Poisson kernel 539 polynomial dose-response 45 polynomial model 343, 344 polynomial regression 1115 polynomial regression fitting 39 population kinetics 35, 52 power of lack of fit test 394 predicted response 431 preparation contrast 885 primary response 402 principal block 790, 792 probability integral transformation 121 probability of correct selection (PCS) 556 process simulator 261 product array experiments 201,222, 224, 232, 233 product robustness 228 progressively censored schemes (PCS) 133 projection matrix 814 projection method 594 projection properties 349 projection pursuit 379 projections of designs 349 propensity score 184 proportional hazards (PH) 138 proportional-hazards model 20 protocol 34 protocol deviations 19 protocol of clinical trial 5 pseudo double-blinding 41 pseudo-Youden design 990 publication bias 22 pure error 351, 361 pure quadratic coefficients 796

Q-design 892, 893 quadruple systems 762 qualitative response 91 quality by design 237 quality engineering 199, 202, 237 quality improvement 199, 202, 207, 226, 237 quality loss function 203, 215 Quality of Life (QOL) 146 quantal response 91 quantal response analysis 151, 153

Subject index quasi-complete Latin square 492, 497 quasi-factorial designs 926, 928 quasi-likelihood 52, 55 Quasi-Newton method 1198

random censoring 138 random coefficients 52, 526 random effects 45, 46, 49, 50, 52, 403 random effects model 931 random errors 365 random missing patterns 144 random models 660 random parameters approach 537 randomization 2, 9, 480--483, 494, 497, 809, 811, 848, 849 randomization analysis 310 randomization model 811,812, 848, 852, 853 randomization tests 480 randomized block design 103, 812 randomized blocks 811 randomized complete block design 760, 812, 1071 randomized controlled trial (RCT) 2 randomized design 1070 randomized experiment 182 rank 636 rank-based methods 705 rank collection matrix 102 rank interaction 709, 732 rank permutation principle 112 rank transform (TR) 643, 731 rank transformation 118 ranking after alignment 105 recovery of inter-block information 143, 848 recovery on inter-block information 811 rectangular association scheme 770 rectangular lattice design 767 rectangular lattices 947 regression parameters 94 regression quantiles 144 regression rank scores 102 regression rank scores estimators 144 regular discounting sequence 176 regular graph designs 956, 993 regular simplex designs 797 Regulatory Agencies 92 reinforced BIB design 1009, 1010, 1016, 1018, 1042 relative efficiency 1156, 1198 relative loss of information 835 relative potency 875 reliability models 579 REML (restricted maximum likelihood) 49-51,484 repeat pairs 360, 361

1213

repeat run pair 352 repeat runs 361 repeated measurements 21, 52, 779 repeated measurements design 63, 478, 987 repeated measurements study 761 repeated measures design 46, 53, 56 repeated significance testing (RST) 133 repeated significance tests 15, 132 replicated 2 m factorial experiments 122 replication 809 replication of point sets 363 residual effect 128, 692 nonparametric 691 residual matrix 821 residuals 344 resolution 227, 229, 233, 352, 794 resolution classes 760 Resolution III* 358, 430 Resolution III plan 411 Resolution IV plan 411 Resolution V 358 Resolution V plan 428 resolvability 843 c~-resolvability 951 (c~1, c~2,..., at)-resolvabitity 951 843 (/zl, #2,...,/ta)-resolvability 843 resolvable BIBD 765, 945, 947-949, 952, 953, 969 resolvable block designs 840, 951 resolvable designs 946, 947, 949, 951 resolvable PBIBD 951 resolvable row-column designs 967 response 759 response bias, prevention 10 response metameter 132, 145 response model analysis 224-227, 232, 234-237 response on target 204, 213-215, 226 response surface design 230, 233, 343, 795 response surface methodology 230, 377 response surface methods 215, 229 response surface model 795 restricted maximum likelihood (REML) 49-51,484 restricted maximum likelihood approach 856 restricted randomization 480, 481 restricted subset selection 557 resultant length 252 resultant vector 242 revealing power 310, 311,330-333, 337 right censoring 133 right truncation 133 robust Bayes designs 1122, 1123 robust design 199-204, 206, 211, 212, 215, 216, 221, 227, 230, 234, 237, 264, 398 robust general linear model 732, 752, 755 -

-

-

- / z - r e s o l v a b i l i t y

-

1214

Subject index

robust methods 130 robust search design 421 robust statistical procedure 91 robustness 129, 131 robustness property 430 rotatability 358, 362, 363, 364, 398 rotatability measure 365 rotatable component 365 rotatable multiresponse design 398 rotationally invariant 1074 row and column designs with contiguous replicates 971 row-column designs 761,786, 983, 986, 990 - nested 761,775 - draw back associated with 323 non-additivity in 318-324 row-complete Latin squares 492 Roy largest root 386, 388, 393 - criterion 115 RT-property 643,645, 671,672, 683, 684, 686, 687, 696, 697 run orders 232, 510, 511 -

Sacks-Ylvisaker approach 534 Sacks-Ylvisaker conditions 281 safety 31, 33, 46 sample-size determination 11 sample sizes 34, 45 SAS 51 scatter plot matrix 378 Scheff6 procedure 605 schemes - cyclic 831 Latin-square 831 simple 831 - triangular 831 score generating 94 scrambled nets 299 screening designs 326, 327, 349 search designs 329, 417 search linear model 312-315, 320-322, 408, 417 second-order design 346, 370 second-order model 796 second-order surfaces 345 secondary responses 402 selection bias 2 selection in factorial experiments 569 selection with reference to a standard or a control 573 semi-additive model 904 semi-balanced arrays 493, 503-505, 509 semi-parametric model 139 sensitivity 310, 311 -

-

sensitivity analysis 188 sensor density 520 separation of two sets 521 sequences: (t, s)-sequences 302, 303 sequential allocation 172 sequential approach 230 sequential assembly of fractions 407 sequential designs 281 sequential experiment 212, 230, 347, 473 sequential factorial probing designs 421 sequential medical trials 176 sequential stopping rules 152 serially balanced sequences 496 sex difference 459 shift algorithm 642, 644, 675, 679 Shirakura plan 418 Shirakura-Ohnishi plans 421 Shirakura-Tazawa plans 420 short block 764 side-bordered design 491 signal factors 201,203, 234 signal-response relationship 235, 236 signal to noise (SN) ratio 211-214, 216, 224, 226, 228, 232, 234-237 signed rank statistics 99 Silvey-Titterington-Torsney method 1196 Simes procedure 596 simple combinability 855 simple lattice 766 simple least square estimator (SLSE) 820 slmplex design 347 simulated annealing 286 simultaneous comparison 108 simultaneous confidence intervals 562, 722 simultaneous inference with respect to the best 567 single-blind treatment 11 single-factor Bernoulli models 578 single-factor experiments 558 single-replicate nested row and column designs 967 single-stage location invariant procedures 558 singular kernel 542 sliding levels 230 small composite design 358, 359 SN (signal to noise) design 841 sources of variation 31, 34 spatial analysis 482, 483 spatial dependence 489 spending function 137 spline 272 split-plot design 45, 233, 665, 761,779 split-plot type analysis 38, 50 split-plot type model 49 spread 251 square: F square 777

Subject mdex square lattices 947, 948, 949, 951 Srivastava method 423 Srivastava plans 421 Srivastava-Arora plan 419 Srivastava~Ghosh plans 420 Srivastava-Li plans 426 staggered entry 139 standard design problem 516 standard preparation 875 standardized resultant vector 243, 245 stationary regression coefficients 54 steepest descent value 1197 stochastic approximation 152 stochastic curtailment 16 stochastically larger (smaller) alternatives 96 stopping rule 1197, 1198 strategy 173 stratification 185 stratum 852 - inter-block 852, 859 - intra-block 852, 859 - projectors 860 - total area 852 - variances 852, 856, 861 strongly balanced crossover design 67, 70 strongly equineighboured (SEN) design 503, 504 strongly regular graph design 997 sub-block designs of NB design 841 sub-block efficiency-balanced NB design 841 sub-blocks 840 subhypotheses 97 submodels - inter-block 860 - intra-block 860 total-area 860 subset containing the best 556 subset D-optimality 529 subset selection approach 556 subtrials 6 sufficient statistics 52 superblock designs of NB design 841 superblock efficiency-balanced NB design 841 superblocks 840 supplementary difference sets 763 supplemented balance 488, 509, 839, 1009, 1010, 1013 surrogate endpoint 23, 140 survival analysis 131 survival data 20 symmetric design 984 symmetric parallel line assays 880 symmetrical prime-power factorials 787 symmetrical unequal-block arrangements with two unequal block sizes 834 -

1215

system: (r, ,~)-system 834 systematic (or bias) errors 365 systematic designs 480-483, 499

T-optimality 470 Taguchi 199, 200, 202, 203, 210, 212, 213, 215, 219, 222-224, 227-229, 232, 234-237 Taguchi robust parameter design 403 Tchebyscheff points 1078 technical error 1070 test control 488 test for non-additivity 772 test preparation 875 tetra-differences 906 therapeutic effects 35 therapeutic factors 145 three-way balanced designs 921 time series 53 time-dependent covariates 54,56 time-dependentness 139 time-independent covariates 54, 56 time-sequential procedures 133 time-sequential testing 136 titration design 44 Tocher's matrix 831 tolerance distribution 151, 153, 156 tolerance interval 38 total area 859 toxicologic effects 35 transformation 145, 206, 212, 215, 344, 348 transformation group (on 2() 1156 - generated by sign changes and permutations, ~sp 1 1 6 3

- orthogonal Gorth 1163 permutation Gp 1163 sign change gs 1162 translation-equivariant estimator 99 translation-equivariant function 106 translation-equivariant functional 103 translation-invariant ranks 98 transmitted variation 204, 218, 220, 221,227, 230 treatment-by-center interaction 47, 48 treatment-by-time interaction 54 treatment x center interactions 45 treatment combination 787 treatment contrasts 879, 881 estimable 883 treatment effect 103 treatment replications 813 treatment structure 760 treatment versus control 708, 721,744 trend 482, 485, 486, 499, 502 trend-free design 482, 485, 499, 502, 510, 930 -

-

-

Subject index

1216

trend-resistant design 502 trend surface 507 triangular association scheme 770 triple lattice 766 triple systems 762 Tukey procedure 604 Tukey-Kramer procedure 604 two-armed bandit 176 two-dimensional lattices 947 two-dimensional trials 785 two-factor block design 665 two one-sided 5% level t-tests 37 two stage designs 164 two-stage eliminating procedure 560 two-stage model 51 two stage procedures 558 two-way layout 709-711,729, 738, 744, 745, 749 - additive model 710, 711,730, 734 - main effects 734, 737, 739 - main effects tests 742, 744--746, 748, 749 - non-additive model 709, 730 nonparametrics 103 - one observation per cell 711,738, 744 one-sided multiple comparisons 743 - test of additivity 731,733, 738, 741 - two-sided multiple comparisons 736, 740, 747 Type T 824 Type T0 824 Type I censoring 134 Type I or II censoring 135 Type II censoring 134 types of data missing at random (MAR) 46 missing completely at random (MCAR) 46

unbiased estimation 429 uniform asymptotic linearity 102 uniform crossover design 67 uniform distribution -

on

2(

1 1 5 5

- on a sphere 1193 union-intersection method 593 unit error 1069, 1070 unit-treatment additivity 848 universal optimality 68, 487, 984 universal optimality of designs 93 unobserved covariate 189

valid randomisation sets 779 validation sample 141 validity 131 "value for money" in designs 348 variance 365 variance balance 828, 832 variance components 46 variance components models 128 variance of weighted average of prediction 533 variance ratio 96 variance reduction 200, 204, 213, 218, 220, 227, 237 VB (variance-balanced) design 829 von Mises 243, 253

washout periods 36, 38, 55, 64 water contamination 146 wavelets 290, 291,303 weak universal optimality 487 weighing designs 985, 992, 994 - chemical balance 980, 998 - spring balance 980, 999 weight function 366 weighted average variance of prediction 533 weighted concurrences 824 weighted ranking 105 WHO, World Health Organization 36 Wiener sheet process 281 Wilcoxon rank-sum 37 Wilcoxon scores 94 Wilcoxon-Mann-Whitney statistic 644 Wilks' likelihood ratio 387, 388 Williams design 494, 495, 783 - with balanced end-pairs 784 with circular structure 784 Wishart distribution 386 within-subject design 38, 46, 50, 55 within-subject information 64 word length 794 working correlation 52 worth function 175

Youden squares 777, 987

zeru-mean martingale 158, 159

Handbook of Statistics Contents of Previous Volumes

Volume 1. Analysis o f Variance Edited by E R. Krishnaiah 1980 xviii + 1002 pp.

1. Estimation of Variance Components by C. R. Rao and J. Kleffe 2. Multivariate Analysis of Variance of Repeated Measurements by N. H. Timm 3. Growth Curve Analysis by S. Geisser 4. Bayesian Inference in MANOVA by S. J. Press 5. Graphical Methods for Internal Comparisons in ANOVA and MANOVA by R. Gnanadesikan 6. Monotonicity and Unbiasedness Properties of ANOVA and MANOVA Tests by S. Das Gupta 7. Robustness of ANOVA and MANOVA Test Procedures by P. K. Ito 8. Analysis of Variance and Problems under Time Series Models by D. R. Brillinger 9. Tests of Univariate and Multivariate Normality by K. V. Mardia 10. Transformations to Normality by G. Kaskey, B. Kolman, P. R. Krishnaiah and L. Steinberg 11. ANOVA and MANOVA: Models for Categorical Data by V. P. Bhapkar 12. Inference and the Structural Model for ANOVA and MANOVA by D. A. S. Fraser 13. Inference Based on Conditionally Specified ANOVA Models Incorporating Preliminary Testing by T. A. Bancroft and C.-P. Han 14. Quadratic Forms in Normal Variables by C. G. Khatri 15. Generalized Inverse of Matrices and Applications to Linear Models by S. K. Mitra 16. Likelihood Ratio Tests for Mean Vectors and Covariance Matrices by P. R. Krishnaiah and J. C. Lee 17. Assessing Dimensionality in Multivariate Regression by A. J. Izenman 18. Parameter Estimation in Nonlinear Regression Models by H. Bunke 19. Early History of Multiple Comparison Tests by H. L. Harter 20. Representations of Simultaneous Pairwise Comparisons by A. R. Sampson 21. Simultaneous Test Procedures for Mean Vectors and Covariance Matrices by P. R. Krishnaiah, G. S. Mudholkar and P. Subbaiah 1217

1218

Contents of previous volumes

22. Nonparametric Simultaneous Inference for Some MANOVA Models by R K. Sen 23. Comparison of Some Computer Programs for Univariate and Multivariate Analysis of Variance by R. D. Bock and D. Brandt 24. Computations of Some Multivariate Distributions by E R. Krishnaiah 25. Inference on the Structure of Interaction Two-Way Classification Model by P. R. Krishnaiah and M. Yochmowitz

Volume 2. Classification, Pattern Recognition and Reduction of Dimensionality Edited by R R. Krishnaiah and L. N. Kanal 1982 xxii + 903 pp.

1. Discriminant Analysis for Time Series by R. H. Shumway 2. Optimum Rules for Classification into Two Multivariate Normal Populations with the Same Covariance Matrix by S. Das Gupta 3. Large Sample Approximations and Asymptotic Expansions of Classification Statistics by M. Siotani 4. Bayesian Discrimination by S. Geisser 5. Classification of Growth Curves by J. C. Lee 6. Nonparametric Classification by J. D. Broffitt 7. Logistic Discrimination by J. A. Anderson 8. Nearest Neighbor Methods in Discrimination by L. Devroye and T. J. Wagner 9. The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis by G. J. McLachlan 10. Graphical Techniques for Multivariate Data and for Clustering by J. M. Chambers and B. Kleiner 11. Cluster Analysis Software by R. K. Blashfield, M. S. Aldenderfer and L. C. Morey 12. Single-link Clustering Algorithms by E J. Rohlf 13. Theory of Multidimensional Scaling by J. de Leeuw and W. Heiser 14. Multidimensional Scaling and its Application by M. Wish and J. D. Carroll 15. Intrinsic Dimensionality Extraction by K. Fukunaga 16. Structural Methods in Image Analysis and Recognition by L. N. Kanal, B. A. Lambird and D. Lavine 17. Image Models by N. Ahuja and A. Rosenfield 18. Image Texture Survey by R. M. Haralick 19. Applications of Stochastic Languages by K. S. Fu 20. A Unifying Viewpoint on Pattern Recognition by J. C. Simon, E. Backer and J. Sallentin 21. Logical Functions in the Problems of Empirical Prediction by G. S. Lbov 22. Inference and Data Tables and Missing Values by. N. G. Zagoruiko and V. N. Yolkina

Contents of previous volumes

1219

23. Recognition of Electrocardiographic Patterns by J. H. van Bemmel 24. Waveform Parsing Systems by G. C. Stockman 25. Continuous Speech Recognition: Statistical Methods by E Jelinek, R. L. Mercer and L. R. Bahl 26. Applications of Pattern Recognition in Radar by A. A. Grometstein and W. H. Schoendorf 27. White Blood Cell Recognition by E S. Gelsema and G. H. Landweerd 28. Pattern Recognition Techniques for Remote Sensing Applications by P. H. Swain 29. Optical Character Recognition - Theory and Practice by G. Nagy 30. Computer and Statistical Considerations for Oil Spill Identification by Y. T. Chien and T. J. Killeen 31. Pattern Recognition in Chemistry by B. R. Kowalski and S. Wold 32. Covariance Matrix Representation and Object-Predicate Symmetry by T. Kaminuma, S. Tomita and S. Watanabe 33. Multivariate Morphometrics by R. A. Reyment 34. Multivariate Analysis with Latent Variables by P. M. Bentler and D. G. Weeks 35. Use of Distance Measures, Information Measures and Error Bounds in Feature Evaluation by M. Ben-Bassat 36. Topics in Measurement Selection by J. M. Van Campenhout 37, Selection of Variables Under Univariate Regression Models by P. R. Krishnaiah 38. On the Selection of Variables Under Regression Models Using Krishnaiah's Finite Intersection Tests by J. L. Schmidhammer 39. Dimensionality and Sample Size Considerations in Pattern Recognition Practice by A. K. Jain and B. Chandrasekaran 40. Selecting Variables in Discriminant Analysis for Improving upon Classical Procedures by W. Schaafsma 41. Selection of Variables in Discriminant Analysis by P. R. Krishnaiah

Volume 3. Time Series in the Frequency D o m a i n Edited by D. R. Brillinger and R R. Krishnaiah 1983 xiv + 485 pp.

1. Wiener Filtering (with emphasis on frequency-domain approaches) by R. J. Bhansali and D. Karavellas 2. The Finite Fourier Transform of a Stationary Process by D. R. Brillinger 3. Seasonal and Calendar Adjustment by W. S. Cleveland 4. Optimal Inference in the Frequency Domain by R. B. Davies 5. Applications of Spectral Analysis in Econometrics by C. W. J. Granger and R. Engle 6. Signal Estimation by E. J. Hannan

122o

Contents of previous volumes

7. 8. 9. 10. 1I. 12.

Complex Demodulation: Some Theory and Applications by T. Hasan Estimating the Gain of a Linear Filter from Noisy Data by M. J. Hinich A Spectral Analysis Primer by L. H. Koopmans Robust-Resistant Spectral Analysis by R. D. Martin Autoregressive Spectral Estimation by E. Parzen Threshold Autoregression and Some Frequency-Domain Characteristics by J. Pemberton and H. Tong 13. The Frequency-Domain Approach to the Analysis of Closed-Loop Systems by M. B. Priestley 14. The Bispectral Analysis of Nonlinear Stationary Time Series with Reference to Bilinear Time-Series Models by T. Subba Rao 15. Frequency-Domain Analysis of Multidimensional Time-Series Data by E. A. Robinson 16. Review of Various Approaches to Power Spectrum Estimation by E M. Robinson 17. Cumulants and Cumulant Spectra by M. Rosenblatt 18. Replicated Time-Series Regression: An Approach to Signal Estimation and Detection by R. H. Shumway 19. Computer Programming of Spectrum Estimation by T. Thrall 20. Likelihood Ratio Tests on Covariance Matrices and Mean Vectors of Complex Multivariate Normal Populations and their Applications in Time Series by E R. Krishnaiah, J. C. Lee and T. C. Chang

Volume 4. Nonparametric Methods Edited by E R. Krishnaiah and E K. Sen 1984 xx + 968 pp.

1. Randomization Procedures by C. B. Bell and R K. Sen 2. Univariate and Multivariate Multisample Location and Scale Tests by V. E Bhapkar 3. Hypothesis of Symmetry by M. Hugkov~i 4. Measures of Dependence by K. Joag-Dev 5. Tests of Randomness against Trend or Serial Correlations by G. K. Bhattacharyya 6. Combination of Independent Tests by J. L. Folks 7. Combinatorics by L. Tak~ics 8. Rank Statistics and Limit Theorems by M. Ghosh 9. Asymptotic Comparison of Tests - A Review by K. Singh 10. Nonparametric Methods in Two-Way Layouts by D. Quade 11. Rank Tests in Linear Models by J. N. Adichie 12. On the Use of Rank Tests and Estimates in the Linear Model by J. C. Aubuchon and T. E Hettmansperger

Contents of previous volumes

1221

13. Nonparametric Preliminary Test Inference by A. K. Md. E. Saleh and R K. Sen 14. Paired Comparisons: Some Basic Procedures and Examples by R. A. Bradley 15. Restricted Alternatives by S. K. Chatterjee 16. Adaptive Methods by M. Hugkowi 17. Order Statistics by J. Galambos 18. Induced Order Statistics: Theory and Applications by E K. Bhattacharya 19. Empirical Distribution Function by E Cs~ki 20. Invariance Principles for Empirical Processes by M. Cstrg6 21. M-, L- and R-estimators by J. Juretkovfi 22. Nonparametric Sequential Estimation by E K. Sen 23. Stochastic Approximation by V. Dupa6 24. Density Estimation by E Rtvtsz 25. Censored Data by A. E Basu 26. Tests for Exponentiality by K. A. Doksum and B. S. Yandell 27. Nonparametric Concepts and Methods in Reliability by M. Hollander and E Proschan 28. Sequential Nonparametric Tests by U. Mtiller-Funk 29. Nonparametric Procedures for some Miscellaneous Problems by P. K. Sen 30. Minimum Distance Procedures by R. Beran 31. Nonparametric Methods in Directional Data Analysis by S. R. Jammalamadaka 32. Application of Nonparametric Statistics to Cancer Data by H. S. Wieand 33. Nonparametric Frequentist Proposals for Monitoring Comparative Survival Studies by M. Gail 34. Meteorological Applications of Permutation Techniques Based on Distance Functions by E W. Mielke, Jr. 35. Categorical Data Problems Using Information Theoretic Approach by S. Kullback and J. C. Keegel 36. Tables for Order Statistics by E R. Krishnaiah and E K. Sen 37. Selected Tables for Nonparametric Statistics by E K. Sen and E R. Krishnaiah

Volume 5. Time Series in the Time Domain Edited by E. J. Hannan, E R. Krishnaiah and M. M. Rao 1985 xiv + 4 9 0 pp.

1. Nonstationary Autoregressive Time Series by W. A. Fuller 2. Non-Linear Time Series Models and Dynamical Systems by T. Ozaki 3. Autoregressive Moving Average Models, Intervention Problems and Outlier Detection in Time Series by G. C. Tiao 4. Robustness in Time Series and Estimating ARMA Models by R. D. Martin and V. J. Yohai

1222

Contents ().]'previous volumes

5. 6. 7. 8. 9.

Time Series Analysis with Unequally Spaced Data by R. H. Jones Various Model Selection Techniques in Time Series Analysis by R. Shibata Estimation of Parameters in Dynamical Systems by L. Ljung Recursive Identification, Estimation and Control by P. Young General Structure and Parametrization of ARMA and State-Space Systems and its Relation to Statistical Problems by M. Deistler 10. Harmonizable, Cram6r, and Karhunen Classes of Processes by M. M. Rao 11. On Non-Stationary Time Series by C. S. K. Bhagavan 12. Harmonizable Filtering and Sampling of Time Series by D. K. Chang 13. Sampling Designs for Time Series by S. Cambanis 14. Measuring Attenuation by M. A. Cameron and P. J. Thomson 15. Speech Recognition Using LPC Distance Measures by P. J. Thomson and P. de Souza 16. Varying Coefficient Regression by D. E Nicholls and A. R. Pagan 17. Small Samples and Large Equations Systems by H. Theil and D. G. Fiebig

Volume 6. Sampling Edited by R R. Krishnaiah and C. R. Rao 1988 xvi + 594 pp.

1. A Brief History of Random Sampling Methods by D. R. Bellhouse 2. A First Course in Survey Sampling by T. Dalenius 3. Optimality of Sampling Strategies by A. Chaudhuri 4. Simple Random Sampling by P. K. Pathak 5. On Single Stage Unequal Probability Sampling by V. P. Godambe and M. E. Thompson 6. Systematic Sampling by D. R. Bellhouse 7. Systematic Sampling with Illustrative Examples by M. N. Murthy and T. J. Rao 8. Sampling in Time by D. A. Binder and M. A. Hidiroglou 9. Bayesian Inference in Finite Populations by W. A. Ericson 10. Inference Based on Data from Complex Sample Designs by G. Nathan 11. Inference for Finite Population Quantiles by J. Sedransk and P. J. Smith 12. Asymptotics in Finite Population Sampling by P. K. Sen 13. The Technique of Replicated or Interpenetrating Samples by J. C. Koop 14. On the Use of Models in Sampling from Finite Populations by I. Thomsen and D. Tesfu 15. The Prediction Approach to Sampling Theory by R. M. Royall 16. Sample Survey Analysis: Analysis of Variance and Contingency Tables by D. H. Freeman, Jr. 17. Variance Estimation in Sample Surveys by J. N. K. Rao

Contents of previous volumes

1223

18. Ratio and Regression Estimators by P. S. R. S. Rao 19. Role and Use of Composite Sampling and Capture-Recapture Sampling in Ecological Studies by M. T. Boswell, K. E Burnham and G. E Patil 20. Data-based Sampling and Model-based Estimation for Environmental Resources by G. E Patil, G. J. Babu, R. C. Hennemuth, W. L. Meyers, M. B. Rajarshi and C. Taillie 21. On Transect Sampling to Assess Wildlife Populations and Marine Resources by E L. Ramsey, C. E. Gates, G. E Patil and C. Taillie 22. A Review of Current Survey Sampling Methods in Marketing Research (Telephone, Mall Intercept and Panel Surveys) by R. Velu and G. M. Naidu 23. Observational Errors in Behavioural Traits of Man and their Implications for Genetics by E V. Sukhatme 24. Designs in Survey Sampling Avoiding Contiguous Units by A. S. Hedayat, C. R. Rao and J. Stufken

Volume 7. Quality Control and Reliability Edited by E R. Krishnaiah and C. R. Rao 1988 xiv + 503 pp.

1. Transformation of Western Style of Management by W. Edwards Deming 2. Software Reliability by E B. Bastani and C. V. Ramamoorthy 3. Stress-Strength Models for Reliability by R. A. Johnson 4. Approximate Computation of Power Generating System Reliability Indexes by M. Mazumdar 5. Software Reliability Models by T. A. Mazzuchi and N. D. Singpurwalla 6. Dependence Notions in Reliability Theory by N. R. Chaganty and K. Joag-dev 7. Application of Goodness-of-Fit Tests in Reliability by B. W. Woodruff and A. H. Moore 8. Multivariate Nonparametric Classes in Reliability by H. W. Block and T. H. Savits 9. Selection and Ranking Procedures in Reliability Models by S. S. Gupta and S. Panchapakesan 10. The Impact of Reliability Theory on Some Branches of Mathematics and Statistics by P. J. Boland and E Proschan 11. Reliability Ideas and Applications in Economics and Social Sciences by M. C. Bhattacharjee 12. Mean Residual Life: Theory and Applications by E Guess and E Proschan 13. Life Distribution Models and Incomplete Data by R. E. Barlow and E Proschan 14. Piecewise Geometric Estimation of a Survival Function by G. M. Mimmack and E Proschan

1224

Contents of previous volumes

15. Applications of Pattern Recognition in Failure Diagnosis and Quality Control by L. F. Pau 16. Nonparametric Estimation of Density and Hazard Rate Functions when Samples are Censored by W. J. Padgett 17. Multivariate Process Control by E B. Alt and N. D. Smith 18. QMP/USP - A Modern Approach to Statistical Quality Auditing by B. Hoadley 19. Review About Estimation of Change Points by P. R. Krishnaiah and B. Q. Miao 20. Nonparametric Methods for Changepoint Problems by M. Cs6rg6 and L. Horv~ith 21. Optimal Allocation of Multistate Components by E. E1-Neweihi, E Proschan and J. Sethuraman 22. Weibull, Log-Weibull and Gamma Order Statistics by H. L. Herter 23. Multivariate Exponential Distributions and their Applications in Reliability by A. P. Basu 24. Recent Developments in the Inverse Gaussian Distribution by S. Iyengar and G. Patwardhan

Volume 8, Statistical Methods in Biological and Medical Sciences Edited by C. R. R a t and R. Chakraborty 1991 xvi + 554 pp.

1. Methods for the Inheritance of Qualitative Traits by J. Rice, R. Neuman and S. O. Moldin 2. Ascertainment Biases and their Resolution in Biological Surveys by W. J. Ewens 3. Statistical Considerations in Applications of Path Analytical in Genetic Epidemiology by D. C. Rat 4. Statistical Methods for Linkage Analysis by G. M. Lathrop and J. M. Lalouel 5. Statistical Design and Analysis of Epidemiologic Studies: Some Directions of Current Research by N. Breslow 6. Robust Classification Procedures and their Applications to Anthropometry by N. Balakrishnan and R. S. Ambagaspitiya 7. Analysis of Population Structure: A Comparative Analysis of Different Estimators of Wright's Fixation Indices by R. Chakraborty and H. Danker-Hopfe 8. Estimation of Relationships from Genetic Data by E. A. Thompson 9. Measurement of Genetic Variation for Evolutionary Studies by R. Chakraborty and C. R. Rat 10. Statistical Methods for Phylogenetic Tree Reconstruction by N. Saitou 11. Statistical Models for Sex-Ratio Evolution by S. Lessard 12. Stochastic Models of Carcinogenesis by S. H. Moolgavkar 13. An Application of Score Methodology: Confidence Intervals and Tests of Fit for One-Hit-Curves by J. J. Gart

Contents of previous volumes

1225

14. Kidney-Survival Analysis of IgA Nephropathy Patients: A Case Study by O. J. W. E Kardaun 15. Confidence Bands and the Relation with Decision Analysis: Theory by O. J. W. F. Kardaun 16. Sample Size Determination in Clinical Research by J. Bock and H. Toutenburg

Volume 9. Computational Statistics Edited by C. R. Rao 1993 xix + 1045 pp.

1. Algorithms by B. Kalyanasundaram 2. Steady State Analysis of Stochastic Systems by K. Kant 3. Parallel Computer Architectures by R. Krishnamurfi and B. Narahari 4. Database Systems by S. Lanka and S. Pal 5. Programming Languages and Systems by S. Purushothaman and J. Seaman 6. Algorithms and Complexity for Markov Processes by R. Varadarajan 7. Mathematical Programming: A Computational Perspective by W. W. Hager, R. Horst and E M. Pardalos 8. Integer Programming by E M. Pardalos and Y. Li 9. Numerical Aspects of Solving Linear Least Squares Problems by J. L. Barlow 10. The Total Least Squares Problem by S. van Huffel and H. Zha 11. Construction of Reliable Maximum-Likelihood-Algorithms with Applications to Logistic and Cox Regression by D. B6hning 12. Nonparametric Function Estimation by T. Gasser, J. Engel and B. Seifert 13. Computation Using the OR Decomposition by C. R. Goodall 14. The EM Algorithm by N. Laird 15. Analysis of Ordered Categorial Data through Appropriate Scaling by C. R. Rao and E M. Caligiuri 16. Statistical Applications of Artificial Intelligence by W. A. Gale, D. J. Hand and A. E. Kelly 17. Some Aspects of Natural Language Processes by A. K. Joshi 18. Gibbs Sampling by S. E Arnold 19. Bootstrap Methodology by G. J. Babu and C. R. Rao 20. The Art of Computer Generation of Random Variables by M. T. Boswell, S. D. Gore, G. E Patil and C. Taillie 21. Jackknife Variance Estimation and Bias Reduction by S. Das Peddada 22. Designing Effective Statistical Graphs by D. A. Burn 23. Graphical Methods for Linear Models by A. S. Hadi 24. Graphics for Time Series Analysis by H. J. Newton 25. Graphics as Visual Language by T. Selkar and A. Appel

1226

Contents of previous volumes

26. Statistical Graphics and Visualization by E. J. Wegman and D. B. Carr 27. Multivariate Statistical Visualization by F. W. Young, R. A. Faldowski and M. M. McFarlane 28. Graphical Methods for Process Control by T. L. Ziemer

Volume 10. Signal Processing and its Applications Edited by N. K. Bose and C. R. Rao 1993 xvii + 992 pp.

1. Signal Processing for Linear Instrumental Systems with Noise: A General Theory with Illustrations from Optical Imaging and Light Scattering Problems by M. Bertero and E. R. Pike 2. Boundary Implication Results in Parameter Space by N. K. Bose 3. Sampling of Bandlimited Signals: Fundamental Results and Some Extensions by J. L. Brown, Jr. 4. Localization of Sources in a Sector: Algorithms and Statistical Analysis by K. Buckley and X.-L. Xu 5. The Signal Subspace Direction-of-Arrival Algorithm by J. A. Cadzow 6. Digital Differentiators by S. C. Dutta Roy and B. Kumar 7. Orthogonal Decompositions of 2D Random Fields and their Applications for 2D Spectral Estimation by J. M. Francos 8. VLSI in Signal Processing by A. Ghouse 9. Constrained Beamforming and Adaptive Algorithms by L. C. Godara 10. Bispectral Speckle Interferometry to Reconstruct Extended Objects from Turbulence-Degraded Telescope Images by D. M. Goodman, T. W. Lawrence, E. M. Johansson and J. P. Fitch 11. Multi-Dimensional Signal Processing by K. Hirano and T. Nomura 12. On the Assessment of Visual Communication by E O. Huck, C. L. Fales, R. AlterGartenberg and Z. Rahman 13. VLSI Implementations of Number Theoretic Concepts with Applications in Signal Processing by G. A. Jullien, N. M. Wigley and J. Reilly 14. Decision-level Neural Net Sensor Fusion by R. Y. Levine and T. S. Khuon 15. Statistical Algorithms for Noncausal Gauss Markov Fields by J. M. E Moura and N. Balram 16. Subspace Methods for Directions-of-Arrival Estimation by A. Paulraj, B. Ottersten, R. Roy, A. Swindlehurst, G. Xu and T. Kailath 17. Closed Form Solution to the Estimates of Directions of Arrival Using Data from an Array of Sensors by C. R. Rao and B. Zhou 18. High-Resolution Direction Finding by S. V. Schell and W. A. Gardner

Contents of previous volumes

1227

19. Multiscale Signal Processing Techniques: A Review by A. H. Tewfik, M. Kim and M. Deriche 20. Sampling Theorems and Wavelets by G. G. Walter 21. Image and Video Coding Research by J. W. Woods 22. Fast Algorithms for Structured Matrices in Signal Processing by A. E. Yagle

Volume 11. Econometrics Edited by G. S. Maddala, C. R. Rao and H. D. Vinod !993 xx + 783 pp.

1. Estimation from Endogenously Stratified Samples by S. R. Cosslett 2. Semiparametric and Nonparametric Estimation of Quantal Response Models by J. L. Horowitz 3. The Selection Problem in Econometrics and Statistics by C. E Manski 4. General Nonparametric Regression Estimation and Testing in Econometrics by A. Ullah and H. D. Vinod 5. Simultaneous Microeconometric Models with Censored or Qualitative Dependent Variables by R. Blundell and R. J. Smith 6. Multivariate Tobit Models in Econometrics by L.-E Lee 7. Estimation of Limited Dependent Variable Models under Rational Expectations by G. S. Maddala 8. Nonlinear Time Series and Macroeconometrics by W. A. Brock and S. M. Potter 9. Estimation, Inference and Forecasting of Time Series Subject to Changes in Time by J. D. Hamilton 10. Structural Time Series Models by A. C. Harvey and N. Shephard 11. Bayesian Testing and Testing Bayesians by J.-E Florens and M. Mouchart 12. Pseudo-Likelihood Methods by C. Gourieroux and A. Monfort 13. Rao's Score Test: Recent Asymptotic Results by R. Mukerjee 14. On the Strong Consistency of M-Estimates in Linear Models under a General Discrepancy Function by Z. D. Bai, Z. J. Liu and C. R. Rao 15. Some Aspects of Generalized Method of Moments Estimation by A. Hall 16. Efficient Estimation of Models with Conditional Moment Restrictions by W. K. Newey 17. Generalized Method of Moments: Econometric Applications by M. Ogaki 18. Testing for Heteroscedasticity by A. R. Pagan and Y. Pak 19. Simulation Estimation Methods for Limited Dependent Variable Models by V. A. Hajivassiliou 20. Simulation Estimation for Panel Data Models with Limited Dependent Variable by M. E Keane

1228

Contents of previous volumes

21. A Perspective Application of Bootstrap Methods in Econometrics by J. Jeong and G. S. Maddala 22. Stochastic Simulations for Inference in Nonlinear Errors-in-Variables Models by R. S. Mariano and B. W. Brown 23. Bootstrap Methods: Applications in Econometrics by H. D. Vinod 24. Identifying Outliers and Influential Observations in Econometric Models by S. G. Donald and G. S. Maddala 25. Statistical Aspects of Calibration in Macroeconomics by A. W. Gregory and G. W. Smith 26. Panel Data Models with Rational Expectations by K. Lahiri 27. Continuous Time Financial Models: Statistical Applications of Stochastic Processes by K. R. Sawyer

Volume 12. Environmental Statistics Edited by G. E Patil and C. R. Rao 1994 xix + 927-pp.

1. Environmetrics: An Emerging Science by J. S. Hunter 2. A National Center for Statistical Ecology and Environmental Statistics: A Center Without Walls by G. P. Patil 3. Replicate Measurements for Data Quality and Environmental Modeling by W. Liggett 4. Design and Analysis of Composite Sampling Procedures: A Review by G. Lovison, S. D. Gore and G. P. Patil 5. Ranked Set Sampling by G. P. Patil, A. K. Sinha and C. Taillie 6. Environmental Adaptive Sampling by G. A. F. Seber and S. K. Thompson 7. Statistical Analysis of Censored Environmental Data by M. Akritas, T. Ruscitti and G. P. Patil 8. Biological Monitoring: Statistical Issues and Models by E. P. Smith 9. Environmental Sampling and Monitoring by S. V. Stehman and W. Scott Overton 10. Ecological Statistics by B. E J. Manly 11. Forest Biometrics by H. E. Burkhart and T. G. Gregoire 12. Ecological Diversity and Forest Management by J. H. Gove, G. P. Patil, B. E Swindel and C. Taillie 13. Ornithological Statistics by P. M. North 14. Statistical Methods in Developmental Toxicologyby P. J. Catalano and L. M. Ryan 15. Environmental Biometry: Assessing Impacts of Environmental Stimuli Via Animal and Microbial Laboratory Studies by W. W. Piegorsch 16. Stochasticity in Deterministic Models by J. J. M. Bedaux and S. A. L. M. Kooijman

Contents of previous volumes

1229

17. Compartmental Models of Ecological and Environmental Systems by J. H. Matis and T. E. Wehrly 18. Environmental Remote Sensing and Geographic Information Systems-Based Modeling by W. L. Myers 19. Regression Analysis of Spatially Correlated Data: The Kanawha County Health Study by C. A. Donnelly, J. H. Ware and N. M. Laird 20. Methods for Estimating Heterogeneous Spatial Covariance Functions with Environmental Applications by P. Guttorp and P. D. Sampson 21. Meta-analysis in Environmental Statistics by V. Hasselblad 22. Statistical Methods in Atmospheric Science by A. R. Solow 23. Statistics with Agricultural Pests and Environmental Impacts by L. J. Young and J. H. Young 24. A Crystal Cube for Coastal and Estuarine Degradation: Selection of Endpoints and Development of Indices for Use in Decision Making by M. T. Boswell, J. S. O'Connor and G. P. Patil 25. How Does Scientific Information in General and Statistical Information in Particular Input to the Environmental Regulatory Process? by C. R. Cothern 26. Environmental Regulatory Statistics by C. B. Davis 27. An Overview of Statistical Issues Related to Environmental Cleanup by R. Gilbert 28. Environmental Risk Estimation and Policy Decisions by H. Lacayo Jr.