BIOSTATISTICS: An Introductory Text

418 88 16MB

English Pages 298

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Fundamentals of Polymer Science: An Introductory Text 1566765595

411 30 206KB Read more

Meaning-Centered Grammar: An Introductory Text 1904768105, 9781904768104

Rather than narrowly dividing language between correct and incorrect, this book promotes a respect for the power and use

471 37 4MB Read more

Modern Monetary Theory and Practice: An Introductory Text

Modern Monetary Theory and Practice: An Introductory Text is an introductory textbook for university-level macroeconomic

644 32 4MB Read more

Introductory Text-Book of Meteorology

229 101 7MB Read more

Introductory biostatistics [1 ed.] 9780471418160, 0-471-41816-1

* Provides many real-data sets in various fields in the form of examples at at the end of all twelve chapters in the for

357 37 3MB Read more

Introductory applied biostatistics [1 ed.] 9780495014713, 0495014710, 9780534423995, 053442399X

INTRODUCTORY APPLIED BIOSTATISTICS (WITH CD-ROM) explores statistical applications in the medical and public health fiel

1,126 122 59MB Read more

Meteorology: An Introductory Treatise

197 41 12MB Read more

An Introductory Transformational Grammar

684 64 7MB Read more

Translating Texts: An Introductory Coursebook on Translation and Text Formation [1 ed.] 0415788099, 9780415788090

Clear and accessible, this textbook provides a step-by-step guide to textual analysis for beginning translators and tran

118 83 2MB Read more

Gramsci, Culture and Anthropology: An Introductory Text (Reading Gramsci) 0745316778, 9780745316772

407 54 705KB Read more

BIOSTATISTICS: An Introductory Text

Author / Uploaded
Avram Goldstein

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

llpPMCE

OURPR^E

m

fli

tt£i

Digitized by the Internet Archive in

2012

http://archive.org/details/biostatisticsintOOgold

'r

BIOSTATISTICS: An

Introductory Text

BIOSTATISTICS An Introductory

Text AVRAM GOLDSTEIN Professor of Pharmacology, Stanford University

THE MACMILLAN COMPANY,

NEW YORK

COLLIER-MACMILLAN LIMITED, LONDON

©

COPYRIGHT, AVRAM GOLDSTEIN,

1964

No part of this book be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without permission in writing from the Publisher. All rights reserved.

may

Fifth Printing

1967

Library of Congress catalog card number: 64-11036

The Macmillan Company,

New York

Collier-Macmillan Canada, Ltd., Toronto, Ontario Printed in the United States of America

To dbg

Preface

book

This

traces

its

pharmacology course

origin to the

at

Medical School given more than ten years ago, where Douglas

and

I

undertook to teach medical students some principles of

Harvard S.

Riggs

biostatistics.

There was a short monograph, and also a laboratory exercise in tossing pennies and drawing colored shoe-buttons out of jars.

A

few years

later

(Riggs having meanwhile taken the chair of pharmacology at Buffalo) I

obtained the cooperation of William E. Reynolds (then in the Depart-

ment of Preventive Medicine of California) and together biostatistics course for

in a

few cases

much

at

Harvard, now associated with the University

we organized and taught

its

Training in

statistics has,

and

was being

in the years since, biostatistics

most

institutions.

of course, long been recognised as indispensable

and of psychology and other behavioral

sciences.

teaching experience with students of medicine and the biological

sciences has largely shaped the content

few students in these mastery of

its

fields will

mathematical

and format of

wish to learn

basis.

The

statistics

vide useful tools for their

where, and

He

how

life's

work. The

know how

book. Very

great majority, however, are well

know

it

will pro-

shown when, general way why they

student must be

the tools can best be used, and in a

does not have to

this

through a systematic

disposed toward the subject at the outset because they

work.

required

Harvard medical students. At the same time (and

rightful place in the medical curriculum of

for students of biology,

My

first

earlier) regular instruction in this subject

established at other medical schools;

has taken

the

the tools were fashioned, nor even

the proofs that they can

conclusions to which

underlying

do what the

it

statisticians

or to understand

intelligently,

biostatistics

Preface

vii

To

use

claim for them.

use by others and the

its

of the rationale

leads, requires principally a grasp

applications and of the correct ways of formulating prob-

its

lems and hypotheses for

statistical

analysis.

have tried to frame the

I

arguments, explanations, illustrations, examples, and problems in terms

common

to everyday laboratory or clinical experience.

meant to be sufficient

in itself for all possible needs.

prototypes of a great

include

and the tables

will satisfy every

many problems ordinary

The book

is

not

However, the examples

commonly

that

arise,

requirement. Thus the book

should continue to serve as a working manual for most procedures long after

has

it

introductory

fulfilled its role as

text.

The description of procedures and the construction of the tables proceed from two premises first, that biological experiments hardly ever

—

yield data that are

meaningful beyond three significant

that the conventional levels of significance

for

sufficient

all

ordinary purposes.

It

(0.05

has

figures,

and second,

and 0.01) are quite

therefore

been

possible,

numerous

simplifi-

rule has been followed: If

an exact

especially in transcribing the various tables, to effect

cations and abbreviations. In rounding off

half

numbers the usual round

to be dropped,

is

half of

the

all

small, so

to the nearest even

numbers retained are

no systematic bias

slightly

results.

number. In the long run

too large, half slightly too

However,

of

in the tables

critical

values of various statistics, the following rule has been adopted: If any

be dropped, round in the conservative direction, upward or

digits are to

downward depending upon test

of significance

Worthy of

will

special

/-test,

in their

and are

be very slightly more rigorous than intended.

mention

parametric procedures. the

Two

is

the inclusion of three very useful non-

of these compare favorably

was

I

also

am

very

book originated

much indebted

good enough

text.

teaching experiments

to Drs. Riggs

and Reynolds. The improving

Dr. Rupert G. Miller, Associate Pro-

fessor of Statistics at Stanford, offered

am

in the

to offer constructive suggestions for

an early draft of the present

I

with

more widespread adoption.

referred to,

and

in efficiency

far simpler to apply, yet there has been a surprising lag

Insofar as the ideas in this

latter

Thus an occasional

the particular statistics.

much

helpful criticism

and advice,

deeply appreciative of the patience and care with which he read

the manuscript. Dr. Helena C. Kraemer, Research Associate in Statistics, carried out a painstaking verification of the examples

and problems, and

PREFACE

viii

made

several

useful

criticisms.

Many

incorporated, but, of course, they share

may

remain.

I

owe thanks

Sumner M. Kalman,

also to

for their

my

of their suggestions have been

no

responsibility for errors that

colleagues, Drs. Lewis

Aronow and

comments on a preliminary draft, and for me from the obligations of co-

their generosity in temporarily releasing

authorship of another text so that to the late Professor Sir

I

might complete

Ronald A.

this one. I

Fisher, F.R.S.,

am indebted

Cambridge and

Dr. Frank Yates, F.R.S., Rothamsted, also to Messrs. Oliver Ltd., Edinburgh, for permission to reprint Tables

from

their

book

secretarial

II,

I

wish to express

assistance

my

&

IV, XI, and

Statistical Tables for Biological, Agricultural

Research. Finally, for

I,

gratitude to Mrs.

to

Boyd

XXVII

and Medical

Ray

Jeffery

of the highest quality, without which the

preparation of this book would have been far

more

difficult.

Avram Goldstein

Contents

CHAPTER

I

The Logical Basis of Statistical Inference

CHAPTER

1

2

Data

34

Enumeration Data

93

Quantitative

CHAPTER CHAPTER

3

4

Correlation

Tables 1. Random

129

234

Digits

2.

Squares of Numbers

235

3.

Four-place Logarithms

239

4.

Areas of the Normal Curve

241

5.

Critical Values of

242

6.

Factors (K) for One-sided

/

7.

Critical Values of the Vari-

8.

Factors (k*) for the Student-

9.

Critical values of

ized

10.

Critical in

F

x

2

11.

Confidence Limits for the Poisson Expectation

14.

Significance of an Observed

Difference Between

15.

Critical Values of

16.

Critical

257 in the

Two-sample Rank Test Values

Sum

of

248

Smaller

Signed-ranks Test

(T) in the

262

Conversion of Percents to

18.

Working

263

Probits

251 252

258

the

17.

the Smaller Binomial

Binomial Confidence Limits

U

256

two

Poisson Variables

250

Numbers of Items

Category

254

13.

244

Range

Values of the Exponential Function, e~ x

243

Tolerance Limits

ance Ratio,

12.

and Weighting Coefficients Probits

263

267

Index ix

BIOSTATISTICS: An

Introductory Text

CHAPTER The Logical Basis of Statistical Inference

INTRODUCTION

A

good understanding of

in the training

medicine.

biostatistics has

become an

of students in every branch of the biological sciences and

The reason

tools for the design

is

that the

methods of

biostatistics are indispensable

and interpretation of experiments. The wise

gator draws a statistician into consultation to

ment a

is

make

well designed to answer the question at hand,

maximum

essential ingredient

of useful information with a

minimum

investi-

sure that his experi-

and that

it

will yield

expenditure of animals,

patients, or time. In evaluating data he again utilizes the statistician's skill

ings will permit, but still

and as general as the

so that his conclusions will be as definitive

no more

so.

find-

many experiments

Unfortunately, too

being carried out and published are so poorly designed that they

cannot support any valid conclusions. The purpose of

this

book

is

pri-

marily to outline the logical basis of the statistical approach to experi-

mental problems, and the main features of those

commonly used

in biological

experimentation.

A

better

statistical

methods

working knowledge

of the procedures and their rationale on the part of every student of the biological sciences

and medicine can only be

acquaintance should lead not only to some

beneficial.

facility in

Such a preliminary

applying biostatistical

methods, but also to a better appreciation of their potential value, so that expert advice will be sought

The

full text is

before and

more

readily

intended for the reader

when

who

the occasion arises.

has never studied

whose mathematical training may be rather

scant.

statistics

Emphasis

is

placed upon the applications of biostatistics to real experimental problems I

THE LOGICAL BASIS OF STATISTICAL INFERENCE

2

The

of the kinds encountered in the laboratory or clinic.

each procedure

no attempt

is

is

made

to provide

some previous exposure to still

mathematical proofs. The student with or with a

statistics,

matical background, will find profit

rationale of

explained intuitively, or demonstrated empirically, but

much of the

more

sophisticated mathe-

text too elementary, but

should

from studying the examples and working the problems. Numer-

ous references are provided to sources of additional explanation, to

mathematical proofs, and to texts containing further

illustrative

examples.

All the necessary tables are at the back of the book.

SOME IMPORTANT ASPECTS OF EXPERIMENTAL DESIGN The

principal application of biostatistics

No

derived from experiments.

ment, the conclusions

may

matter

how

This

is

some fundamental

a subject with

many

of data

be false unless the experiments themselves

were properly designed and carried out. considering

in the analysis

is

elegant the statistical treat-

well, therefore, to begin

It is

by

principles of biological experimentation.

ramifications,

and

special

should be consulted for fuller information. Here

we

works devoted

to

it

shall discuss briefly

some of the most important requirements of proper experimental design, which bear upon the validity of data to which statistical analysis is to be applied.

The problems

that mainly concern us have to

experimental manipulations upon interventions

biological

do with the

systems.

by the investigator are known as treatments.

produced by a treatment

We may

is

known

effects

of

Such deliberate

A

result

as an effect.

wish to find out whether or not a certain drug reduces the

concentration of glucose in the circulating blood. The treatment consists in

administering the drug under specified conditions of dosage, frequency,

duration,

and so on, to a group of

subjects.

The

effect

would be a measur-

able lowering of blood glucose concentration.

We

might want to

know how

the lever-pressing behavior of rats

affected by periodic food reinforcements.

The treatment

is

is

the specified

reinforcement schedule. The effect would be a measurable change in the rate or temporal pattern of lever presses.

We may

be interested in learning

how exposure

to heat influences the

subsequent germination of tomato seeds. The treatment

is

exposure of the

seeds to certain temperatures for specified periods of time.

The

effect

Some important aspects of experimental design

would be measurable

as a

change

3

percent of seeds which germinate,

in the

or in the average time to germination, or in

some

qualitative feature of the

germination process.

Since in

we wish

to

know what

effect a

treatment produces, the chief aim

planning and conducting an experiment

no factor other than the treatment

that

result.

This ideal

is

is

will

to ensure insofar as possible

contribute to the observed

almost never attainable, since extraneous influences are

nearly always present.

The

practical

aim

is

therefore to ensure that

treated subjects to

the

all

all

upon the

influences except for the treatment under test will act equally

and upon a comparable group of control subjects exposed

same conditions but not

to the treatment.

Let us examine a possible experimental design for investigating the effects

of a drug that might lower the blood glucose

subjects

is

assembled

convenient time

at a

blood samples are drawn. The drug samples are drawn again.

When

is

in

the

The group of morning and initial level.

then given, and an hour later blood

glucose concentrations are determined

they are found to be considerably lower in every postdrug sample than in the corresponding initial sample.

Can

it

be concluded that the drug

lowers blood glucose? Certainly not. In this case the fault in the experi-

mental design

is

control group at

transparent— all subjects were treated and there was no all.

We

do not know what would have happened

blood glucose concentrations

hour

interval if the

in these

subjects over the

same one-

drug had not been given.

above example an apparent

In the

same

to the

effect

(lowering of blood glucose)

was found, but because the experiment was uncontrolled there was no

way of deciding whether responsible. Suppose,

on the other hand,

that there

had been no change

blood glucose concentration. One might be tempted,

in

in that case, to

dismiss the possibility that the drug lowers blood glucose, but a

convince one that no such inference can be drawn.

reflection will

was

or not the treatment (drug administration)

drug not been administered, the blood glucose of possibly have increased during the

same period,

reduced what otherwise would have been a

all

little

Had

the

the subjects might

so that the drug effectively

much

higher concentration.

Thus, regardless of the outcome, no valid conclusion can be drawn from an uncontrolled experiment. Let us design.

On

the

now

The first

consider the following improvement in the experimental

subjects' cooperation

is

obtained on two successive mornings.

(control) morning, blood samples are

again one hour

later,

but no drug

is

administered.

On

drawn the next

initially

and

morning the

4

THE LOGICAL BASIS OF STATISTICAL INFERENCE

identical

procedure

repeated, but the drug

is

given as soon as the

is

initial

blood samples have been secured. Suppose the chemical analyses now reveal that on the first day there was no important difference between the initial

and

blood glucose

final

concentrations

whereas on the second day the

levels,

Can we

during the hour after drug administration.

all fell

then conclude that the drug was responsible for lowering the blood

glucose?

might seem that proper controls are now built into the experiment, but two major faults remain, which render any conclusion uncertain. First, the very act of drug administration may have influenced the observed It

outcome, even

if

what would happen inert pills

The question must be answered,

the drug itself were inert.

to the blood glucose levels

if

dummy

injections, or

were given, instead of the drug? The importance of

this

kind of

simulated treatment (known as placebo control) will be considered at

The second major fault is the assumption that the only first day and the second was that drug was given

length later.

difference between the

on the

own

latter

but not on the former.

deliberate intervention

is

To

attribute

common and

a

we may

nonetheless a serious one. Whatever treatment

may

influences about

which we know nothing

observed results

which we interpret as treatment

is

that today

is

nutritional states

at all

prime importance to our

understandable error, but administer, other

cause part or

effects.

all

The simple

of the truth

not yesterday. The subjects' physical, emotional, and

may have been

quite different

on the two days, the room

temperature and other environmental circumstances

and so on. Certainly

may have

differed,

blood glucose

different (and possibly relevant to

responses) are subjects' emotional reactions to the novel experience of the first

day

in contrast to the familiar

laboratory experiments, conditions

one of the second. Even

may change from one who may then

without the awareness of the investigator, to a treatment

what was

really

caused by an

in the simplest

time to another falsely attribute

unknown and extraneous

circumstance.

For reasons made evident above, a good experiment whenever possible

embodies the principle of concurrent control group in the

control,

same experiment with the

i.e.,

the inclusion of a

treated group or groups.

Sometimes there are good reasons why an experiment cannot include concurrent controls, but however good the reasons clusions of such experiments are uncertainty.

The commonest

comparison.

A

bound

to be clouded

illustration

certain drug, for example,

of this is

is

may

be, the con-

by some degree of the "before-after"

generally believed to prolong

Some important aspects of experimental design

the survival of children suffering from leukemia.

drug-treated children of today survive

It is

5

certainly true that the

many months

longer than did

victims of this disease before introduction of the drug. During the

same

period of years, however, numerous other advances in medical care have occurred.

Would

today's patients do as poorly without the drug as did

have other influences contributed

their counterparts several years ago, or

to the apparent beneficial effects of today's

drug treatments? This question

can no longer be answered experimentally, because ethical to withhold a drug that

establish a concurrent control

human

is

would not now be

it

believed to be effective, in order to

group of

patients. This peculiar difficulty in

experimentation points to the importance of conducting thoroughly

conclusive experiments early in the

trial

period of any

new drug or

thera-

peutic procedure.

Assuming

that an experiment will include a concurrent control,

shall subjects

be assigned to the treatment and control groups?

be supposed that any haphazard allocation of subjects would experience shows that this is

It

how

might

suffice,

but

so. Essential to

good experimental design

as nearly complete an equivalence of control

and treatment groups as

is

not

can be achieved. Otherwise differences arise

from the treatment, which may

among

As

outcome

will

be supposed to

really only reflect innate differences

the subjects themselves.

Consider a

an

in the

clinical trial to

determine whether a new drug

inert material (a placebo) in shortening the duration

patients with colds appear (for

physician-investigator at his

example

own whim

in

of

is

superior to

common

an industrial

colds.

clinic),

the

prescribes either the drug or the

placebo. Subconscious bias can play a surprisingly large role in deter-

mining such assignments, so that the patients with milder symptoms may

more often and those with severe symptoms may receive more often. The outcome of such an experiment may be that

receive the drug

the placebo

the colds are over sooner in the drug-treated subjects, but the conclusion that the drug

was responsible does not merit confidence,

since the drug-

treated

group might well have had shorter colds than the placebo group

even

the drug were worthless.

It

if

should not be thought that inadvertent selection

to groups occurs only in

human

in

assigning subjects

experimentation. So simple a matter as

dividing 50 mice into two equal groups for control and treatment can be

hazardous, because some characteristics of the mice

may

readily influence

their allocation by the investigator to one or the other group. For example, if

he removes 25 mice to another cage, they are very likely to be heavier and

.

THE LOGICAL BASIS OF STATISTICAL INFERENCE

6

sluggish (easier to catch) than the ones

more

left

behind.

Groups

selected

such a manner could not really be used for any experiment whatsoever.

in

Regardless of the outcome, one would have doubts about the equivalence the control

oi^

and treated groups, and so any conclusion about the upon a shaky foun-

presence or absence of a treatment effect would rest dation.

The only thoroughly

way

reliable

randomize the assignments.

1

to set

up equivalent groups

Here the key requirement

is

is

to

that no character-

istic of a subject whatsoever shall play any part in his assignment to a group. Tossing coins, rolling dice, or drawing lots are suitable procedures. The

most universally applicable random device is the table of random numbers. Such tables have been generated by a system (e.g., electronically) designed so that each of the ten digits has equal probability of appearing at

position in a sequence. Table

from a larger random of which

Example

An

will

l-l.

an extract of 1,000 sequential

is

1

lends itself to a variety of uses, only one

series. It

be illustrated here.

Randomization.

experiment

is

among four different Make the assignments

to include 100 subjects, divided equally

treatment groups: placebo, drug A, drug B, and drug C.

randomly, by means of Table

The

first

subjects

any

digits

step

is

1

number

to assign a

would be numbered from

to each subject. In this case the

01 to 100 (denoted

by

00).

Then

enter

the table at any point and begin filling one of the groups according to the

sequences of numbers in the table. For example, left

to 47,

corner of Table

fill

1,

we

the placebo group

and so on, to

this

first,

we

the placebo group

we continue

we

it.

number

When

in the

start at the

... If we

assign subjects 94,

group. If a

appears again we simply ignore

if

94847 47234 476

find there

84,

upper

have decided

74,

72,

34,

that has already been used

25 subjects have been placed in

same way

to assign 25 subjects to

each of the next two groups. Then the remaining 25 are placed automatically in the final group.

It will

be observed that every subject has an

equal chance of being placed in any group, and also an equal chance of

being placed together with any other subject. randomizing is that it is possible on the basis of a model randomization to draw sound statistical inferences, the probability basis of the model being provided not by wishful thinking but by the actual process of randomization which is part of the experiment. ..." H. SchefTe, The Analysis of Variance (New York: John Wiley & Sons, Inc., 1959), p. 106.

^'The

logical reason for

reflecting the

Some important aspects

The

step (assigning a

first

there be an actual

list

number

7

of experimental design

to each subject) does not require that

of subjects. The sequential numbers might refer to

the order in which patients may, in the future, present themselves at a

would mean

hospital clinic. In that case the assignments described above

that in order of their arrival at the clinic, the 94th, 84th, 74th, 72nd (and

would be placed

so on) patients It is

good idea

a

maximum

use

to let

in a placebo group.

numbers represent

made of the numbers

is

12 subjects to be divided into groups. If

we would probably have

01 to 12,

the table before

all

to discard a great

cedure

to use as

is

subjects in such a

in the table.

Now

yield

we used only

we would have many numbers along the way. A more efficient promany cycles of 12 as we can. Eight cycles of 12 will fit

we

two-digit

divide

it

division by 12.

and

01

of Table

warning

order about an appealingly simple but incorrect

which the

digits in

Table

to three groups, the digits 7, 8,

9 for

3

the table.

Thus

if

the digits 31665 appear,

B.

Now

list

some of

as

to

we would

group A; the

the groups are

desired

number of

subjects

toward the end of the

bound

list

will

at the

beginning

is

filling

numbers

in the list is

out the list

"list"

is

filled

last

will

first

and

fifth

to

with the

vacant group.

be

same group

very substantially higher than

These defects may have serious practical consequences

where the

and

in

be evident that

therefore less than in a truly

cedure, and the probability that he will be in the

above him

be

will

6 for

have an unusually high probability

probability that a subject at the end of the

group as one

lo

it

5,

the subjects

assign the

third, fourth,

subjects sooner than others,

of being assigned together, particularly in

just

4,

Now

group C; and C would be discarded.

second subjects on the

The

in assigning 12 subjects

might stand for group A;

are assigned to groups according to the sequence of

list

group

2,

1,

way of

are permitted to stand for the

1

groups rather than for the subjects. For example,

on a

Thus, on the bottom

repeated 10 represented by the final 58 being discarded).

in

in

12.

would represent subjects

the sequence 24415 95858

1,

is

randomizing,

group B;

greater than 12,

is

by 12 and use only the remainder to designate a subject,

12, 5, 11, 10 (the

A

they

if

96 will obviously

therefore enter the table

number

allowing a remainder of zero to denote subject line

13 to

We

96. If a

be discarded

will

random numbers from

numbers between

all

the digit pairs from

those particular numbers turned up, and

random remainders on

and accept

that

to look through a considerable part of

between 01 and 96; the digit-pairs 97, 98, 99, 00 appear.

way

Suppose there are only

in

in the

random

same pro-

as the subject it

should be.

experiments

wholly or partly determined by some characteristics of

8

THE LOGICAL BASIS OF STATISTICAL INFERENCE

the subjects, as

when animals have

and then randomly

to be caught

For example, a group of the animals that were hardest to catch would very likely be assigned together to the group that happened distributed.

to be filled last.

To

avoid using the same sequence of

cedure

is

random

digits repeatedly, a pro-

desirable for entering the table at a different place

employed.

Any method

that accomplishes this

is

suitable.

each time

it is

One can simply

mark each stopping place and begin there on the next occasion. Alternatively, one can number the columns and rows so that a randomly chosen

number can then

specify a point of entry.

Once the random assignment of subjects the remaining concern

is

that

to groups has been completed,

groups be subjected to identical con-

all

ditions during the experiment, except for the treatments

Again the

pitfalls

A

many.

are

major one

is

under study.

failing to recognize,

or

minimizing the importance of some of the conditions associated with, but the treatment. Consider an experiment

not considered an intrinsic part

of,

to determine whether a certain

drug diminishes the

control mice remain untreated and undisturbed.

removed from

The

their cages several times daily

criterion of effect

is

the

number and

fertility

The

of mice. The

treated mice are

and injected with the drug.

size

of

litters

within a given

period of time. These might turn out to be substantially smaller in the treated group. Nevertheless, a conclusion that the drug reduces fertility

may

be quite

false,

because the controls were not subjected to the same

conditions as the treated mice. Repeated handling or the trauma of injection

may have played

If the

a major role in reducing fertility in the treated group.

treatment group received drug injections, the control group should

have had placebo injections on the same occasions. If a

treatment under study

subjected to a

including the

is

an operative procedure, controls must be

sham operation, as nearly like the real one as possible, and same anesthetics (which often produce significant effects

themselves). If a lesion

is

to be placed electrolytically in

an animal's

brain, an identical electrode should be similarly placed in a control animal,

but without passage of the electrolytic current. Ideally, to avoid inadvertent selection of animals, the electrode

would be inserted before

it

had been

decided whether the particular animal was to be treated or kept as a control,

and the decision whether or not to pass the

would then be made by tossing a

electrolytic current

coin. Precautions of this kind

may

Some important aspects

of experimental design

9

sometimes seem extreme, even absurd, but the careful experimenter keeps

them

A

in

mind and employs them whenever he can reasonably do

known

special technique

so.

as blind design ensures against the investi-

gator's bias (conscious or unconscious) influencing the conduct of

experiment or the evaluation of the is

that the personnel

who

when treatment and

The

Sometimes

this

an

essence of this procedure

carry out the treatments should not

subjects belong to which groups. as

results.

is

know which

manifestly impossible,

control procedures differ grossly. However,

if

control animals are to receive inert injections while others receive drugs, it is

not very

difficult to

arrange for the actual injections to be coded, and

then administered by someone It is

who

"blind" with respect to the coding.

is

even more important when criteria of

effect are

being assessed, that

the investigator be able to measure, count, or otherwise evaluate results

with complete objectivity. Objectivity cannot be guaranteed

if

he knows to

which groups the subjects belong, because he usually has some emotional stake in the experiment, or he probably It

would not have undertaken

might be thought that insistence upon "blind" technique

is

it.

merely an

exaggerated fussiness about abstract principles of experimental design.

On the contrary, is full

the literature of experimental psychology and of medicine

of reports on experiments whose outcomes were determined more

by an investigator's bias than by any treatment under will suffice.

One example

Shortly after the introduction of the antihistamine drugs into

medicine, their effectiveness in the treatment of the investigated in several field

new

test.

trials.

The

results

common

cold was

were very favorable to the

drugs, with the consequence that antihistamines were widely

as cold cures. In these experiments, however,

promoted

no safeguards had been

employed to prevent the examining physicians (who rated the progress of each cold) from knowing which subject had received placebo and which

had

received

drug.

"Blind"

experiments,

undertaken

subsequently,

showed consistently and conclusively that the drugs were without

effect.

Later, in the elegant clinical trials that established the value of strepto-

mycin

in the

treatment of pulmonary tuberculosis, the "blind" technique

was scrupulously observed. Even the

radiologists evaluating X-ray films

were not allowed to know what treatments had been given, since independent studies had revealed a pronounced influence of such prior information

upon radiological

interpretations.

although measurements surprising

how

may

be

Even

made

in

laboratory experimentation,

with very accurate instruments,

it is

often accidental errors can occur in the direction of an

investigator's bias!

THE LOGICAL

10

BASIS

OF STATISTICAL INFERENCE

compared with animal) experimentation the difficulties are compounded, because the subject's knowledge about the experiment

human

In

and

his

(as

preconceptions about the anticipated effects can also influence the

outcome. Neither the subjects nor the investigator directly involved in the experiment must know how the treatments have been assigned. Medications are given serial code numbers, and the person with access to the

code refrains from any contact with the actual experiment until all the data have been collected. Nevertheless, even this "double-blind" system is not foolproof.

A

drug

may

reveal itself through a side effect such as

drowsiness or dry mouth, which cannot be duplicated in the placebo. If the subject becomes convinced (rightly or wrongly) that he has received a

potent drug, or that he has been given a placebo, his responses conviction to a remarkable degree.

reflect this

interpretations

become extremely

may

Under such circumstances,

difficult.

Suppose now that subjects have been assigned randomly to a treated group and to a concurrent control group, and that proper blind precautions have been taken.

mental

design will

The

validity of the

outcome

in such

an experi-

then depend very largely upon the practical equivalence

of the two groups. Even though the subjects were assigned randomly, the

groups might

still

differ accidentally in

important ways, especially

if

there

A difference in outcome might then only some chance difference between subjects in the two groups, whereas it would be falsely interpreted as a treatment effect. This difficulty can be overcome by using a balanced design. We would perform the experiment in two parts, first using one group as the concurrent control, then reversing the roles of the two groups in what is known as a crossover. The are but few subjects per group. reflect

effect

of any intrinsic differences between the groups would thereby be

minimized.

A crossover experiment for testing the effect of a drug on blood glucose in

human

subjects might be conducted as follows. Subjects reporting for

the experiment

would be assigned

random number

table.

Group

A

to

group

would

A

or group

B by means of the

receive placebo

drug on the second; group B would receive drug on the

on the first

first

day,

day, placebo

on the second. Under these conditions, provided the proper blind

pre-

cautions were taken, a lowering of blood glucose which occurred in both

groups when drug was given and in neither group when placebo was administered could be taken seriously as evidence of a treatment

effect.

more than two groups are involved, the crossover principle becomes more elaborate. The Latin square may be used to ensure that each group If

Some important aspects of experimental design

I I

receives every treatment in a systematic fashion. Latin squares were intro-

duced originally for the purpose of subdividing plots of land for agricultural experiments, so that treatments could be tested even

parts of a field

had various

soil

alphabet are placed in such a

once

in every

row and

though

different

conditions. In a Latin square, letters of the

way

in every

that each letter appears once

column. Suppose we wish to

different treatments (including a control)

on the spontaneous

and only four

test

activity of

We assign the desired number of mice randomly and D. We then choose a 4 x 4 Latin square, such

mice, in a balanced design. to each

group A, B, C,

as the

one below, and assign meanings to the columns and rows, as

indicated.

2

Day

Treatment

I

Day 2

Day 3

1

A

B

c

D

2

B

C

A

3

C

4

D

D A

D A B

C

The square provides the

B

specifications for carrying out the desired

experiment on four successive days. Each group of mice treatment once, and

how

Day 4

easy

we could

it is

all

groups

will

be treated each day.

shall see later

to analyze the data obtained in balanced experiments.

Here

ascertain not only if the treatments differ from each other

from the control, regardless of the day of treatment, but also taneous activity of the mice

and

each

will receive

We

finally if

differs

if

and

the spon-

from day to day regardless of treatment,

treatment effects depend in any

way upon

the day of treat-

ment.

The Latin square or equivalent balancing procedures may be during the course of an experiment as well as in principle of maintaining similar conditions for

its initial

all

useful

planning.

The

groups throughout an

experiment requires that even unforeseen influence^ must be prevented

from acting preferentially upon any particular group. Every investigator can

some unfortunate experience that taught him this particular The following example is fairly typical. An experiment to ascertain

recall

lesson.

whether a particular extract had antibacterial activity required the incubation of agar plates containing bacteria and extract (treatment group)

and of similar plates from which the extract was omitted (control group). 2 There are many different Latin squares of each size. One of these may be selected randomly from the collection catalogued in R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and Medical Research, 4th ed. (New York Hafner Publishing Company Inc., 1953), Table XV and pp. 18ff. :

:

THE LOGICAL

12

The

BASIS

OF STATISTICAL INFERENCE

grow on treatment follow-up experiment employed

bacteria failed to

plates.

A

estimate

plates

and grew well on control

serial dilutions

of the extract to

antibacterial potency. Again, although control plates grew

its

normally, no growth occurred on treatment plates, even after millionfold

however, that no bacteria grew on "treat-

dilution. It finally developed,

ment"

plates,

even when no extract was added.

right to left in a defective incubator

from

plates

had been regularly placed

A

temperature gradient

was wholly responsible. Control

at the left,

where the thermometer was

and where the temperature was correct. Treatment plates had always been placed to the right, where the temperature was too high to support any bacterial growth. A systematic alternation in the placement of located,

control and treatment plates would have revealed the true situation at once.

The placement of animal cages

in a

room might

also

seem unworthy of

special planning. Yet light, temperature, noise, vibration are

many

among

the

conditions that can vary from place to place in an animal room.

Systematic placement of cages according to a Latin square plan helps equalize extraneous influences.

For the same reasons the order

in

which

procedures are carried out with various experimental groups should be

on

varied systematically, in order to balance out any possible influence the outcome.

There are several variations on the balancing principle, some of which are used very frequently.

One such

design

is

the randomized block. Often

one wishes to provide as broad a basis as possible for generalization of an experimental conclusion. For example,

it

may

be worthwhile to examine

treatment effects in several strains of animal rather than limiting the

experiment to a single

strain. In that case

randomly mixing animals of a distinct group.

The

latter is

by

may be

fetal

strain as

on

all

the animals taken collectively,

gained about strain-specific differences in the

number of chemicals for their malformations when administered to pregnant

Suppose one wished to produce

and keeping each

far the better procedure, for without

losing any information about effects

information

one has to choose between

different strains,

test a

effects.

ability to rats.

The

entire experiment will be replicated in several "blocks," each block consisting

of animals belonging to a single

chemicals are

made randomly

(W, X, Y, Z) and

five

each

litter,

The assignments of

the

within each block. If there were four strains

chemicals (A, B, C, D, E) under

each chemical would be The data would be observations on in all, then

strain.

test,

and 60 animals

tested in three animals of every strain.

the

number of malformed young

and they could be tabulated on the following grid

in

3

:

Some important aspects of experimental design

1

Chemicals

A

Strains

D

C

B

E

Total

W X Y Z i

Total

Each box of the table

will

contain three observations, there will be 12

observations on each chemical, and 15 observations on each strain. Such a design

differences reality

because

is efficient

experiment, and

it

it

answers a number of questions

in a single

tends to improve the reliability of the results because

between

strains

can be taken into account in assessing the

of any differences between chemicals.

In a factorial design combinations of treatments are examined. Several levels

of one factor are criss-crossed with several levels of another. For

example, an anticonvulsant drug (factor A) might be studied at four different doses (including a zero-dose control) in animals

vulse

may

by three

different procedures (factor B).

then be divided equally and randomly

The

among

made

to con-

available animals

the twelve combi-

nations of the factors, as in the following grid Factor

A

Doses of Anticonvulsant Drug Factor

B

Convulsant Procedure

1

2

3

Total

a

b

c

Total

:

THE LOGICAL BASIS OF STATISTICAL INFERENCE

14

Here each box

will

contain one or

that each level of factor

and vice

factor £,

A

is

more animals, and the design ensures

tested in

combination with every

may

versa. Factorial designs

level

of

permit one to assess in a

single experiment not only the primary effects of the levels of each factor

independently, but also the joint effects of the combinations.

A nested (or hierarchical) design is

one

which the factors do not

in

cross, but rather are present in various tiers,

one.

An example would

criss-

each contained within a higher

be a comparison of the accuracy and precision of

blood-cell counting in several clinical laboratories.

Suppose portions of

same blood were coded appropriately and then submitted to three laboratories, each employing a number of technicians. It might be of the

interest to

know

laboratories

technicians in the

what extent the counts obtained

to

agree,

in the different

whether or not the results obtained by different

same laboratory

differ in

any systematic way, and also

whether or not each single technician obtains a reproducible count in replicate trials with the

same blood. Here the lowest

tier

contains "repli-

cates within technicians," the next contains "technicians within labora-

and the highest contains the three laboratories. The data might be

tories,"

five replicate

counts by each technician, and they would

fall

into the

following hierarchical tabulation

A

/Laboratories

!

Tiers ^

Technicians

a

I

b

d

c

I

e

f

The methods of analyzing data design will be described in Chap.

in these 2.

==

and other types of experimental

The designs described here

are merely

most commonly encountered, but much more elaborate

ones are also used. Detailed information about this subject in the references

h

g

ill = = = 11 = 111 = = ==

Replicates

the simplest and

C

B

may

be found

beginning at page 192.

Whether or not the reader ever has

to set

up an experiment embodying upon to read

the design principles outlined here, he will certainly be called

and interpret published reports of investigations

methods were used.

It

in

which

cannot be emphasized too strongly that

statistical

statistical

procedures yield valid conclusions only for adequate experiments. the

first

satisfy

steps in evaluating a report of an experiment

is

One of

therefore to

oneself that the principles of preliminary randomization

and

Some important aspects of experimental design

subsequent control were followed. Only then

whether or not treatment had any

15

appropriate to inquire

is it

effect.

SAMPLING DISTRIBUTIONS Experimental observations variable, but variability

Any

tation.

is

single experiment

special

prominent

is

experimen-

in biological

necessarily of finite scope, yielding a

same experiment were repeated,

limited sample of data. If the different set of data

of science are to some extent

in all fields

especially

a

somewhat

would generally be obtained, so we cannot attach any

importance to a particular sample. Rather do we regard each

sample of data as having been drawn randomly from an collection of similar data that

happen not

infinitely large

to have been included in the

sample. This hypothetical infinitude of data, of which the sample representative,

we

call

Data that characterize a sample are known as statistic is

the

mean 3

DNA

sea-urchin eggs. Another statistic

we

An example

statistics.

of a

content of 100 randomly chosen unfertilized is

number of mice

the

which are paralyzed by a given dose of drug. case that

is

a population.

Now

in a

group of 25

almost always the

it is

are interested in populations, not in samples. This

is

merely

another way of saying that experimental observations are useful to the extent they have general relevance.

What we wish

to

know

is

the

DNA

content of unfertilized sea-urchin eggs in general, not the content of the

we

are interested in

to mice in general, not to a particular

group of 25 mice.

eggs chosen for a particular experiment. Likewise

what a drug does

Numbers

that characterize a population are

Parameters corresponding to the sample

mean all

DNA

content of

all

known

as parameters.

statistics just cited

unfertilized sea-urchin eggs,

would be the

and the percent of

mice that the given dose of drug would paralyze. The only way we can

obtain information about populations

Sample

statistics

are

then used

population parameters. That

is

why

as

is

to

make

observations on samples.

estimators

of the corresponding

the procedures of randomization and

control discussed earlier are so important; they ensure that the samples

we

deal with will be truly representative of the populations from which

they were

drawn and therefore

that the statistics

estimators of the parameters in which 3

A mean

be defined

is

we

we obtain

will

be

fair

are really interested.

an ordinary arithmetical average. There are other kinds of averages, to

later.

THE LOGICAL BASIS OF STATISTICAL INFERENCE

16

Suppose that the true mean weight of all students of a given age is 165 lb. We shall weigh randomly chosen groups of students, compute the mean weight in each group, and see how well these sample means estimate the true mean. Let us begin with the smallest possible group, a sample of

one (N

=

namely the individual student. Each such weight

1),

unbiased estimate of the parameter, 165

lb,

because

it is

is

an

just as likely to be

high as low, and the long-term average of the estimates will approach the value of the parameter in question. But obviously the weight of an individual student

is

not

likely,

except occasionally, to be 165 or even very

close to 165.

Now

consider groups of ten students (TV

each sample

=

again an unbiased estimate of the

is

The mean weight in population mean because

10).

in the

long run the fluctuations above and below 165 will balance each

other.

Each sample of 10

below

165, so the

is

almost certain to contain weights above and

mean of such

was the weight of a randomly

a sample

is

likely to

be closer to 165 than

selected individual student. It should there-

fore be evident that a statistic estimates the corresponding parameter ever

more

accurately as the sample size increases.

approached the

become

size

of the population (N -*

At the extreme,

oo), the

sample

if

a sample

statistic

would

indistinguishable from the population parameter.

Besides sample

determines

how

the

size,

amount of

variability in a population also

accurately a statistic will estimate the corresponding

parameter. Clearly,

if

no student's weight

than a pound or two, then even

statistics

differed

from 165

be pretty good estimators. For any specified sample will better estimate the

lb

by more

from very small samples would

parameter, the smaller

is

size,

the statistic

the variability in the

population from which the sample was drawn.

These generalizations about are not precise

how

enough

to be useful. In each instance

well a statistic estimates

example, that we want to

young adults venient these

after

number of

and parameters are

statistics

its

know

to

know just

corresponding parameter. Suppose, for

the

an overnight

true, but they

we need

mean blood

fast.

We

glucose concentration in

could randomly select a con-

subjects (let us say 20),

draw blood, and determine

blood glucose concentrations. The mean concentration in the

sample

will

then be an unbiased estimate of the true blood glucose con-

centration of

all

similar people under the

of the sample data will also

tell

same conditions. The

us something about

glucose levels are in the population. If

it

how

variability

variable blood

should happen that

all

the sample

values are in the range 0.90-1.10 mg/ml, then probably most of the

Some important

population values are also in this range. population

mean might

it

could be

intuitively. If the

1.37.

population

17

can imagine then that the

well be something like 0.93 or 1.02, but

be very reluctant to believe

can be grasped

We

aspects of experimental design

we would

The general line of reasoning mean were 1.37, there would

have to be values above as well as below this. Under such circumstances it would be very very improbable that we should draw a random sample of 20 subjects, all of whose blood glucose concentrations were lower than

we did draw such population mean is as high

we reject the hypothesis that the as 1.37. If we consider, one by one, a series of such hypotheses about the population mean that it is 1.36, 1.35, 1.34, and so on we will eventually come into a region of 1.10.

Since the truth

is

that

a sample,

—

—

acceptable hypotheses. For example, the hypothetical value 1.09 would

probably be regarded as acceptable, as would other values covered by the sample data. At a region of rejected hypotheses. ties

still

in the

range

lower values we would again enter

The process of

inferring

from the proper-

of a sample, within what range a parameter probably

lies is

known

as

estimating a confidence interval (or confidence limits) for that parameter.

Assume now concentration

is

that in a

we somehow know what the mean blood glucose control population, and we conduct an experiment to

ascertain whether or not a drug lowers this concentration.

We

shall

obtain data from a treated group of subjects and compute the sample

mean. Using these sample data we can then estimate a confidence interval for the

mean of

the treated population.

Now

if this

senting the range of probable values of the true lies

below the known

entire interval, repre-

mean

after treatment,

control mean, then we can conclude that the drug

has a significant effect in reducing the blood glucose concentration. the other hand,

if

the confidence interval for the

mean of

On

the treated

population includes the control mean, we would be unable to conclude that the drug

had any

effect, since

the sample data are not inconsistent

with the hypothesis that the treated sample was drrwn from the control population, and that the observed sample

of the

known

We

is

a reasonable estimate

control mean.

The foregoing discussion assumed a meter.

mean

rarely

knowledge about a para-

have such knowledge. Often we have two experimental

groups, control and treated, and

ment has had an

priori

effect.

This

is

we want

to

know whether

or not treat-

tantamount to asking whether or not both

samples could reasonably have been drawn from the same (untreated) population.

even

if

Of

course,

we expect

the sample

means

to differ

both samples did come from the same population. The

somewhat

real

question

THE LOGICAL

18

is

BASIS

OF STATISTICAL INFERENCE

whether the difference between the two sample means

is

so large that

we

compelled to reject the hypothesis that they are estimates of the same

feel

parameter.

Here the required confidence

interval

for

is

a

difference

between two parameters. Suppose the mean blood glucose concentration in the control sample is 1.00 and in the treated sample 0.89. There is an apparent difference of —0.1

The

ence. (for

1,

which

our best estimate of the true

is

example) -0.26 to +0.04. Since a zero difference

confidence limits,

we cannot be

we cannot

assert that the

was any

certain there

is

The

effect.

—

effect,

because

between control and

treated blood glucose concentrations. Yet neither can

drug might have had some

included in these

drug had any

real difference

differ-

might be

confidence interval for the true difference, however,

we deny

that the

confidence interval gives the probable

was any lowering of blood

limits of

magnitude of the

sugar,

was probably no greater than 0.26 mg/ml, and there might even

it

have been a small

rise

effect

i.e., if

of blood sugar, no greater than 0.04 mg/ml, which

the sample accidentally failed to detect. fidence interval

there

had not included

On

the other hand,

we could have

zero,

if

the con-

asserted the efficacy

of the drug and stated the probable quantitative limits of

its

effectiveness.

In discussing the meaning of confidence limits and the rationale for

deciding whether or not observed effects are to be taken seriously,

made

use of rather vague words like "probably" and "reasonably."

said, for it

example, that

was not "probable"

mean. To get

if

mean lay outside we would have observed a

a population

that

at a precise definition

of such terms

we

We

a certain interval, particular sample

we need

information about the probability of drawing various sample

concrete

statistics

by

chance from populations with specified parameters.

The

probability of an event

that event, relative to

fraction

all

is

the long-term frequency of occurrence of

alternative events.

It is

expressed as a decimal

between zero (the event never occurs) and unity (the event

always occurs and no alternative event ever occurs). Sometimes a probability is

known

a priori, as in penny-tossing, where

long run heads and is

tails will

each

fall

we know

that in the

with relative frequency 0.5, and this

therefore the probability that any particular toss will produce heads.

Sometimes a probability can only be estimated empirically by observing the relative frequency

A all

toward which the

sampling distribution

is

results of a great

many

trials

converge.

a graph showing the probabilities of obtaining

possible statistics in samples

drawn randomly from a

specified

popu-

a histogram depicting the expected sampling distribu-

lation. Figure 1-1

is

tion for throws of

two

dice,

computed a priori on the assumption

that each

Some important aspects of experimental design

face has equal opportunity to be uppermost.

outcomes are given on the horizontal

The

19

eleven discrete possible

axis, the probability

of obtaining

shown on the vertical axis. Figure 1-2 shows an expected sampling distribution of mean weights in samples of 10 students from a hypothetical population whose true mean is 165 lb. In order to plot this sampling each

is

Sampling distribution for throws of two dice.

Figure l-l

6

5

*

7

8

Sum Of The Numbers On

distribution an assumption also

9

10

Both Dice

had to be made about the

the population. In contrast to Fig.

1-1,

this

variability in

sampling distribution

is

a

continuous curve, since here sample means are not limited to discrete integer values but

may assume any

intermediate values as well. In the

next chapter the origins of such sampling distributions as are depicted in these figures will be considered

more

closely.

observe that a sampling distribution shows

drawn sample meter

it

statistic will

estimates. Thus,

if

how

For the present we may likely

it is

that a

randomly

deviate to any given extent from the para-

we have

the sampling distribution will

a particular sample statistic in hand,

tell us the exact probability of its having been obtained randomly from a hypothetical population with a given

THE LOGICAL BASIS OF STATISTICAL INFERENCE

20

parameter. That probability, as

judgment

we

shall see,

then becomes a basis for

sample was actually drawn from the

as to whether or not the

hypothetical population. Figure 1-2

Sampling distribution for mean weights in samples of 10 from a hypothetical population with mean 1651b.

150

165

160

155

Mean Weight

Example

From The

the sampling distribution less

total area

desired area

Example

10

left is

1-3.

shown

in Fig. 1-1, estimate the probability

of

with two dice.

under the histogram represents the sum of the probabilities for

possible outcomes,

area to the

Sample Of

Probability of an outcome.

1-2.

throwing 4 or all

In

175

170

of

5.

i.e.,

unity.

The

probability desired here

is

the fractional

Since the bars of the histogram are of equal width, the

proportional to the heights, 0.03

+ 0.06 + 0.08

=0.17.

Probability of obtaining an extreme statistic.

Estimate from Fig. 1-2 the probability of drawing, from the hypothetical population, a

random sample of

10 with

mean weight

outside the limits 160-170

lb.

Here

can be seen that about two-thirds of the area lies within the stated about one-third outside. So the required probability is about 0.34. Since the curve is symmetrical, there is probability 0.17 of obtaining a sample it

interval,

mean

less than 160, and 0.17 for obtaining one greater than 170. If we actually weighed many such samples, about two-thirds of the means would, in the long run, be between 160 and 170. It would certainly occasion no surprise, however, if

as

a sample were selected randomly and 1

7 of every

1

00 sample means

will

its

mean turned out

exceed

1

70.

to be 171,

inasmuch

Statistical hypotheses

Example

1-4.

and decision

21

rules

Deviation corresponding to a given probability.

You would

certainly be surprised

then found that

its

mean

if

you chose a

95 out of 100 similarly selected sample

high would the sample

mean have

statistics.

random sample and mean by more than did

single

deviated from the population

how low

Approximately

or

to be in order to occasion such surprise,

assuming the sampling distribution of Fig. 1-2? Here we wish to know the interval of mean weights that includes 95 % of the total area under the curve. The correct answer is 155-175, so a sample mean outside this range would indeed surprise us. The answer could have been found by actually measuring the area but it is more readily obtained from a special table of areas that will be described in the next chapter.

STATISTICAL HYPOTHESES

AND DECISION RULES

Imagine that you are shown a penny, heads up, and are asked to decide, without examining

it,

You What is

way

whether

headed one.

are permitted to see the

you

the best

like.

penny

tails

will fall tails

it

tossing

it

an ordinary penny or a two-

it is

outcome of

to proceed? Naturally,

sooner or

later,

as

many

so the simple answer

as long as possible. If tails appears, the

problem

tosses as

an ordinary

if it is

is

is

to keep

solved. If

does not appear, however, you will have to decide sooner or later

that the

penny

is

two-headed.

No

you may be wrong; the very next

matter when you

toss

make

this decision,

might conceivably have

But obviously, the longer you wait the more certain you

will

fallen tails.

be of making

the right decision.

The problem would be more

interesting,

as an analogy to experimental situations, to each toss,

action

if

and somewhat more there were

some

realistic

cost attached

and some penalty for a wrong decision. Your course of

would then

certainly be determined

the cost of continued tossing

by these new contingencies.

was low, and the penalty was high

If

for

wrongly concluding the penny was two-headed, you would wait to see a If, on the other wrong decision a

long succession of heads before arriving at any conclusion.

hand, the cost of tossing was high and the penalty for

was low, you would terminate the

series early,

perhaps after a very few

tosses.

In this case, since a priori probabilities for the behavior of a true

are

known,

it is

easy for us to arrive at decision rules.

formulate a hypothesis which

is

The

to be accepted or rejected

sample data. Here the hypothesis to be tested

is

that the

first

penny

step

is

to

on the basis of

penny has two

THE LOGICAL

22

OF STATISTICAL INFERENCE

BASIS

heads and

different faces,

headed. The next step

tails;

the alternative

to decide

is

how

probability will

is

known

be rejected

;

—

two-

we

are

what probability we are the penny two-headed when it is not. This

designated by the letter P.

it is

is

i.e.,

which the hypothesis

as the level of significance at 4

As we have noted

already,

choice of the level of significance.

will influence the

both cost and penalty

penny

that the

often, in the long run,

willing to reject the hypothesis wrongly willing to accept for calling

is

Suppose we are willing to be wrong as often as five times in every hundred 5 Then we can reject the hypothesis if trials, but no more often (P ^ 0.05). and only

if

outcome

a sample

is

pected with probability 0.05 or

observed, which would have been ex-

less,

were the hypothesis

Let us consider the probability that the specified toss of a true penny. first tails 1

to

appear

/2 ) and then

tails at 2)

first tails 3

(Vi)

appearing

C/ifi C/2)

*

2nd

Obviously toss,

this

heads

is

is

/2 for the

appear

1st toss.

required at the l

the 2nd (probability

1

at

any

For the

1st (probability

/2 ), giving a combined proba-

for both required events. Similarly the probabilities of the

C/2 )/C/

bility

at the

in fact true.

first tails will

and higher

at the 3rd, 4th, 5th,

5 ^

etc

-

tosses are

found to be

These probabilities are plotted as a histogram

in

Fig. 1-3.

Now 5th toss

on the

the probability that the is

given by unity less the

1st,

sum of the

of

all

sum of the

not appear sooner than the

probabilities for

2nd, 3rd, and 4th tosses; or (which probabilities for

tosses; or (which

histograms

first tails will

at,

is

also the

and to the

same

-

[(V2)

+

which exceeds our desired

the

same

5th, 6th,

appearance

thing) by the

and

all

higher

thing) by the area contained in the

right of, the 5th toss, relative to the total area

the histograms. This probability

1

is

appearance on the

its

its

2

is

3

readily found to be

+

4

=

C/2)

+

level

of significance. The shaded area, on the

(V2)

(V2)

]

0.0625

other hand, including the 6th and higher tosses, represents a total probability

0.03125, well below the desired level. Thus, since the probability of

getting four heads in a

row and then

tails

on the 5th

toss exceeds 0.05,

we

would not consider that outcome incompatible with the hypothesis of a true penny; whereas getting five heads in a row would happen rarely

enough with a true penny

to

make

us reject the hypothesis.

The

decision

4

Rejecting the hypothesis when it is true is referred to as a Type I error. The proba(designated by the Greek letter a) of committing a Type I error is obviously the same as the level of significance (P). The two symbols will be used interchangeably. bility

5

means "greater than";

^ means

"equal to or

less

than."

Statistical hypotheses

must therefore be: Accept the hypothesis

rule

reject

if

it

no

appears within

tail

five tosses.

if

any

and decision

tail

rules

23

appears, but

For two-headed pennies

this

rule will be free of error, but the honesty of ordinary pennies will be

impugned wrongly about

3

times in every 100

Figure 1-3

Expected

trials.

outcomes of penny-tossing with a

true penny.

o CL Q.

*

*

2

68

QUANTITATIVE DATA

For the within-samples SS: 2 Z (x m - x m = Z x m - 2 X x m x m + X N 2

)

=

X

2

2 Z X m ~ 2 £ *m*m + Z ***i

= Yx

2

—

-T x

m 2

Z*In practice, the within-samples SS

2

-I%

is

m

usually not calculated, but taken as

the difference between the total SS and the between-samples SS.

how,

algebraically, the

components of variation add up

Note

to the total SS.

Between-samples

^N

m

N

Within-samples

Tm 2

^-ltN 1

m

2 Y ^x

Total

It

r2

N

should be clear from the foregoing that the only terms required for an

analysis of variance are

x

2

T2

N

Means do not have to be calculated; they are implicit in the procedures. In more complicated analyses, when samples are grouped according to more than a single criterion, one has to calculate a different betweensamples SS for each grouping of samples, but otherwise the procedure

is

the same.

Example 2-14 may now be worked by the direct method. and systematic procedure is first to prepare a preliminary squares.

16

16 This is

A

convenient

table of total

Here we have, the step-by-step

method suggested by J. C. R. Li. As one becomes more some of the preliminary steps can be omitted.

familiar with the calculations,

:

69

Analysis of variance

Grand

T 2 = ( - 20) 2 =

total

Samples

£

Tm 2 = ( - 22) 2 +

400 2

=

(2)

488

Observations

£x = (-7) 2

2

+ (-4) 2 +

•••

+(_i)2 =

i4 8

PRELIMINARY CALCULATIONS Number of

Number of

Total of

Type of

Total of

Items

Observations per

Total

Squares

Squared

Squared Item

Squares per Observation

10

40.0

Grand

400

Samples

488

2

5

97.6

Observations

148

10

1

148.0

From

the preliminary table,

we then compose

(9 2 (I x )

the SS, as explained

ANALYSIS OF VARIANCE

Sexes (samples)

Error

Total

The

F-test in a

DF

SS

Source

Variance Estimate

97.6-40.0=

57.6

1

57.6

148

-97.6=

50.4

8

6.3

148

-40.0=108.0

two-sample comparison

F=9.14

is

really identical to a /-test.

with (TV- 1) 1, (N — 1) DF is numerically identical to t DF. Thus, if Example 2-14 is worked as a Mest, one finds t = 3.02, which is the square root of 9.14. Of course, examples of this type can also be worked by the nonparametric two-sample rank test. The usefulness of the analysis of variance becomes evident in more complex experimental

Indeed,

2

F with

designs, as illustrated

by the following examples.

QUANTITATIVE DATA

70

Example 2-15.

Anal/sis of Variance:

One-way

classification,

many-sample comparison.

The progress of wound-healing was compared when five different postoperative regimens were employed after abdominal surgery. The 30 patients in the study

were randomly assigned. The numbers below are coded data on duration of the wound-healing period. Postoperative Regimen

A

Tm (Tmf

= =

D

C

B

E

3

4

2

6

8

5

7

3

3

2

5

6

4

5

4

2

6

3

5

5

4

9

3

2

6

5

7

5

4

6

24

39

576

1,521

20

25

31

400

625

961

S x* = 739 r=i39 T2 =

19,321

PRELIMINARY CALCULATIONS Number of Type of

Total of

Items

Total

Squares

Squared

Number of

Total of

Observations per

Squared Item

Squares per Observation

Grand

9,321

1

30

644.0

Regimens

4,083

5

6

680.5

739

30

1

739

Observations

ANALYSIS OF VARIANCE

Regimens

680.5

-

644

Error

739

-

680.5

739

- 644

Total

F= 9.13/2.34 = 3.90 p = 0.05, variance

so

(4,

we conclude

itself,

25

Variance Estimate (Mean Square)

DF

SS

Source

= 36.5 = 58.5

4

9.13

25

2.34

=95

29

DF) and

this

exceeds the

that the regimens

do indeed

critical differ.

value 2.76 for

The

analysis of

however, does not permit us to decide which pairs of regimens

differ significantly

from each other and which do

not.

71

Analysis of variance

For making simultaneous comparisons between several

we wished

as

to

do

in

Example

different

2-15, the student ized range test

is

means, used.

17

number of samples being compared and the (i.e., the number of samples times one number of items per sample) we obtain a preliminary factor

Consulting Table

8,

for the

within-samples degrees of freedom

than the

less

which

k*,

minimum between means may

then multiplied by a standard error term to obtain a

is

which the actual ranges

significant range (A), against

be compared. The procedure

is

very similar to that for obtaining a confi-

dence interval (p. 45).

where k*

taken from Table

is

analysis of variance,

Nm

is

samples should be of equal

Example 2-16.

Ve

8,

the

the error-variance estimate from the

is

number of observations per sample.

All

size.

Studentized range test.

Apply the test to the data of Example 2-15 to ascertain which regimens differ from each other at the 5 % significance level. Here we consult Table 8 with 5 means and 25 DF [i.e., (N m — 1)DF per sample] and interpolate k*=4.\6. From Example 2-15, fe = 2.34. Then, since each sample contains 6 observations, f

the smallest significant range at the

Now we

It is

find each

mean

as

5%

level of significance.

arrange them

in

order of magnitude:

C

A

D

E

B

3.33

4.00

4.17

5.16

6.50

then apparent that of

differ

T m /6 and

2 34

all

the contrasts between pairs of means, only

by an amount as great as

superior to B, but

we cannot

k.

We may

assert that

B and C

therefore conclude that regimen

it

is

C

is

superior to A, D, or E, or that

there are any other real differences. 17

"Studentized" refers to the tabulation of a statistical distribution based on a variance v, derived from a sample, when a is unknown. Such procedures were introduced by W. S. Gosset, under the pseudonym "Student." The /-test, which is also based on a small-sample variance estimate, is often called "Student's /-test." Although we shall illustrate the studentized range test in Example 2-16 upon the same data as was analyzed by the F-test in Example 2-15, it should be understood that the estimate,

two

tests are alternatives,

not ordinarily applied sequentially to the same data.

:

QUANTITATIVE DATA

72

In this particular example the studentized range test allowed us to

conclude that the extreme samples differed significantly.

It

may happen

that even though analysis of variance permits the conclusion that a set of

means did not come from

the

same population, the studentized range

nevertheless does not reveal any particular pair of to differ. This test

may

is

not be

efficient

enough

tude. In the following example, test

means

no more surprising than any case

makes possible

in

test

that can be said

which a particular

to detect a real difference of given

magni-

on the other hand, the studentized range

several discriminations

way

Example 2-17. Analysis of variance: One variance and studentized range test.

among

the sample means.

classification;

many-sample analysis of

Four different media were compared to see if they differed in supporting the growth of mouse fibroblast cells in tissue culture. Five bottles were used with each medium, the same number of cells were implanted into all 20 bottles, and the total cell protein in each bottle was determined after 7 days. The results were as follows (/Ag of protein nitrogen)

Medium

C

D

E

100

101

107

100

119

100

104

103

96

122

99

98

105

99

114

101

105

105

100

120

100

102

106

99

121

A

The data

are

first

B

coded by subtracting 100:

A

B

c

1

7

~N m

E 19

4

3

-4

-1

-2

5

-1

1

5

5

2

6

-1

21

10

26

-6

96

__Trn X

D

2.0

5.2

100

676

22 14

20

-1.2 36

19.2

9,216

T=126 72

=

15,876

73

Analysis of variance

PRELIMINARY CALCULATIONS Number of

Number of

Type of

Total of

Items

Total

Squares

Squared

Observations per Squared Item

Grand Media

Total of Squares per Observation

15,876

1

25

635.0

10,028

5

5

2,005.6

2,096

25

1

2,096

Observations

ANALYSIS OF VARIANCE Variance Estimate {Mean Square)

DF

SS

Source

Media

2,005.6 -

Error

2,096

- - 2,005.6

=

90.4

20

4.52

2,096

--

=

1,461.0

24

—

Total

635.0=1,370.6

F= 342.6/4.52 = 75.8 4.43 at

P = 0.01,

= 5.29.

(4,

20 DF), which greatly exceeds the tabulated value

From Table

test:

D

1

A

-1.2 see that

medium E is superior by more than k. C

closest neighbor

No

P = 0.01,

^452

= 5.29

the smallest significant range at the

to B.

8, at

for 5 samples

and 20 DF,

Then A:

We

342.6

so the media differ very significantly.

Studentized range

k*

635.0

4

%

= 5.03

level

of significance.

B

C

2.0

5.2

E 19.2

to all the others, since is

superior to

A

and

D

it

exceeds even

its

but not necessarily

other comparisons are significant.

Example 2-18.

Analysis of Variance:

Randomized block design, 4 treatments

>

cations.

Each observation

is

the weight of a

mouse

at 3

months

Diet Litter 1

A

B

C

D

20

18

18

21

23

2

19

17

20

3

20

20

17

20

4

22

21

16

23

5

19

19

16

22

after weaning.

5 repli-

QUANTITATIVE DATA

74

Code by

subtracting 20:

Diet

D

A

Litter

1

-1

+2

5

-1

7diets

T

diets

9

-1

1

-3

9

+3 +2

+2 -4

16

9

3

4

litters

-3

+ +3

1

2

T

^litters

-5

-13

+9

25

169

81

4

=T r2 =

8i

PRELIMINARY CALCULATIONS Number of

Number of

Total of

Type of

Total of

Items

Total

Squares

Squared

Observations per Squared Item

Squares per Observation

Grand

20

81.0

275

Diets Replications

4.05

55.0

5

4

39.0

9.75

(litters)

Observations

20

89.0

89.0

1

ANALYSIS OF VARIANCE

MS

DF

SS

Source

= 50.95 = 5.70 55.0-9.75 + 4.05 = 28.30

F 7.20**

Treatments

(diets)

55.0

4.05

3

Replication

(litters)

9.75

4.05

4

1.42

12

2.36

Error

89.0

Total

The

89.0

error SS

is

- 4.05 = do not

differ

unity), but diets do, since 7.20 (3, 12

% significance level. 18

Diet

from one another

DF) exceeds

12

DF at

/>

=

N.S.

all

the others from

(since

F is

less

than

the tabulated value 5.95 at the

D seems to be best and diet C poorest. The student-

more information about 0.01, k* = 5.50. Then

ized range test gives us

and

42

107

73

72

145

the table are identified by letters for easier reference in the fol-

lowing calculations. Here we have four categories: placebo-delay, caffeine-delay, placebo-no-delay, caffeine-no-delay.

It

is

readily ascertained that since

marginal totals are fixed, once data are entered are

by

filled 1

in

any

all

the

single box, the other boxes

automatically by subtraction. Thus, fourfold tables are characterized

DF.

we begin by assuming that the drug were drawn from the same population (null hypothesis). From the pooled data we can then estimate an expectation for each box of the table. The placebo group was 73/145 of the total, and 38 subjects altogether showed a delay. On the assumption that the drug and placebo do not differ, we Since there are no a priori expectations,

and placebo

results

108

ENUMERATION DATA

should then expect the proportion of subjects showing a delay to be the same in both groups. Thus we expect in box a, (73/145) x 38 = 19.1. The remaining expectations can be c

by subtraction from the marginal

filled in

totals: b

=

18.9,

= 53.9, d= 53.1.

\ m the usual way, remembering (since there is only one degree of freedom) to make the Yates correction. In box a for example,

We now

calculate the contribution of each

box

(£-0) = (19.1 -8) = (£-O)c = (E-0)c = 2

^^ E

2

to

11.1

10.6

\\2

= 5.9

The contributions of boxes b, c, d are similarly computed to be 5.9, 2.1, 2.1, and so x 2 — 16.0, P v

^ 200

oo

c

W.

•

(V

Q 150

c o .,_

Q.

.100

o

i

i

i

Micrograms *J.

K.

Kodama and

Allyl

i

i

4

2

8

6

Alcohol

C. H. Hine,

/.

Pharmacol. Exptl. Therap. \2A: 97(1958).

assay of allyl alcohol. This choice line

would be used

wish to will lie allyl

What we

know is the range of optical densities within which the reading when we perform a single determination on a given amount of

alcohol.

Again,

impression, since

Moreover, is

correct because here the regression

is

for the prediction of a single observation.

it

will

however,

the

range

+ S.D.

gives

a

false

only include two-thirds of the observations.

this case presents a further complication. If the regression line

unknown amounts of allyl alcohol, the the unknown from a given optical density

to be used for the assay of

problem

will

be to estimate

reading, and to attach a confidence interval to such an estimate. This

problem of estimating x from y can only be handled properly by means of the procedures developed later in this chapter.

Of

the various shortcuts for graphic representation of the reliability of

correlation data, the most satisfactory

is

that

illustrated

in

Fig.

4-4.

134

CORRELATION

Vertical lines are again used, but to the 95

%

now

the length of each line

confidence interval for the true

Obviously the true regression

line,

mean y

made

equal

at that value of x.

expressing the general relationship

between y and x in the population, must pass through such individual confidence ranges. Actually this cedure, since the confidence

is

is

all

(or nearly all)

a conservative pro-

band formed by joining

all

the upper and

Figure 4-4 Effects of increasing doses of ethanol on electroshock seizure threshold in mice. Two different shock procedures are represented by the two sets of points. Each point is the value determined in a group of 15-25 mice. Vertical lines indicate 95% confidence limits. Both the dose scale and the threshold ratio scale are logarithmic. (Adapted from McQuarrie and Fingl.*)

5000

2000

3000

1000

Dose,mg/kg *D. G. McQuarrie and E. Fingl,

/.

lower extremities of the vertical

band

Pharmacol. Exptl. Therap. 124: 264(1958).

lines is

wider than the true 95

for the regression line as a whole.

Fig. 4-4 there

is little

Thus

in the

% confidence

experiment depicted in

doubt that seizure threshold was increased

signifi-

cantly by increasing dosage of ethanol with one shock procedure, but not

with the other.

MISINTERPRETATIONS OF CORRELATION DATA False conclusions

may

be drawn from valid correlation data.

must be understood that the mere necessarily about cause

and

effect.

First,

it

fact of a correlation implies nothing

True, the absence of correlation

may

lead one to reject a hypothesis that v-effects are caused by x. However,

even the strongest correlation does not of causal relationship.

between the

Or y and x turn have 3

A. B.

sale

correlation

may

permit one to infer a

be wholly accidental, as that

3 of bananas and the death rate from cancer in England.

effects

common

may be

caused independently by other factors that in

causes, as in the correlation over

Hill, Principles

1955), p. 185.

A

itself

of Medical

Statistics

many

(New York: Oxford

years between University Press,

Misinterpretations of correlation data

I

35

the salaries of Presbyterian ministers in Massachusetts and the price of

rum

Havana. 4

in

A

second source of misinterpretation

is

unwarranted extrapolation.

One may never assume without good reason

that a regression line will

extend beyond the limits of the observational data, and quite often not. Figure 4-1 provides a

enzyme life.

is

meaningful, although

increases at a

much

does

activity at a

few

also quite possible that the

slower rate during a longer period of prenatal

Extrapolation in the other direction

of fact

it is

it

perhaps have grounds

back to zero enzyme

for supposing that extrapolation

days prior to birth

We may

good example.

is

clearly

unwarranted as a matter ;

has been shown experimentally that this enzyme activity does not

it

increase further

comment on

beyond that attained

the shortening of the

at 21

Lower

Mark Twain's 5 wry

days.

Mississippi points up the pitfall

of extrapolation in a most entertaining way.

hundred and seventy-six years the Lower Mississippi two hundred and forty-two miles. This is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period, just a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. In the space of one

has shortened

There

is

itself

something fascinating about science. One gets such wholesale

returns of conjecture out of such a trifling investment of fact.

ESTIMATING A REGRESSION LINE FROM SAMPLE DATA

Now

how

us return to a detailed consideration of

let

should be treated to obtain a regression line and Let y be the true value of y at any given value of the true regression line will be j>-axis at 4 D.

x

Huff,

=

0,

How

and b

is

y =

a

the slope.

to Lie with Statistics

+ 6

its

x.

bx, where a

correlation data

confidence limits.

Then

is

the equation of

the intercept on the

The observed ^-values

(New York: W. W. Norton

&

will scatter

Company.,

Inc.,

1954), p. 90. 5

Mark Twain, Life on the Mississippi (New York: Harper also known as the regression coefficient.

6 b is

&

Brothers, 1874), p. 155.

136

CORRELATION

above and below the true already indicated,

to yield a set of deviations (y

line,

we choose

— y). As

as our best estimate, the "least-squares" line,

from which the sum of squared deviations

X (y — y)

2

will

be minimum.

In order to find the values of a and b which will minimize the

squared j-deviations from the

dQ -^ =

line,

we

sum of

set

dQ

„

= -f db

and

da

where

Solving by partial differentiation yields

0=2(Xy-N«-6£x) Na = a

Thus, y at x general

is j>;

mean

=

X -bXx >•

j>

—

6x

words, the least-squares

in other

line passes

through the

x,y of the observations.

Partial differentiation with respect to b gives

= 2£>-a£x-&5> 2) and substituting now for

=

a,

I x>' - j X x +

foe

Xx-bXx

2

£x,.Q>XZ*) 2

XX -XX

A-

y

^2

_ (XX)

_ ss;

An 7

equivalent expression can be obtained

7

which shows somewhat

The equation

2 (* ~ *)(? ~ y) derived by adding and subtracting y x in the numerator and x * in the denomi(•* *) 2 is familiar as the SS term for x, which appears in the numerator of the ^-variance. ( x ~ x){y —y) is a new term, which is the analogous numerator of an expression known as the covariance of x and y, and will be symbolized here by SP X y since it is a sum of products rather than a sum of squares.

X

is

nator.

X

—

X

X

Estimating a regression line from sample data

more

clearly just

what the slope

suitable for computations. (for later use)

£j

2 ,

can

all

represents, but the

The required

above equation

£ x, £ y, £ xy, £

terms,

137

more x and

is

2

be found automatically on a good calculating

machine.

The

sign of b

may

be positive or negative, depending upon whether

^-values tend to increase or decrease as x-values

become

Figure 4-5

a regression line.

Diagrammatic representation of the slope of

larger. If >'- values

varied without any relation to the associated x- values, there would be

no

correlation,

Since

and the true slope would be

we know from

mated regression shall

we

line,

zero.

the foregoing that the point x,y

and

since this point

is

not be concerned further with the intercept,

lies

a.

Substituting a

obtain a more generally useful equation of the regression

y

=

y

+

b(x

-

x)

on the

esti-

so readily computed,

=y—

line,

we bx

38

CORRELATION

Rearranging

this expression,

we have

the reasonable description of slope

as the ratio ^-deviation to x-deviation,

b

y =x

from the point x,y

—

x

as illustrated in Fig. 4-5.

To draw

the estimated regression line, once

its

equation has been

found, plot (x,y), as shown. Then add a convenient amount to x (preferably

an amount as possible) and b times this amount to y, and plot new point (x + Ax, y + bAx) (see Fig. 4-5). These two points determine

as large

the

the line.

The following example

illustrates the full

procedure for calculating an

estimated regression line for the data depicted in Fig. 4-1.

Example

4-1

.

Calculation of a regression line.

The following data were obtained

for the ability of liver slices

from guinea

pigs of different ages to conjugate phenolphthalein with glucuronic acid. Calculate the equation of the regression line.

Age

Millimicromoles

{days)

Conjugated *2

X

y2

y

xy

1

1

5.6

31.4

1

1

8.8

77.4

3

9

5

25

18

324

90

6

36

31

961

186

10

100

38

1,444

10

100

44

1,936

11

121

22

484

380 440 242

14

196

37

1,369

518

15

225

46 54

2,116

690

2,916

1,134

441

21

2> = 97

2*

2 ==

144

12

2> =

1,255

h

x

_

94° 400~

= 8.82,

y

2.4

= 28.8

=

11,803

5.6 8.8

36

l[xy = 3,730.4

Estimating the error variance and the confidence interval for a regression line

Then

the equation of the regression line

y

The

is

= 28.8 + 2.4U-

above example

line calculated in the

Figure 4-6

139

is

8.8)

shown

A

as

Estimated regression

in Fig. 4-6.

line

and confidence limits

for the data of Figure 4-1.

60

/

/

/

B

/

/

/ /

50

/

/

40

/

//

30 CO

/

o

/

E a.

E

/ •A / / A

c o

20

•

/

/

/

/ /

,

//

/

/ // / / / f /

/

/

/

—

/

""

/

>

/

/

//

/ •/

10

B

/

/

/

/

//

/

/

/ »

/

s /

- (L*)(2>)' N

6(SP„)]

Example 4-2.

W=

j

Example

-!^-2.4[3730

4-1

30 691 >

= 49.56 8 Since two parameters (mean and slope) have been estimated, there are {N —2) DF. Division by (N — 2) makes s y z 2 an unbiased estimate of o y -x 2 just as division by (TV— 1) made s 1 an unbiased estimate of a2 for the single-parameter case. 9 Also, note that .

I(y -y) 2 =I[y -y -b(x

=2 (y Substituting b

=

/

(.x

^

-y) 2

—

— x)(y — v) ,

-m

~ 2b I

(

2

x -*)(y -*>

+ b2 1 (*

then gives the result shown.

-*) 2

Estimating the error variance and the confidence interval for a regression line

sy x, the square root of the error variance,

is

.

individual j'-values about the regression line. If

we could

points on the regression line,

use

it

a standard deviation of

were the same at

it

to obtain

+ 1.96

is

demarcated

sy x .

in Fig. 4-6

from the regression

by

95%

parallel lines (B) at vertical distances

(where s y

=

1 1

.

For reasons explained below, such a "confidence parallel lines,

Usually line as a

may

be

is

likely to

we wish

finding a confidence interval for

standard error of a

mean

Example

has

4-3.

(N -

2)

its

y

A

x, y.

limited approach

at x, thus establishing

central region.

in the single-parameter case,

y t

bounded by

to establish a confidence interval for the true regression

true limits for the regression line in

where

interval,"

be too narrow except in the vicinity of

whole, rather than for individual ^-values.

made by

=

7.04) and x N /49.56 points of the experimental sample.

line

these lines are seen to include all

all

an approximate

confidence interval for individual >>-values. Such an approximate interval

141

By analogy we have

to the

^

fc

=

=

y±

t(sy)

DF.

Confidence interval for y at x.

Calculate the 95

% confidence interval for y at x from the data of Example

4-1.

From

the data,

j

= 28.8

Syx 2 = 49.56 --

49.56 C_2

=

:

4.51

11

sy= V4i5l = >?

Consulting Table 5 at 9 ±2.26(2.12)

=

DF

2.12

= 28.8±f(2.12)

and

P = 0.05, we

±4.8. Then, at x,

y

= 24.0

to 33.6

find

t

= 2.26,

so the limits are

142

CORRELATION

may

Lines parallel to the estimated regression line limits of

y

at x, as

shown by

be drawn through these

C

the short solid segments of lines

in Fig. 4-6.

In the region close to x, y these will include the true regression line 95 times out of 100. However, such parallel confidence limits cannot be extended beyond the immediate vicinity of x, y.

^-values about y remains the

The main reason is that even same at all x-values, there is

uncertainty about the true slope of the estimated regression

doubt about the true position of the

line

is

the variance of

if

considerable

still

line.

The

resulting

very small near x, y but becomes

greatly magnified with increasing distance along the line.

Accurate confidence limits

will therefore

be represented by curves that

are convex towards the estimated regression line, the confidence interval

becoming wider with increasing distance from

This

3c.

is

represented by

an equation containing a weighted correction term, (x increases the magnitude of s y

2

Sy

_ ~

2

at increasing distance

Syx

N

+

x,

x)

x)

2 ,

which

from x:

:

£(x-x)

Only the correction term changes with terms are already known. At

-

(x

1

2

—

where {x

different x-values; the

—

x)

2

=

0, the entire

remaining expression

reduces to that already used for calculating confidence limits of y at

Example

4-4.

Calculate the

95%

confidence limits of the true regression line at several

representative values of x, for the data of

At x

x.

=

1

5,

Example

4-1.

for example,

^(15-8.82) ao^\ — + = 49.56 1

Sy 2i

|_11

2

]

400

= 9.24

Then

95%

limits are ±2.26(3.04)

=

±6.9 from the estimated

line,

as

com-

pared with ±4.8 at x. Similar computation at other x-values leads to the biconvex curves C in Fig. 4-6, which will include the true regression line 95 times out of 100; they should not (and do not) include 95

%

of the individual points.

143

Confidence interval of a slope

In summary, a confidence interval for the true regression line

found

A

two ways.

in

purposes

rough approximation that may

to estimate a confidence interval for

is

upper and lower

limits of

squares regression line

y thus obtained,

may

y

at

3c;

be drawn. These

to find confidence limits of

is

pertinent range of observations; this

some

through the

lines parallel to the least-

approximately bound

will

the true regression line in the immediate vicinity of x,y.

method

may be

suffice for

y method

at several will

The accurate

x throughout the

yield

an hourglass-

shaped area, narrowest at x,y and flaring out at a distance, which accurately defines the confidence interval of the true regression line.

CONFIDENCE INTERVAL OF A SLOPE It

has already been pointed out that

if

would be no correlation and the

relation to the associated x-values, there

true slope,

/?,

would be

data might yield a

or negative slope

finite positive

determining with a

r-test

real or

is

whether b

by calculating confidence

ascertaining whether zero

sampling variance of b

random sample of estimate, b. The question

zero. Nevertheless a particular

whether an apparent correlation

better yet,

j-values varied without any

is

is

not must then be answered by

differs significantly

included in these.

It

2 s

and then,

~Z(x-x) 2

as usual,

t

J

b

(*"/*)

~P) =

1

2

(I-f

N and the confidence

interval for

P

where

t

has

(N -

2)

DF.

/?

=

is

given by

t(s

b

±

y x) .

Vss x

zero, or /?,

and

can be shown that the

given by the following equation:

Sb

from

limits for the true slope,

CORRELATION

144

Example

4-5.

Calculate the 95

%

confidence interval for the slope of the regression line of

Fig. 4-6.

2.26(7.04) 1

1,255-^ 11

= 2.4 ± 0.8 = Since the limits

1.6 to 3.2

do not include

zero, there

is

a real positive correlation between

age and enzyme activity and the true slope

is

not

3.2,

both statements made at the

5%

level

less

than

1.6

nor greater than

of significance.

SIGNIFICANCE OF A DIFFERENCE BETWEEN TWO SLOPES By analogy Student's

/

to the test of a difference between

may

cantly. In other

be used to

words

upon which two estimates, b

whence

t

and

may be

test

whether or not two slopes

this is a test

b'

.

A

2)

is

10

calculated in the following equation:

=

—

b-V 1

consulted at the desired level of significance, with

DF, and

two slope

pooled error variance has to be computed,

1

—

differ signifi-

of parallelism. Data will be available

different regression lines are estimated, with

t

Table 5

two sample means,

parallelism

is

rejected if the critical value of

/ is

(N —

2

+ N'

exceeded.

CORRELATION COEFFICIENT The absolute magnitude of a slope obviously depends upon the particular units used on the x and y axes, just as the absolute magnitude of a standard deviation depends upon the units of measurement. We can 10This

same

is

legitimate only

if

the

two estimates of error variance are substantially the compare two sample means, (p. 52).

just as in pooling variance estimates to

Correlation coefficient

ascertain whether or not a given slope

is

significant,

but there

decide from the value of b alone whether a correlation

We

were able to express standard deviation as

(p. 36)

by relating

it

to x,

is

145

no way

is

to

strong or weak.

a coefficient of variation

and thus to obtain a comparative measure of

the relative homogeneity of data from different normal distributions.

Here the problem

is

We

similar.

wish to have a measure of slope which

independent of any particular units of measurement, and which

is

will

indicate the strength of correlation for any array of data in comparable

terms.

The

— y) 2 The

X (y

-

(cf. p.

may

strength of correlation

variance (or SS) that

is

be defined as the fraction of the total

due to regression. The

SS due to regression

is

total

SS

given by

is

the total SS less the error SS, or

140), '

Regression SS

=£

(y

-

2

y)

[(

- (^ (y - yf -

"j 03

(

*^ g ^

}

Then Regression SS

=

Total SS to

which we give the symbol

r

2

E*yr

[(x

—

N

ratio of

2

(ly)

Ly

may

r

(ss,xss,)

]

N

gives the strength of correlation directly in terms of a

two variance components,

square root, r

(SP X ,) 2

J ,

2

2

(I*XI»1 2

r

y)~\

Simplifying, for ease of computation,

.

(L*) irYv 2 Although

—

I(x-x) 2 -lO>-*) 2

2

N

x){y

(known

it is

nevertheless customary to use the

as the correlation coefficient) instead.

vary from zero (no correlation) to

—

1

or

+1

The value of

(perfect negative or

positive correlation). It

can be shown from the above expression for

r

2

that r

is

very directly

related to b,

Thus

r is really

the slope of a universal regression line, plotted on trans-

x

and

formed coordinates, x and y values being replaced by s

y .

s

It

follows

2

146

CORRELATION

that

all

correlations which are equally strong will have the

relation coefficient, regardless of the apparent differences

on the

slopes of the regression lines based

The

significance of r

may be

original

this

way

is

(N —

5 is entered with

that r estimated

significance of a correlation

Example

data.

2)

2)

DF. The

null hypothesis tested in

from the sample represents a true correlation

coefficient of zero. Obviously,

standardized slope,

cor-

estimated from the expression

r\N 1 -f where Table

raw

same

between the

it

makes no

difference whether the test of

performed on the actual slope,

is

b,

or

on the

r.

4-6.

Compute

r

and

its

significance for the data of

3 ' 73

°-

Example

(97)(316.4)1 ii

[

J

(97)2i r

1,255

4-1.

-^-j-

(316.4)21

11,80311

J

= 0.818 = 0.904 >=

0.818(9) /

yj

Critical

/

= 3.25

at

„, \ 0.182

P-0.01

=6.36

with 9

DF.

is real (P < 0.01) and due to regression.

Thus, the apparent positive correlation since

82%

The

of the total variance

is

distinction between the significance

it is

very strong,

and the strength of a cor-

relation recalls the similar distinction between the significance

magnitude of a difference between means analogous way, a correlation

no

practical consequence.

On

may be

(p. 29).

significant, yet so

the other hand,

it

weak

may appear

yet because of the small sample size or large variability of it

may prove

not to be significant.

and the

Here, in a completely as to be of

to be strong,

measurements

The logarithmic transformation and the log dose-response curve

147

THE LOGARITHMIC TRANSFORMATION AND THE LOG DOSE-RESPONSE CURVE Because linear regression

is

so easy to deal with,

it

is

customary to

transform nonlinear correlations into linear ones whenever possible. For example, x- or j-values

may be

plotted as their reciprocals, squares,

square roots, ratios, or logarithms. The choice of a particular transfor-

mation may have a theoretical basis

enzymology

11 ),

or

may

Lineweaver-Burk plot

(as in the

be purely empirical on the grounds that an

approximately linear correlation

results.

For biological data the logarithmic transformation

Many measurements

yield

are plotted directly but

fit

are used. In

For example, the mean heart

40,

left

most

useful.

the symmetrical normal distribution better

some

cases this can be attributed

to a limitation of the possible range of variation in

Deviations to the

is

skewed frequency distributions when x-values

when logarithms of x-values other.

in

rate in

man

is

one direction or the

about 70 beats per min.

(lower rates) are restricted by a lower limit around

whereas possible deviations to the right

On

may

be

min

to the left (log 35/70

logarithmic scale a deviation of 35 beats per

much

greater.

a

= =

— 0.3) will correspond to a deviation of 70 beats to the right (log 140/70 + 0.3). It is also true that responses to drugs tend to vary proportionately to log dose rather than to dose, so dose-response correlations are routinely

plotted with log-dose rather than dose

An

x-axis.

important type of correlation encountered

biological experiment

is

ment) and the response

in

several

kinds of

that between the dose of a drug (or other treatelicited. If

isolated tissue or single animal, a first

on the

dose

is

increased systematically in an

graded response may be obtained. At

there will be a range of doses so low that

no response

is

manifest.

Then a higher range of doses elicits responses of increasing magnitude, and finally a maximal response may be attained which cannot be exceeded at any dose. If log dose is plotted on the jc-axis and response on the >'-axis, a symmetrical sigmoid curve is characteristically obtained (Fig. 4-7) whose central portion is nearly linear. This means that, over a considerable range of doses, increasing the dose by constant multiples causes equal linear increments of response.

Figure 4-7 shows three different ways of plotting dose on a logarithmic basis.

U J.

The upper S.

&Sons,

Fruton and

scale

S.

shows actual doses, spaced so that successive geo-

Simmonds, General Biochemistry, 2

Inc., 1958), p. 252.

ed.

(New York: John Wiley

148

CORRELATION

Figure 4-7 Graded response of cat nictitating membrane to epinephrine injection fn Each point is the mean response in 5 cats; the same 5 cats were used for the entire curve. Actual contraction amplitude (after magnification) is shown on the left scale, percent of estimated maximum contraction on the right. (Data of Maxwell et al.*)

vivo.

3 Q_ 70 CD (S)

o =3

CT>

,

Dose (mg/kg)

.002

.004

.008

i

-1.50

I

.128 i

i

-2.00

-2.50 Log dose

.064

.032

.016 i

-1.00

i

Arbitrory log

*R. A. Maxwell

et al., J.

Pharmacol. Exptl. Therap. 131: 355(1961).

metric increases (here doublings) are equally spaced. In the middle scale, actual logarithms are designated.

The bottom

scale

would ordinarily be

used only on a working graph, not for the final display of data; a transformation that greatly simplifies computations. in the geometric series

it

illustrates

The ascending doses

have been coded by assigning integer numbers

beginning with zero. In the present example the actual logarithm of the lowest dose, 0.002, would be 3.301, 12 Because it

simplifies

12

computations we

representing negative logarithms as the

and

shall

sum of a

this is

coded as zero. Since the

employ the established convention of negative characteristic and a positive

mantissa. log 2

= 0.301

log 20

=1.301

log 0.2

=

1.301

Conversions to and from logarithms may be accomplished with the aid of Table

3.

The logarithmic transformation and the log dose-response curve

149

successive doses are doublings, the coded log units must differ by log

2

=

Thus any point on the

0.301.

arbitrary log scale can be decoded by

For example, 3.200 on the

multiplying by 0.301, then adding 3.301. arbitrary scale

would correspond

=

actual log units, and antilog 2.264 dose.

Log

dose-response

3.301

=2.264

in

0.0184 mg/kg, the corresponding

The coding and decoding procedures

Figure 4-8

+

to (3.200 x 0.301)

curve

are exactly analogous

cumulative

a

as

normal

when

frequency

distribution of sensitivities of the individual responsive units.

90

BO

/

^

2 8 §

< § ,0

dose-response curve

log

70

8-

V

r-

•;

/I

E-e

/i

40

J^~~

a> ">

1

~"**>^

Sj >//

>

30 — o

>J j\.

/ A

sensitivities of individual

responsive units

\

20

—~

a>

Q- Q_

3

Dose *A.

S.

Kuperman

different,

and

et al., /.

to

30

10

300

100

1000

/i.g/kg

Pharmacol. Exptl. Therap. 132: 65 (1961)

what extent the two

ED 50 values really differ.

If

two drugs

but differ only

by the same mechanism upon the same of their curves must be the same. Conversely, different slopes imply different mechanisms of action. If the two curves are equiresponsive units,

act

in potency, the slopes

distant is

from each other

in the horizontal direction at all

meaningful to state the difference

fication.

If the

meaningful only

two curves are not if

in

parallel,

the particular response

response

levels,

it

potency without further qualia potency comparison

level (e.g., the

ED 50

)

is

is

specified.

Figure 4-9 illustrates some parallel and non-parallel curve segments for different drugs acting

upon

the

same

biological system.

CORRELATION

152

The their

relative

two log

potency of two drugs

ED 50

values,

and

obaitned as a difference between

is

since

log x'

-

=

log x

log

— x

the comparison

Example

expressed as a potency ratio.

is

4-8.

shown

In Fig. 4-10 are

the effects of a single drug in depressing the amplitude

pH

of contraction of the turtle heart at two different the

ED50

pH

at each

and the potency

Figure 4-10

Effects of

heart. Each point et al.*)

is

ratio for the

pH on the

pH

by eye

values.

action of pentobarbital on the turtle

mean value from

the

values. Estimate

two

10 hearts.

(Adapted from Hardman

100 -

80 -

c o (/>

Q. a;

pH7

pH

5

8.5

60

q)

~ q_

40

-
—-D

100

/

/

9i

B n

,•••'!

"

"

...••"

1

[

a

80

/

/

-^=60

1

1

*

1

c O

n

/

to/

o>

f

~

~/

••

1 1

* >

1

40 4

/ /

20 /

s

'

'

// /

/ • •

/

/

H

I-

f

1

q/

/

7m •

o

c

00001

0.001

0.01

1.0

0.1

10

Dose (mg) *W. R. Bryan and M.

unequal

95%

B. Shimkin, /. Nat. Cancer Inst. 3: 503 (1943).

confidence limits, 1.6 on the low side and 6.8 on the high

side; likewise the confidence range of the

ED 50

for

one of the drugs

extended from 0.22 below to 0.36 above the estimated value of 0.60 mg/kg.

For many purposes

it

may

suffice to

draw the

ED 50

curve by eye, and to estimate slope,

,

entire log dose-response

or potency ratio directly from

the approximate curve, without applying any statistical analysis at

Indeed, biological data

may sometimes

all.

be unavoidably poor, so that a few

rough conclusions with which everyone can agree may be preferable to

an elaborate

will

statistical analysis

not really justify.

A

which the experimental observations

good example

is

presented in Fig. 4-11. Here

we

see dose-response relationships for three carcinogens given subcutaneously

to mice. 13J.

Now

it is

quite evident that

A

and B have about the same slopes

T. Litchfield and F. Wilcoxon, J. Pharmacol. Exptl. Therap. 96: 99(1949).

CORRELATION

154

and very nearly equivalent potencies. C, on the other hand, potent, but the data are so variable that ratio in

it is

is

clearly less

hard to estimate a potency

which we could have much confidence.

If these self-evident

con-

would serve no

clusions are sufficient, then further statistical analysis useful purpose.

NORMAL EQUIVALENT DEVIATIONS AND

PROBITS

Except for rough approximations, the assumption of linearity over any considerable segment of the log dose-response curve

proper analysis requires that the data as to

make

the curve linear over

plished by converting the j-values units

known

as

its

first

is

untenable, and a

be transformed in such a way

entire extent. This

can be accom-

from percents of maximal response

normal equivalent deviations (N.E.D.).

A

"N.E.D."

is

to

the

response increment brought about by increasing (or decreasing) the log

dose by one standard deviation, taking the 0). is

This

shown

is

N.E.D.

0,

ED 50

as starting point (N.E.D.

in Fig. 4-12. Centrally placed

corresponding to

50%

on the

j-axis at the

of maximal response, or

50%

left

of the

cumulative area of the normal curve. Since in the normal distribution an

+o

increment of

34% of the area (Table 4), N.E.D. 1.0 = 50% + 34% 84% of maximal response. N.E.D. -1.0 50%— 34% = 16% of maximal response. Theoretically,

corresponds to corresponds to

from

\i

includes

such a transformed scale has no upper or lower distribution itself

however,

is

considered to extend from

±2 N.E.D.

the x-axis

to

+oo.

Practically,

in actual biological experiments.

a log-dose scale in which log

is

— oo

normal

are usually sufficient to include the extremes of

meaningful data obtained

On

limit, just as the

ED 50

is

always chosen as

the zero point. All log dose-response curves will therefore intersect at (0,0) in the center

of the graph, regardless of relative potencies. Their

slopes, however, will differ. In

creasing log dose by 0.3

of

maximum,

sensitivities

or

1

(i.e.,

Curve A of

Fig. 4-12, for

example,

in-

doubling the dose) raises the response to 84 %

N.E.D. In other words, the standard deviation of

of the responsive units to this particular drug

Curve B has twice as steep a

is

0.3 log units.

slope, indicating a standard deviation of only

—

i.e., a more homogeneous population of responsive units. Even more convenient than normal equivalent deviations are units known as probits. A probit is identical to a N.E.D. except that zero

0.15 log units

N.E.D.

is

defined as 5 probits, thus eliminating negative values.

The

Normal equivalent

probit scale

is

shown

at the right

deviations and probits

155

of Fig. 4-12. Table 17 permits direct

conversion of any percent to the corresponding probit.

Graph paper

is

available

on which actual percentages are shown on the

7-axis (as in Fig. 4-12), spaced according to the corresponding

N.E.D.

Figure 4-12

Transformation of the cumulative normal distribution dose-response curve) to normal equivalent deviations and to probits. The scale of percent response is also shown at the left, as it would appear on probability paper. (log

+0.3

-0.3

Log Dose -Log £~D 50

This

is

known

as probability paper, or log-probability paper, according as

the other coordinate scale

paper

is

sometimes useful,

is

linear or logarithmic.

it is

Although such special

often easiest to convert percent response to

probits (by means of Table 17) and dose to log dose (by means of Table

3)

or to an arbitrary log scale, and then to use the transformed data both for

computations and for plotting on ordinary linear coordinates.

The slope of

the regression line of probit (or N.E.D.) on log dose

is,

as

pointed out above, a direct measure of the standard deviation (a) of logarithms of individually effective doses

(i.e.,

of doses just effective on the

CORRELATION

156

individual responsive units).

slope

is

equal to

coordinates.

Example

1/-variance

divided by the square of the slope, 2

(m-x) ~

'yx i

2

(x

1

N

—

=\2

x)

+ SS X

15 The method to be described here for graded responses merely uses the probit transformation to achieve a linear regression line. It may not be entirely valid if many responses are at the extremes of the curve, because the variances of the responses are not likely to be constant throughout. For very accurate results a method of weighting the responses must be used, as described by D. J. Finney in Probit Analysis (London: Cambridge University Press, 1952), pp. 185-188.

:

CORRELATION

158

whence the approximate confidence

m— where

t

(N —

has

2)

DF.

=m—

x

m

If

interval

is

and the variance

t(S( m

b

is

often the case

2

becomes

further, to

Syx z

x)

in brackets

still

(m-x)

_

(x

term

simplifies

S

_ x) )

reasonably close to x, as

in a well balanced experiment, the negligible,

±

x

would be

2

N

whence

m—

=m—

x

x

KSyx)

+

by/N Although

it is

no exact variance of (m

true that

confidence limits for the ratio of two

random

the

list

Fieller's

Theorem. For

its

one of the basic texts cited in

of references.

A term g must

first

be computed

9

Then

x) can be stated, exact

variables are given directly

by the solution of a quadratic equation known as theoretical basis the reader should consult

—

=

t\sy x

2 )

.

2

b (SS x )

the lower and upper confidence limits (designated by

(m -

0-0)

(

-9)

m -x)-^VP—- + t(Sy.x)

1

x) L

/(I

L and U)

are

~\2 (m-x)

,

N

SS X

(1-g)

(m-x) 2 ]

and 1

Now g will

\,

-x

Ks y

.

x)

be recognized as containing the relationship of the slope to

its

2

standard error, since s b

2

s = -^L-

(p.

143).

slope estimate, the smaller will be g. If g 0.1)

it

makes a

Thus, the more certain

is

the

SS X is

small enough (less than about

negligible contribution to the

above equations, and then

Analysis of a single curve with graded responses

159

the exact equations for confidence limits simplify to those obtained from the approximate variance. Evidently, g will be small under

all

ditions that reduce the slope variance: if the error variance

small;

slope itself

steep; if the dose range

is

of confidence

is

t

will

analysis, the actual

it

computation of g

if

the

can be dropped; otherwise

it is

from

zero,

other parts of any data

in

will entail

routinely. If

g^

and no confidence

almost no additional

found to be smaller than

it is

and the

retained,

exact confidence limits must be used. If significantly

is

also, if the desired level

be small. Inasmuch as the four terms com-

g have to be found anyway for use

work and should be performed 0.1,

and

large;

not too rigorous and/or a large number of observations

has been made, so that prising

is

the con-

1,

full

equations for

the slope does not differ

interval can be found.

Example 4-10.

Estimate the ED50 of a drug which produced the following contractions of a piece of rat small intestine suspended in a tissue bath for amplifying

and connected to a device

and recording the contractions.

Drug Concentration

(*)

Recorded

Coded

Contraction

Percent of Estimated

mm

Concentration

0.09

0.27

1

0.81

(y)

Probit

Maximum

log

8

12

3.82

13

20

4.16

2

19

30

4.48

2.43

3

24

38

4.69

7.29

4

62

5.31

73

5.61

84

5.99

— —

21.9

5

40 47

65.7

6

54

7

63

99

8

64

100

197.

591.

Since the response increment between the last two doses was negligible, estimate 64

mm

to be the

maximum

therefore expressed as percent of 64,

we

contraction. All the other responses are

and converted

maximum,

to probits.

It is

wise to exclude

measured would give them a spurious weight in the analysis. For example, the difference between 99.0% aryi 99.9% of maximal response would hardly be distinguishable in most biological systems, yet the respective probits (7.3 and 8.1) differ considerably. In fact, this probit responses that are very near zero or reliably,

and

their conversion to probits

since they cannot be

:

:

CORRELATION

160

difference

is

34% and 66%

as great as that between the

responses.

For

this

reason the two highest doses in the example have been excluded. A coded log scale is shown in the second column, as discussed in connection with Fig. 4-7.

The transformed data are plotted in Fig. 4-13. The usual calculations lead to the following

N=l (7

2* = 21

lx 2

x) 2

—-^- = 63.0

x

=3

=9\ 2

2> = 34.06

2> 2 =

(7 y) = 165.73 ^p-

= 4.87

>?

169.51

G>)p» = 1Q218

2> = 112.42

112.42- 102.18 0.366

91.0-63.0 sy x

2

=

i [169.51

-

165.73

- 0.366(1 12.42 -

102.18)]

= 0.006 s yx

y

=\/0Sm = 0.077

= y + b(x - x) = 4.87 + 0.366(x -

m= For

95%

3.00)

5.00-4.87

^36^ +3 °°^ we

confidence interval,

find

g

3

-

36

only 0.01 so

is

we proceed with

the

approximate equation

m — x = (m — x) ±

2.57(0.077)

0.366

\\

(0.36)

2

\/ 7

= 0.36 ±0.21

m = 3.36 ±0.21 Then

in

coded log units the

ED

5o

is

3.36

and

its

95%

confidence limits are

±0.21, as shown in Fig. 4-13. In order to reconvert the coded result to actual log dose, we

each dose differed from a previous one by a factor of log scale must be log 3

= 0.478.

The

3,

first

note that

so the unit of the coded

starting point, zero,

on the coded log

scale

Parallel-line bioassay with

corresponds to log 0.09 thus log

ED 50

=

graded responses

161

we multiply by 0.478 and add 2.954; =0.560 and ED 50 = antilog 0.560 =

2.954. Therefore

-3.36(0.478)

+ 0.954 -2

3.63 /ig/ml.

The 95% confidence limits decode to give -0.21(0.478) = ±0.100 log units. Then 0.560 ± 0.100 = 0.460 to 0.660 log units, the confidence interval. Limits for the ED50 itself are then the antilogs, or 2.88 to 4.57 ^g/ml. Figure 4-13 Working graph for Example 4-10. Hypothetical data on dose-response for contractions of rat intestine. Each open circle represents a single experimental determination. The solid circle is the calculated general mean. Log ED50 and its confidence limits are shown at the bottom of the graph.

6

—f

...

in

c O

Q. Q>

5

^1

or >*-

Xs

'5

(3

)

,

1

1

a. 1

c

/r 1

X

4 c lyT

qED 50 ±9Z Vo

lo

*

1

2

3

confidence

V7hm 3.36

Log Dose (Coded

limit

i

4 Units]

PARALLEL-LINE BIOASSAY WITH GRADED RESPONSES The purpose of a bioassay is to compare the potency of an unknown with that of a standard, by means of a biological response produced by both

CORRELATION

162

The unknown may be the same material as the concentration being unknown. This is the case when

substances.

standard, only

its

vitamins, hor-

mones, or vaccines are assayed against preparations of standard

Then

it

clear that

is

when

the

unknown and standard

are adjusted (by

responses, they will contain the

dilution) to give identical biological

same concentration of the

activity.

On

active agent.

the other hand, the potency

of a different substance (or crude extract of

unknown composition) may

be compared with that of a standard material. In that case a generally valid potency

comparison can only be made

that the slopes of the

A

typical bioassay

if it is first

shown

may

be good reasons, however,

why

estimate the maximal response of the system; then to find

an

ED 50

The method of

known)

by graded response does not require the use of a

probit transformation, although probits will be useful

employed. There

(or

two log dose-response curves are the same.

and the probit transformation

will

it is

it

if

they can be

not practical to

will

be impossible

be out of the question.

be described here in

parallel-line bioassay will

its

simplest

terms, for a 2 x 2 assay, with direct measurement of a graded response (e.g.,

blood pressure increase,

in

mm

of mercury), equal dose-ratios, and

equal numbers of observations at each dose. This elementary type of assay has several important limitations, but application of the procedures

developed here to more elaborate designs culties.

will present

no

special diffi-

16

The procedure is to choose two dose levels of the standard, x S] and xSl and two of the unknown, x Vl and x U2 in such a way that the ratio of the higher dose to the lower is the same in both cases, ,

,

XS2 X Si

U

The concentrations of S or the responses

will

_

^JJi

x u,

are adjusted so that, as nearly as possible,

be matched,

y Sl

The purpose of the

s

analysis

y Vi

is

and

y S2

£

y U2

then to ascertain the potency of

U in

terms

of S, by comparing the two regression lines of response on log dose. If

was very good, these two lines will be nearly identical; otherwise they will be parallel but somewhat separated on the x-axis. The

the matching

statistical

16

More

problem

is

to

estimate the true potency ratio,

defined

sophisticated designs are well described in D. J. Finney, Statistical Assay (New York: Hafner Publishing Company, Inc., 1952).

in Biological

as

Method

— graded responses

Parallel-line bioassay with

— for equal dose of

effect,

U

and

its

confidence interval. Since lower effective

dosage means higher potency, the magnitude of potency of

U relative

163

this ratio expresses the

to S.

Figure 4-14

mean

2x2

parallel-line bioassay. Each point is the of 5 responses at the given dose. Parallel regression

lines have been drawn through the respective x, y, points with the mean slope calculated from all the data. (Hypo-

thetical data

from Example 4-1

I.)

50

uz

/ sz

/

{

40

yu

—y

30

"?s

0)

o

20

/

t/li

5,

*

/

10

Log Dose

The procedure is illustrated in an unknown extract of adrenal

Fig. 4-14.

tissue

A

Units)

standard pressor amine and

were assayed by injecting

venously in a cat, and recording the transient

produced by each

(Coded

injection. Five observations

rise

intra-

of blood pressure

were made

at

each dose of

both preparations (the order of injections being randomized) and the mean responses were plotted as the four points shown on the graph. The two

CORRELATION

164

dosages of

S and

cV,

—

and

log scale as

1

may

whatever they

+

definite weights of a

actually be, are plotted

on a coded

The doses of standard pressor amine

represent

pure chemical substance; the doses of

unknown

1

.

represent definite volumes of a certain dilution of the adrenal extract.

Suppose the matching had been equal

effects,

Thus,

if

5 /xg of the standard pressor

extract,

want

to quantify

Now when

to

by giving a confidence

(as in Fig. 4-14) the

the potency ratio will contain

ml of

that each

extract

of the standard pressor amine. This

jag

would be subject

estimate, of course,

ratio.

amine and x Vl contained

we would conclude

contained the equivalent of 50

lines

produced

units, equal doses

and decoding gives the estimate of the actual potency

xSi were

ml of the adrenal

0.1

two regression

perfect, so that the

were exactly superimposed. Then in coded

some

uncertainty, which

we would

interval for the potency ratio.

matching

is

not perfect, the estimate of

two components, one due

to the horizontal

separation of the two regression lines on the coded log-dose scale, the

other due to the coding

M be

its

M

the coded estimate identical response.

Antilog

itself.

Let

M

will

c

be coded log dose

will

The decoded estimate

slope

but

M

is

S minus coded

will

be given by

logarithmic,

U for — xv

log dose

M

c

+

(x s

).

then estimate the true potency ratio.

Since the regression lines for 17

M be the log of the true potency ratio,

estimate from the sample data. Since the dose scale

may have

different

S and U are assumed to have the same mean responses, we write two regression

equations, as follows:

9s

=

+

b(x s

yu

= yv +

b (*u

For the same response, y s ys Since

M=x

s

— xv

,

+

= yv

b(*s

for the

~

ys

~ xs) *u)

and

,

*s)

= Pu+

b(*u

~

*u)

same response,

yu-ys=

bl

M - (*s -

M = yu ~ ys + (xs\

-

x

,

s

*i/)]

- x-n v)

We shall test this assumption below in an analysis of variance (p. 169). Alternatively, could be tested by the significance of the difference between the slope estimates

17 it

(b s

—b v )

from the S and

U data,

by the method indicated on

p. 144.

Parallel-line bioassay with

graded responses

165

This equation would apply to the general (uncoded) case. In a symmetrical design with coding, such as represented in Fig. 4-14,

xs = x v on

the coded scale, so in coded units,

b

M

and the actual value of

The common

b

=

by

all

be obtained after decoding. is

given by

X (x - x) v(y -y)u + Y.( x - x )s(y - y) 2 ~\ s X (X - X)rj + I (X - X) 2

U

which the S and

in

will

slope estimate

data are pooled to yield the single slope best

the points.

The

error variance

obtained by an equation

is

usual one for data of a single regression less

U

fitted

DF, and

strictly

analogous to the

except that there

line,

is

now one S and

the symbols (as above) distinguish between data in the

sets.

2 s...,

•>-

=

i (

N-~Z\

-

v Ly

0> P

,

2

2

(Lys)

)

N„u

N.

iy s

&E (* - x u)(y - yu) + Z

(

x

-

x s)(y

-

y s y]

These equations for slope and for the error variance may also be used with

S and

unequal numbers of observations in terms

may

For

U.

7VS

=

Nv

the various

simply be pooled, as illustrated below in Example 4-11.

The estimation of an exact variance of same difficulties already pointed out in estimate (p. 157).

An

same form

(m

as for

M — (x

approximate variance

—

x).

s

—

Xtj) is

beset with the

the case of a single potency is

given by an expression of the

Naturally, the error variance, slope, and SS X are

based upon pooled data from both

sets,

S and

j_

[M - (xs— x v)y

Nv

SS X

2

S [Af-(3cs-*u)]

Ns

whence approximate confidence t(s y . x )

U.

limits are given

M -V-**)±^Jjr + jr +^ /

1

1

s

Nv

by

[M -

(x s

ss

-

*„)]

:

CORRELATION

166

Exact confidence limits are again given by

M - (x

s

-

x v)

b

M - (x

=

U

V

g)

s

Nv)

\N s

Fieller's

-

Theorem

x v)

+ SS.

where 2

9

and may be neglected

if it is

=

t

(s y x )

2

.

b\SS x )

smaller than about 0.1. In that case the

expression reduces to that based on the approximate variance. In the symmetrical assay, where (in coded units) x s (x s — Xu) disappears. Moreover, if in a symmetrical

=

xv

2x2

lower dose ation (x

-

is

x)

coded as

=

1,

—

and the higher

1

and SS X

=£

(x

-

x)

+

as

1

the exact confidence limits

if

g

is

0,

assay the

each devi-

and

Nv

).

J

small,

^

)/4 +

b

all

Ns

n

V

b

M„ = M„ + In

=

become

(i-g)L and

then x

the term

= N (where N is the total number

2

of observations that are divided equally between

Then

,

,

V

M

N

problems involving two dose-response

lines,

t

has

(N —

3)

DF. 1

Example 4-11.

An

extract of adrenal tissue, of

unknown

pressor amine content, was assayed

and recording the tranx 2 assay was used, with a high/low dose ratio of 3 for both standard and unknown. Five observations were made at each dose of S and U, and the order of injections was randomized. From the data given against a standard pressor amine by injecting into a cat sient rise of

18

One

slope.

DF

blood pressure.

is

lost in

each

A

2

ED 5 o estimate,

and another

in the estimate

of the

common*

Parallel-line bioassay with

graded responses

167

%

below, calculate the potency of the extract (and its 95 confidence limits) in terms of the standard. Data are peak blood pressure increases in of mercury. Figure 4-14 is the working graph for this example.

mm

xs 5

m -l

Coded x

Nu

Ns,

xs 2

x

+

1

ml

0.1

-1

0.3

ml

+

1

15

41

19

51

17

47

25

42

17

35

20

47

13

50

16

56

18

38

23

54

5

5

5

5

-5

5

-5

5

5

5

5

5

5

N 2> 2>

15 A*g

xu 2

*°i

2

80

211

103

250

1,296

9,059

2,171

12,626

1,280

8,904.2

2,121.8

12,500

16.0

154.8

49.2

126.0

16.0

42.2

20.6

50.0

2

(Izl AT

I(y-y)

2

y

J,xy

G»(Z>0 N

-80

211

-103

250

_ 80

211

-103

250

:

CORRELATION

168

Pooled Data Pooled

Pooled

+

S+U

S1

N

S2

Pooled Ui

+ u2

10

20

10

10

20

10

10

20

10

291

644

353

2* X

I*

2

(Ixf

N I(x-x) 2

ly y

ly

2

(lyf

N I(y-y)

2

2 xy

29.1

32.2

35.3

10,355.0

25,152.0

14,797.0

8,468.1

20,736.8

12,460.9

1,886.9

4,415.2

2,336.1

131

278

147

131

278

147

N 2 (x - x)(y - y) Some of

the terms found above will not be required here, but will be used

later.

We

fnay

now compute

b and

M

c

from the pooled terms above.

2 (x-x)(y-y) 2 pooled 2 (* - x)

pooled

278 ~~

20

yu-ys = 35.3-29.1 — = 0.446 Me = b

13.9

Analysis of variance of bioassay data

W

= ^73 {2 (? - >0 - 6[2 2

= sy x .

r-os

at 17

2

g Since g

is

13.9(278)]

=

32.41

=V 3241=5.69 = 4.452 t\s y

_

2 .

x

)

~b 2(x-x) 2

very small, and the assay

may

-

- x)(y - >)]}

DF = 2.11 r

expression

iV [4,415.2

fcc

169

be used for the 95

2

is

~_

(4.452)(32.41)

=

0.04

(193.2)(20)

symmetrical

2x2{x s = xu),

% confidence limits of M

the simplest

c,

KSyx) b (2.11X5.69)

N

\

/4±

0.199

Mc = 0.446 ±0.396

Now coded

the actual ratio of high-dose/low-dose

units; thus,

was

M=

Then

(0.239)(0.446)

±

corresponding to two

3,

our coded unit corresponds to V2 log

= 0.239.

3

(0.239)(0.396)

= 0.107 ±0.095 = 0.012 to 0.202 The

actual potency ratio

lower confidence

ml

U is

limits.

is

The

given by the antilogs of result

is

1.28,

estimated to be 1.28 times more potent than 5

the equivalent of 64 jug S, with

95%

M and of

with limits 1.03 to fig

S; so

confidence limits 52 to 80

upper and

its

1.59.

ml

1

Then

t/

0.1

contains

fig.

ANALYSIS OF VARIANCE OF BIOASSAY DATA As

discussed earlier (p. 64), analysis of variance permits one to segre-

gate and examine separately the several

components

that contribute to the

total variability in a system. In parallel-line bioassays the total variance

made up of two components, that

due

to error

(i.e.,

the residual within-doses variance).

doses variance can be broken

from

(1)

difference

between

is

that due to differences between doses, and

down

further into four

preparations,

(2)

The betwecn-

components

regression

(i.e.,

arising

dose

related differences in response), (3) departure from parallelism of the

two

170

CORRELATION

regression lines, and (4) departure from linearity. Since two points deter-

mine a

line,

departure from linearity can only contribute in assays with

more than two points per preparation. 19 Analysis of variance for the data of Example 4-11 is presented below. The respective sums of squares are formed in a manner analogous to that explained on p. 67. It should be recalled that coding in a symmetrical design does not affect the analysis of

way

variance in any

since all the data are

changed by addition or subtrac-

same amount.

tion of the

Calculation of

Sums

Grand Total

of Squares for Data of

4-1 1.

per Observation for y: (80 + (Zyf = -j^-

Total SS:

Example

+ 103 + —iQ-

211

2

250)

"

= 20,736.8

DF)

(19

2> 2

^~L =

1,296

+ 9,059 +

2,171+ 12,626-20,736.8

= 4,415.2 Between Doses:

(IyJ

(3

2

DF) 20 2

(2 yf

Nm

(80)

+

(21 1)

2

+

(103)

2

+

(250)

2

- 20,736,

5

JV

= 4,069.2 Preparations:

(Iys Y

Ns

DF) 21

(1

,

(Iy v f

Nv

2

(2^) „(25

2

N

i

/">CT\2

-

20,736.8

10

=

192.2

!9A defect of the 2 x 2 assay is that no information about linearity can be obtained from the data, and serious error may arise if the pairs of points are not in corresponding positions on the two log dose-response curves. 20 2 y-m and N m refer t0 tne individual dose-groups. 21 2^' Ns and 2>+' N u refer to pooled data of both standard groups and both

unknown

groups, respectively.

Analysis of variance of bioassay data

Regression:

[2

(1

(Pooled slope SS)

- x)(y - j)] 2Pooled 2ix~ ^Pooled

(-80

(x

Parallelism:

12 (x

DF)

(1

171

4-

,

- x)(y - y)] v 2 1 2 (* - *) V

2 (X - X) S

(131)

250)

:

= 3,864.2

separate slopes and pooled slope SS)

£ (x - x)(y -

[2 (x

2

+

103

20

DF) (Difference between

- x)(y - y)]s 2

-

211

2

lo" +

(147)

Z(X~ ^)

2

y)] Pooled

2

poo.ed

2

"

-fo

3

'

864 2 -

-

,2 8 -

Linearity:

Error (Within Doses):

DF)

(16

Total SS

-

between-doses SS

= 4,415.2 - 4,069.2 =

346.0

ANALYSIS OF VARIANCE (DATA OF EXAMPLE 4-1 1)

DF

Source

Preparations

1

192.2

192.2

Regression

1

3,864.2

3,864.2

Parallelism

1

12.8

12.8

Between doses Error (within doses) Total

—

Regression :

Error Preparations

Error

The

analysis is

due

4,069.2

346.0

19

4,415.2

3,864.2 '

,

=179(1,

16

DF)

/>> ma x = 6.31 and ^ min = 3.70. Then the working probit (y) is 0.14(6.31) + 0.86(3.70) = 4.07, which is an adjustment upward toward

the line. 24J.

T. Litchfield

and

F.

Wilcoxon,

J.

Pharmacol. Exptl. The rap. 96: 99 (1949).

Single log dose-response curve with quantal responses

Log Dose

Dose

Proportion

Alive/

Empirical

A lire

Total

/^g/kg (x)

Expected

Probit

Probit

5.84

(P)

1,000

3.0

8/8

1.00

500

2.7

7/8

0.88

6.18

250

2.4

4/8

0.50

5.00

5.25

125

2.1

4/8

0.50

5.00

4.67

1.8

1/8

0.12

3.82

4.09

62.5

175

The empirical probits (from Table

17) are

shown

plotted against log dose in

Figure 4-15, where a provisional regression line has been drawn by eye. Expected probits, read off

from the provisional

line,

have been entered

in

the

last

column above. Weighting

coefficients (w) are

expected probit, cations give

Nw

is

found by interpolation

in

Table

18, for

Nwx and Nwx 2

.

Figure 4-15

Working graph

for analysis of

Example 4-12.

T

/

6

adjusted line^

> 1

^^-provision al

tr

f\

/ A 1.0

each

obtained for each dose group, then successive multipli-

20 Log Dose

30

line

CORRELATION

176

Expected

w

Probit

X

TV

Nw

Nwx

Nwx 2

5.84

0.490

2.7

8

3.92

10.58

28.57

5.25

0.622

2.4

8

4.98

11.95

28.68

4.67

0.611

2.1

8

4.89

10.27

21.57

4.09

0.468

1.8

8

3.74

6.73

12.11

%Nw = 17.53 2 Nwx = 39.53 2 Nwx2 = 90.93 -

Weighted x

=

y Nwx

39.53

%Nw

17.53

-=

2.25

Working probits (y) are then obtained by entering Table 1 8 with each expected and using the observed sample proportions (not percents) as indicated. Then Nwy, Nwxy are also computed: 25 probit

Expected Probit

Observed Proportion

y

Nwy

Nwxy

(p)

5.84

0.88

6.13

24.03

64.88

5.25

0.50

4.99

24.88

59.64

4.67

0.50

5.01

25.00

52.50

4.09

0.12

3.86

14.44

25.99

2 Nwy

=

= 88.35

2 Nwxy == 203.01 88.35 2 Nwy _ = = 5.04

Weighted y

ZNw

VL53

Finally, the slope of the adjusted regression line

tion analogous to that

on

p.

given by a weighted equa-

(iNwx)CjNwy)

2 Nwxy

iNw

2 Nwx

NwX y

(2

iNw

(39.53X88.35) 203.01 17.53

90.93

-

(39.53)'

17.53 3.85

= 2.15

L79 25 For

is

136:

some purposes Nwy 2 may

also be required.

:

Single log dose-response curve with qucntal responses

The adjusted (x,y)

regression line, with slope 2.15

= (2.25,5.04)

is

shown

cycle

and passing through the point

in Fig. 4-15, together

made some

cycle of computation has clearly

177

with the provisional

small difference in the

One

line.

line.

Another

would begin with the adjusted line and proceed exactly as before, to still better estimate, based on the expected probits given by the new

obtain a line.

In the present case (and quite often with real data) another cycle

alter the adjusted line

Before accepting the adjusted line as a reasonable

should be made, sufficiently

order to

in

would

only by an insignificant amount.

homogeneous

make

to the data, a

fit

y

1 test

sure that the various sample responses are

randomly drawn from

to be accepted as

represented by the adjusted regression

This goodness-of-fit

line.

out by converting each expected probit (from the new

a population test is carried

back to proportion

line)

responding, and then (multiplying by N), to the expected number of subjects

responding

numbers

each dose group. The difference between expected and observed

in

each dose group

in

(E-Oy

/ °

a.

a>

rr

Log Dose (Coded

Units)

CORRELATION

182

The adjusted lines with slope 0.89 and passing through (x,y) in each case, shown in Fig. 4-16. They differ but slightly from the provisional lines and although a high degree of accuracy would require that another cycle of approximation be performed, we shall ordinarily be content with a single adjustment. The test of homogeneity by % 2 is performed with each set of data and the adjusted 2 lines, exactly as on p. 177; the result is an acceptably low value of x Here, if there seems to be any doubt that the data describe two parallel lines, a test for are

-

parallelism should also be performed. 26

Once we have satisfied ourselves that the observations are reasonable samples from the populations represented by the two adjusted lines, implying also that a linear regression is a reasonable representation, and that parallelism is a reason-

we may proceed

able assumption,

to the actual estimates required.

(U)

m

.00-5.07

5.00

4-0.362=0.283 0.362 -0.283 +

--

c

+ 0.601=0.623

0.89

0.89

M

- 4.98

=

= ms -m v = 0.283 - 0.623 =

.660

1

Alternatively,

M = ^_A + (^ - xv

)

C

=

^J^ +

(0.362

- 0.601)

= T.660 First

we compute

the confidence interval for each

LD

5o

separately.

For S (Drug A): (1.96)

2

0.164

2

(0.89) (29.6)

m - 0.362 =

——

0.079

0.936

=

+

m

is c

B):

^.J.

1

^

(-0.079)

2

< m < (+0.309 + 0.362)

0.362)

e

estimated to be 0.283, with

For C/(Drug

/0.936

+0.309

to

-0.116

So

1.96 0.8

[

-0.478

(-0.478

±

2 = 1,247, SS*,- 11.43, s y * = 1.905, jc = 5.10, ^ = 13.28, y/x =

results

2

y

same system

chosen for

5,

N

smaller.

187

were

assays and 10 others for protein determinations.

12, 15, 13, 14, 15, 12, 12

2.60. r

= 2.45

/2

Then

at

P = 0.05

with 6

DF

= 6.00

the limits are

(5.10)03.28)

±

(5.10)(13.28)

(5.10)

2

-6.00/

2 54 j

(5.10)

2

-

= 2.77 ±0.71 So the estimate of DNA/protein

29 As

met

is

is

)

(13.28)

2

-6.00^j

m

6.00|

=2.06

2.60, with 95

shown by W. G. Cochran (Biometrics

% confidence limits 2.06 to 3.48.

7:17, 195!), a requirement that must be

that

x 2 (Nx )

). A misleading experiment than useless. Randomization would have avoided the difficulties. in case («),

is

worse

PI-3.

Draw up a sequential list of 50 numbers that will represent the order in which mice are going to be drawn out of stock for assignment to cage A or B. To avoid having to discard all the numbers greater than 50 as meaningless, designate mouse to be drawn by 01 or 51, the second mouse by 02 or 52, and so Then find the 25 random assignments to cage A by entering Table and reading out a set of 25 digit pairs. Suppose the numbers 02, 54, and 05 occurred among this set but 01 (or 51), 03 (or 53), and 06 (or 56) did not. Then the begin-

the

first

on.

1

ning of the

list

might look

like this:

Mouse Order 1st

2nd

3rd

4th

5th

6th

01

02

03

04

05

06

51

52

53

54

55

56

A

A

A

When

25 assignments to

for the remaining

consulted,

The

and

25.

A

have been entered on the

Then

the animal

is

the

first

placed

mouse

in the

is

list,

as above,

removed from

B

is

entered

stock, the

list

is

appropriate cage (in this instance, B).

distribution of mice continues in the

same way

until

completed;

this part

a mere physical execution of the specified plan of randomization in which no characteristic of the mice plays any role whatsoever.

of the procedure

is

PI -4.

with which his last name it has in this case deterand begins is certainly a "characteristic of a subject ridiculous objection, but a seem might mined into which group he is placed. This which could influence life, experiences in deeper thought will reveal that many one's name and its related to be may one's response to a psychological test,

The law of randomization was

violated.

The

letter

,,

rank

in the alphabet. It

is

not necessary to cite any particular respecl

in

which

a

name is likely to influence an experimental outcome. The point is that as long as some characteristic of a person influences his assignment to a group, a suspicion they should he. is introduced, that the groups may not be as equivalent as

198

ANSWERS TO PROBLEMS

PI-5.

Experimental

Location

Here each if

1

2

3

4

5

1

A

B

C

D

E

2

B

C

D

E

A

3

C

D

E

A

B

4

D

E

A

B

C

5

E

A

B

C

D

letter represents

the experiment

Day

a different attractant. Since there are five positions, it must be conducted on five different occasions,

to be balanced

is

some time. Many other Latin squares could have been chosen. The results obtained will be numbers of flies trapped, and there will be a number in each of the 25 boxes of the so each position can be occupied by every attractant at

5x5

The attractants will be compared by computing the mean numbers for A, for all B, and so on. Any difference between locations will be given by a comparison between the row means, any difference between days by a comparison between column means. table. all

PI -6.

The

blood pressure is not abnormally elevated. Decision blood pressure. If it does not exceed 130, accept; otherwise reject. Many definitions of the normal state of health are necessarily of a statistical nature. Large deviations from the norm imply underlying disease, but there Hypothesis'.

rule:

is

Measure

systolic

systolic

often a considerable region of uncertainty, where health

to be found.

Thus

in the present

all recruits

who

some who

are really quite healthy.

are rejected because of blood pressure above

The

1

%

of

all

may

decision rule

with disease manifested by high blood pressure certainly reject

and disease are both

Among

problem, some rule has to be adopted.

(if so,

healthy recruits (since a

then

1

30 there will be

reject all recruits j8

= 0)

and

will

it

= 0.01).

PI -7. If

you

see X, the chances are 2 to

there are three

X

symbols

in all;

1

that the other side

on the obverse

of two of them an X. The decision rule seen, reject.

Now

if

is

seen,

and we

is

reject,

therefore: If

we

because with that frequency the other side is

seen,

and we

accept,

we

will also

is

will

is

also X. This

side of

X

is

because

one of these

is

be wrong once in three

indeed X, so a

be wrong once

= 0.33.

in three trials,

so

if

0, is

trials,

But /S

a

is

seen, accept;

if

X

= 0.33.

PI -8.

The null hypothesis states that the samples of data from the alcohol-treated group and from the control group were drawn from the same population, i.e., that alcohol

had no

effect.

Since

we

are dealing with score differences, the null

Chapter

hypothesis

that the true score difference

is

is

zero,

I

199

that the sample of score

i.e.,

was drawn from a population with mean difference equal to zero. The null hypothesis will be tested by comparing the observed mean difference with its sampling distribution (about which we will need some information) to see how rare it is. If the probability of its being drawn by chance from the differences

hypothetical null population the null hypothesis

less

is

and conclude

than our chosen value of /\ we

that alcohol

was detrimental

Otherwise we shall be unable to conclude that alcohol

is

will reject

to driving ability.

detrimental.

PI-9.

When we drawn from

reject a null hypothesis,

different populations,

differed. This

is

the

same

we conclude

i.e.,

two samples were and control groups really

that the

that treated

as finding that the confidence interval of a difference

may merely mean we had insufficient data to reject it. It certainly does not mean it is true. While may be reasonably small with respect to some specified alternative hypothesis, does not include zero. Accepting a null hypothesis, however,

jS

other alternatives can usually be postulated for which

j5

is

very large.

A

real but

small difference between two parameters might only be detectable by samples too

The confidence interval for the difference would, in this would specify how large a real difference might without being detected. The confidence interval approach is therefore much

large to be practical.

case, include zero, but the limits exist

more

informative.

PI-IO.

Not

necessarily.

The

level of significance

estimate of the true magnitude of an effect. certain that

B

is

effective, in the sense that there

really be inert, but only

however, that

A

should not be confused with an

The

is

the

one chance

more

in

100 that

effective vaccine.

is

B

results

mean

one chance is

in

that

it

20 that

is

more

A may

inert. It is entirely possible,

For example,

it

might have been

on a smaller sample than B, so that the results were less decisively different from the control. Again a confidence interval for the potency difference would be more informative than the mere statement of effectiveness.

tried

Pl-ll

was labeled and patients could not know which medications were represented by A, B, and C. However, opinions soon form as to which is which, and these in turn bias further observations. The merits of A, B, and C are sure to be discussed among the patients and among the staff. Any observed effect (favorable, unfavorable, or neutral) o\' one drug, though it may be manifested in but a single patient, becomes common

The

defect in this experimental design

distinctively. It

is

is

that each medication

true that the investigators, nursing

knowledge, and that drug (be the experimental group and

it

staff,

A, B, or C) acquires special properties throughout

for the remainder of the experimental period.

200 ANSWERS TO PROBLEMS

For these reasons the drugs to be administered to each subject must be coded and without clue as to their relationship to any other subject's medications. Random serial numbers may be employed; or the entire sequence of medications for a given subject may be labeled with that subject's name, and numbered serially.

individually

CHAPTER

2

P2-I.

Since the data are paired for each subject, the procedure should the pairing.

We therefore calculate

each of the 10 subjects. This gives the following

+ 14, + 26,

+19,

make

use of

the score difference, placebo minus drug, for

-4,

-26,

-6,

of differences:

set

-29,

+17,

Inspection of the signs of these differences makes

it

+10

+5,

seem unlikely that the drug

causes any significant improvement. Certainly six positive and four negative

group of 10

signs in a

chance basis

in a

is

entirely compatible with the 5

+ 5— ,

expected on a

population with true difference zero.

The signed-ranks

test

would require a preliminary rearrangement

in

rank

order without regard to sign, as follows:

d Rank Then

-4,

the

1

+ 5,

-6,

+ 10,

+ 14,

2

3

4

5

sum of the

negative ranks

Table 16 shows that at not exceed

10.8.

We

P = 0.05

6

22.5, the

is

for

+ 17, + 19,

one

tail

(+26,

7

sum of the

and

-29

-26),

10

8.5

positive ranks

TV =,10, the smaller

could not, therefore, assert a significant drug

is

32.5,

sum may

effect

from

this test either.

Had we chosen

to

employ a

5> =

Q»

/-test,

we would

find, for the set

2 x 2 = 3,216.

+26 2

^ •*

N SS

x

=

(\ fxl D /.O

= 3,148.4

+2.6 s

2

=

3,148.4

9

sz

2

=

349.8

= —

349.8

~\A Q 9, 34 -

io sz

=

V34.98

5.92

of 10 differences,

201

Chapter!

x '

=

+2.60

0439

= T92 = 7,

which Table 5 shows to be not significant. To answer how great an improvement or decrement might be attributable to the drug yet not discernible here (Type II error), we establish 95% confidence limits for /z, the true difference. Table 5 shows / = 2.26 for P = 0.05 at 9 DF.

±

/x= +2.6 ±2.26(5.92)= +2.6 = -10.8 to +16.0 These are the is

which the true drug

limits within

included in them, the drug

may

13.4

probably

effect

lies.

Since zero

be entirely ineffectual as already shown.

It

on the average no greater than + 16 points on a base score of about 300, in other words, no greater than about 6% improvement. Moreover, although the drug could also be having a detrimental effect, that effect would be no worse than about a 4% decrement from the original score. is

also clear, that

if

the drug does have a beneficial effect, this

is

P2-2.

1005

=

Here C.V.

=

and x

15

x

= 282,

so

15(282) s

423

=^o^= = = S£ '2 = s2

1

89.45 since

5 2 (z-z')=

13.4

x-

282

x!

=

S(x-x')

heart rate

DF

is

240

= 42

value 2.55 for one

critical

so the

18

-

13.4

This exceeds

For the

told the C.V. did not change

- x') 42 '=-—=3.13

t=-

fall in

we were

178.9

S(z-x')=

(x

Then

,789

89.45 for the group initially

Sx 2

95%

is

tail at

^=

0.01 with 18

confidence limits of the true change

in

2.10 (x

-

x)

±

f-osCw-*')

P2-3.

Code by

DF

(Table

5),

highly significant.

subtracting 40.

Then

= 42 ± 2. 0( = 14 to 70 1

1

3.4)

heart rate,

t

05

with

ANSWERS TO PROBLEMS

202

Xc
2.16

N

SS == 12.98

+ 3.6 Y~

-

=

12.98

+0.6

52 =

= 2.60

5

2.60

~6~~ 0.433

S£ 2 =

Si

= 0.658

Decoding,

x

=

+0.6

+ 40 = 40.6

! = 10(H/Z60 =40 cv= j00 * 40.6

For

99%

confidence interval, fi

r

DF)=4.03

i(5

= 40.6 ± 4.03(0.658) = 37.9

to 43.3

P2-4.

The obvious question on each criterion.

scores

in this

We

problem

is

whether or not caffeine affected the

begin by inspection of the data.

It

appears that the

"alertness" scores are improved, and that the "relaxation" scores are decreased;

but

it

is

obvious that "nervousness" scores are not consistently changed and

nothing would be gained by analyzing them. The

an analysis of variance on the "alertness" Placebo

X

x

1

2

X

50

first step,

then,

is

to perform

scores.

mg x

2

300

X

mg x2

1

1

2

4

2

4

4

16

1

1

4

16

4

16

1

1

2

4

3

9

1

1

3

9

2

10

16

4

100

256

r=28 T2

= 784

203

Chapter 2

Preliminary Calculations

= 784/15 = 52.3 doses: 360/5 = 72.0

Grand

total

Between

Observations:

per observation „

=82.0

82

„

ANALYSIS OF VARIANCE

72.0-52.3=19.7 By diff.= 10.0

12

82.0-52.3=29.7

14

Between doses Within doses (error) Total

Table 7 gives significant.

A

We

F

=6.93

i

Variance Estimate

DF

SS

DF,

at 2,12

2

F 11.8**

9.85

0.833

so the between-doses effect

highly

is

are then justified in going further.

summary of

"alertness" score differences for each subject provides the

following: caffeine

caffeine

caffeine

(150 mg)

(300 mg)

(300 mg)

minus

minus

minus

placebo

placebo

caffeine •

2,

1,

Since

no

all

3,

1,

one kind

in

/-test

is

2 (Z*) /W=7.2, SS =

3.21

and r

s

4

at

5,

from placebo.

N = 4,

of the zero leaves

However, a

a sample of

2,

3,

and

the differences are positive,

signs of

that both doses differ

=

4,

2,

1

3

s

1

there

is

shows that if there are no difficulty in observing

In the last set of data, however, elimination

and

or the signed-ranks

test

2

test.

^x = 6, = 1.2, 2* = 10, =0.140, s$ = 0.374, t= 1.2/0.374 2

yields

= 0.700, s £ 2 DF (Table 5) = 2.13 in 2.8,

0,

2,

1,

mg)

since Table 10

P < 0.05,

too small for the sign

appropriate,

(150

jc

a one-tail

test.

Thus we may

reject

the null hypothesis of zero difference between the two doses.

Proceeding similarly, we find that analysis of variance on the "relaxation" scores provides no evidence of heterogeneity, since variance estimate between

= 2.45, error = 0.833, F=2.94 and exceeded. We cannot, therefore, properly doses

"relaxation" scores. There

is,

however, some clear indication that caffeine

does reduce "relaxation" scores, and

prove significant

We

if

a larger

DF) -

3.89, which is not go further with comparison of the

F. 05 (2,12

number

it

is

not unlikely that this effect would

of subjects were used.

therefore conclude that caffeine increases "alertness" scores as

to placebo,

and that there

is

a real dose effect. Caffeine

may reduce

compared

"relaxation"

scores, but this could not be established in the present experiment. There appears

to be

no

effect

on "nervousness"

scores.

1

ANSWERS TO PROBLEMS

204

P2-5.

The problem

is

solved by constructing the analysis of variance.

to code the data by subtracting 50.

Obs (Obs)*

-10

Y R 20 G -21 Total

-

100

400 441

Decoding

Obs (Obs)*

R 12 G -29 Y

1

G -35 Y -12 R 8

144

1

-16

-39

256

1,521

121

convenient

It is

not required.

Obs (Obs)*

841

1

is

1,225

144

64

Total

-33 -21 -12

1,089

441

144

-66 T*

= 4,356

Colors

ir

-21

441

2* IG

40

1,600

-85

7,225

9,266

PRELIMINARY CALCULATIONS Type of

Total of

Number of

Total

Squares

Items

Observations per

Squared

Squared Item

Number of

Total of Squares per Observation

Grand

4,356

1

9

Rows

1,674

3

3

558

Columns

1,898

3

3

632.7

Colors

9,266

3

3

3,089

Observations

3,360

9

1

3,360

484

ANALYSIS OF VARIANCE SS

Source

Rows Columns Colors Error Total

- 484 = 74 - 484 = 148.7 - 484 = 2,605 3,089 48.6 by difference =

DF

Variance Estimate

558

2

37

1.52

632.7

2

74.35

3.06

3,360

- 484 = 2,876

2

2

1,302 24.3

53.6

—

N.S N.S *

1

Chapter 2

Table 7 shows at

2

2,

DF that F. 05 =

19.0

and

F

= 99.0.

i

and

differ significantly in their attractiveness to the birds

205

Therefore the colors

there are

no

significant

position effects in the experiment.

P2-6.

The problem really asks for an upper tolerance limit for observations on inbody temperatures in this illness. We are required to find a temperature so high that if it were exceeded, a "serious doubt'' would be raised that the observation came from the same population. The definition of "serious doubt" is somewhat arbitrary. Let us find that temperature below which 99 % of the observations would be expected to lie, and let us make our assertion at P = 0.05. For 15, Table 6 indicates K = 3.52. The data themselves are readily coded dividual

N=

by subtracting 100:

1.2,

0.6,

0.6,

1.8,

1.0,

1.2,

1.6,

1.0,

0.8,

2.0,

1.6,

1.6

2>c=

Q»

2.2,

0.4,

2>c2 =

18.4

0.8,

26 80 -

2

»

22.57

N SS

4.23

—

52=

1.23

Xc

=

=0.302

14

x x

+ K(s) =

101.2

+

=

101.23

5

= 0.550

3.52(0.550) =103.2, the required upper limit.

P2-7.

This problem items

is

is

solved directly by the two-sample rank

small enough to

Evidently there

is

make

it

practical to obtain

a preponderance of

X

U

test.

The number of

directly

by counting.

preceding C, so the smaller

V

will

probably be found by counting C preceding X. In the following diagram the is indicated. number of C preceding each

X

C c

c c X X X C C X C X C X C X X

).5

3

c X c

xxxcxxccxcxcxcxcccc 3 3

5

6.5

9

6.5

9

1

206 ANSWERS TO PROBLEMS

C

The sum of smaller

value

of

U

we

by

%

be 55 or

We next confirm that this is the = 123.5, which is larger than the 15, we find that at P = 0.05 the value

therefore 56.5.

is

U'

=

12(15)

-

56.5

Then consulting Table

obtained.

U must

X

preceding

U = NN' -

sample

less for these

sizes.

Therefore

we canot

quite assert,

significance level, that the supplemented diet

improved pelt quality. The nearly significant result, however, suggests it might be unwise to accept the null hypothesis unequivocally. The farmer would have to decide whether or not the matter was worth further (and perhaps more refined) experimentat the 5

ation.

P2-8.

Code

the data by subtracting 200, then dividing by 10.

No

decoding

will

be

necessary because addition and subtraction do not affect variances; and although multiplication ratio of

two

and division do

affect variances, they will not

20°

10°

II

-5, -3,

III

0,

L

Total (Total)

change F, which

a

is

variances.

2

-4 -5

30°

(-9) (-8)

5,

3

(8)

0,

6,

8 (14)

(1)

7,

8 (15)

-1, -9,

1

-5 -6

-16

37

-21

256

1,369

441

Total

(Total)

(0)

-1

1

(-6) (-15)

1

1

2

r=o T2 =

PRELIMINAR Y CALCULATIONS Type of

Total of

Total

Squares

Number of

Number of

Total of

Items

Observations per

Squares per Observation

Squared

Grand Light

1+0+1=2 +

Temperature

256

Combinations

(-9)2

Observations

(-5)2 = 466

1,369

+

+

(8)2

+

(-4) 2

1

18

3

6

3

6

344.3

2

446

1

466

0.333

441

= 2,066 +

Squared Item

•

•

+ (- 15)2

•

= 892

9

+

•

•

•

+

(-6) 2

18

Chapter 2

207

ANALYSIS OF VARIANCE

Light

0.333

-

0.333

344

=

344.

2

Light \ temperature

by

= 101.4

4

Error

466

-446=

466

Total

F

0.167

2

Temperature

diff.

Variance Estimate

DF

55

Source

20.0

9

= 466.0

17

content

may

be calculated.

P3-6.

153

250 220 81

306

244

I

1,254

L

We may

209

consider 209 to be a reasonably good estimate

deviations from this expectation by x 2

-

o\' A,

and examine the

ANSWERS TO PROBLEMS

216

o

O-E

153

-56

3,136

250

1,681

220

+ 41 + 11

81

-128

16,384

306

+ 97 + 35

9,409

244

(O

- Ef

121

1,225

31,956

(O

-Ef _ 2 (O - Ef _

At

P = 0.01 and

5

DF, Table

31,956

~E~

E

153

209

9 gives 15.1 as the critical value of x 2 So .

it

may

be

concluded that there is a marked heterogeneity in the distribution of particles over the electron micrograph field. In other words there has been local aggregais not at all what would be had been uniformly sprayed over the

tion or dispersion so that the overall distribution

expected

same

if

the

same number of

particles

area.

P3-7.

No

Hallucinations

Placebo

1

Treated Total

Since this special

2x2

Total

6

1

5

1

6

6

7

13

table has boxes containing expectations smaller than 5, the

method must be

same marginal

Hallucinations

totals,

Since the second table for both these tables,

used.

We

first

write the

most extreme table with the

then the next most extreme,

if

is

identical to the

one observed, we find the probabilities

the null hypothesis were true. For the observed table

have

7!6!6!7! 137lT6T5Tl!

=

7

.0245

22(13)

we

Chapter 3

and

for the

217

more extreme one, 7!6!6!7! 0.0006

13!0!7!6!0!

The sum of these, P = 0.025, hypothesis and conclude that

is

(24)(11)(13)

we

the probability

wish.

So we

reject the null

the extract did have hallucinogenic properties.

P3-8.

2x5

Let us rewrite the data as a standard

contingency table and apply the

2 x test.

A

B

C

D

E

Acceptable

14

12

20

13

17

76

Not acceptable

22

26

16

24

17

105

36

38

36

37

34

181

Total

Total

Expected Frequencies:

C

D

E

15.1

16.0

15.1

15.5

14.3

76

20.9

22.0

20.9

21.5

19.7

105

36

38

36

37

34

181

A

B

Total

E-O:

D

A

B

1.1

4.0

-4.9

2.5

— 2.7

-1.1

-4.0

4.9

-2.5

2.7

C

E

Total

(E-0)2:

A

B

C

D

E

1.21

16.0

24.0

6.25

7.29

1.21

16.0

24.0

6.25

7.29

(E-Oy A

B

C

D

E

0.08

1.00

1.59

0.40

0.51

0.06

0.73

1.15

0.29

0.37

^(E-OV

x2

At P

-

0.05 with 4

hypothesis

any

may

=6.18

9 gives 9.49 as the critical value of x 1

we cannot conclude

between the flavoring agents.

C and

inferiority of

random sampling from able.

£

therefore not be rejected, and

real difference

superiority of

DF, Table

=Z

B may

In other

-

The

null

that there

is

words, the apparent

well be the result of the chances of

a population in which

all

the agents are equally accept-

::

ANSWERS TO PROBLEMS

218

P3-9.

The important thing to recognize is that this problem calls for the two-sample test, and not a x 2 test, because the data form an ordered contingency table.

rank

Data

4r,6