141 52 225MB
English Pages 1156 [1158] Year 1991
Hi
' . -.-■
• '
z s I
Or
T"’
Form Reviewed by Sponsor
/ Is Form N Acceptable?
No
Yes Investigator Submits Form to IRB Investigator Modifies Form
215
Sponsors and investigators want to use an informed consent that does not adversely influence enrollment rates of patients. A negative influence has been reported (Epstein and Lasagna, 1969), especially when an informed consent accentuates the risks of a clinical trial and does not provide the balance of potential benefits. Ethics Committees/IRBs
Many Ethics Committees/IRBs are in the position of having to “reinvent the wheel” each time they review an informed consent that raises new issues. Sometimes they debate issues without awareness of all relevant points of view. Ethics Committees/IRBs operate at different standards and consistency levels (Goldman and Katz, 1982), although one IRB reported a high degree of internal consistency over a several year period (Grodin et al., 1986). Ethics Committees/IRBs have the benefit of a journal (IRB: A R e v i e w o f H u m a n Subjects Research, edited by Robert J. Levine and published by The Hastings Center) to present issues, either as legal cases arise or in another format. These cases could be used to establish precedence and to provide additional points of view, which could be openly debated by other Ethics Committees/IRBs facing similar situations. Some of the many important issues facing Ethics Committees/IRBs are listed in Table 27.6, and types of protocol changes requiring an Ethics Committee/ IRB approval are listed in Table 27.7. Patients
Summarizing the perspective of patients is more complex than summarizing those of the other groups: the makeup of patient populations, plus their motivation and behavior, varies greatly from trial to trial and within clinical trials as well. Normal healthy volun-
Form Reviewed by IRB
Form N ' Acceptable’
/
No
TABLE 27.7 Protocol changes that m u s t b e submitted to a n Ethics Committee/IRB for approval3
Yes Investigator Notified
Investigator Uses Form ;
Investigator' Sends Copy . t o Sponsor,
FIG. 27.5 Development and submittal of the informed consent by the investigator and review by the Ethics Committee/IRB. Sample informed consents, regulations, and elements to include may be furnished to the investigator by the sponsor.
1. An increase in medicine dosage or frequency of dosing or a change in the method of medicine administration; this generally includes any change in medicine formulation, since the bioavailability of the medicine may be affected 2. Significant increases in the number of patients to receive the clinical trial medicine; a 10% or greater increase is considered “significant” by some individuals 3. Use or inclusion of new groups of patients whose concurrent medical condition might significantly affect the scope and/ or validity of the protocol 4. Use or inclusion of new groups of patients for whom special considerations are warranted in terms of care or other factors 5. Use or inclusion of new groups of patients whose concurrent therapy might confound clinical trial interpretation a
Each Ethics Committee/IRB may require submission of other protocol modifications for approval in addition to or in lieu of those listed in this table.
Investigator or Staff Discuss Trial
With .
Patient /
Does Patient Desire to
No
Patient Dismissed
Enroll? Yes Patient Completes Screen
Does \ Patient \ Pass Screen?/
Patient Dismissed
No
» Patient Repeats Screen
Yes
Patient Discusses Trial with Investigator and/or the Staff
Patient Reads Informed Consent and Ask Questions
Patient Willing to Enroll?
No
Patient Dismissed
and/or
Yes Patient Signs Informed Consent
Patient Discusses Trial with Family and/or Friends
Patient Thinks About Trial
Yes Patient Enrolls in Trial and is Given Copy of I.C.
/ Patient\ Satisfied With Response?’
Patient Formulates Questions and Discusses them with Investigator
No
Patient Dismissed
FIG. 27.6 Potential interactions of investigator and/or staff members with patients regarding the
informed consent. The screen may be performed after the informed consent is signed. I.C., informed consent.
216
INFORMED CONSENT AND REVIEW PROCESSES
teers in Phase I clinical trials obviously have a different perspective about a trial than do patients who are mildly ill with a chronic illness or patients who are severely ill. In addition, a number of differing groups of patients deserve special consideration (e.g., see Table 27.8). A series of questions that are relevant for patients to ask before enrolling in a clinical trial and for evaluation have been presented (Nealon et al., 1985). The investigator-patient interaction process leading to the informed consent is illustrated in Fig. 27.6.
/
217
the possibility of dropping out or of obtaining alternative treatment (7) being influenced by family or friends, or (8) believing the clinical trial will interfere with proper treatment. It is usually desirable to discuss or consider these and other clinical-trial-related issues before patients undergo time-consuming and possibly costly screening tests. If the informed consent has a significant effect on the makeup of patients who agree to enroll, the trial population will differ as a group from the intended patient population, which could compromise the extrapolatability of the data obtained to other populations of patients.
Patients Who Refuse to Sign Informed Consents FUTURE ISSUES
Using an informed consent may introduce bias into a clinical trial; this issue has not been adequately addressed in the literature. It was reported (Edlund et al., 1985) that psychotic patients who refused to sign an informed consent were generally more hostile than those who did sign. In addition, refusers were more likely to abuse alcohol and drugs. The literature reviewed by Edlund et al. (1985) revealed that only 4 of 232 studies evaluated presented characteristics of patients who refused to sign an informed consent form. These four papers were from epidemiological studies. Patients were interviewed on the eve of elective gynecological surgery about enrolling in a hypothetical trial. Those willing to sign an informed consent had less anxiety than those who refused (Antrobus, 1988). Another study (Dahan et al., 1986) reported that the age and sex distribution of patients who refused to sign an informed consent differed from those who did sign. Refusal to sign might involve the patient’s (1) religious or philosophical beliefs, (2) overestimating the magnitude of risks involved, (3) underestimating the likely or potential benefits of a trial, (4) dislike of the physician, staff, or facilities, (5) believing that protocol demands in terms of time or other commitments are excessive, (6) not fully understanding factors such as
TABLE 27.8 Selected patient populations with special informed consent considerations3 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Mentally retarded patients Psychotic patients Severe trauma victims and medical emergency situations Patients who are in a coma or are lethargic Unborn embryos and fetuses Newborn babies and infants Children and adolescents Prisoners Pregnant and lactating women Organ transplant donors and recipients
a These considerations (except for group 9) usually relate to the issue of competence or ability to provide an informed consent. Other groups such as students, employees, military, or police may be strongly encouraged (i.e., overinduced) to enter a clinical trial or they may be reluctant to refuse entry when asked.
Four issues are briefly mentioned below that may be more widely debated in the coming decade. Surgical Consent
The routine consent forms required in the United States to perform a surgical operation are perfunctory documents that in reality do not ensure that any information appropriate for decision making has been provided. In most other countries the standards are the same or lower. The document is so poor that in one study conducted 2 to 5 days after surgery 27 of 100 patients were unaware of what organ was operated on and 44 of the same 100 patients were unaware of the exact nature of the surgery (Byrne et al., 1988). Providing Information on Medical Disagreements
The possibility of providing information on clinical variability and disagreements has been discussed (Anonymous, Lancet, 1989b). This topic is not likely to become a popular cause because of the enormous complexity and the knowledge needed (often by experts) to understand the nuances needed to make the best decisions. Obtaining Informed Consent in Emergency Situations
This topic has long been a difficult one to deal with because of the patients’ stress and lack of time to discuss the alternatives at the very time that a rapid decision must be made. Traditional approaches, according to Grim et al. (1989), have been to (1) avoid clinical trials in emergencies, (2) omit the informed consent process, (3) obtain deferred consent, or (4) obtain customary consent. They propose a fifth possibility consisting of a two-step consent process that has more desirable characteristics than the other four procedures.
218
/
DEVELOPING AND W R I T I N G C L I N I C A L PROTOCOLS
Informed Consent as a Source of Bias
There is no doubt that using an informed consent introduces bias into a clinical trial (Myers et al., 1987; Levine, 1987). There is no possibility of returning to
pre-informed-consent days, but, as more investigators attempt to understand this bias better, it is hoped that they will study the means of diminishing its impact on a clinical trial.
CHAPTER 2 8
Remote Data Entry Types of Remote Data Entry, 219 When To Use Remote Data Entry, 219 Perspectives of Professionals Affected, 219
Models of Data Flow Using Remote Data Entry, 220 Advantages and Disadvantages, 221 Regulatory Considerations of Remote Data Entry, 222 Practical Pointers, 223
Monitor’s Perspective, 219 Investigator’s Perspective, 220 How Remote Data Entry Functions, 220
Remote Trial Monitoring, 223
Audit Trails, 220
TYPES OF REMOTE DATA ENTRY
PERSPECTIVES OF PROFESSIONALS AFFECTED
The term remote data entry generally conjures the image of clinical trial data transmission via computer and telephone line from an investigator’s site to the sponsor. Although this is the most commonly discussed type of remote data entry, there are numerous other types, including transmission of laboratory data from laboratory to sponsor or investigator. Specialized tests may be conducted offsite, or data may be interpreted offsite, and results subsequently transmitted as desired. Attributes of a desirable remote data entry system are listed in Table 28.1, and protocol elements that may be included are listed in Table 28.2.
Monitor’s Perspective
Electronic transmission of data from investigators to sponsors does not eliminate the need to monitor trials at the trial site. Although it is far better to assess the status of an ongoing clinical trial by reviewing data than by telephone contacts, the importance of direct contacts and visits cannot be underestimated. Nonetheless, a lower frequency of monitoring visits may be possible if data of acceptable quality are received by the sponsor at an acceptable rate, and telephone and other contacts indicate that no problems requiring a monitoring visit exist. Generally it is not necessary to monitor both paper and electronic data, although this often occurs. In-house data processors may wonder about how remote data entry will affect them in terms of job security. Usually, it only means that data arrive at a sponsor’s site somewhat faster and cleaner. The jobs of data processors are rarely, if ever, threatened.
WHEN TO USE REMOTE DATA ENTRY
Although there is no right or wrong time to use remote data entry, its use is generally more advantageous in certain types of trials. These include (1) pivotal clinical trials, (2) trials with large amounts of data, (3) trials with both a moderately large number of sites and patients at each site, and (4) trials with seasonal data. Criteria suggesting that remote data entry should be considered are listed in Table 28.3. Remote data entry tends to be impractical for clinical trials of long duration (i.e. , over 1 year) because it ties up equipment and may not be cost effective. Trials with low volumes of data are also generally impractical for remote data entry. Even in these two cases, however, remote data entry might be considered after assessment of relevant issues.
TABLE 28.1 Attributes of a desirable remote data
entry system 1. 2. 3. 4. 5. 6. 7.
219
No vendor involvement is required System is user friendly Audit trails are present that are easy to use Data are transferred to sponsor automatically System is readily compatible with existing systems System is relatively inexpensive System allows an easy method of correcting errors to be used
220
/
DEVELOPING AND WRITING CLINICAL PROTOCOLS
TABLE 28.2 Selected protocol elements that may be
built into the remote data entry computer program 1. 2. 3. 4. 5.
Data collection forms Patient entry criteria Dosing schedules and acceptable modifications of dose Patient visit schedules Level of abnormal laboratory values that require a comment or action 6. Severity of adverse reactions or treatment that require notification of the sponsor 7. Adverse trends in laboratory or other values that require a comment or acknowledgment 8. Electronic mail for communication between site and sponsor
Investigator’s Perspective
A company conducting clinical trials should only use remote data entry procedures when both the sponsor and the investigator desire it. Trying to convince a reluctant investigator to use this type of data transmittal is not an auspicious way to start a clinical trial. One approach that would be acceptable to most investigators who are not totally comfortable with computers (and even those who are) would be to hire a specialist to run and maintain the data entry and transmittal system. This method would free the investigator from that particular responsibility. From the investigator’s viewpoint it is redundant (in terms of time and resources) to rekey data that has come from the hospital’s laboratory in a computerized format. Furthermore, investigators who are participating in medicine trials with several sponsors and are using remote data entry will need sufficient space to house multiple computer facilities. Cooperation between different sponsors resulting in a single system of computer equipment at one site would improve efficiency and decrease costs. This advance, however, does not appear to be forthcoming. The cost of running a clinical trial increases if the data are to be sent by computer and telephone. These costs, as well as any incurred for hiring and training staff to operate and maintain the system, must be factored into the clinical trial budget. Investigators usu-
TABLE 28.3 Factors that suggest remote data entry
should be considered 1. Trials are considered “fast track” (i.e., it is important for them to be conducted rapidly) 2. Trials are such that rapid decisions are needed 3. Medicine has particularly high commercial potential 4. Trials have a large patient enrollment 5. Multicenter trials are planned 6. Trials have a high volume of data suitable for remote entry (e.g., not an excessive amount of time or effort required by the site or sponsor) 7. Both the investigator and sponsor want to use remote data entry
ally delegate computer work to others, who then handle the technical issues and problems that arise. It should be emphasized that serious adverse reactions are handled the same way, whether or not remote data entry is used; serious and unexpected adverse reactions should be telephoned to the sponsor. Patients are unaffected by remote data entry, except in rare cases.
HOW REMOTE DATA ENTRY FUNCTIONS
Security of a remote data entry is usually addressed by requiring a password to enter the computerized system. Additional passwords are required for the person who has gained access to enter data. Systems are created so that sponsors are unable to change data from their own computers during the trial. In telephone conversations with investigators, both must use the same password if they desire to enter the system simultaneously. Finally, data are stored in non-word-processing format.
Audit Trails
Remote data entry system should include audit trails. An audit trail should include the password identifier of the person making any changes, plus the date that the change is being made. The patient number, visit number, and variable being changed must be included, as well as the old and new values. The reason for the change should be given whenever possible.
Models of Data Flow Using Remote Data Entry
Figure 28. 1 is a schematic representation of three models of data flow using remote data entry. In model A, both a hard copy and electronic copy of the data are available, whereas in model B only the electronic copy of the data exists. The electronic copy in model B serves both as the original data collection form and as the sponsor’s copy. Model C differs from the others in that the individuals who send the data to the sponsor are actually from the sponsor (or a contractor) and are not the investigator or his or her staff. Model C requires more visits from sponsors, more time spent at sites, and therefore more monitoring. The advantage of this model is that data entry is performed by those monitors who are usually the most motivated to have data as pristine and accurate as possible. Their job evaluation is usually more closely related to the accuracy of the data than is the evaluation of the clinical trial coordinator or whoever else enters and transmits the data.
REMOTE D A T A E N T R Y
Investigator Interacts With Patient
A
Data Transmitted by Staff to Sponsor via Computer
DCF Filled in by Investigator or Staff
/
221
Sponsor Edits Data
Monitor Verifies Data
B
Investigator Interacts With Patient
Data Transmitted by Staff to Sponsor via Computer
Sponsor Edits Data
Data Processed & Analzyed
Monitor Verifies Data
Investigator Interacts With Patient
Data Edited by Sponsor's Monitor at Site
DCF Filled in by Investigator or Staff
Data Transmitted by Monitor from Investigator's Site
Data Processed & Analyzed
Monitor Verifies Data
FIG. 28.1 Models of data flow using remote data entry. DCF, data collection form.
ADVANTAGES AND DISADVANTAGES
General advantages of remote data entry for different groups are listed in Tables 28.4 to 28.6 and practical pointers are listed in Table 28.7. Most remote data entry systems include a program that enables the com-
puter to recognize when answers to a question (i.e. , values on a data collection form) are unacceptable. When answers are typed into the system that lie outside accepted ranges, or are not understood by the TABLE 28.5 Advantages for the monitor of using
remote data entry
1. Earlier availability of interim data for decision making 2. Earlier completion of final data base with reduction of keypunching and a savings in data comparisons of double-entered data 3. More rapid completion of clinical trial reports 4. More rapid completion of a regulatory submission 5. Savings in printing, shipping, and storing of hard copies of data collection forms 6. Rapid feedback on protocol problems
1. Higher percentage of valid patients in the clinical trial 2. Rapid feedback on patient enrollment, adverse reactions, and clinical-trial-related issues 3. Ability to review completed data collection forms prior to visiting the site 4. Ability to have more productive site visits 5. Savings in time and expense of correcting multiple data collection forms at different sites 6. Less chance for data collection forms to get lost at the site, in the mail, or at the sponsor 7. Fewer problems of illegible entries, missing data, and missed abnormal results
Modified from a talk by Dr. Lionel D. Edwards of ScheringPlough Corp., with permission.
Modified from a talk by Dr. Lionel Edwards of Schering-Plough Corp., with permission.
TABLE 28.4 Advantages for the sponsor of using
remote data entry
222
/
DEVELOPING AND W R I T I N G C L I N I C A L PROTOCOLS
TABLE 28.6 Advantages for the investigator of using
TABLE 28.8 Selected problems that may arise in
remote data entry
clinical trials with remote data entry
1. Remote data entry identifies and alerts staff about missing, inappropriate, or abnormal data 2. It identifies and alerts staff about desirable, undesirable, or abnormal trends 3. It simplifies the correction of data 4. It may provide an algorithm to assess the association between a medicine and an adverse reaction 5. It provides a hard copy on site for data sent (if desired) 6. It provides electronic storage on disk, plus a backup disk 7. It is a state-of-the-art system for those who enjoy using the latest techniques available 8. It allows nonvalid patients to be eliminated more rapidly from a trial
A. Technical problems 1. Changes to protocol are often difficult once the study is initiated 2. Security may become an issue and is more complex than with other clinical trials 3. Frequent updating of software is often required to improve the system 4. Initial costs are usually significant 5. Technical and compatibility problems may be a major problem, particularly outside the United States and between countries (e.g., telephone installation times, types of telephones available) 6. The same modems cannot legally be used in all countries (e.g., United States and Germany) 7. The quality of local telephone companies and their lines (e.g., transmission noise) vary greatly within some countries and between countries 8. There may be multiple vendors responsible for hardware maintenance in some countries 9. Backup of data at the investigator’s site is essential 10. It may be difficult to standardize various entry and transmission procedures unless identical equipment is provided to each site
Modified from a talk by Dr. Lionel Edwards of Schering-Plough Corp., with permission.
computer, a signal (light or sound) is given. This method of alerting the data entry person to review the previous entry is a valuable means of saving time, both in data monitoring and in making corrections. Technical problems can be substantial in a clinical trial using remote data entry, especially if the problems interfere with the trial’s progress. Transmission line noise is one example. Other problems that may arise are listed in Table 28.8. Despite attempts to prevent technical problems, innumerable “gremlins” seem to inhabit electronic equipment and delight in causing problems at inopportune times. Determining the precise costs of remote data entry is complex, requiring, first, a decision on which costs to include. If only hardware and software costs are considered, then remote data entry would cost more than a paper data collection method. If other costs related to the system are included, and time saved in processing the data is factored in, remote data entry may provide a savings of approximately 15 to 25%, according to informal analyses conducted by various experts in this field. REGULATORY CONSIDERATIONS OF REMOTE DATA ENTRY Regulatory authorities are willing to accept data from trials that use remote data entry techniques, provided TABLE 28.7 Practical pointers in developing and using
remote data entry 1. Develop a template to fit over the keyboard for data entry that identifies relevant functions to use in entering data 2. Use color monitors for ease of reading the screens 3. Use a single color for data areas that the investigator must complete and a different color or shade for other areas 4. Program a light to flash or buzzer to sound when errors are made in data entry 5. Use remote data entry for only some sites in a multicenter pilot trial; compare time and error rates in each group for every step of the process; compare overall costs
B. Personnel problems 1. There may be reluctance of some investigators, monitors, or sponsor’s staff to utilize procedures 2. There may be a lack of adequate training or available staff to serve as programmers, statisticians, and computer experts to establish, support, and maintain the system 3. Training may present problems, particularly if staff turnover is high 4. Backlogs of data may build up at transmitting sites C. Political and administrative problems 1. Governments may restrict the hardware that may be used (e.g., locally manufactured machines may be required) 2. Some governments restrict the information that may be given to other governments and who has access to it 3. Trade or commerce regulations and tariff barriers may be major issues 4. Licenses to use telephone equipment may be restricted 5. Issues of who owns the data (e.g., hospital, investigator, sponsor) may create problems in the submission of data to regulatory agencies
that data collection forms can be generated, audit trails exist, and supporting data are available for an audit conducted by the regulatory authority. The audit trail is extremely important from a regulatory perspective because it allows an individual to learn how the original data appeared and to follow their change over time through processing procedures or by intentional modifications. When a physician or other health professional sees a patient and then types results into a computer for remote data entry, there is no hard copy serving as a source document. The computer screen, with various fields to fill in, serves in the example given as source document, data collection form, and remote data entry. Although data collection forms are computergenerated it is still important that the investigator sign them, as with hard copy forms.
REMOTE D A T A ENTRY
PRACTICAL POINTERS
Careful planning is essential for a successful experience with remote data entry. Define what is desired from the experience and decide whether the goals are realistic, fit the time schedule and budget, and can be implemented. Equipment may be purchased or leased. Leasing is particularly valuable when a large number of computers are required for a relatively short period. Changes to the format of individual computer screens for data collection should be done centrally and then given to all sites. Although small changes may be made by staff at each site, the “gremlins” that often appear in the equipment are more likely to create problems if changes are made at each site. Software and hardware change rapidly, but older equipment may be satisfactory and it will be cheaper to run, will involve less training, often will be easier to repair, and will allow a group to start a trial faster. Newer programs and equipment should be evaluated for potential gains in an important area before they are adopted.
/
223
The How of information between sponsor and investigator, or the movement of data at either site, may have to be modified to use remote data entry. The flow pattern often differs between groups and is specific to each sponsor. Data scrubbing may be done at the investigator’s site or at the sponsor’s. These aspects will lead a sponsor to choose a particular model of data flow (Fig. 28.1). The model chosen should allow the sponsor to achieve the “greatest byte for the least bucks.” Remote Trial Monitoring
The process of remote data entry must be clearly differentiated from remote trial monitoring. In the latter, the monitoring process is primarily or entirely conducted by receipt of frequent, even daily, electronic transmission of data sent to the sponsor. Data are sent more frequently using electronic transmission than for most cases of remote data entry using telephone or courier. The monitor must still visit the site to compare the data sent with the source documents.
CHAPTER 2 9
Preparing the Introduction In writing a protocol’s introduction, the purpose or intent of the clinical trial should be described as specifically as possible. Although the length of the introduction may vary from a single paragraph to many pages, it should be of minimal length to meet this goal sufficiently. Some of the possible specific goals for the introduction are to:
trial medicine(s). Various points to consider for inclusion in an introduction are listed in Table 29. 1 .
TABLE 29.1 Information that may be included in a
protocol’s introduction11
1. Synopsis of the disease to be studied 2. Background of standard medicines or other current therapy used to treat the disease 3. Limitations of currently available medicines and/or other treatments 4. Background of the medicine(s) being studied in the present protocol: Chemistry Pharmacology Toxicology Pharmacokinetics Clinical safety Clinical efficacy 5. Rationale for the present trial 6. Other information
1. Present preclinical and clinical data on a new medicine that is being evaluated with the present protocol. 2. Present the background of a hypothesis that is being tested with the present protocol. 3. Present the background for a new methodology that is being evaluated or is being used to test a new medicine with the present protocol. Adequate information must be presented on the safety, efficacy, and any other relevant aspects of the clinical
a Published (or unpublished) references that support the material presented should be included.
224
CHAPTER 3 0
Standardizing Information Across Protocols Defining the Time of Patient Entry and Completion, 230 Missed Appointments, 231 Patients Lost to Follow-Up, 231 Medical Emergencies, 232
Protocol Format, 225
Title Page, 225 Table of Contents, 225 List of Abbreviations, 225 References and Appendices, 225
Administrative Elements, 232
Protocol Content, 229
Administrative Responsibilities of the Investigator, 232 Confidentiality of Data, 233 Collection and Processing of Data, 233 Publishing of Data, 233 Monitoring the Clinical Trial, 234
Patient Enrollment and Duration of the Clinical Trial, 229 Location of the Trial Site, 229 Factors to Control Within or Outside the Trial Environment, 230 Shipping of Medicines T o and From the Clinical Trial Site, 230 Obtaining, Handling, and Shipping Biological Samples, 230
The term “boilerplate” refers to sections of information that are similar from protocol to protocol. Standardizing the format and content of these sections can save time for individuals who prepare numerous protocols. The term “standardization” does not mean that these parts of the protocol are identical in various protocols but that their format is often similar, and the material in the section can usually be easily modified to fit the specific nature of a new protocol. The various sections of the protocol that may be standardized are listed in Table 30.1. Several of these topics are discussed in other chapters within Section II. A summary of the processes of protocol preparation is given in Fig. 30.1.
If numerous abbreviations are used in the protocol, then it is useful to provide a listing. As a general rule, abbreviations should not be used at all or should be severely limited to a small number of commonly accepted ones.
PROTOCOL FORMAT
References and Appendices
Title Page
The need for references and appendices varies widely from trial to trial. Their usefulness in a protocol is dependent in part on the personal style of the individual writing the protocol as well as on the requirements of the sponsoring institution for sponsored clinical trials. There are few rules that can be used to evaluate the need for appendices, since styles and “accepted prac-
Table of Contents
A table of contents is an optional feature of a protocol, but it is often helpful, especially in longer protocols with distinct sections. A sample listing of possible items to include is shown in Fig. 30.2. List of Abbreviations
There are many variations in the information presented on a title page. Table 30.2 presents a list of information that is usually present and sometimes present. Sponsors of clinical trials have usually established a preset format for the title page.
225
226 Randomization (if Required)
Prelim Discussion
'Sponson Review & H Critique/
—» YES
Protocol .Ok?
-Statisticians -Scientist -Investigators
Protocol Rev/Critique
Draft Clinical Protocol
Yes
Trial Feasible?.
-Statisticians -Scientist -Potential Investigators
z
Clinical Trial
Plan/Outline
No
No
Revision of Protocol
Author
Revision of Protocol
Author
Revision of Clinical Plan
Author
Third Pass Drop/Modify Plan
Third Pass Drop/Modify Plan
Revision of Documents
Yes
Reg. ' Authority.
Any Regulatory Documents Pro Forma
No
Yes
Initiate Clinical Trial
Yes
/"Ethicsx. Comi 11. Approval?.
Clinical Investigator
Clinical Trial Protocol
No
No Author
Third Pass Drop/Modify Plan
Third Pass Drop/Modify Plan
FIG. 30.1 Processes involved in protocol development and approval for an investigational or marketed drug. The total process is arbitrarily divided into six stages. For unsponsored trials, stages C and D would be modified. For clinical trials using marketed medicines without need for regulatory documents, the last stage (F) would be omitted. Each Ethics Committee/IRB in a multicenter trial must approve any changes made to the protocol.
Initiate Clinical Trial
No
/ Trial X Cisapproved?.
30-Day or Other. Wait Period
Complete Regulatory Submission
Yes
ADD THIS SECTION FOR INVESTIGATIONAL MEDICINES
4) or somewhat broader patient populations are evaluated. The lower figure represents most Phase III clinical trials in which extremely broad (A) or very narrow patient populations are usually studied.
6. Analyze all clinical data available from patients who dropped out or were discontinued and compare with results of those patients who completed the trial. 7. The authors of a publication should state to which populations of patients their results do and do not apply. It is interesting to note that a well-known painkiller evaluated in thousands of patients during the 1960s revealed no problems of abuse. The sponsor was certain that the medicine was free of abuse potential. After the medicine was marketed, however, this problem began to surface. After marketing the medicine was used by an entirely different population, i.e. , habitual
702
/
PROBLEMS O F C L I N I C A L D A T A I N T E R P R E T A T I O N
drug abusers, who had been systematically excluded from all of the clinical trials. Formal Evaluation for Generalizability
In some cases it may be important to test formally the extrapolatability of data from a clinical trial. If resources are available, an external committee of experts should have the clinical trial audited. The audit will provide information on the conduct of the trial (e.g., were randomizations conducted as planned?, were patient exclusions appropriate?), design of the trial (e.g., was the power adequate?), and data processing and statistical analyses. The group can then determine how representative the patients are of all patients with the disease. This would involve a comparison with patients not enrolled in the clinical trial as well as with refusers, dropouts, nonqualifiers, and discontinued patients. The checks on extrapolatability that could be conducted are listed in Table 90.3. Homogeneous versus Heterogeneous Trial Populations
Investigators sometimes choose between conducting a clinical trial in a narrowly defined homogeneous population or a broadly inclusive heterogeneous group of patients. The former group may yield data that are hard to extrapolate to a wider group of patients. The other extreme is to conduct a trial in a broad heterogeneous group of patients, which will be more variable but will provide data of more widespread clinical applicability (Fig. 90.3). The solution to this issue relates to the objectives of the clinical trial. In Phases I and II of medicine development, a narrow homogeneous population is almost always preferable, whereas in Phases III and IV more heterogeneous populations have many advantages. When specific objectives are posed, the group
TABLE 90.3 Aspects to evaluate in assessing extrapolatability 1. 2. 3. 4. 5.
Baseline characteristics of prognostic variables Demographics Results and patient profiles across clinical research centers Subgroup analyses of apparent relevance Comparison with patients who refused entry" or did not qualify for entry 6. Hypotheses of suspected problems 7. Sensitivity of results for analysis deviations 8. Literature results a This is sometimes called a volunteer effect, because patients who volunteer to enter a clinical trial are usually different from those who do not.
TABLE 90.4 Ability to predict therapeutic efficacy of various types o f medicines based o n data from normal volunteers Disease or condition
Parameter used
Usefulness
Hypercholesterolemia Hypertension Insomnia Anxiety Beta blockade Anticoagulants
Cholesterol Blood pressure Sedation Sedation Heart rate PT/PTT"
Fair Fair Good Good Excellent Excellent
a
PT, prothrombin time; PTP, partial thromboplastin time.
chosen should be the population that will best address the objectives and provide the “extrapolatability” desired. Table 90.4 lists the ability to extrapolate data from normal volunteers to patients. Another perspective is to view all individuals as differing from each other and conclude that we can only know and evaluate some of the pertinent factors in any clinical trial. Summary data of an adequate-sized group of dissimilar people will be obtained. It is reasonable to expect that another similar or slightly different group will react in a similar manner. In some diseases there is a continuum of patients with differing types of the disease or with differing severity of illness. One treatment may be best for one type of disease and another treatment best for those with a different type. How should one determine the midrange, and what treatment should be given to those patients? It is not possible to extrapolate data from patients with one type of disease to another without at least some evidence that this procedure is likely to be clinically meaningful.
Direction of the Extrapolation
Another issue relating to extrapolation concerns the direction of extrapolation. This means that one may extrapolate results from an entire group or population to one patient or extrapolate results from a single patient to an entire population. Most of the foregoing discussion assumed that the results of a single clinical trial are being extrapolated to the entire population of patients with the particular disease. This is mainly a theoretical consideration, since extrapolation by practicing physicians usually goes in the opposite direction, i.e., toward the single patient they are treating. Clinical relevance for specific patients with the disease is based on interpretations of data obtained in trials using a larger patient population. Extrapolation to an entire population from a single patient usually represents a pitfall to be avoided.
CHAPTER 9 1
Data That Are Difficult to Interpret Reasons Why Data May Be Difficult to Interpret, 703
Situations in which There Are Both Positive and Negative Responses in Efficacy, 706
Seeking Clues, 703 Nonresponders, 703 Dirty or Noisy Data, 704
Incomplete Data Sets, 707
Why D o Incomplete Data Sets Occur? 707 Publication of Fragments, 707 A Complete Clinical Trial That Yields Fragments: The Jigsaw Puzzle Problem, 707
Techniques to Use in Difficult Situations, 704
Prevention and Approaches, 704 Developing New Hypotheses or Interpretations, 704 Creating a Differential Diagnosis, 705 Questions to Pose, 705 Statistical Considerations, 705 Use of Lateral Thinking, 706 Use of the Delphi Technique, 706
Anecdotal Observations, 708
Observations and Comments from Patients, 708 Observations and Comments from Investigators, 708 Interpreting Anecdotal Observations, 708
staff who conducted the trial. Almost any aspect of the trial may be responsible, and it is not possible to present a simple checklist to determine how this problem may have occurred. A list of some methods that can be used to identify why a set of data is difficult to interpret is presented in Table 91.2.
REASONS WHY DATA MAY BE DIFFICULT TO INTERPRET
Despite the most assiduous attention to detail in designing a clinical trial and adhering to high standards in its conduct, situations will still arise in which the data obtained are difficult to interpret. When this situation occurs, the initial step is to ascertain the primary reason for the difficulty. A few common reasons are listed in Table 91.1.
Nonresponders
Data may be difficult to interpret if there were a relatively large number of nonresponders in the clinical trial. The data may be broken down and analyzed by degree of patient response. If two groups of patients receiving treatment may be identified as responders and nonresponders, various clues may be considered and evaluated as possible reasons why some patients were nonresponders. This technique is useful for hypothesis generation but is not suitable for drawing a definitive interpretation. Specific factors that may be evaluated include (1) duration of illness, (2) severity of illness, (3) response to previous medication, (4) reason for current exacerbation, (5) compliance with trial, (6) differences in blood levels, and (7) differences in dosages given or taken. Chapter 98 discusses the concept of responders and nonresponders in more detail.
Seeking Clues
In analyzing data one must often seek clues that might help to explain the results. Clues such as changes in important staff personnel, clustering of data, and changes in the protocol may all be important in this process. Unfortunately, a single factor is often not able to unravel the complexities of difficult data on its own. With many complex clinical situations and trials, it is often not possible to tease out a single thread. If an entire clinical trial yields data that are difficult or “impossible” to interpret, the reason(s) may be primarily factors relating to the protocol or trial design, patients entered in the trial, or the investigator and
703
704
/
PROBLEMS O F C L I N I C A L D A TA INTERPRETATION
TABLE 91.1 Selected reasons why a specific data set may be difficult to interpret A. Reasons related to the protocol 1. Too few patients entered 2. Too many patients entered (overpowered tests will detect small, clinically unimportant differences) 3. Trial design flawed (e.g., blinding, randomization, efficacy measures, or controls used were inappropriate) 4. Inadequate methodology exists to measure the endpoints accurately and reproducibly 5. Inappropriate statistical tests used to analyze the data B. Reasons related to the investigator not following the protocol 1. Errors in patient assignment to treatment 2. Trial blind not maintained 3. Unacceptable biases entered the trial 4. Irregular compliance with protocol C. Reasons related to the patient 1. Patients were not compliant and did not take medicine as instructed or missed numerous appointments so that insufficient or inaccurate data were collected 2. Patient characteristics that related to the response differed in different treatment groups 3. Patients may not have been accurately diagnosed or may have represented different subgroups of the population 4. A large number of nonresponsive patients were entered in the trial D. Reasons related to the disease or problem being evaluated 1. The disease process is particularly complex or not well understood 2. The disease process was too advancd in some patients or too mild in others E. Reasons related to the outcomes of the trial 1. The placebo response may have been much larger than anticipated 2. The active control response may have been much smaller than anticipated F. Other reasons 1. Only one or a few case studies are available for interpretation 2. Only retrospective data are available 3. Large gaps exist in the data collected (i.e., the data are incomplete) 4. Too many anecdotal observations, hearsay, or conjecture were included in the data collected
Dirty or Noisy Data
One of the most common reasons why some data are difficult to interpret concerns the concept of “clean” and “dirty” data. Dirty or noisy data have that term applied because the data (1) are incomplete and fragmentary (see discussion of incomplete data sets in this chapter), (2) were obtained under suboptimal conditions and have a great deal of variability, inconsistency, or lack of ascertainment (i.e., the validity is questionable), or (3) were obtained under close to optimal conditions but still have a great deal of variability, inconsistency, or lack of ascertainment. TECHNIQUES TO USE IN DIFFICULT SITUATIONS Prevention and Approaches
A number of approaches can help to prevent situations in which data are difficult to interpret or to deal with
such situations after they have occurred. Most of the situations that lead to collecting data that are difficult to interpret may be avoided through careful planning of the design and conduct of the clinical trial and careful data editing. The reason(s) for difficulty in data interpretation may also relate to problems in data analysis or to another factor that can be corrected or reconsidered without requiring duplication of the entire trial. A few approaches to consider are listed in Table 91.3. Developing New Hypotheses or Interpretations
Dividing the data according to new factors (e.g., age, sex, weight of patient) and reanalyzing the data of the new groups obtained may lead to new hypotheses or interpretations. If an attempt to demonstrate the positive association between two or more events, medicines, or effects is unsuccessful, it might be useful to evaluate whether qualitative differences among the test parameters may be demonstrated. It is also sometimes possible to focus on the inverse of what is being examined as a means of achieving an adequate interpretation of previously “uninterpretable” data. See Chapter 76 on Developing Hypotheses. Creating a Differential Diagnosis
There is a clinical analogy in the approach to be followed in solving problems of interpretation. The perTABLE 91.2 Techniques or approaches to identify reasons why certain data are difficult to interpret 1. Evaluate the difficulty in an overall manner and systematically attempt to identify techniques that may resolve the difficulty 2. Consult with statisticians, consultants, colleagues, and peers in your own clinical field and occasionally in a different one 3. Use techniques of “lateral thinking” a 4. Use a "devil’s advocate" approach 5. Repeat the statistical analyses 6. Evaluate each of the individual factors that might have affected and complicated the data, including steps conducted in data collection, editing, processing, and analysis 7. Consider various pitfalls that may have occurred and could have created problems 8. Consider various biases that may have influenced the data 9. Reanalyze the data with new methodologies or by subgroups and attempt to create hypotheses based on the new results 10. Evaluate the trial design for possible flaws that led to the problems in developing a suitable interpretation 11. Use flow diagrams, algorithms, and other related techniques to analyze one's thought processes and to identify new questions or relationships 12. Evaluate the conduct of the trial for possible problems that led to the difficulties in interpretation 13. Determine whether interpretation at a different level (e.g., molecular, cellular, physiological, pharmacological) is more appropriate 14. Determine whether subclassifications of the disease, receptors, or activities are more useful to consider than broader classifications a
See text and references by de Bono (1967a, b; 1969).
DATA THAT ARE DIFFICULT TO INTERPRET
TABLE 91.3 Techniques or approaches to interpret difficult data3 1. Use appropriate caveats in the interpretations reached 2. Present an interpretation of “grays” even though a “blackand-white" (e.g., all-or-none) answer was sought 3. Modify the problem to make it even more difficult to solve. For example, the use of an extreme case or situation may suggest an approach or interpretation not previously considered 4. Discuss only those parts of the data for which a reasonable interpretation may be achieved. Comment on data where a reliable interpretation is not possible 5. Discuss the interpretation in terms of the degree of association between two events (i.e., is a cause and effect highly unlikely, possible, probable, almost definite, unknown, or virtually impossible) 6. State what the data do not mean. This may be a useful starting point in developing the interpretation, especially for subjective data 7. Suggest what additional information would be required to reach a more definitive interpretation and then attempt to obtain that information 8. Utilize an “ i f - t h e n " exercise whereby a point of view, hypothesis, or statement follows “ i f . ” The consequences of the statement are then developed starting with the term “then." These consequences are then looked for, to determine their existence, nature, and/or validity a
See Table 91.2 for additional methods.
son who is interpreting the data should establish a differential diagnosis of the possible interpretations or problems preventing the interpretation and then systematically evaluate each until the correct one(s) is identified. The old clinical cliche applies that “when you hear hoofbeats think of horses, not of zebras” (i.e., common reasons rather than esoteric ones are usually the true cause of the problem).
/
705
cine, the patients, or the data) is presented in Table 91.4. Finally, after consultation with peers and consultants, one may reluctantly have to conclude that no suitable explanation or interpretation of the data is possible. Statistical Considerations
The individual who is interpreting data should have a collaborative and interactive relationship with a statistician. They should review the data analyses to ascertain that the appropriate methods were used to address the clinical questions. It may be decided that it is relevant to perform additional statistical analyses. This situation may occur if statistical tests that were not used to analyze the data are equally valid as those tests that were used. It is also possible that the original data may be analyzed with different statistical tests to evaluate the data from a different perspective. For example, trials are often conducted in which a medicine is given to patients and data are obtained each hour for a given number of hours. The data obtained may be analyzed in several ways. A few of the more straightforward methods are shown below. For between-treatment evaluations: 1 . Data obtained at each individual hour of the trial may be compared for two or more groups. 2. Algebraic differences between a baseline and each individual hourly result may be compared for two or more groups. 3. Data obtained at the average of all trial hours after
Questions to Pose
When data are difficult to interpret, a concerted attempt should be made to identify the source of the problem. One approach to identifying the specific factors involved is first to identify (if possible) whether the problem is related to the (1) patient, (2) tests used, (3) measurements obtained, (4) investigator, (5) professional and ancillary staff, (6) environment, (7) test medicine itself, or (8) concomitant treatment. Other possible categories may also be considered. After the broad category is identified, a search for more specific factors may be considered and conducted. Another approach is to identify any unusual aspects of the data. One should attempt to determine why these aspects occurred and focus on those elements that may provide a basis for interpreting the data. This type of analysis may demonstrate the need to conduct a new clinical trial that is better designed and/or better conducted. The number of specific questions to pose is almost without limit. A small sample of the types of questions to ask that relate to any aspect of the trial (the medi-
TABLE 91.4 Specific questions to pose when an interpretation of difficult data is sought 1. Do data include different lengths of treatment for different patients? 2. Are patients remaining in the clinical trial for different lengths of time in different groups? 3. Are blood levels different in different groups of patients? 4. Is the chemical stability of the medicine known with assurance? 5. Are impurities or different isomers present in some batches of medicine? 6. Did each patient receive the correct medicine? Has this been confirmed with analysis of a sample of their medication? 7. Are all medicines getting to the site of action? 8. Does the medicine act differently in different species? 9. May animal studies be conducted to address questions about the problem (e.g., does medicine cross the bloodbrain barrier)? 10. How did the patient's behavior change as a result of being in the trial? 11. Are many patients nonresponders? 12. Can a more detailed knowledge of all patients medical history provide clues? 13. Can a detailed analysis of one patient’s history provide hints for the entire trial? 14. If the medicine does not cause an effect of its own, perhaps it modulates the same effect caused by other medicines?
706 /
PROBLEMS OF CLINICAL D A T A INTERPRETATION
treatment may be compared for two or more groups. 4. Algebraic differences between the baseline and the average of all trial hours after treatment may be compared for two or more groups. 5. Analysis of variance or other more sophisticated tests may be used to evaluate the data. For within-treatment evaluations, each of the five processes described may be conducted with data of one treatment group. Variations include (1) only choosing a single or a few representative time points to evaluate, such as the initial or final response, (2) averaging all baseline values or only averaging the last n values, (3) averaging baseline with posttreatment follow-up values as a control, or (4) defining baseline in a different manner. Use of Lateral Thinking
Use of lateral thinking in interpreting data should be considered, although this approach is often difficult to implement effectively. The four major principles of lateral thinking (de Bono, 1967b) are: 1. Recognize dominant polarizing ideas 2. Search for different ways of looking at situations or problems 3. Relax the rigid control of vertical (i.e., logical) thinking 4. Use chance in arriving at a solution These concepts cannot be adequately summarized in a few sentences, and interested readers are referred to de Bono’s books, in which he describes four types of thinking: natural thinking, logical thinking, mathematical thinking, and lateral thinking. Edward de Bono has described lateral thinking as a low-probability sideways thinking. H e presents its use in The Five-Day Course in Thinking, The Use of Lateral Thinking, and The Mechanism of Mind (see references). Lateral thinking may enable a person to search for novel solutions to a problem or to rephrase the question being asked in a way that would allow a solution to be achieved. The second principle may be approached by intellectually viewing the data in a different perspective that allows an answer to be achieved. One example is when a specific relationship is consciously reversed. For example, a glass of water may be viewed as being either half full or half empty. Another example is to view the walls of a building as suspended from the roof rather than viewing the walls as supporting for the roof. Finally, an object could be viewed as moving in a curve through space, or space itself may be viewed as curved. One must be cautious when using this ap-
proach, because a measure such as pain intensity may not simply be the inverse of pain relief.
Use of the Delphi Technique
This technique is applicable to many disciplines but should be carefully evaluated before it is used. It may be most useful if it is desired to have a group, rather than a single individual, address the problem of interpretation. Interested readers are referred to a concise review article by Duffield (1988) for details.
Situations in which There Are Both Positive and Negative Responses in Efficacy
A potentially difficult situation in interpretation of data may arise if some parameters measured are positive and others either do not change or are negative. In this situation it is almost impossible to rank order the importance of parameters after a clinical trial is complete without bias, to determine whether the overall interpretation is more positive than negative or vice versa. There are, however, a few choices to consider. One is to present all of the data and results without attempting to compare the importance of parameters that are positive with those that are negative. This is an “objective” approach, but clearly it is unsatisfactory to most people.
Creating a Hierarchy of Parameters The most satisfactory approach is to anticipate this potential dilemma prior to the clinical trial and initially to decide which specific tests and parameters are most important. Additional parameters or tests may be identified that will be used for objectives of secondary importance or as supportive data. A variation of this approach is to establish a definition of medicine activity prior to the trial. Then, if the primary parameter is negative but other parameters are highly positive and the medicine achieves the definition of activity, a convincing overall interpretation may be possible. A situation may occur in which two parameters were rated (prior to the clinical trial) of primary importance and two others of secondary importance. The data may indicate that only one parameter of each group was positive and the other from each group negative. If both parameters that were negative in this example were neutral (i.e., unchanged), then the trend of the trial would probably be viewed as positive. But every situation encountered will present a unique set of data and analyses and therefore will have to be judged in the context of many factors. This approach often raises
D A T A T H A T A R E D I F F I C U L T T O INTERPRET
the problem that various efficacy variables can often not be ordered or directly compared. Apples versus Oranges
Parameters evaluated often differ from each other, and it may be difficult to compare them adequately. For example, some efficacy parameters that may be difficult to compare accurately and rank in an arthritis trial are (1) patient’s clinical global impression, (2) swollen joints, (3) physician’s assessment of a patient’s functional ability, (4) quality of life, (5) various biochemical tests, and (6) number of painful joints. The most satisfactory solution in many cases is to create an overall index that includes consideration of several parameters. INCOMPLETE DATA SETS Why Do Incomplete Data Sets Occur?
Data are usually difficult to interpret when only fragments of interpretable data are available. This situation may arise (1) when a retrospective trial is conducted, (2) when an excessive number of patient dropouts or missed appointments have occurred, or (3) when an interim analysis is being conducted. An interim analysis of data that is difficult to interpret should not present a serious problem to the investigator, who will either be blinded to the outcome of the analysis or will continue the clinical trial regardless of the results. If the interim analysis suggests that real or potential issues have arisen in the trial that must be considered, then these should be addressed with the assistance of a statistician. The most common reason for occurrence of fragmented data is when a trial has been conducted that is markedly flawed for some reason (e.g., too few patients, too many dropouts, too great a quantity of missing or flawed data). Data may be incomplete because of an excessive number of dropouts, noncompliers, poorly randomized patients, incorrectly diagnosed patients, or many other problems that sometimes arise in clinical trials. There are statistical conventions and techniques for dealing with such trials. Another source of data fragments is when an event is observed in a single or small number of patients or in a single clinical trial (e.g., a published report that medicine Z caused toxicity X in Y number of patients), but the background information and sequence of events that led to the adverse reaction (or other problem) are unknown or unreported. The data will be incomplete and may not indicate how long the event has occurred, whether it had a gradual, sudden, or stepwise onset, whether it was worse in the past and is
/
707
now being measured or observed during a phase of gradual or rapid improvement, or if a cycling phenomenon of improvement and subsequent deterioration is present. A great deal of caution is needed in interpreting fragments of data without knowledge of the natural history and resolution of the event in question. Publication of Fragments
Publication of fragments of a large clinical trial (e.g., one site’s data from a multicenter trial) is counterproductive when it is not possible to develop an adequate and accurate interpretation. Thus, any interpretation of such data presented will introduce bias in the literature and lead to a situation that may create significant problems for others to explain or resolve. This type of clinical data is usually best kept out of the medical literature through high standards of publication adhered to by journal editors. A statistician may provide advice to investigators prior to a clinical trial that will help prevent this type of data from being generated. For example, if the power of a trial is extremely low, then the possibility of the data representing a false-negative or -positive event becomes high. Many individual and multicenter clinical trials with insufficient power to address the clinical objectives are published in the literature. These trials contain a generally unreliable set of data and may inadvertently provide false information on a topic and also may mislead medical thought for many years. The opposite perspective also has merit and should be noted. Case reports or incomplete data may present clinical information that has clinical significance even though the clinical trial is incomplete or only a few patients were involved. Publication of such data may also alert other clinicians and scientists of adverse reactions that should be sought, monitored, and evaluated. A Complete Clinical Trial That Yields Fragments: The Jigsaw Puzzle Problem
Even clinical trials that are perfectly conducted may be difficult to interpret for reasons other than those listed in Table 91.1. The most common reason would be that the data cannot all be gathered together in a neat grouping to evaluate the treatment effects, and the data therefore consist of many fragments, possibly analogous to pieces of a jigsaw puzzle. Dealing with only a few pieces of a jigsaw puzzle is difficult because they can often be arranged in several different ways. The interpreter often has to fill in the missing pieces. If most or all of the important pieces of a puzzle are available, then the missing ones may be recreated with relative assurance of little bias. On the other hand, if
708 /
PROBLEMS O F C L I N I C A L D A T A I N TERPRETATION
TABLE 91.5 Pertinent questions to ask about data
fragments 11
Will the data fragments be more clear if: 1. More patients are entered in the clinical trial 2. The trial is continued for a longer duration 3. A higher dose of trial medicine is added 4. An additional control group is added 5. The trial is conducted double blind instead of single blind or open label b 6. The protocol is adhered to more closely by the investigator or patients13 7. New measurements or tests are incorporated into the trial b 8. The trial design or trial conduct is modified in a different manner 13 a
Data fragments are incomplete data sets from a clinical trial. Note that these changes in the trial must generally be adjusted for in the analyses.
Observations and Comments from Patients
It may be extremely difficult for the investigator to interpret anecdotal observations and comments from patients in a clinical trial. If the patient’s comments fit a well-established pattern, then additional questioning may develop a better understanding of the report. But if the report is unusual, then it is often best to describe it succinctly under “Investigator’s Comments” in the data collection forms for consideration at a later time. Observations and Comments from Investigators
b
only a few of the major pieces are present, then creating the entire picture will require an excessive amount of conjecture and educated guesses, and the result will probably be more open to challenge. When the data appear to be highly fragmented, there are several questions to ask. Some of these are presented in Table 91.5. A major question relates to the importance of data obtained in a flawed trial. The nature and degree of all significant flaws in a trial are essential to determine. At that point it is possible to assess whether the trial is mildly compromised or whether its interpretation will have virtually no validity. Discussions with statisticians or consultants may offer possible paths out of this dilemma.
Anecdotal observations from investigators who have conducted a clinical trial may be indicated in the appropriate place in the results and possibly discussion sections of a published report. If a brief or full communication is published describing experiences with only one or a few patients, then it is extremely important to consider summarizing or describing the anecdotal comments or observations and to place them in proper perspective. Anecdotal observations often relate to adverse reactions. When novel adverse reactions to a medicine begin to appear in the literature, additional case studies may provide extremely valuable data and a better perspective on the clinical significance of the potential problem. This is especially so when an understanding of the incidence of the adverse reaction is discussed along with details of clinical severity, degree of association with the medicine, and other factors. Interpreting Anecdotal Observations
ANECDOTAL OBSERVATIONS
Most clinical trials elicit a number of anecdotal comments from both patients and investigators, and these are often placed on the data collection forms or in various reports. These comments or observations are often tidbits of information that may (1) assist in the interpretation of data, (2) clarify results obtained, or (3) provide information that may have usefulness at a later time. All anecdotal comments should be collected and reviewed at some point during or after a trial for possible insights or suggestions.
In preparing reports on limited clinical trials, anecdotal observations and comments may be informative in gaining clues that will help understand the nature of a medicine’s clinical profile. On the other hand, all anecdotal (i.e., testimonial) information and data must be viewed with at least some skepticism. Recall the often quoted comment that if testimony could establish a fact, then one of the best established facts in the Middle Ages was the existence of the devil, since he was frequently seen and was carefully described by many individuals.
CHAPTER 9 2
Reconciling Different Interpretations from Different Trials Lazarus Phenomenon, 71 1 Examples of Different Interpretations, 711
Types of Differences, 709
Among Animal Experiments, 709 Among Clinical Trials, 709
Perceptions of New Therapies, 711 Approaches to a Solution, 711
Factors to Evaluate, 709
General Approaches, 71 1 Specific Approaches, 712
Who Is Doing the Reconciliation, 709 Perspective and Aims of the Clinical Trial, 710 The Myriad of Details, 710
(1) are designed to “improve the trial” or (2) serve to obtain data of particular interest. The variations commonly introduced in developing clinical trial designs and protocols are only one aspect of the many differences between animal studies and clinical trials. The use of awake, active humans in most trials rather than inbred (often anesthetized) animals or in vitro studies creates many other significant differences. There are even more differences between chronic studies conducted in humans and animals, since many additional factors may influence the data. Chronic studies represent a minority of animal studies conducted but a relatively large percentage of human clinical trials. These considerations indicate that it is rare for multiple clinical trials, but not for animal studies, to be conducted under similar experimental conditions.
TYPES OF DIFFERENCES Among Animal Experiments
Many investigators who conduct scientific studies in animals often attempt to replicate previous experiments conducted by themselves and others. Despite careful attention to even the most minute detail and experimental condition, many animal experiments are not easily replicated. Even when one attempts to follow identical conditions, results often turn out differently on different occasions. Evaluating the reason(s) for the differences in the results obtained is usually extremely difficult. This often makes it impossible to choose the “correct” interpretation of the experiment. The experiment is sometimes repeated a third time in the belief that if two of the three studies reach a similar conclusion, the probability is that the majority interpretation is correct. A fourth or even larger number of studies may be conducted to search for and confirm the correct interpretation. The rationality of this approach may be questioned.
FACTORS TO EVALUATE Who Is Doing the Reconciliation
The first factor to examine in reviewing different interpretations is clarification of the relationship between the investigator(s) and the person who is evaluating the interpretations. Although differences in clinical interpretation may arise when a single investigator has performed two or more related clinical trials, differences in interpretation are more common when trials have been performed by two or more in-
Among Clinical Trials
Clinical trials are also often repeated, but they rarely utilize the same experimental conditions or the identical protocol. Each investigator who wishes to repeat a trial usually adds variations to the protocol that either
709
710
/
PROBLEMS OF CLINICAL D A T A INTERPRETATION
vestigators or groups of investigators. The person attempting to reconcile the interpretation may be: 1. An investigator whose interpretation of his or her present clinical trial differs from a previous trial 2. An investigator whose interpretation of his or her trial differs from the interpretation of one (or more) trials conducted and reported by others 3. A “third party” who is attempting to reconcile two (or more) published trials that differ in interpretation Perspective and Aims of the Clinical Trial
The second factor to evaluate is to determine the point of view or perspective expressed within each clinical trial. If one trial was reported by regulatory agency personnel, it might be slanted toward a view reflecting their mission to protect the nation’s health. A trial from certain academicians might reflect a highly theoretical perspective, and at least some industrial physicians use a practical perspective for developing an interpretation. If one trial report presented data without attempting to propose or defend a hypothesis, whereas another trial proposed a hypothesis, and a third trial defended one, these varying orientations might well
Perception of Medical Value
Low
The Myriad of Details
Other differences in interpretations relate to the myriad of factors that make up the design, conduct, and analysis of the clinical trial. Whenever a trial is conducted, it means that many factors and conditions will differ from those in a trial conducted at an earlier time, even when an attempt is made to keep conditions constant. Differences are usually difficult to delineate fully and will often affect data collected and interpretations reached. Obtaining information on many of the relevant factors to evaluate (see Chapter 83 for identification of many of these factors) will prove exceedingly difficult if the individual conducting the evaluation was not the investigator of all clinical trials being compared. If one must rely on published reports for some or all of the information needed to compare interpretations, one is at a marked disadvantage. Although one may write or telephone the original investigators to obtain relevant information, the response to this approach is generally limited.
• Period of Extravagant Claims and Substantial Optimism
Extremely High
High
have had a marked influence on the interpretations advanced.
•
View of Proponents
•
View of Critics
Preferred Therapies of Choice
Balanced Perspective for Many Therapies
• Period of Disillusionment
Therapies Rejected or Out of Favor
Extremely Low New Idea Proposed and/or Initial Efficacy Reported
Efficacy Clearly Demonstrated and Publicized
Adverse Reactions and Drawbacks Reported
Substantial Data and Experience Available
TIME FIG. 92.1 Perception of a new medical treatment’s value and the change of that perception over time. Although there are always various perceptions of the medical value of a treatment at any time, only the major consensus points are shown.
RECONCILING INTERPRETATIONS FROM DIFFERENT TRIALS
/
711
Lazarus Phenomenon
TABLE 92.1 Variations that accounted for differences in clinical trial results in a survey conducted by Horwitz3
One reason why different clinical trials yield different responses is that nonresponders are entered in some trials (sometimes in large numbers). Expecting patients who have been unresponsive to all other medicines to respond to the latest one is expecting the Lazarus phenomenon. Fortunately for some patients, this does occur from time to time. However, even when one or two examples of the Lazarus phenomenon occur, the clinical trial itself may still be negative if many other “burned-out” patients were entered or if the medicine itself is relatively inactive in most patients.
1. Eligibility criteria and selection of study groups (e.g., some clinical trials in hypertension included high-risk patients with evidence of end-organ damage) 2. Baseline state differences in patient population evaluated (e.g., in evaluating recurrent gastrointestinal bleeding in patients with cirrhosis, one group had a much larger population in which alcoholism was responsible) 3. Protocols for use of the principal therapy (e.g., large variations in dose and treatment duration may explain different results of the use of steroids to treat alcoholic liver disease) 4. Protocol requirements (e.g., corticosteroids in patients with septic shock work best when given early, but some clinical trials required time-consuming tests prior to enrollment that delayed administration of the medicine and affected results) 5. Use of concomitant therapy (e.g., diuretic treatment as concomitant therapy in one of two nitroprusside clinical trials is believed to have accounted for differences in outcome) 6. Managing patients in more artificial versus realistic protocols (e.g., hypertension is often studied by withholding treatment from the control group, which tends to enhance the difference from the treated group; when control patients are treated if their diastolic blood pressure exceeds 105 to 110 mmHg, the treatment strategy more closely resembles actual practice) 7. Regulatory requirements in data analysis and interpretation (e,g., intention-to-treat analyses that are de rigueur at regulatory authorities are often unacceptable to clinicians) 8. Poorly blinded double-blind trials (e.g., decreases in blood pressure or cholesterol i n one treatment group effectively unblind many clinical trials, even if adverse reactions or laboratory data do not) 9. Use of different efficacy endpoints (e.g., use of overall mortality or prevention of ventricular fibrillation to assess efficacy in lidocaine prophylaxis trials in patients with acute myocardial infarction)
Examples of Different Interpretations
Horwitz (1987) collected 36 topics in gastroenterology and cardiology for which conflicting results were obtained in over 200 randomized clinical trials. The interested reader is referred to his paper for details. H e found nine major methodological sources of variation to account for differences that were related to either clinical trial design or interpretation. His results are summarized in Table 92.1. Horwitz purposely avoided clinical trials in which differences were a result of small sample size or obvious bias, the two most common reasons for incorrect outcomes reported in the literature of clinical trials.
a From Horwitz, 1987. The wording of his nine categories has been modified.
PERCEPTIONS OF NEW MEDICAL THERAPIES
There are often wide swings in the medical literature in how a new medicine, modality, diagnostic test, or technique is viewed. An illustration of this phenomenon is shown in Fig. 92.1. Initial reactions tend to be skeptical, but almost unbridled enthusiasm often occurs at a later time, often followed by a period of disbelief and almost sarcasm. Finally a more balanced evaluation is achieved. Some of these swings are a result of the changing views of the organization’s leaders, plus the project’s proponents. Media involvement (if present) also plays a major role. Another factor is the time necessary before sufficient data and experience are collected to allow a balanced perspective to be obtained. The faddish nature of therapeutics is not a recent phenomenon. Osler has been reported to have said “Make haste to use a new remedy before it is too late” (quoted by Alderson, 1974). Miller and Melmon (1972, P- 409) also describe this phenomenon and briefly discuss the “wild swings of popularity and debasement” experienced by dimethylsulfoxide (DMSO). Lum and Beeler (1983) observe that this phenomenon is also seen with new diagnostic tests, such as carcinoembryonic antigen, a-fetoprotein, and acid phosphatase
(by radioimmunoassay). Laurence (1973) and Steiner and Dince (1981) also discuss this topic. This kind of evaluation occurs when investigational medicines are proceeding through the phases of development, as well as after they are marketed. In the former case, opinion within the company often changes radically, whereas in the latter case opinion within the larger medical community changes. APPROACHES TO A SOLUTION General Approaches
This section describes both broad general approaches and a few specific steps that may be used in reconciling interpretations of two different studies. A few of the broad approaches are: 1. Evaluate and compare the quality of the clinical trial designs. 2. Evaluate and compare the quality of the conduct of the trials.
712
/
PROBLEMS OF CLINICAL D A T A INTERPRETATION
4. Combine (pool) data from all (or most) comparable trials to obtain an overall interpretation of results from trials other than those being compared. 5. Evaluate each possible interpretation of the data and choose the most reasonable single interpretation. 6. Look for both general and specific differences between patients. 7. Look for both general and specific differences between trials.
2.
3. 4.
5. In regard to the last two approaches, it is necessary to go through the protocol, data, analyses, and other details of each trial step by step, looking for differences in trial design, methodology, results, patient management, and other areas. After important differences are identified, one must attempt to determine which trial or trials yielded the most reliable data.
Specific Approaches
In establishing a specific approach to use with problem situations, a number of the following questions may be relevant to address: 1 . Do the trials meet current standards of design and conduct? Those that do not probably should be
deleted from serious consideration, or they should be considered in a separate group. Can the reasons for the differences be identified? This usually involves consideration of many specific factors. Are the differences in interpretation irreconcilable, or do they represent minor variations? If a resolution is impossible at this time, is it possible to design a new trial that will resolve the differences between or among the trials? If a solution is not apparent, see the approaches listed in Chapter 91 on Data That Are Difficult to Interpret.
Group meetings may be held to attempt a resolution. These may be large or small, informal or formal, private or public. Whatever type of meeting is adopted, it is useful if discussions are held ahead of time to determine people’s views and the types of arguments, beliefs, and reasons they will advance to support their view or to challenge others. This is analogous to discovery depositions used by lawyers in advance of a trial. One difference from legal depositions, however, is that the premeetings are excellent opportunities to lobby for one’s cause. Finally, if desired, the Delphi technique may be used. This approach is nicely summarized by Duffield (1989) and may be used to resolve many types of issues arising in medicine and science.
CHAPTER 9 3
Interpreting Placebo Data How Is the True Medicine Effect Obtained? 716 Efficacy: Are Placebo and Efficacy Responses Simply Additive? 716 Safety, 717
Types and Uses of Placebo, 713
Placebo Medication or Treatment, 713 Uses of Placebo, 713 Placebo Effect, 713 Placebo Control Group and No-Treatment Control Group, 715
Placebo Responses That Are Less or Greater Than Expected, 717
Ceiling Effect, 717 Placebo Responses That Are Greater Than Expected, 718 Do Placebo Effects Wear Off? 718 Interpretation When a Medicine Is Not Better Than Placebo, 718 Possible Interpretations When Placebo Is Found T o Be More Effective Than Active Medicine, 718
Factors Affecting the Placebo Response, 715
Personal Contact and Interaction with Physicians as a Basis of the Placebo Response, 715 Effect of Color, Dosage Form, and Size of the Capsule on Patient Perception and Response to Placebo, 715 Color of the Placebo, 716 Evaluating the Magnitude of the “True Medicine” Response, 716
(3) to evaluate spontaneous variations in the disease, (4) to control the sensitivity of the measurements, and (5) for deception. This last practice cannot be condoned except in exceptional circumstances and when the patient’s rights have been carefully considered by an Ethics Committee/Institutional Review Board (IRB).
TYPES AND USES OF PLACEBO Placebo Medication or Treatment
The term placebo has been applied to many types of substances used in a variety of different ways. The most common definition refers to a material that does not contain any active medicine and is pharmacologically inert. Various materials may be used to make a placebo in the form of a tablet, capsule, or other dose form. This type of placebo has sometimes been called a “pure placebo.” An “impure placebo” refers to any substance used as placebo that is not believed to be totally inert. For example, a homeopathic dose of an active medicine is an impure placebo. A list of various types of placebos is given in Table 93.1 and includes nonmedicine placebos and sham techniques.
Placebo Effect
Deception by Physicians The effect observed after a placebo is given or used is referred to as the placebo effect. Placebos are sometimes given by physicians to their patients to simulate a therapeutic medicine and to evaluate the placebo effect. The physician’s intention is to deceive the patient into believing that an active medicine is being given. Although the physician’s purpose is often to differentiate between a “real” problem and an “imagined” or “functional” problem, this distinction is often not possible to make and may represent an artificial characterization (Goodwin et al., 1979).
Uses of Placebo
The major uses of a placebo medication in a clinical trial are (1) to serve as a control for psychological factors, (2) to maintain the double-blind design of the trial,
713
714
/
PROBLEMS O F CLINICAL D A T A INTERPRETATION
TABLE 93.1 Types of placebo treatments A. Medicine-like substance 1. Active medicine given to a patient for its placebo effect in treating a different disease a. Knowingly (e.g., many cases of antibiotics given to patients for viral illnesses) b. Inadvertently (e.g., patient has been incorrectly diagnosed) 2. Homeopathic dose of an active medicine given to a patient for a correctly diagnosed disease 3. Blank medicine given as a control in a clinical trial 3 4. Blank medicine given to ''treat" a patient in a medical practice outside of a clinical trial® B. Nonmedicine placebo 1. Care given patient by physician and the interaction between physician and patient 2. Interactions between patient and nonphysician health personnel 3. Additional care and attention given by physicians to patients in a clinical trial above that provided in the "usual" practice of medicine 6 (e.g., more frequent clinic visits, longer clinic visits, more careful patient evaluation, training, and personal discussions) 6 C. Other techniques 1. Sham surgical operations 2. Sham procedures 6 a Lactose-filled placebos may cause adverse reactions in some patients with lactase deficiency (Havard and Pearson, 1977). A blank medicine is one without an active ingredient to treat the condition for which the medicine is used. Ideally, and in most cases, it also does not have any ingredient that would be active in other conditions. 6 This is related to the “Hawthorne effect," where increased efficiency was observed in factory workers as a result of the increased attention provided during a study. °Wilhelmsen (1979) and Reiser and Warner (1985). d See Chapter 85 for a listing of several sham procedures.
TABLE 93.2 Factors that influence the placebo effect
of the physician-patient interaction 1. Attitude of the physician (or other health personnel) toward the patient. This refers to the degree and nature of a physician’s interest, warmth, friendliness, liking, sympathy, empathy, neutrality, disinterest, rejection, and hostility 3 2. Attitude of the physician (or other health personnel) toward the treatment. This refers to factors such as faith, belief, enthusiasm, conviction, commitment, optimism, positive and negative expectations, skepticism, disbelief, and pessimism® 3. Attitude of the physician (or other health personnel) toward the results. This refers to the introduction of observer bias that often affects the data, usually caused inadvertently by the physician (or other health personnel). 6 Factors include (1) the bandwagon effect, (2) nonverbal behavior and communication between investigator and patient, (3) expectations, (4) motivation, (5) prestige, (6) visual and verbal cues, (7) sex and personality characteristics of patient and investigator 4. Attitude of the patient toward the physician or other health personnel 6 5. Attitude of the patient toward the treatment 6 6. Attitude of the patient toward the results6 a
Shapiro (1969). Examples of this effect are discussed by Shapiro (1969). These include the influence of outside individuals and events on the patient, which in turn affect the patient’s attitude. The patient’s attitudes are a reflection of fears, hopes, previous experiences, expectations, prejudices, ideas, and concepts. 6
6
In some hospitals, physicians sometimes give a small syringe full of saline to a patient complaining of pain as a test of the objectiveness of the pain. A positive response (i.e., benefit to the patient) is seen as evidence of the patient’s either falsifying symptoms or having psychologically induced (i.e., functional) symptoms. If the patient has no response to the saline, it supposedly confirms the authenticity of the patient’s pain. Apart from the arguments given above against this interpretation, the enthusiasm of a physician as well as other factors of the physician-patient relationship described in Table 93.2 often lead to a positive response in the patient.
Objective Changes Elicited by Placebo Many authors have demonstrated objective physiological changes in patients given placebos. There appear to be few responses in conscious individuals that have not been observed to occur to some degree after a placebo is given. Beneficial effects caused by placebos have frequently been shown as changes in objective efficacy parameters relating to the disease, not just subjective feelings of improvement. Some of the objective parameters of efficacy that have responded to placebo include lowering of blood pressure in a doubleblind clinical trial (Gould et al., 1981), relief of ulcer symptoms (Sturdevant et al., 1977), decreased exercise-induced bronchospasm in 40% of asthmatics, and a 50% reduction of the rapid eye movement phase of sleep (Vogel et al., 1980). Adverse reactions caused by placebo have also been shown to be related to those caused by the active medicine. Schindel (1968) reviewed the literature and quoted references reporting the following adverse reactions (among others) attributed to placebo: (1) watery diarrhea, urticaria, and angioneurotic edema of the lips after a mephenesin placebo, (2) loss or impairment of hearing and eosinophilia after a streptomycin placebo, and (3) hallucinations, loss of vision, paresthesias, constipation, and stuffy nose after a reserpine placebo. Schindel (1968) has reported that the frequency of placebo-induced adverse reactions in the literature ranges from 1% to 61% and averages about 10% to 20%. Thus, both safety and efficacy responses to placebo often demonstrate objective changes of the type expected with an active medicine. It is impossible to state that the 35% of patients who are believed to respond to placebo (Beecher, 1962) are not experiencing a genuine physical response. In a specific patient given an active medicine it is usually not possible to differentiate between that part of the response caused by the pharmacological properties of the medicine and that part of the response that would have occurred if the patient had only been given a pla-
INTERPRETING PLACEBO D A T A
cebo. Figure 71.2 illustrates that the overall clinical response observed is usually a combination of several factors. Because the concept of placebo effect is so intertwined with both the patient’s and physician’s belief system, a full understanding of the nature and magnitude of the placebo effect requires at least some data on this topic. This information is not easily obtained. An Ideal Clinical Trial in Which to Evaluate the Placebo Effect One approach to obtaining a full understanding of the placebo effect is to obtain clinical data in a trial from four groups of patients:
/
715
TABLE 93.3 Selected factors to consider in interpreting placebo data o n adverse events 1. Determine who obtained the adverse event data. It is widely believed that the incidence of adverse events varies depending on whether a physician, nurse, or other individual obtains the data as well as on the manner i n which the information is obtained 3 2. Determine if a list of possible adverse events was read to the patient, or if a general probe was used, or if the patient spontaneously volunteered all information. There is general consensus that the incidence of adverse events is greater when a specific list of possible adverse events is read to patients than when patients respond to a general probe 3. Determine if the data were obtained from a clinical trial (Phase I to Phase IV) or from monitoring of usual clinical practice. It has been reported that the incidence of adverse events is greater during a clinical trial than in a routine monitoring of the usual practice of medicine (e.g., postmarketing surveillance) 1’ a
1. Patients given a placebo who believe they received a placebo 2. Patients given a placebo who believe they received the active medicine 3. Patients given the active medicine who believe they received a placebo 4. Patients given the active medicine who believe they received the active medicine This approach is only a hypothetical goal since it is not ethically acceptable to deceive patients in clinical trials unless there are special extenuating circumstances that are approved by an Ethics Committee/IRB (see Chapter 64). Placebo Control Group and No-Treatment Control Group
A control group in a clinical trial that receives no treatment differs from a control group that receives a placebo medication. The no-treatment approach fails to account for the pill-giving ritual and any effects that this event may have on efficacy. Effects from taking placebos have often been shown to be greater than effects in a no-treatment group. The concept of the placebo control group is discussed in several chapters.
FACTORS AFFECTING THE PLACEBO RESPONSE
There is a placebo response relating to efficacy measures in a clinical trial and also a placebo response relating to safety measures. The most complex safety measures relating to the placebo effect concern adverse reactions. The number, intensity, nature, and other characteristics of adverse reactions are highly subject to wide variation and depend on many factors. A few of the more important factors to consider are listed in Table 93.3. These factors are based on char-
It is not agreed, however, whether physicians usually obtain a larger or smaller number of adverse events than their staff. b Rossi et al. (1984).
acteristics of the patient, the investigator, and the medicine. Placebo responses relating to efficacy measures are also dependent on many factors. Those relating to the physician-patient relationship are summarized in Table 93.2. Many other factors that may affect placebo responses are identical or similar to those that affect responses to active medicine (see Chapter 83). Personal Contact and Interaction with Physicians as a Basis of the Placebo Response
A study of 97 patients demonstrated that those assigned to intensive contact with anesthesiologists both pre- and postoperatively were prescribed about 50% less analgesics by surgeons who were unaware of the clinical trial than the group who only received routine pre- and postoperative procedures (Bourne, 1971). In addition, the patients who had more contact with physicians were sent home by the surgeons an average of 2.7 days earlier than their counterparts. It would be interesting and potentially important to evaluate this hypothesis in other clinical situations. Effect of Color, Dosage Form, and Size of the Capsule on Patient Perception and Response to Placebo
There have been several clinical trials to evaluate associations of medicine color, form, and size with expectations of effect. Capsules are generally perceived as being stronger than tablets, larger capsules are perceived as stronger than smaller ones, and certain colors are associated with expectations of certain effects (Buckalew and Coffield, 1982a). A second trial by these authors demonstrated some marked differ-
716 /
PROBLEMS OF CLINICAL D A T A INTERPRETATION
ences in how different racial populations associate colors with anticipated clinical activity (Buckalew and Coffield, 1982b). Although these clinical trials revealed interesting results that relate to a patient’s race, they were inadequately controlled and cannot eliminate the possibility that environmental factors were responsible. It is the author’s opinion that larger and better designed trials must be conducted if their conclusion is to be supported. Factors such as socioeconomic status, age, medicine experience, and environmental factors must be included to arrive at more definitive data. Even then, the results will probably vary from group to group within any country and will depend on numerous cultural factors. Any conclusions reached will also be likely to change as environmental, social, and other conditions change. Color of the Placebo
Other investigators have conducted clinical trials in which the color of the placebo (and active medicine) was evaluated in actual clinical studies (as opposed to an experiment conducted in a clinical laboratory setting). Lucchelli et al. (1978) evaluated a standard hypnotic (hepabarbital) versus placebo in 96 hospitalized insomnia patients using two colors of each medication. They observed a significant interaction of color and sex of the patient. The same group of authors evaluated the sedative effects of different color placebos (no active medicine was given) and observed significant differences in the color preferences of each sex (Cattaneo et al., 1970). Schapira et al. (1970) performed a different type of experiment in which they gave patients oxazepam to treat anxiety. The medicine was prepared and dispensed in different colors (green, yellow, and red), and each color was used for 1 week in a Latinsquare design. Patients and physicians were asked to rate the effectiveness of each “medicine.” Color preferences were noted for treating anxiety with green medicine and for treating depressive symptoms with yellow medicine, but these differences did not reach statistical significance. Other studies have also confirmed the influence that color has on patient responses to placebo (Huskisson, 1974) or patient perceptions of their activity (Jacobs and Nordan, 1979). EVALUATING THE MAGNITUDE OF THE “TRUE MEDICINE” RESPONSE How is the True Medicine Effect Obtained?
The complexities of assessing a placebo response have never been adequately described, nor have all of the possible interactions been evaluated. It has conveniently been assumed in most clinical trials that medi-
cine-induced clinical effects and placebo-induced clinical effects are simply additive. This means that the response obtained by the placebo group (or during the placebo period) may be subtracted from the active medicine response to derive the “true” magnitude of the medicine response. This assumption has not been proven. Moreover, it defies common sense and clinical judgment for several reasons. Efficacy: Are Placebo and Efficacy Responses Simply Additive?
The process of subtracting the placebo response from the total response to yield a “ t r u e ” medicine effect has been questioned by Lindahl and Lindwall (1982), They provide a number of examples that demonstrate an interaction between the placebo and the “real” effect of a medicine. Some of the factors that may interact with placebo and thus may invalidate the assumption that the placebo and “real” effects are additive are (1) carryover of previous treatment, (2) continuation of other concomitant medicines, (3) continuation or initiation of nonmedicine modalities, (4) withdrawal effects of previous treatment, and (5) changes in magnitude of the placebo response over time. Understanding Placebo Responses If an investigator reports that placebo elicited a 50% response, this value must be qualified in several ways to indicate the following: 1. What is the variation among patients (i.e., did some patients have a 100% response and others a 0% response, or did most cluster around 50%)? A standard deviation, range of values, and confidence limits are all methods of illustrating the variability observed. 2. How did the placebo response change with time? At what point in the clinical trial did it develop, for how long a period during the trial did it persist, and did the effect wear off? 3. Did the changes in placebo response with time mirror the changes in efficacy noted with the trial medicine? What are the implications for the interpretation of the data and for data from other trials (as well as for designing future trials)? 4. Was the magnitude of the placebo response related to a factor of the trial (e.g., age or sex of patient, severity of disease)? Ceiling Effect One can consider circumstances in which a placebo gives an all-or-none effect. One example is in evalu-
INTERPRETING PLACEBO D A T A
ating an analgesic, when pain relief may be graded as 100% with placebo. In this situation no additional activity is possible to elicit with the active medicine, and a “ceiling effect” has been reached. Safety
Some specific adverse reactions are more closely associated with placebos than others. It was reported that of all symptoms, nausea and vomiting were most closely associated with placebo in a well-designed crossover trial of women receiving oral contraceptives (Goldzieher et al., 1971). All other symptoms reported with the oral contraceptives decreased markedly in occurrence when placebo was initiated. Nonetheless, there is a large background frequency of adverse reactions in the general population who are not receiving medicines. A survey of over 400 university students and hospital staff who were without illness and who had not taken any medicines within at least 3 days found that only 19% were symptom-free over the preceding 3 days. The median number of symptoms reported was two (from a predetermined list of 25 symptoms), and 30 people reported six or more symptoms (Reidenberg and Lowenthal, 1968). Similar results were reported by Spilker and Kessler, 1987). The reasons why patients receiving placebos experience adverse events characteristic of the test medicine or the active control are several. The most common reason may be that the adverse reaction was listed in the informed consent. This could heighten patient awareness and anticipation of that particular reaction and somehow lead to its occurrence, or the belief that it occurred. A second reason may be that the adverse reaction is characteristic of an entire class of medicines and the patient had previously experienced the adverse reaction while on a related medicine. A third reason could be a contact reaction that the patient on placebo experienced in a waiting room (for an outpatient trial) or anywhere in a clinic or hospital (for an inpatient trial). A contact reaction occurs when close proximity to another patient experiencing an adverse reaction leads through the power of suggestion or psychological contact to the elicitation of reaction in another patient. Fourth, the patient on placebo may learn about the adverse event of trial patients on the active medicine in a clinic or elsewhere. Finally, the adverse event may be part of the disease itself. PLACEBO RESPONSES THAT ARE LESS OR GREATER THAN EXPECTED
Placebo responses may be less than expected for many reasons, some of which are presented in Table 93.4. Reasons for this occurrence can be grouped into two
I
717
TABLE 93.4 Possible interpretations o f a placebo response that is less than expected 1. Patients were extremely ill and less responsive to medicine therapy in general 2. Patients had a high frequency or number of nonresponders, and a placebo response was expected based on previous data 3. Disease/problem is not likely to improve substantially with treatment, and thus a larger placebo effect should not have been expected 4. Small response observed represents the lower end of normal variation 5. Trial blind was not effective, and patients knew they were receiving the placebo 6. Placebo medication was not identical to the trial medication, and patients knew they were receiving the placebo 7. Inadequate period was allowed in the trial for the placebo response to develop 8. Excessive period was allowed for the trial and the placebo response wore off 9. Patients in the placebo group were receiving concomitant medication that was partially effective, and patients had diminished ability to show any medicine (or placebo) effect 10. The placebo was always given after medicine in a crossover trial 3 11. No placebo response should have been expected (e.g., a placebo effect is not believed to be present in comatose patients) 3
See Cormia and Dougherty (1959) for data that support this conclusion.
basic categories. The first is that the actual placebo response was less than expected or was less than previously observed in other studies. The second category relates to spurious conclusions about the placebo response that were based on operational definitions, ambiguities in the study, or other reasons unrelated to the placebo effect itself. A few of the reasons from the second category relating to small placebo responses can be described in more detail.
Ceiling Effect
It is assumed that there is a sigmoid-shaped dose-response relationship for most measured medicine effects on efficacy. The plateau of this curve represents the maximal response (activity) possible. Response to placebo is measured in terms of the same efficacy parameter. As the placebo response increases toward 100% effect, there is a decrease in the ability of an active medicine to demonstrate its effect. It is assumed that it is easiest to demonstrate an effect of a medicine’s activity either near the threshold or lower part of the dose-response relationship and that it becomes progressively more difficult to demonstrate positive activity as the plateau is approached. This is also referred to as the “ceiling effect,” which indicates that the plateau of the dose-response curve represents a maximum or near-maximum level of activity above which it is (usually) not possible to go.
17 9 5 8 6 2 4
718
/
PROBLEMS OF CLINICAL D A T A INTERPRETATION
Placebo Responses That Are Greater Than Expected
If a placebo response occurs that is greater than expected, it is usually an unwelcome occurrence because of the increased difficulty in differentiating between active and placebo treatments. A number of reasons for this situation are presented in Table 93.5.
Do Placebo Effects Wear Off?
It is difficult and often impossible to evaluate the presence or magnitude of interactions between the placebo effect and true medicine effect in any one individual, let alone an entire group of patients. When an initially dramatic clinical effect wears off in time, it is possible to speculate that the residual effect is the true medicine effect, but there is little evidence to support this view. There is, however, evidence that demonstrates the development of tolerance to medicine effects, which is another possible interpretation when diminished efficacy is observed over time.
TABLE 93.5 Possible interpretation of a placebo
response that is greater than expected 1. Most patients who were entered into the clinical trial had mild (or moderate) disease and were more sensitive to a placebo response 2. Patients entered the trial with severe disease, and as a result of the natural variations in the course of chronic disease (i.e., phenomenon of regression to the mean), the disease yielded an apparently large placebo effect 3. Expectations for a smaller placebo response were not justified 4. Patients were more responsive to treatment than previously studied groups and were at the upper end of the normal variation observed 5. The population studied was different from previously studied groups in an important manner (e.g., they were more susceptible to psychological suggestion) 6. The blind was better maintained than in previously conducted trials, and a more accurate placebo response was observed 7. The optimal time points for measuring placebo response were used, whereas previous trials had missed detecting the peak response and had observed a smaller placebo effect 8. Patients had been withdrawn from all treatment for a relatively long period and were especially sensitive to "treatment" 9. Patients in the placebo group were receiving concomitant medicines unbeknownst to investigator 10. Patients in the placebo group received active medicine as a result of a packaging error 11. Placebo was always given prior to medicine in a crossover study, and a greater response (relative to the medicine effect) was observed 3 12. Environmental settings may influence the magnitude of the placebo effect a See Cormia and Dougherty (1959) for a discussion on the order of medicine and placebo presentation and data that support this conclusion.
Interpretation When a Medicine Is Not Better Than Placebo
A failure to differentiate statistically between the effects of a trial medicine and placebo in a clinical trial may seem to indicate that the trial medicine is not clinically useful. This failure generally should not be viewed as positive evidence against the possibility that the trial medicine is active. Rather, the trial’s outcome is not evidence in favor of the medicine being active. Such a perspective is especially applicable in clinical trials of psychoactive medicines. There are many logical and sound clinical and scientific reasons why a trial medicine may fail to demonstrate activity: (1) the dose studied was too low, (2) the trial was too short, (3) the patients were noncompliant, (4) the patients did not have the disease they were supposed to have, (5) the patients were nonresponders, and (6) the patients improved on both treatments, illustrating the phenomenon of regression to the mean. Thus, a study failing to demonstrate medicine activity for a n explainable reason should not adversely affect the interpretation that a medicine is active, provided that the view is supported by findings in well-designed, well-controlled, and relevant clinical trials. Possible Interpretations When Placebo Is Found to Be More Effective Than Active Medicine
A number of controlled trials have reported that patients receiving placebo responded better than patients receiving active medicine treatment. Results were statistically significant. What are the possible interpretations of these incidents? In some cases the data interpretation was simply that the patients on active medicine did less well than those on placebo and that the medicine should not be used in those patients. An example is idoxuridine for herpes simplex encephalitis (Boston Interhospital Virus Study Group, NIAIDsponsored cooperative antiviral clinical study, 1975). Other interpretations are listed in Table 93.6. In a clinical trial evaluating whether dexamethasone shortened
TABLE 93.6 Possible interpretations if placebo responses are statistically significantly better than the active treatment 1. The medicine and placebo supplies were reversed 2. The active medicine makes the disease worse 3. There is a difference between the two groups, either in baseline values or in inherent differences 4. There is confounding by other factors (e.g., surgery, prior treatment) 5. The sample size used was too small 6. The parameters measured were inappropriate 7. The clinical trial was conducted over too short a period 8. Adverse reactions influenced the results of treatment 9. Results occurred as a chance event
INTERPRETING PLACEBO D A T A
the duration of coma in patients with malaria, it was found that the effect of dexamethasone was significantly worse than that of placebo (Warrell et al., 1982). This finding should lead to cessation of that treatment in medical practice, since the trial was apparently well designed, conducted, and analyzed. This conclusion is
I
719
not always warranted when a placebo is statistically better than active medicine treatment. In some situations a placebo may be better than active medicine in one trial but not in others using the same trial design (e.g., comparison of placebo and ketanserin in intermittent claudication; Bounameaux et al., 1985).
CHAPTER 9 4
Interpreting Data from Active Medicine Control Groups Types of Active Medicine Control Groups, 720 Dose-Response Relationships, 720 Combination Medicines, 721 Problems with a Two-Arm Active Medicine Control Trial, 721
When May Two- Arm Active Medicine Trials Be Conducted? 722
Patient-Specific Differences, 722 Choosing an Active Medicine T o Use as a Control, 722
response. This avoids one of the major problems associated with two-arm studies, i.e., the difficulty of demonstrating that either medicine was effective if they gave statistically indistinguishable results. This assumes that one is not evaluating a clinical problem in which the evidence of efficacy is absolutely incontrovertible, such as when a test medicine immediately brings patients out of a coma.
TYPES OF ACTIVE MEDICINE CONTROL GROUPS
A number of types of active medicine control groups may be used in clinical trials. The strength of the interpretations drawn varies to a large degree and depends on the specific type of active medicine trial used. Within either the parallel or crossover category of trial design, the most common types of active medicine trials may be divided broadly into two groups. These categories may be used to describe clinical trials conducted in any clinical phase of development.
DOSE-RESPONSE RELATIONSHIPS
It is possible to develop variations of clinical trial designs for most active medicine trial comparisons that avoid the problem of not being able to demonstrate that the active medicine was actually effective. A good example is when a dose-response relationship of the trial medicine is determined and one dose of an active medicine control is evaluated. If the doses chosen for the dose response are able to elicit a wide range of efficacy responses (i.e., from a small response at low doses to a greater response at a higher dose), it would provide a framework against which the active medicine could be compared. Moreover, there would be relatively high assurance (if the clinical trial were well designed, performed, and analyzed) that the comparisons between active and test medicine were valid. It is also possible to perform dose-response evaluations for the active medicine instead of the test medicine or for both the active and study medicines. Although not strictly two-arm trials because of multiple treatment groups, these variations still utilize just two medicines to arrive at a potentially strong conclusion.
1. Two-arm trial. Comparison of (1) a test medicine with (2) an active medicine control in an openlabel, single-blind, or double-blind manner 2. Three-arm trial. Comparison of (1) test medicine, (2) placebo, and (3) active medicine control in a single-blind or double-blind manner Within parallel trials a fourth or additional arm is often used. This usually involves either a second or third dosing group, whose members receive the test medicine or a combination medicine. This type of trial is included in the discussion of a three-arm trial. The term group could be substituted for arm in this operational definition. Interestingly, some authors use the term leg instead of arm. In general, there are more problems in interpreting data from two-arm than from three-arm clinical trials. In three-arm trials, especially when conducted in a double-blind manner, data from the active medicine group can be compared with placebo responses to confirm that the active medicine control yielded a positive
720
INTERPRETING D A T A FROM A C T I V E M E D I C I N E CONTROLS
COMBINATION MEDICINES
Another example in which assurance of the validity of a clinical trial’s interpretation could be achieved in evaluating an active medicine versus a trial medicine involves combination medicines. Each component of the combination could be tested separately in a parallel trial design. A crossover trial design, though rarer, could be used in certain diseases wherein each patient would receive during separate periods in the trial the test medicine as a combination, each of the entities that make up the combination, and the active medicine control (which could either be a single entity or combination medicine). In this situation it is assumed that either one or more than one of the separate medicines in the combination being tested would demonstrate less efficacy than the combination (or no efficacy). Thus, there would be a possibility of confirming that the active medicine possessed activity through comparison with the single entities and combination. These alternatives (e.g., a third or fourth arm) would also address the concern that using an active medicine control (in the absence of placebo) does not provide adequate incentive to the investigator and sponsor to have the clinical trial succeed (i.e., show a difference between treatments). A statistically significant difference between at least two groups in the trial would be important to strengthen the conclusions reached. If the response to an active control is less than anticipated, a number of interpretations are possible (Table 94.1). PROBLEMS WITH A TWO-ARM ACTIVE MEDICINE CONTROL TRIAL
Temple (1982) has pointed out three major categories of problems with the two-arm type of active medicine control (i.e., in the absence of a placebo treatment): 1. It is more difficult to prove statistically that two TABLE 94.1 Possible interpretations when the response t o a n active medicine control is smaller than expected 1. Inadequate dose given 2. Inadequate compliance by patient 3. Medicine was not biologically active because of chemical decomposition (e.g., medicine was photosensitive, medicine sat on a heated radiator, medicine was left out of the refrigerator) 4. Medicine was not effective in population studied, possibly because it was distributed or metabolized or eliminated in an atypical manner, perhaps resulting from inclusion of an unusual patient population 5. Medicine was not adequately absorbed (e.g., resulting from interactions with food) 6. Patients were relatively nonresponsive (i.e., because of normal variation) 7. Patients had mild disease and did not have much of a therapeutic range in which to demonstrate improvement 8. The placebo response was much greater than expected (or usual), and the magnitude of the response caused by the active medicine control was thus proportionally less
/
721
results are the same than to prove that they are different. If both treatments yield the same effect, there is no test to establish that a statistically significant similarity exists. If the test medicine is statistically better than the active control (or vice versa) in a two-arm clinical trial, then this issue does not arise. 2. Since the investigator does not wish to observe a difference between treatments, there is no incentive to conduct the trial well. In fact, the more poorly it is conducted, the more likely the data will be the same with both medicines. Thus, poor technique represents an inadvertent means to obtain data that demonstrate that the test medicine is as efficacious as the standard medicine. 3. There is no accepted statistical means of demonstrating that either medicine worked if there is no statistically significant difference between them in the results obtained. If both medicines are approximately equal in the effect they elicit, it does not prove that either medicine is truly efficacious. Either both medicines were efficacious in the trial or neither was. Temple points out that in any particular study a standard medicine may be inactive even though it is generally well established to be an active medicine for treating patients. Many active medicines have been shown to be no better than placebo in at least some well-controlled clinical trials in analgesia and other areas. Although Temple’s third point is that both medicines were efficacious in the trial or neither was, the real situation is more complex, and many interpretations are possible (see Table 94.2 for selected examples).
TABLE 94.2 Possible interpretations when the average clinical response is equal for a group o f patients o n study medicine a n d active control medicinea A. Both medicines are clinicaly effective b 1. Each medicine was active in the same types of patients 2. One medicine was highly active in a few patients, and the other medicine was less active in a larger number of patients 3. One medicine was active in one type of patient, and the other medicine was active in different types of patients 4. One medicine had a false-positive response and the other medicine was actually active 5. Both medicines had false-positive responses B. Neither medicine is clinically effective 1. Clinical changes were nonspecific and resulted from a placebo-like effect 2. One medicine caused a false-negative response, and the other medicine was actually ineffective 3. Both medicines had false-negative responses C. It is impossible to evaluate whether the medicines elicited a clinical response a This table assumes that only a single dose of each medicine is tested and no placebo group is included. b The conclusion of “effectiveness” depends on well-established historical data demonstrating activity of the active control medicine or clear clinical evidence of activity.
722
/
PROBLEMS O F C L I N I C A L D A T A INTERPRETATION
WHEN MAY TWO-ARM ACTIVE MEDICINE TRIALS BE CONDUCTED?
The author agrees with the validity of Temple’s points in most cases. Nonetheless (as Temple agrees), twoarm active medicine control trials cannot be avoided in certain situations. For example, in patients with terminal cancer, severe bacterial infections, or numerous other severe conditions for which adequate therapy currently exists, it would not be ethically acceptable to test a new medicine against a placebo. Active controls are not the only solution to designing a trial in these situations, however, since a historical control group could be used, but active controls are often preferable. Likewise, in situations in which a test medicine is being evaluated for its value in long-term therapy in a chronic disease, it is often unethical to use a placebo medication. In these situations an active control has some value. Some of the techniques to overcome the limitations of two-arm trials were previously mentioned, and are also described at the end of Chapter 6. Patient-Specific Differences
An additional issue that may arise in either two- or three-arm active medicine trials is that two medicines may yield equivalent overall effects, but the responses
they cause in specific types of patients may differ enormously. This issue may be addressed by a subgroup analysis of the data. A variation of this issue is that if medicine X is statistically superior to medicine Y, there still may be a group of patients who will respond to medicine Y, and not to medicine X. This possibility should be considered, although it is not always straightforward to test for this. If a crossover trial design is used in a chronic disease, it should be possible to evaluate this possibility. A parallel design could also address this issue by evaluating subgroups of patients, although this approach is not as efficient as using a crossover trial when that design is suitable. This type of potential problem may be particularly prevalent in some clinical trials of psychotropic medicine. For example, the characteristics of groups of schizophrenic patients often vary widely between different trials, as do their responses to specific medicines (Chassan, 1979). Choosing an Active Medicine to Use as a Control
Finally, the choice of an active control medicine may be based on many factors including (1) therapeutic category, (2) mechanism of action, (3) chemical class of the medicine, (4) regulatory considerations, (5) medical reputation of the medicine, (6) medical use of the medicine, and (7) marketing considerations.
CHAPTER 9 5
Interactions Among Medicines Medicine-Medicine Interactions, 723 Classification of a Medicine’s Interactions with Other Medicines, 723 When Do Interactions Occur? 723 Results of Interactions, 724
Clinical Sequelae of Interactions, 724 Intended Interactions, 724 Other Factors to Consider, 724 Nonmedicine Interactions, 725
Medicines interact with numerous factors or substances both within and without the body. Outside the body medicines may interact with air, light, water, heat, and humidity in the environment or with part of the container in which the medicine is stored. These reactions often inactivate the medicine, although they sometimes lead to the formation of a more active (or toxic) metabolite(s). There are also physical interactions that occur when mixtures of two or more medicines are put in solution or when a single medicine is put in solution. These interactions may cause precipitation or otherwise inactivate or modify the medicine through physicochemical changes. Within the body innumerable types of interactions are possible. These include interactions both with other medicines (i.e., medicine-medicine interactions) and with nonmedicine factors (e.g., foods).
which are known. The third approach is reasonable, but the significance of many interactions is unknown. The fourth approach raises many questions because it is often uncertain that a real interaction has occurred, or if it did, that it will recur on a regular basis. Assurance that an interaction has occurred can be described in three major categories: (1) definite association, (2) probable association, and (3) anecdotal reports. It is difficult without conducting one or more controlled clinical trials to demonstrate an association adequately. The discussion above illustrates the difficulty of deriving an acceptable basis for classifying the types of interactions among medicines.
When Do Interactions Occur? The term medicine-medicine interaction is usually applied to antagonistic or synergistic (potentiation) interactions between two medicines, and this is the major focus of this chapter. Interactions occur during various processes, including (1) absorption, (2) distribution, (3) transport (active or passive), (4) combining of a medicine with receptors, (5) eliciting a biochemical, pharmacological, or other effect, (6) conversion of the biochemical, pharmacological, or other effect to a clinical effect, (7) inactivation of the medicine through reuptake or other processes at or near the receptor site, (8) metabolism, and (9) elimination. Brodie and Feely (1988) divided interactions between medicines into those caused by the pharmacokinetics of the affected medicine and those influencing the pharmacodynamic response to it.
MEDICINE-MEDICINE INTERACTIONS Classification of a Medicine’s Interactions with Other Medicines Classifications may potentially be based on (1) mechanism-of-action, (2) types of sequelae, (3) clinical significance of the outcome, (4) degree of assurance that the interaction is well established, or (5) other categories. The first approach (mechanism-of-action) is important in understanding the event, but the mechanisms of many interactions are unknown and others are speculative. The second approach is difficult to use because various sequelae could occur for even a single interaction, depending on many factors, not all of
723
724
/
P R O B L E M S O F C L I N I C A L D A T A I N T E R P R E T AT I O N
Results of Interactions A medicine-medicine interaction may result in either synergistic (potentiated) effects or a diminution of response. The two medicines that interact may each have the same therapeutic effect and may be combined purposely or used together to take advantage of their interactions. Their mechanism of action may differ, and this factor may allow for a synergistic action (e.g., use of multiple hypotensive medicines). Several mechanisms that may lead to synergistic effects are listed in Table 95.1 and those that may lead to diminished effects are listed in Table 95.2. Some mechanisms (e.g., receptor blockade) may lead to either enhanced or diminished responses. Some medicine-medicine interactions d o not affect responses.
Clinical Sequelae of Interactions Medicine-medicine interactions may affect efficacy or safety. In terms of safety, adverse reactions are often the primary area of interest. Many factors are known that may influence adverse reactions resulting from medicine-medicine interactions. These factors relate primarily to the patient, medicines, or trial design, and a few are listed in Table 95.3.
TABLE 95.1 Mechanisms involved in medicinemedicine interactions that may lead to synergistic clinical activities11 1. Displacement of one medicine by another from protein binding sites in plasma (e.g., salicylates may displace phenytoin, thus increasing the concentration of free phenytoin in plasma) 2. Inhibition by one medicine of an enzyme that normally metabolizes and inactivates the other (e.g., inhibition of an enzyme’s synthesis or activity) 3. Competition of two medicines for the same enzyme (e.g., noncompetitive inhibition) 4. Modifying the pH of urine can increase or decrease excretion of certain medicines (e.g., aspirin and sodium bicarbonate) 5. One medicine may cause biochemical or pharmacological changes, leading to conditions that increase the effect of another medicine (e.g., hydrochlorothiazide leads to decreased potassium levels, which render patients receiving cardiac glycosides more liable to have a toxic reaction) 6. Medicines with similar biochemical, pharmacological, or physiological effects may produce synergistic clinical actions 7. A medicine may affect a pharmacological receptor so that another medicine’s action is enhanced 8. A medicine may prevent reuptake of another medicine into a storage site (e.g., cocaine’s potentiation of sympathomimetics) 9. A medicine may prevent metabolic inactivation of another medicine 10. A medicine may modify gastrointestinal absorption of another medicine a
Resulting effects may be beneficial or toxic.
Intended Interactions The two medicines that interact may be combined purposely so that one medicine prevents or slows a process such as renal elimination or metabolism that would inactivate or eliminate the other medicine (e.g., penicillin and probenecid). In this situation only one of the two medicines produces a therapeutic effect, although the effect is enhanced by the presence of the other medicine. It is generally important to differentiate in the interpretation of data between those interactions that were a result of a purposeful step and those that occurred inadvertently.
Other Factors to Consider The time between the administration of two (or more) medicines that interact is usually important in determining the magnitude and clinical significance of the response. Evaluating doses of the two medicines that interact is also essential to understanding the clinical significance (if any) of the purported interaction. Many hundreds or even thousands of medicine-medicine interactions have been reported, but the number that have been established as probably true is far smaller,
TABLE 95.2 Mechanisms involved in medicinemedicine interactions that may lead to a diminished therapeutic effect3 1. Physicochemical inactivation prior to absorption (e.g., chelation of one medicine by another, adsorption of one medicine by another) 2. Modification of gastrointestinal absorption 3. Physicochemical inactivation subsequent to absorption (e.g., protamine sulfate reacts with heparin in the vasculature) 4. Microsomal enzyme formation may be induced in the liver by one medicine. The liver then more rapidly metabolizes and inactivates another medicine 5. Specific or nonspecific receptor antagonism by one medicine may prevent or diminish the action of a second medicine. The receptor antagonist may elicit an opposite effect, no effect, or may be a partial agonist b 6. Alterations in urine pH may increase the excretion of certain medicines or may lead to crystalluria (e.g., methenamine plus sulfonamides) 7. Displacement of one medicine by another from plasma proteins. The displaced medicine is then more rapidly distributed to tissues, metabolized, or excreted 8. Direct physical action of one medicine on another (e.g., a topical medicine that prevents a second medicine from penetrating the skin) 9. Direct chemical action of two medicines (e.g., mixture of two medicines in a syringe or infusion apparatus) a
Resulting effect(s) may be beneficial or toxic. Numerous types of receptor antagonism may occur (e.g., competitive or noncompetitive, reversible or irreversible). b
INTERACTIONS AMONG MEDICINES TABLE 95.3 Selected factors that m a y influence adverse reactions resulting from medicine-medicine interactions A. Factors relating to the medicines 1. Dose of each medicine. Larger doses are more likely to cause adverse reactions 2. Dosage form (e.g., sustained-release forms usually yield lower peak plasma concentrations than immediate-release forms of a medicine and may be less likely to interact with other medicines) B. Factors relating to trial design 1. Route of administration. This parameter can best be evaluated in the context of each medicine's mechanism of action 2. Time of administration. Interactions relating to medicine absorption are more likely to occur when the time between administration of the two medicines is reduced 3. Order of administration. This may be important to consider depending on the mechanism of the interaction C. Factors relating to the patient's physiological function 1. Renal function often has a major role in affecting the occurrence and magnitude of medicine-medicine interactions via effects on the excretion of medicines 2. Hepatic function insofar as metabolic capacity and capabilities are concerned may influence adverse reactions D. Factors relating to the patient’s inherent and acquired characteristics 1. Genetic history 2. Disease state(s) 3. Age 4. Diet 5. Smoking
/
725
and the number with clinical significance is smaller still. NONMEDICINE INTERACTIONS
A few of the nonmedicine interactions with medicines that may occur outside the body were indicated in the introduction (e.g., decomposition as a result of air, water, or other environmental factors). Within the body, nonmedicine interactions include effects on medicines resulting from (1) the pH of fluids the medicine encounters, (2) food that is ingested, (3) cigarette smoke, and (4) various other factors. Another area that might be considered as a medicine interaction is the effect of a medicine on diagnostic laboratory tests. Alterations in laboratory test results may occur either as a result of the medicine affecting the patient or from interactions resulting from interference by the medicine with a step in the analytical laboratory test procedure. In interpreting data it is often difficult to identify each of the unanticipated interactions that may have occurred in a clinical trial. An index of suspicion to consider interactions when unanticipated events occur will help uncover these events.
CHAPTER 9 6
False-Positive and False-Negative Results Rates, 726 Sensitivity and Specificity, 726 The Ideal World, 726
Statistical and Clinical Definitions, 726 Types of False-Positive and False-Negative Results, 727
This chapter discusses definitions and general concepts of false-positive and -negative responses as well as methods to probe for why they are there and why they occur. Related concepts of rates, specificity, and sensitivity are also described. False-positive and -negative results occur with diagnostic tests (e.g., laboratory measurements) and with efficacy tests used to evaluate medicines. Occasionally, the results of an entire clinical trial may be reported as yielding data that are either falsely negative or positive.
SENSITIVITY AND SPECIFICITY The Ideal World In measuring a medicine’s efficacy or safety many different types of tests are used. The ideal test would have a 1:1 correlation in accuracy with the desired results. There would be no positive results obtained that were truly negative (false-positive response) and no negative results obtained that were truly positive (false-negative response).
RATES Statistical and Clinical Definitions Rates are used to present both safety and efficacy data and are determined on a response-per-unit basis. Thus, both a numerator and denominator are required. The numerator is usually the number of times an event occurred or the number of patients who are affected by or otherwise involved in a particular issue. The denominator could be presented in terms of patients, problems, physicians, or medical practices. Patient populations can be viewed in terms of a single clinical trial, group of trials, all trials on a medicine, or in terms of a patient characteristic or characteristics. Within a single trial the denominator may refer to patients who (1) are potentially available to enter the trial, (2) are considered for entry, (3) complete screens successfully and are eligible, (4) sign an informed consent, (5) enter the baseline, (6) enter the treatment period, (7) complete treatment, (8) are analyzed, or (9) have completed adequate follow-up. A detailed discussion of incidence and prevalence is given in other chapters.
Since the ideal situation is not observed, it is important to know what the rate is of false-positive and of falsenegative results. This information allows one to determine how much reliance should be placed on results from a particular test. A statistical definition of sensitivity of a test is the true positive rate. A clinical definition is that sensitivity relates to how finely detailed data may be and still be recognized as positive, i.e., if one test can discern responses of 1 cm and another test can discern responses of 1 mm, the latter test is much more sensitive than the former. Definitions are shown in Table 96.1. The true negative response is known in statistics as the specificity of a test. Clinically, specificity relates to how well the test can differentiate between the test medicine and other medicines. If one test can distinguish among five separate medicines and identify only the test medicine as positive and note the others as negative, then it has a higher degree of specificity than
726
FALSE-POSI TIVE AND F A L S E - N E G A 1 1VE RESULTS
/
727
TABLE 96.1 Definitions of various terms relating to false positives and
false negatives = Probability of correctly classifying a diseased patient as diseased
Sensitivity
number of diseased persons with a positive test * total number of diseased persons tested
Sensitivity (%) Specificity
= Probability of correctly classifying a nondiseased patient as nondiseased number of nondiseased persons with a negative test * total number of all nondiseased persons tested
Specificity (%) False-positive rate
= Probability of incorrectly classifying a nondiseased patient as diseased number of false positives total number of all nondiseased persons treated
False positives False positives®
= 1 - specificity
False-negative rate = Probability of incorrectly classifying a diseased patient as nondiseased number of false negatives total number of all diseased persons tested
False negatives False negatives®
= 1 - sensitivity
a
Assume that all diseased patients are classified either correctly or incorrectly. The probability of being correct in the diagnosis plus the probability of being incorrect = 1.0. The probability of being correct in the classification of diseased patients = sensitivity. Therefore the probability of being incorrect (i.e., a false negative) = 1 minus the sensitivity. A similar description can be used to illustrate that the false positives = 1 minus the specificity.
a test that cannot distinguish among any of the five medicines. One means of categorizing positive and negative results is shown in Table 96.2. There are numerous publications that evaluate the sensitivity and specificity of (1) a particular test performed by different methods (e.g., see Steinberg et al., 1985), (2) groups of clinical trials using different types of controls (e.g., see Sacks et al., 1983), and (3) reports of diagnostic tests (e.g., Sheps and Schechter, 1984).
If the problem is known to exist, then further steps toward uncovering the reason(s) for its existence should be undertaken. The suspicion of a false-negative (or -positive) result is extremely common during some parts of many clinical trials. Steps that may be followed to evaluate this possibility are outlined in Table 96.3. Some of the reasons for false-positive results occurring in a trial are listed in Table 96.4, and reasons for false-negative results are listed in Table 96.5. These tables list only a small portion of the reasons that may affect data obtained in a trial in either a positive or negative direc-
TYPES OF FALSE-POSITIVE AND FALSENEGATIVE RESULTS False-positive or false-negative results generally fit one of three types. The occurrence of a false result may be (1) known, (2) unknown, or (3) suspected. If it is unknown, then this brief chapter will not provide definitive or even likely means to reveal this fact. If the problem is suspected, one immediate goal will be to determine whether the suspicions are correct or not.
TABLE 96.2 Categorization of positive and
negative results True state Test results
Disease is present
Disease is absent
Positive Negative
True positive False negative
False positive True negative
TABLE 96.3 Possible actions and considerations if a
false-positive or false-negative result is suspected 1. Identify the suspected factor(s) 2. If the identity of the suspected factor(s) is not totally clear, discuss with colleagues which methods could be used to identify and evaluate possible factors 3. After suspected causes are identified, list and evaluate possible reasons why those causes could have led to false-positive or -negative results 4. Discuss the suspicion with a statistician 5. Conduct appropriate reanalyses of data 6. Develop algorithms or a decision tree that may be tested 7. Evaluate relevant articles in the medical literature 8. Determine if any additional laboratory, efficacy, or other tests on some (or all) of the patients in the clinical trial could resolve the issue 9. Determine if a protocol modification for ongoing or future trials could resolve (or address) the issue 10. Discuss the suspicion with colleagues, peers, and consultants
728
/
PROBLEMS OF CLINICAL D A T A INTERPRETATION
TABLE 96.4 Selected reasons for false-positive data3
TABLE 96.5 Selected reasons for false-negative data“
A. Patient related 1. Patients were not as ill as the investigator believed, and the medicine was more effective in mildly ill patients than in moderately or severely ill patients 2. Patients were much more ill than the investigator believed, and the medicine was more effective in severely ill patients than in mildly or moderately ill patients 3. A number of patients had unsuspected renal or liver disease, which prevented rapid metabolism and/or elimination. Blood levels were maintained at a therapeutic level for a longer period than in patients who had normal renal and liver function 4. Too few patients were evaluated, and activity was noted by chance 5. A few patients had a large response, which skewed the overall data 6. Patients tend to enroll in clinical trials when they are most severely ill and often gradually improve independent of any therapeutic interventions. This illustrates the phenomenon of regression toward the mean and when not controlled gives an impression of a beneficial effect 7. Patients are sometimes more easily recruited for a clinical trial when they have a high risk of developing a disease that is not the primary focus of the trial but is being evaluated. This yields a falsely elevated incidence value for the disease b 8. More medicine was absorbed than anticipated or would normally occur (e.g., because of the presence of food, which delayed gastric emptying) 9. Patients were not compliant with the trial protocol and took an excessive amount of trial medicine or concomitant medicines 10. Patients may feel a strong allegiance to the investigator or feel stong pressure to demonstrate a positive medicine effect 11. Patients received concomitant nonmedicine treatment modalities, which were responsible for improvement noted (e.g., bed rest in patients with lower back pain) B. Trial design and medicine related 1. The blind used in the trial was broken or was ineffective, and the physician or patients inadvertently biased the results in favor of the test medicine 2. In an open-label trial a larger response occurred than was anticipated, but there was no placebo control group to interpret properly the magnitude of the apparent placebo response 3. The medicine decomposed to a more active metabolic product 4. An error occurred in dosing patients, who received a greater dose than intended 5. Inadequate washout period was used, and patient’s previous treatment had a carryover effect 6. Inappropriate clinical endpoints, tests, or parameters were used C. Investigator and staff related 1. The investigator and staff demonstrated great enthusiasm and stimulated the expectations of the patients 2. The investigator and staff used an aggressive therapeutic approach D. Results and data related 1. The data illustrated “clumping," where a preponderance of events occurred in a nonrandom manner 2. A high percentage of nonresponders and partial responders dropped out of the trial, leaving a relative preponderance of responders 3. Data analysis did not appropriately account for all patient dropouts 4. Only a portion of the baseline or treatment period data ______were analyzed___________________________________
A. Patient related 1. Patients were either much less or much more ill than the investigator believed and did not respond to the medicine as would patients with severity of disease intended as the trial group 2. The patient group included a relatively large number of nonresponders 3. The patients had unsuspected renal or hepatic problems, which caused rapid excretion or inactivation of the trial medicine 4. The patients were not compliant with the trial design and did not take their doses as directed 5. Patients took concomitant medicines that interacted with the test medicine and prevented full expression of its activity 6. Patients were persistently exposed to conditions that prevented improvement (e.g., bacteria or protozoa that continually reinfected the patients) B. Medicine related 1. The medicine had lost its potency through (1) lack of chemical stability, (2) improper storage, (3) improper preparation, (4) exposure to light, or (5) for other reasons 2. The medicine was not adequately absorbed 3. The medicine’s metabolism was different in the study population than in other patient groups C. Trial design related 1. Too few patients were entered to demonstrate a medicine effect (i.e., there was a lack of power) 2. An inappropriate trial design was chosen (e.g., efficacy was measured at the wrong time, in the wrong manner, with the wrong equipment) 3. An insufficient dose of the medicine was tested 4. Efficacy tests or parameters were used that were unable to demonstrate a medicine effect 5. Inadequate washout period was used, and previous treatment interacted with the trial medicine, preventing full expression of efficacy to be observed 6. Concomitant nonmedicine therapy interfered with full expression of the trial medicine’s efficacy D. Investigator related 1. The investigators may have used a too conservative approach to patient treatment or to the data (e.g., data analysis, clinical interpretation) 2. Patients were given the wrong medicine or wrong size tablet or insufficient quantity of medicine, or the label on the medicine container was incorrectly printed E. Results and data related 1. Patients who improved dropped out of the trial, and a higher percentage of nonresponders remained 2. Data analysis did not appropriately account for all patient dropouts
a Data demonstrated a statistically significant effect of the medicine, whereas more convincing data obtained subsequently or prior to the trial clearly demonstrated inactivity. Most factors listed in Chapter 83 may be responsible for false-positive data. b Lavori et al. (1983) reported this effect in a 15-year followup trial of patients who were evaluated for developing breast cancer, Qven though that was not the primary objective of the trial.
a Data did not demonstrate a clinical effect that was shown in previous or subsequent trials. Most factors listed in Chapter 83 may be responsible for false-negative data.
tion. It is important to evaluate as many factors as possible in attempting to determine which ones were responsible. If the presence of false-negative or -positive results is detected, then different mechanisms should be utilized to discern the reason(s) for their existence. Once the causes are discovered and dealt with, then the factors can be eliminated and the data reanalyzed. In a worst-case scenario, the clinical trial may have to be repeated utilizing an improved methodology that eliminates or minimizes the likelihood of obtaining a falsenegative or -positive result.
CHAPTER 97 Interpreting Quality of Life Data Relevance of Disease and Personality Types, 730 Reference Time of the Measurement, 731 When Should Quality of Life Be Measured? 731
Measures of Quality of Life, 729
Is It Important To Measure Quality of Life? 729 Definition, 729 Quality versus Quantity of Life, 729 Spectrum of Measuring Instruments, 729 Choosing Tests and Scales, 729 Relevance for National Policy, 730
Criteria to Evaluate Quality of Life Tests, 731 Quality of Life Assessments in Clinical Trials, 732 Presenting Results, 732
this term. A definition should include consideration of physical, psychological, economic, and social wellbeing plus a measure of a patient’s ability to perform daily activities. See Chapter 52 and Spilker (1990a).
MEASURES OF QUALITY OF LIFE
Since the primary goal in treating most patients with chronic disease is to improve their function and quality of life, it is important to measure and interpret this concept. The difficulties of interpreting subjective quality of life data are generally much more complex than those involved in interpreting more objective data. There are many controversies in this area, and identifying a few will indicate the types of difficulties and complexities encountered. A number of factors that may be measured with quality of life scales are listed in Table 97.1.
Quality versus Quantity of Life
The balance between the value of therapeutic measures that increase a patient’s life by a short period at the expense of its quality is becoming more widely debated both within and without the medical profession, especially in terms of terminally ill patients who sometimes choose not to be treated or who choose hospice care. More patients and physicians are stating that it is essential to consider life’s quality in addition to its duration when a therapeutic regimen is established.
Is It Important to Measure Quality of Life?
An important controversy in the past was whether or not it is worthwhile to measure quality of life. Proponents stress that it is the major goal of health care and that placing attention on a patient’s functional outcome is important. Opponents used to stress the scientific value of objective tests and treatments for obtaining new data on major (and minor) diseases and the rigorous (and objective) scientific approaches that led to these medical advances. Another argument concerned the vagueness and difficulty of assessing and interpreting quality of life measures. The importance of quality of life assessments is now generally accepted.
Spectrum of Measuring Instruments
A patient’s quality of life and its change as a result of treatment may be evaluated with subjective parameters (e.g., how a patient feels) or with objective parameters (e.g., the number of days worked per year or per month). Some authors and tests focus on objective criteria to define and measure quality of life, whereas others stress the measurement of subjective aspects of this concept. Using both approaches is best.
Definition
Choosing Tests and Scales
Another controversy is how the term quality of life is defined. There is no universally accepted definition of
There are many different tests and scales used to measure the quality of life. Although some are better val-
729
2
730 /
PROBLEMS O F C L I N I C A L D A T A INTERPRETATION
idated than others, it is generally difficult to compare data obtained with different scales. Numerous scales are described in Quality of Life Assessments in Clinical Trials (Spilker, 1990a), plus the Bibliography on Health Indexes (U.S. Dept, of Health and Human Services, Public Health Service, 1983). This bibliography was issued several times. See also Spilker et al. (1990). It is possible to measure the patients’ ability to function in their daily activities in terms of either their actual behavior or their potential ability. Many types of problems may act as variables to confound the data. These problems may arise from many of the factors described in Chapter 83 (e.g., social environment, physical environment, genetic inheritance). Parameter measures may be weighted or unweighted. The use of unweighted parameters means that each parameter is considered to be equal. Weighting means that the parameters are not considered equal, making it desirable (or necessary) to create an imbalance in their relative importance to make their combination together more fair or correct. For example, assume there are three measures (A, B, C); A is twice as important as B and a third (C) is only onehalf as important as B. The weighting of the values prior to combining them means that A is 4 times C , B Factors that may be measured or assessed with quality of life scales
TABLE 97.1
A. Function in daily life 1. Ability to bathe, dress, and feed oneself 2. Ability to control physiological functions (e.g., urine, bowel movements) 3. Ability to ambulate and move in and out of furniture and/ or cars 4. Ability to achieve satisfactory sleep and rest B. Productivity and economic status 1. Ability to work productively at the patient's desired (or other) vocation 2. Ability to support oneself, family, and/or others at a satisfactory standard of living C. Performance of social role 1. With relatives, friends, and others 2. Family relationships 3. Community relationships 4. Recreation and pastimes D. Intellectual capabilities 1. Memory 2. Ability to communicate 3. Ability to make decisions 4. Overall ability to think, act, and react E. Emotional stability 1. Mood stability and swings 2. Beliefs about the future 3. Emotional levels 4. Religious and philosophical beliefs F. Assessment of satisfaction with life 1. Level of well-being 2. Perception of general health 3. Outlook for the future G. Signs and symptoms of illness 1. Nature of problems 2. Severity of problems 3. Duration and frequency of problems 4. Amount of treatment required 5. Adverse reactions
Potential methods and approaches to combining data from three or more separate tests3
TABLE 97.2
1. Present raw data for all patients and sum all scores so that the contribution of each test depends on the potential range of that test plus actual score achieved. 2. Transform raw data for all patients to a scale of 1 to 100 for each test and then sum all scores so that each test contributes one-third of the total. 3. Weight the results of each test by a numerical factor that is based on its importance in the overall total score achieved or on another basis described in a footnote (or in the text). 4. Instead of transforming or weighting individual patient scores, conduct these statistical tests on the overall scores for each treatment group. 5. Use other methods to transform data (e.g., see Chapter 81) before combining them. a It is assumed that all tests were conducted in a single clinical trial and that the possible range of raw scores that could be achieved differs among the tests.
is 2 times C , and C is unity. Another way to view it is that 7 Cs in value (4 plus 2 plus 1) equal 100%, and of the total, A is 57%, B is 29%, and C is 14%. Methods for combining and presenting data from multiple tests are given in Table 97.2 and Fig. 97.1. Zimmerman (1983) reported that there was a high correlation (0.94) between weighted and nonweighted totals across 18 studies that evaluated the stress-illness relationship. Of course, other weighting procedures would generate different results and may be useful in other types of quality of life scales. Relevance for National Policy
The rapidly escalating cost of national health care services is one reason why greater attention is directed toward quality of life measures. Evaluation of expensive medical therapies and procedures sometimes requires that an assessment of quality of life be made. This assessment is often conducted as part of an economic analysis of the costs and benefits of such procedures to the whole society as well as to the individual patient. For example, an evaluation of quality of life in patients with end-stage renal disease was reported (Evans et al., 1985). The personality of the patient evaluated, his or her disease severity, and prognosis will each affect quality of life scores obtained. Demographic characteristics may also be important. If these factors are not controlled or evaluated, then data obtained with any test may prove to be difficult to interpret. Relevance of Disease and Personality Types
Popular beliefs relating to the psychosocial aspects of disease and coping mechanisms are beginning to be analyzed more systematically than in the past. The hypothesis that the psychological status of chronically ill patients does not differ according to specific diagnoses was tested by Cassileth et al. (1984) in patients with
INTERPRETING Q U A L I T Y OF L l F E D A T A
QOL PARAMETER +
CAPOTEN (N = 181)
METHYLDOPA (N = 143)
/
731
PROPRANOLOL (N = 162)
General well-being Physical symptoms Sexual function
©
Work performance Sleep dysfunction Cognitive function Life satisfaction Social participation
• • • © o • ©
• • © © • o
FIG. 97.1 A revised form of the data from Croog et al. (1986) illustrating comparative quality of life (QOL) data for three medicines in eight categories. White circle, improved; gray circle, stable; black circle, worsened.
arthritis, depression, diabetes, cancer, renal disease, and nonmelanomatous dermatologic disorders. Their data supported this hypothesis. They observed that patients with recently diagnosed disease (in all groups) had poorer scores of mental health than did those patients who had had their diagnosis made at least 4 (or more) months previously. These results challenge the once popular view that individuals with certain specific personality types or emotional traits are more at risk of becoming ill with a particular disease or problem. Older patients (above 60) had better mental health scores than did those in the 40- to 60-year range, whose scores in turn were better than those who were still younger (below 40). Quality of life measures in some illnesses may need to be tailored to the disease/illness measured. Although Cassileth et al. (1984) reported similarities in responses of six groups of patients with different chronic diseases, specific aspects of each disease may be relevant. A number of articles have evaluated the quality of life as it relates to cardiovascular disease (Wenger et al., 1984), oncology (Edelstyn et al., 1979; Sugarbaker et al., 1982), renal disease (Evans et al., 1985), and lung disease (McSweeny et al., 1982). These papers and those in Spilker (1990a) point out the wide variety of ways in which this subject is currently interpreted.
frame, it is possible to measure the previous day, week, or some other time period. Another approach is to compare the present instant, day, or week to the pretrial baseline. A third approach is to measure both the present time and the best previous point during the trial (or the previous week, month, or other time). If the quality of life is determined for a cancer patient, it may be debated whether the measurements should be made at the time of treatment, a few weeks later, during remission, or at a different time or times. When Should Quality of Life Be Measured? It is generally not necessary to measure and evaluate the quality of life when a medicine is life saving or has a much higher benefit-to-risk ratio than other medicines for the same disease. The question of quality of life measurements becomes important when a treatment or medicine (1) has a small or moderate benefitto-risk ratio, (2) is only partially curative, (3) is extremely expensive, or (4) relieves disease symptoms but elicits moderate or severe adverse reactions. The field that is concerned with these issues is growing rapidly, and attempts to validate various tests and measures will continue. CRITERIA TO EVALUATE QUALITY OF LIFE TESTS
Reference Time of the Measurement Measurements of the patient’s quality of life may be made in reference to the precise moment the test is being completed. In addition to or in lieu of this time
Six criteria were described by Deyo (1984) to evaluate tests used to measure quality of life. These should be considered by investigators and sponsors who must choose tests to include in clinical trials.
732
I
PROBLEMS O F C L I N I C A L D A T A INTERPRETATION
1. Applicability to the purpose and group being tested 2. Practicality in terms of effort and cost to conduct the tests 3. Comprehensiveness of the questions posed as related to each important area 4. Reliability of the test in terms of its reproducibility from time to time 5. Validity of the test. There are various types of validity and means of establishing this. Deyo (1984) describes some of these issues 6. Sensitivity of the test to detect important changes or differences in magnitude
QUALITY OF LIFE ASSESSMENTS IN CLINICAL TRIALS
The discussion above touches briefly on a few issues relating to the interpretation of quality of life data. Sixty authors recently contributed to a book entitled Quality of Life Assessments in Clinical Trials (Spilker, 1990a), which presents a detailed discussion on these and other topics. The book is organized into five sections: (1) introduction to the field (chapters discuss concepts, choosing approaches, validating models); (2) standard scales, tests, and approaches to quality of life assessments (individual chapters discuss the various domains— economics, social interactions, psychological well-being, and physical function); (3) special perspectives (e.g., ethics, culture, marketing, industry, regulatory); (4) special populations (e.g., pediatrics, geriatrics, substance abuse, rehabilitation, chronic pain, cardiovascular surgery, gastrointestinal surgery); and (5) specific disease (e.g., cardiovascular, neurologic, psychiatric, inflammatory bowel, renal re-
placement, pulmonary disease, cancer, and chronic rheumatic disease). PRESENTING RESULTS
The results of a complex quality of life test that has several parts may be expressed with a single number, whereas a more simple test with multiple parts may yield multiple numbers (i.e. , one for each part). Batteries of tests invariably lead to multiple numbers/results that cannot be combined. The major approaches to use when comparing medicines with multiple tests are to (1) create a hierarchy of test importance prior to a comparative trial, (2) express summaries of all data (e.g., medicine A had a higher score in 4 of 6 tests), or (3) merely list results obtained in each test. In the first case, an indication should be given of how much of a change in the major test(s) would be considered clinically significant, in addition to stating which test(s) is most important. In the second case, sufficient data would have to be presented so that readers could determine if they agreed with the implicit interpretation suggested by stating that 4 of 6 tests showed medicine A to be superior. The data in Fig. 97. 1 lend themselves to this type of presentation. Merely listing results of each test enables readers to make their own comparisons and interpretations and to reach their own conclusions. The first method is the best approach because it minimizes bias while indicating which test(s) is deemed most important. The question of which medicine leads to a better quality of life depends on its attributes, the specific patients studied, and the test(s) chosen to use in a clinical trial. Numerous presentations of data on quality of life are given by Spilker and Schoenfelder (1990).
CHAPTER 9 8
Responders and Nonresponders Responses and Plasma Levels, 735
Definitions and Background, 733
A Practical Example, 735
Partial Responders, 733
How to Turn Nonresponders into Responders, 736 Calculating Patient Survival Curves, 736
Reasons for Nonresponse, 733
Ceiling Effect, 734 Influence of Time, 735
for at least 1 month or a 50% decrease in size of measurable lesions. Measurable refers to a direct physical measurement of a patient’s lesion(s). Unfortunately, there is a wide disparity in how this definition is applied. In some clinical trials size refers to volume, and a 50% decrease is reflected by a 21% reduction in a linear dimension. In other clinical trials size refers to area, and a 50% decrease is reflected by a 29% reduction in a linear dimension. Moreover, some investigators apply the definition to each measurable lesion and others apply it to the sum of all lesions.
DEFINITIONS AND BACKGROUND
The operational definitions of a responder and a nonresponder are often arbitrary in many diseases, especially chronic diseases for which a complete cure is not expected. A nonresponder may be operationally defined as a patient who has not responded sufficiently (or adequately) to treatment whether the treatment was a study medicine, active medicine, placebo, no treatment, or a specific previous treatment. Criteria may be established to define a positive response more specifically. This may include reference to the duration of response. In some areas of medicine (e.g., oncology), the term partial responder (i.e., incomplete responder) is often used in addition to the other two terms. This may help clarify the definitions in some situations, since responses of chronic diseases are rarely all or none. An investigator may be asked to rate a patient as being either a responder or nonresponder, and to record that rating on the data collection forms. The sponsor may overrule the investigator’s decision if the sponsor’s action is based on objective criteria that it established or decided to use prior to initiating the clinical trial. In certain clinical situations (e.g., many acute problems), responses occur in an all-or-none manner. All patients who have received treatment in this type of clinical situation may be classified as being either responders or nonresponders.
REASONS FOR NONRESPONSE
The nonresponder’s lack of response may have resulted from a high degree of refractoriness that was present prior to the clinical trial, although this may only be known in hindsight. In refractory patients, it is sometimes concluded that virtually no treatment with the characteristics of the trial medicine could have elicited a positive response. On the other hand, patients entered in a trial may have been able to respond to therapy with the characteristics of the trial medicine but for some reason(s) did not. It is not always possible to determine which category of nonresponse a specific patient fits. In some cases a high degree of refractoriness may be predictable based on a patient’s clinical history. The main method to eliminate such patients from a trial is through carefully developed inclusion criteria. It is, however, generally impossible to eliminate all nonresponders of the first type (those who will not respond to any treatment) prior to the trial. When results of a clinical trial are negative or are
Partial Responders
1 he definition of partial response in oncology is usually based on either improvement in nonmeasurable lesions
733
734
/
PROBLEMS OF CLINICAL D A T A INTERPRETATION
TABLE 98.1 Selected reasons for a patient being classified as a nonresponder 3 A. Patient is noncompliant because of dissatisfaction related to 1. Time taken away from job to attend clinic 2. Cost of meals and transportation to get to clinic 3. Excessive demands of the clinical trial in terms of time, effort, or other factors 4. Lack of dramatic clinical benefit 5. Lack of rapid onset of clinical benefit B. Problems related to trial medicine or protocol 1. Dose of medicine is inadequate 2. Duration of treatment is inadequate 3. Medicine composition has deteriorated as a result of storage, handling, or method of preparation 4. Definition of response is narrow 5. Measurements of response are taken at inappropriate times 6. Inadequate or unclear directions are given to patients, many of whom then take the medicine improperly C. Patient characteristics or responses 1. Patient develops tolerance to the medicine’s effect 2. Patient eats a diet containing food that inactivates the medicine 3. Blood levels are inadequate (e.g., variable absorption between patients) 4. Patient stores the medicine improperly a The reasons given for a false-negative result (Table 96.5) include reasons why a patient may be classified as a nonresponder. Also see Chapter 15.
less significant than anticipated, it may be that an excess number of nonresponders were entered. A number of reasons why a patient may be classified as a nonresponder are listed in Table 98. 1 and Fig. 98. 1.
Ceiling Effect When the difference in response is small between placebo and treated groups, more patients in the treated group will be considered as nonresponders than when the difference between the groups is great. If patients receiving placebo have a much larger response than expected, the number of patients receiving active medicine who will be defined as nonresponders may be increased artificially. One reason is that there is often a maximum response possible (“ceiling effect”), and this limit is being approached by patients in the placebo group, allowing less potential for medicine recipients to demonstrate a greater effect. This concept is discussed further in other chapters.
Patient Does Not Respond to Medicine
Patient-Related 1. 2. 3. 4.
Noncompliance Rapid metabolizer Adverse reaction Inability to absorb medicine
Medicine-Related
Protocol-Related
1. Medicine-medicine interaction 2. Medicine-nutrient interaction 3. Medicine is inactivated (e.g., light, moisture) 4. Medicine is inactive
1. Wrong parameter measured 2. Dosing too infrequent 3. Dose too low
Trial Conduct-Related 1. Equipment problem 2. Technician problem in measurements 3. Patient given wrong medicine by investigator or staff
Other Reasons FIG. 98.1 A number of reasons why a patient may not respond to the active medicine in a clinical trial. Other reasons include: wrong type of disease diagnosed, inappropriate clinical trial design used, incorrect data analysis conducted, and parameters measured incorrectly, and patient given wrong medicine because of a packaging error.
RESPONDERS AND NONRESPONDERS
Influence of Time
/
735
TABLE 98.2 Selected characteristics to differentiate
between responders and nonresponders
Although the above concepts are described as if they are independent of time, that is generally not true. Clinical responses may be permanent, but especially with chronic diseases, many symptoms and problems tend to recur in a fixed or variable cycle. Problems usually recur either with a waxing and waning pattern or with discrete episodes, although other patterns exist. Some examples are described below as methods that can help determine the basis of a patient’s response and thus differentiate between responders and nonresponders. These examples focus on blood levels of a trial medicine and survival curves.
RESPONSES AND PLASMA LEVELS It is usually straightforward to determine whether responders and nonresponders may be differentiated on the basis of a medicine’s blood level rather than dose. The most direct approach to this issue is to graph the blood level (or concentration) versus clinical response to determine if higher blood levels are associated with superior responses. A positive correlation suggests a relationship, although toxicity may affect this relationship for patients who have higher blood levels. Additional evaluations should be made if data suggest a possible relationship. A negative correlation may also be observed. For instance, if a medicine’s bioavailability differs widely between different individuals at the same dose level, some may have a blood level in the therapeutic range and be classified as responders, whereas others may have lower blood levels and be termed nonresponders. This pattern may suggest a relationship between response and blood level if the numbers of nonresponding patients with a blood level in the therapeutic range and of responding patients with a blood level below the therapeutic range are relatively small. Other clues that may be useful in differentiating responders and nonresponders are listed in Table 98.2. A detailed description is given below of the process through which it can be evaluated if patient responses are related to the blood level of the trial medicine. All patients who have received medicine in the groups to be compared should be classified as responders or nonresponders. If a relatively small number of patients cannot be easily classified, they may be omitted from this analysis, placed in a separate category, or divided as responders or nonresponders based on other criteria established. Patients who have received placebo should be omitted from this analysis, since whether or not they were responders is not pertinent to an evaluation of blood levels. If the medicine’s bioavailability at one (or more) dose is highly variable, then blood
1. Patients studied in the morning versus afternoon 2. Patients evaluated by Dr. A versus those evaluated by Dr. B 3. Patients entered in the clinical trial at its inception versus those entered toward the end of the trial 4. Differences in blood level peaks or troughs of the trial medicine or its metabolites 5. Differences in various pharmacokinetic parameters (e.g., areas under the curve, half-life of medicine) 6. Differences in physiological function 7. Differences in metabolites or concentrations of metabolites 8. Differences in past medical history or any specific patient characteristic 8 a
See patient characteristics described in Chapter 83 and listed in Table 83.25.
levels will probably be related to individual patient bioavailability. Patients receiving different doses may have equivalent (or divergent) blood levels. A Practical Example It is desirable to examine blood levels of responders and nonresponders separately at each dose or dose range studied. A simple hypothetical example to illustrate the reason for this approach is given below. Assume that the following data are obtained in a clinical trial. 1. Responders (n = 21) average 100 units of medicine/ml of blood. 2. Nonresponders (// = 21) average 40 units of medicine/ml of blood. These patients received doses from 80 to 400 mg/day. Although these data make it appear as if the patients’ responses are related to the blood levels, when the data were subdivided on the basis of dose of medicine the following were obtained: 1. Patients (n = 21) who received medicine in the dose range of 80 to 150 mg/day average 31 units of medicine/ml of blood. 2. Patients (n = 21) who received medicine in the dose range of 200 to 400 mg/day average 109 units of medicine/ml of blood. These data make it appear as if blood levels are related to the dose of medicine. If the responders and nonresponders are then evaluated separately at each dose range: 1. At a dose range of 80 to 150 mg/day, responders (n = 3) average 40 units of medicine/ml of blood, and nonresponders (n = 18) average 30 units of medicine/ml of blood. 2. At a dose range of 200 to 400 mg/day, responders
736
/
PROBLEMS OF CLINICAL DATA INTERPRETATION TABLE 98.3 Selected mechanisms that may turn nonresponders into responders11
(w = 18) average 110 units of medicine/ml of blood, and nonresponders (n = 3) average 100 units of medicine/ml of blood.
1. Change the definition of the terms responder and nonresponder in relation to the degree or duration of improvement 2. Add a category of partial responder and define this term 3. Provide counseling to patients to improve motivation, attitude, or desire to improve 4. Utilize mechanisms to turn poor compliers into better compliers1’ 5. Raise dose of medicine or modify dosage regimen within the limits of the protocol or modify the protocol (or initiate a new clinical trial) 6. Involve the patient's family in the trial to a greater degree 7. Improve investigator and staff relationships with the patient
Thus, the apparent relationship that was initially shown between blood levels and patient response was shown to be false when the data were examined on the basis of dose ranges the patients received.
HOW TO TURN NONRESPONDERS INTO RESPONDERS
a Each approach initially involves the determination of the reasons) why a patient or group of patients are not responders and addressing that particular reason. Patients or staff may be interviewed to elicit possible reasons for lack of adequate patient response. b See Chapter 15.
Most investigators attempt to determine and understand the reasons why patients improve. This allows the investigator to continue effective therapy in that patient and possibly to try the same techniques in others. One of the most common reasons for therapeutic improvement in some patients but not in others relates to a medicine’s blood level. It is often a useful exercise to attempt to correlate clinical efficacy with the blood level of a therapeutic medicine. If a strong correlation is achieved, then it is possible that a range of medicine levels in blood or plasma may be specified within which the therapeutic response should occur. In this situation, it would allow physicians to dose patients to a given blood level in seeking a therapeutic effect and should prevent underdosage of patients imparting a falsely negative label to the medicine’s therapeutic potential. Another advantage of this approach is that toxic effects of a medicine should be minimized, since physicians would not dose patients to blood levels above the therapeutic range. A number of mechanisms that may turn nonresponders to responders are listed in Table 98.3.
CALCULATING PATIENT SURVIVAL CURVES
Several authors (Weiss et al., 1983; Oye and Shapiro, 1984) make a strong case that patient survival in cancer trials should not be compared between groups of patients defined as “responders” or “nonresponders” based on tumor shrinkage after a trial is completed. These articles present a variety of reasons to support the conclusion that comparison groups should be created prior to the trial, since responders may primarily include patients with the most favorable prognosis. 1. Many pretreatment factors influence patient survival (e.g., extent of disease, functional status, histology of the cancer), and these would not be expected to be evenly distributed in patients classified as responders and nonresponders.
100
0 z
80 60
cc Z) co
40 20
Treated Patients (N = 30) Untreated Patients (N =19)
Responders (N = 10) Nonresponders (N = 20) 12
16
MONTHS
20
24
28
32
12
16
20
24
MONTHS
FIG. 98.2 Survival rates of cancer patients in a clinical trial (Payne et al., 1981) evaluating 5fluorouracil. Data in the left panel compare responders and nonresponders. Data in the right panel compare treated and untreated patients. (Reprinted by permission of the Journal of the American Medical Association from Oye and Shapiro, 1984. Copyright 1984 by the American Medical Association.)
28
32
RESPONDERS AND N()N RESPONDERS
2. Individual variation occurs and may not yield equivalent groups when selection occurs based on patient response. 3. A preponderance of the most severely ill patients would be expected to be in the nonresponder group, and thus this group would have the poorest prognosis for survival. 4. Radiation, surgery, or additional chemotherapy would be expected to be more prevalent in nonresponders and may shorten their survival. Data from a study by Payne et al. (1981) comparing
I
73 7
survival of responders and nonresponders to chemotherapy (Fig. 98.2) apparently demonstrated improved survival of responders to chemotherapy (left panel), and that is what the authors concluded (Payne et al., 1981). A concurrent group of untreated patients was included in this clinical trial, and Oye and Shapiro (1984) compared the survival of all treated patients with all untreated patients (right panel of Fig. 98.2). It is clear that the supposed advantage of treatment disappears when this more appropriate comparison is made. This same point was illustrated by Tannock and Murphy (1983) with data from a different trial.
CHAPTER 9 9
Benefit-to-Risk Assessments and Comparisons Assessing Risks in Six Steps, 743
Assessing Benefits at Different Levels, 739
Determining Benefit-to-Risk Ratios, 743
Types of Benefits, 739 Certainty of Benefits, 739 Levels of Benefits, 739 Measuring Benefits, 740
Definition, 743 Describing the Benefit-to-Risk Concept, 743 Use of Benefit-to-Risk Ratios, 744 High-Risk Groups, 744 Establishing the Ratio for Different Groups of Patients, 744 How Benefit-to-Risk Ratios Change Over Time, 744
Risks, 740
Definition, 740 Types of Risks, 741 Risks to Populations of Patients, 741 Baseline Risk and Excess Risk, 741 Risk Factors, 741 Therapeutic Index, 742 Two Essential Questions to Address, 742 Offsetting Risks with a Single Medicine, 742 Risk Perception, 743
Comparing Benefit-to-Risk Ratios for Different Medicines, 745
Expressing Measurements Relating to Benefit and Risk, 745 Conclusions, 745
two or more medicines in actual medical or economic situations is often much simpler and straightforward than the discussion above might suggest. This is because most factors that affect benefits or risks are either not major considerations in specific situations or are so similar between alternative therapies that they do not greatly impact the overall benefit-to-risk concept or decisions based on this concept. A single factor may represent the total basis for choosing a medicine for a particular patient. For example, in choosing among similar medicines for a patient with renal failure, it is possible that only one medicine is excreted by the hepato-biliary route and would not be expected to lead to toxicity for a specific patient. Benefit-to-risk considerations would thus favor choosing that medicine. Another example would be a patient whose bacterial infection was sensitive to four antibiotics, and the patient was allergic to three of them. Benefit-torisk considerations suggest choosing the fourth medicine, unless other extremely important factors must also be considered that could override the potential risks.
It would be desirable if the assessment of benefits and risks could be readily quantitated and expressed as a single overall number to be compared between medicines. This is unfortunately impossible today. T o achieve this goal both benefits and risks would have to be measured in the same units. It is currently necessary to assess independently numerous types of both benefits and risks for a single medicine, each of which may be expressed differently. Then, it is necessary to weigh these factors using a global introspection technique in a manner that depends on the use to which the data are being applied. Multiple perspectives may be considered, including those of the patient, physician, and society. Benefits that are often imperfectly understood must be compared to the universe of risks and hazards, both known and unknown, to create a relative concept of benefit to risk for a single medicine. This theoretical concept for a single medicine is usually complex, and comparisons with other medicines usually raise additional complexities. The comparison of benefit-to-risk considerations for
738
BENEFIT-TO-RlSK ASSESSMENTS
ASSESSING BENEFITS AT DIFFERENT LEVELS Types of Benefits
The term benefits includes numerous concepts beyond that of a medicine’s efficacy. Efficacy relates to how well a treatment achieves its objectives, in a comparative way (e.g., medicine A improves symptoms 40% better than medicine B) or in a noncomparative way (medicine A improves patients’ disease measures by 25%). Benefits also include quality of life considerations at the individual patient level or at the collective patient level of various groups [e.g., patients in a Health Maintenance Organization (HMO)j. For a society at the city, province, country, or world level, benefits are often considered in terms of improved public health and improved utilization of resources (e.g., efficient allocation of money to the health care of a population).
/
739
can only be said to offer possible benefits in most clinical situations. The number of situations in which a predictable (i.e., almost definite) benefit is expected are much fewer than those situations in which benefits are judged probable or possible. Benefits may be judged as unlikely (i.e., less than 5%) in certain situations when clinical judgment dictates that use of the treatment is still worthwhile. This sometimes occurs in life-threatening situations or with severely ill patients, when alternative therapy is either not available or has been used without success. The risk of the treatment may range from negligible to substantial, and the benefit-to-risk ratio may be unsatisfactory. Nonetheless, the “long-shot” treatment in such situations may be entirely justified on both a medical and an ethical basis.
Levels of Benefits
Certainty of Benefits
Benefits may be either predictable (almost definite), probable, possible, or unlikely. Each of these four types is described below, and arbitrary probabilities in terms of percent likelihood are assigned to each category. Predictable benefits relate to clinical effectiveness of a medicine that is almost certain to occur in a patient or group of patients (e.g., for preventing disease). A greater than 95% likelihood of the predicted response is expected. Probable benefits refer to a situation in which a medicine has not yet been given to a specific patient or has been given, but it is too early in the treatment period for a positive effect to be observed. Based on previous clinical data and experiences, the likelihood of a benefit is judged to be excellent or highly probable (i.e., greater than 50% likely). Giving a diuretic to a volumeoverloaded patient who previously responded to diuretics is an example in which the probability of clinical benefit is extremely high. Possible benefits are those in which the degree of uncertainty of patient response is greater than in the two previous situations, and the chance of benefits is less than 50%. One example is of a new medicine tested in animals, shown to be safe in humans, but not yet demonstrated to be efficacious in humans. Even though the new medicine might eventually save human lives, the benefit-to-risk ratio often requires a cautious approach in early testing, because the benefits are only possible or potential. Because of the many factors that influence a patient’s response, almost all medicines
At each of the levels discussed in this section there are numerous perspectives that are often applied. As one moves from the level of individual patients to the level of society as a whole, the perspective and role of the individual patient becomes progressively less important, while the importance of health policy planners and administrators becomes greater. The impact of physicians and other medical professionals is greater at the individual and group level than at the society level. Specific perspectives are those of (1) the patient, (2) the medical professional, (3) the health policy planner, and (4) various types of administrators.
Level One: An Individual Patient At the level of an individual patient, defined as level one, the term benefit refers to improvements in both efficacy and quality of life. Although it is theoretically possible to describe benefits as a decrease in adverse reactions of one treatment versus another, this topic is usually discussed in terms of risks. Improvement in symptoms relating to a disease, however, are described as a benefit of treatment. One special caveat applies to medicines developed primarily to decrease adverse reactions caused by other medicines. One example is a new medicine designed to decrease emesis caused by cis platinum and another example is a medicine that is used to decrease anticholinergic effects of some antiparkinsonian agents. In these or other examples, the degree to which the test medicine diminishes or eliminates the target adverse reactions is a measure of its efficacy (benefits).
740
/
PROBLEMS O F C L I N I C A L D A T A INTERPRETATION
Level Two: Groups of Patients The same types of benefits described above also apply at the level of groups or subsets of patients with a disease (level two). The way in which an HMO, hospital, or other medical organization allocates resources to different treatments depends on how aggregate data of these treatments compare in benefit-to-risk terms, as well as on a consideration of costs and utilization of the group’s resources (e.g., health care professional time required, equipment needs). Level Three: All Patients The level of society as a whole (level three) considers benefits in various public health terms for all patients. This includes judgment of which medicines and therapies are best for present and future patients. A major difference between this level and the other two is that at this level benefit-to-risk ratios, costs of treatment, and utilization of overall health care resources are usually compared across many, if not all, diseases, rather than within a single disease. This means that a society must decide how much of its available money and resources to place on research and treatment for each specific disease and problem, and not just among the different treatments for a single disease. This allocation of money is largely determined by which treatments are available for use, and their benefits and risks, plus pressures placed on decision makers by various groups. It is at this third level that the greatest impact of nonobjective factors occurs in decision making. These factors and pressures are often referred to as politics. The choices that a society must make in allocating its health care money (or funds within other areas) is sometimes sarcastically referred to as determining whose ox gets gored. Measuring Benefits T o determine and measure clinical benefits objectively, the most common and accepted approach is to conduct clinical trials. Two other methods used to evaluate clinical benefits are to conduct Phase IV observational studies and meta-analyses. At the society level, government agencies, large trade associations, foundations, institutes, and other organizations conduct many surveys in which they collect data that are relevant for these assessments. Social, health, and economic data and statistics obtained are most often utilized at the third level. Unintended medical or nonmedical benefits may occur with medicines or other treatments. For example, if aspirin is being compared with nonsteroidal antiinflammatory medicines, the additional benefits of as-
pirin in decreasing the incidence of myocardial infarctions is an unintended benefit that should be considered. If different modalities used to treat cancer (e.g., radiotherapy, surgery) are being compared and greater use of one modality would also help alleviate an unemployment problem, then that impact may be considered as an unintended benefit for society. RISKS The public is being barraged with information about risks, often without the proper perspective to understand these important concepts (e.g., news coverage by certain media of risks of immunizing children). Although society is increasingly informed that there are no totally safe medicines or medical treatments, many individuals appear to be intolerant of and unwilling to accept risk. Many of these (and other) people are requesting more information about risks of treatment and a greater role in choosing their own treatment. This is a positive event because each treatment involves some degree of risk, and it is usually important for physicians and patients to compare and discuss the magnitude and possible severity of a treatment’s risk along with its anticipated benefits. There is an enormous difference between risks that people voluntarily accept (e.g., smoking cigarettes, abusing drugs, sunbathing, drinking alcohol) and risks that are involuntary (e.g., cosmic rays, depletion of ozone). There is increasing discussion in the scientific literature about assessment of both involuntary and voluntary risks (Dinman, 1980; Spilker and Cuatrecasas, 1989), but this topic cannot be explored in this book. Definition In medicine the term risk usually refers to the probability that a given patient will experience a deleterious reaction to the specific treatment under study or, more generally, that something bad will happen. It is the “bad” aspect that differentiates risk from uncertainty, since it is usually uncertain whether a pregnant woman will have a boy or girl, but this is not bad. The deleterious reaction of which there is a risk could be an adverse reaction, physical sign or symptom, laboratory or physiological abnormality (e.g., abnormal electrocardiogram, pulmonary function test), or other problem. Risk may relate to (1) a single patient, (2) a group of patients (such as those in a clinical trial), (3) the entire population of patients who have been given a medicine, or (4) all patients who may be given a medicine. The risk of an event may be described in terms of the probability of a future problem occurring if the pa-
B E N E F I T - T O - R l S K ASSESSMENTS
tient takes a medicine, or it may be the probability of a future problem resulting from the fact that the patient already took the medicine. In both situations the event is a potential one, but in the first case the probability of the event’s occurrence and its severity will be used to reach a decision as to whether the medicine should be taken. The two spectra, therefore, along which a risk is judged are the probability of the undesirable event occurring, and the severity of the event or its occurrence. Types of Risk
There are several different types of risks, and not all can be accurately measured. For example, an idiosyncratic reaction such as an allergic or anaphylactic reaction cannot be predicted with accuracy for a patient with limited prior exposure to the medicine, and incidence figures for these events are often difficult to acquire. Some medicine risks are dose related, and different incidence figures must be obtained for individual doses or dose ranges. Even a patient’s failing to improve on an experimental treatment could be viewed as a type of risk, since an alternative treatment might have provided the patient with a greater likelihood of achieving a therapeutic benefit. A different type of risk is incurred by patients who refuse treatment.
Risks to Populations of Patients
During the course of a medicine’s development, progressively more information is gathered to help determine its risk assessment profile. At the time of a medicine’s initial marketing, however, there is usually little estimation available on population risks. This is a result of the small number of patients evaluated at the time when a new medicine is approved (see Chapter 119). This information is either gathered or becomes available during Phase IV when the medicine is given to larger numbers of patients. Baseline Risk and Excess Risk
There is almost always a baseline risk of a specific adverse event, even if a patient is not treated with a medicine. This is estimated by determining the incidence of the event in the general population. The additional risk related to a medicine is called the excess risk. The excess risk may be estimated from Phase IV cohort studies, large field studies, or by evaluating large multipurpose automated data bases with record linkage. This last method is rapidly gaining favor as the method of choice. However, the cardinal rule of
/
741
validating patient diagnoses must be followed, regardless of the specific method adopted.
Risk Factors
Risk factors usually relate to specific characteristics of populations, which are by inference sometimes reflected to patients. Risk factors relate to the increase of the patient’s probability of experiencing an adverse reaction, disease, or other medical problem as a result of medicine treatment. Risk factors are also used to describe some nonmedicine treatment situations (e.g., risk factors exist and may be identified for having a myocardial infarction or developing hypertension). There is no acceptable manner of accurately calculating the magnitude and/or interaction of two or more risk factors for different diseases. Clinicians are often in the dilemma of trying to use overall population risk data and relating this to a specific patient. Most diseases or syndromes have risk factors associated with them. These factors may be (1) well established, (2) incompletely established, (3) unknown, and/or (4) controversial. Standard medical textbooks and references enumerate and describe many of these factors. The importance of particular risk factors is described in different ways. They may be presented on a high-to-low risk scale or broadly classified as major or minor (e.g., for having a myocardial infarction). Some risk factors are well known and well researched, such as those for hypertension or myocardial infarction, whereas others are speculative or are unknown. Deriving the risk factor in benefit-to-risk and other analyses may be straightforward or extremely complex. Major factors to consider in identifying and assessing risk are (1) the probability of the risk occurring, (2) the magnitude or clinical severity of the event if it occurs, (3) the ability to reverse the problem if it occurs, (4) whether any residual effects will remain, and (5) potential effects in the specific patient being considered. Risks associated with medicines may sometimes be described in absolute terms (e.g., X grams of medicine A will always cause a fatal reaction, Y tablets of medicine B will always make the patient unconscious). Almost all risks for medicines, however, are relative. These may be defined as the risk for patients in a specific treatment group (or those exposed) divided by the baseline risk (or the risk in those unexposed). A relative risk of approximately 1.8 to 2.0 and above is considered clinically significant. This means a twofold greater risk exists. A relative risk of 2 to 4 is usually described as moderate and risks above that as strong. Examples of relative risks include the association of maternal use of diethylstilbestrol and vaginal cancer
742
/
PROBLEMS O F C L I N I C A L D A T A INTERPRETATION
in the offspring (relative risk about 250), smoking and lung cancer (about 10), and reserpine and breast cancer (2 or less) (Stolley, 1990). Each treatment group may have different risk factors that usually increase their chance of having a specific problem, i.e., increase their risk of the event(s) in question. Risk factors determine the degree of risk. Risk factors often include evaluation of age, sex, weight, genetic makeup, physiological function, and numerous factors related to the disease, plus characteristics of past and present treatment (e.g., total number of courses of therapy, total dose). Risks for specific patients of adverse events differ depending upon the particular subgroups to which they belong and their particular risk factors. Basic questions to pose about risks are listed in Table 99. 1 . Therapeutic Index A single number that partially expresses the overall risk of a medicine is the therapeutic index. This is also known as the therapeutic ratio. In general terms this number is an approximation of the relative safety of medicines. A large number denotes greater safety and vice versa. However, the therapeutic index may vary widely for individual patients, because of unknown or known (e.g., genetics, risk factors, disease severity) factors. The value does not provide sufficient information about either risks or benefits. For example, a large therapeutic index for a potentially highly toxic medicine (if misused or overdosed) may create more anxiety in physicians and patients than a small therapeutic ratio for a relatively safe medicine. This ratio should be as large as possible for all new medicines. In reality, the lowest ratio acceptable for new medicines depends on the seriousness of the disease and also on the ratio for existing treatments for the same indication. In the therapeutic area of cancer, the ratio may even be less than unity for some treatments. This means that toxicity is virtually always observed at the same or at lower doses than those necessary to produce benefits. For cardiac glycosides the ratio is about 2, and for a relatively safe medicine (e.g., aspirin) the ratio is quite high for most patients.
TABLE 99.1
1. 2. 3. 4. 5. 6. 7. 8. 9.
Questions to consider about risk
Is the risk well established? Is the risk’s outcome immediate or delayed? Is exposure to the risk voluntary? Is the risk related to the dose of a medicine? Are specific risk factors known? Is the risk avoidable? Are the outcomes fixed or variable? How severe is the outcome? Is the risk associated with strong emotions?
Two Essential Questions to Address In considering risks it is often important to address and differentiate between the following two questions: 1. What is the likelihood that a specific adverse event experienced by a patient was related to a specific medicine? 2. What is the likelihood that a particular patient will experience a specific adverse reaction if treated with a specific medicine? The first question is one of cause and effect attribution and looks to the past. It may be discussed at all three levels described under benefits, while the second question looks to the future and is only appropriate to consider at the first level (i.e., the individual patient). One may compare both questions only at the first level. The major issue in question one is to establish whether or not the event was caused by the medicine. This is discussed at length in Chapter 73. The answer to the second question depends on previous data gathered in that patient in particular, and on the medicine in general. Other questions to consider are listed in Table 99. 1 .
Offsetting Risks with a Single Medicine One seldom simply weighs benefits and risks to arrive at an overall assessment. Most risks that accompany a medical treatment do not simply increase or decrease with its use. Rather, some may be increased at the same time others are decreased, or those of one medical therapy may be immediate or short-term, while others are longer-term (i.e., do not occur for a long time). Two types of offsetting risks are described below, based on their time of occurrence. Type 1. Risk A is decreased and risk B is increased at the same time. This is observed with many medicines, particularly when comparing risks of the disease itself (i.e., risk A), which are presumably decreasing while the risks associated with therapy (i.e., risk B) are increasing. T o evaluate how the two balance each other, one must compare the overall benefit-to-risk ratio of the treatment versus other treatments or no treatment. In determining this aspect it is usually desirable to evaluate quality of life issues, as well as certain practical issues (e.g., costs). Examples would include the risks of not treating certain fungal diseases versus risks of the medicine used (e.g., amphotericin); or, the risk of not treating a urinary tract infection versus risks of the antibiotic used. Type 2 . The treatment causes risk A to decrease in the present and risk B to increase at a later time in the patient’s life. For example, assume that the risk of
B E N E F I T- 1'O-RlSK ASSESSMENTS
death from cardiovascular disease is decreased with therapy and patients are shown to live longer; however, a number of patients develop a specific type of cancer at a later stage in their life if they are given the treatment.
3. 4. 5.
Risk Perception
A substantial amount of psychological research has demonstrated that the ways in which risk is perceived depends on the way in which problems are presented. Presenting the same problem requiring a decision in different ways (i.e., emphasizing either negative or positive attributes of the outcome) elicits different responses (Tversky and Kahneman, 1981; Slovic et al., 1982). People usually make decisions to avoid present risks and choose a sure thing over a probability. McNeil et al. (1982) reported that preferences of both lay people and physicians depend on whether probabilities are expressed in terms of death rates or survival rates. Patient perceptions of risks of prescription medicines are generally low (Slovic et al., 1989), which should influence their compliance plus attitudes about accepting adverse reactions. These data were obtained in Sweden and it is uncertain if the same results would be obtained in most other countries. The public is often confused because of media hype (either playing up small risks, or ignoring relative large risks). As a result people often overestimate the likelihood of rare events (e.g., death by tornados, vaccines, or lightning) and underestimate the likelihood of frequent events (e.g., death from heart attacks). Another issue is that the public is often given information about suspected risks when no one knows the true magnitude, and known risks when the magnitude is known. Many adverse reactions exist of both types (i.e., suspected and known risks). It is difficult to combine and often to compare data on voluntary risks (i.e., those risks an individual person accepts and brings on him- or herself), such as smoking cigarettes, driving a family car, or driving a racing car, and involuntary risks (i.e., those beyond a single person’s control) such as living in an area with nuclear power plants, cosmic rays, or disappearance of ozone.
Assessing Risks in Six Steps
The process of assessing risks of medicines may be divided into six separate steps, as follows: 1. Identify known risks and also potential risks. 2. Quantitatively assess known and potential risks in
6.
/
743
terms of their probability of occurrence (i.e., along the spectrum from 0 to 1). Determine the degree of exposure of patients to each relevant risk. Assess the known and potential risks in terms of their severity (i.e., along a spectrum). Determine which risks are additive, synergistic, mutually exclusive, or otherwise interact. Derive an overall assessment of both individual and collective risks for one or more patients, considering data from the first four steps and information from the fifth. This assessment may include determining whether for a specific treatment (1) the severity of risks is low and probability of occurrence is also low, (2) the severity of risks is low, but the probability of occurrence is high, (3) the severity of risks is substantial, but the probability of occurrence is low, or (4) the severity of risks is substantial and the probability of occurrence is also high.
DETERMINING BENEFIT-TO-RISK RATIOS Definition
The benefit-to-risk ratio for a particular patient is really the answer to the question: what is the value of giving a specific treatment to a specific patient at a specific time under a defined set of conditions? A comparable definition of benefit-to-risk ratio exists for a group of patients or for all patients with a specific disease. Describing the Benefit-to-Risk Concept
The benefit-to-risk ratio relates the potential or actual benefit that a patient derives from a medicine to the risks incurred by the patient through using the medicine. A medicine that is life saving provides an enormous benefit. If the medicine is relatively safe and thus has a low risk, the patient will have a high benefit-torisk ratio. If, on the other hand, the risk of severe adverse reactions is also high, then the ratio may be low. The risks may be so great that even the potential life-saving qualities of the medicine may make its use unacceptable (e.g., a terminally ill patient who does not want a medicine that may prolong life a short period but with major adverse reactions). The benefit-torisk ratio is not expressed as a finite number but in qualitative terms such as high, low, equal, very low, and so forth. The benefit-to-risk concept includes consideration of multiple types of risks. There is the type of risk of adverse reactions described above. The likelihood of one of these deleterious events occurring may often
744
/
P R O B L E M S O F C L I N I C A L D A T A INTERPRETATION
be quantitated. The data base for this is derived from both pre- and postmarketing studies. The likelihood of the adverse reaction’s occurrence will assist a decision as to whether possible clinical benefits appear to be sufficiently greater than risks involved in undergoing treatment. This judgment is based on both the severity of the potential risks that may be anticipated and their likelihood of occurring. Among those risks that may be anticipated, many (if not most) will be minor or moderate in importance. If potentially serious or life-threatening risks are possible, then consideration must be given to the expected frequency of their occurrence. It is clear that the more severe the potential complication is, the less likely that it will be acceptable, unless the potential benefit derived by the patient is expected to be even more substantial. Use of Benefit-to-Risk Ratios
The concept of the benefit-to-risk ratio is often used by physicians as part of their decision-making process to determine the precise therapy to use in a particular situation. A medicine with potential for both high benefits and high risks is often less desirable than a medicine with moderate benefits and low risks because the benefit-to-risk ratio is greater in the latter case. Clinical judgment must be used when choosing among multiple medicines or nonmedicines, all of which involve risks and benefits that cannot easily be compared. A common medical practice is to initiate therapy with medicines or nonmedicine treatments that have the least risk and to switch to other medicines or treatments if the initial ones are ineffective, in a manner that progressively increases the risks to the patient. This approach has the advantage that a medicine with a low risk may have greater efficacy than anticipated in the particular patient being treated. High-Risk Groups
Some patients have elevated risks because of one or more characteristics that place them in a high-risk group. Common factors that place many patients in a high-risk group are compromised renal, hepatic, or immunological function; age (e.g., infants, children, elderly); pregnancy; and other factors described in Chapter 87. High-risk groups for a specific medicine may be identified on the basis of medical history or laboratory findings. The identification of specific risk factors is usually made during clinical investigations, either through a systematic evaluation or serendipitously. Some of the risk factors identified will probably be related to the characteristics of the patient’s disease, whereas others will depend on the biochemical, phar-
macological, and other activities of the medicine. Risk factors may also be based on a patient’s previous or current clinical treatment as well as most of the patient factors listed in Table 83.25. Establishing the Ratio for Different Groups of Patients
Benefits must be determined in the context of a patient’s clinical condition or disease and the availability of alternate forms of treatment. Thus, the benefit-torisk ratio is usually established for each patient based on his or her particular medical history and present medical situation. Alternatively, benefit may be established for a given group of patients based on the total range and characteristics of their situation. This evaluation is usually in the province of epidemiology. Finally, the benefit-to-risk ratio may be established at the level of an entire society and/or at all levels of the health care system (e.g., formularies, practicing physicians). This is usually in the province of health policy planners, legislators, and regulatory agencies. How Benefit-to-Risk Ratios Change Over Time
As new information about a medicine emerges during a patient’s treatment, in terms of increased risks or anticipated clinical improvement, it may be necessary to reassess the benefit-to-risk ratio. If a particularly severe and serious adverse reaction or safety problem occurs, then the benefit-to-risk ratio may be questioned for all patients in a clinical trial and not solely for the patient receiving the medicine. In the clinical trial environment, revising the informed consent for all patients or even discontinuing the trial may be necessary. In the environment of clinical practice, the physician must disclose substantial new information on risks to the patient, especially in our litigious society. In some cases it may be possible to assess the best treatment for a particular patient through using an N of one (n = 1) clinical trial design (see Chapter 38). A benefit-to-risk ratio often changes over time for a single patient, all patients, or a society. For a single patient, prior to ever receiving a medicine the ratio might be positive and the physician may decide to give the medicine to the patient. Assume that the patient experiences a mild adverse reaction, but it is not as bad as the response the patient experienced with other related medicines. The benefit-to-risk ratio falls, but remains positive (i.e., in favor of using the medicine). Now, the adverse reactions become worse, and the benefit-to-risk ratio becomes negative. The patient is therefore switched to another treatment. Assume another patient takes the same medicine and a skin rash
B E N E F I T - TO-RlSK ASSESSMENTS
develops. If the medicine is intended for a serious disease and no other medicines of equal efficacy are available, the physicians may decide to (1) continue treatment and watch the patient carefully, (2) lower the dose, (3) stop the medicine and begin desensitization procedures, (4) switch to another medicine, (5) treat the problem and continue the medicine, or (6) follow another treatment plan. The benefit-to-risk ratio usually depends on the severity of an individual patient’s disease and its prognosis. The more severe a disease and the more negative its prognosis, the greater the risks that most patients and physicians are willing to take. When a much safer new medicine is introduced the benefit-to-risk ratio changes for all patients with the disease and in some cases most patients will receive the new medicine. For an entire society the information on benefit-to-risk ratios of many medicines and treatments for many diseases is often used to help make decisions on allocating limited health care resources. Improved benefits for patients through discovery of new treatments that have greater benefit-torisk ratios encourages more resources to be allocated to those treatments.
COMPARING BENEFIT-TO-RISK RATIOS FOR DIFFERENT MEDICINES
As indicated in the introduction, benefit-to-risk ratio comparisons for two or more medicines may focus on a single factor or multiple factors. The comparison may be simple or complex, easy or difficult to conduct, or may be impossible because of lack of sufficient information. The results of a specific comparison may or may not have importance in medical practice. If there is to be an impact on medical practice it is essential to express this information in a way that practicing physicians can easily understand.
/
745
minus treatment group rate, divided by the control rate). The major disadvantage of this approach is that it does not reflect the magnitude of the risk without therapy, and this may be misleading or, at best, incomplete in the information provided. Relative risk reduction is often expressed as a percent (e.g., medicine X reduces the risk of stroke by 32%).
Odds Ratio
The odds ratio expresses the relative likelihood of an outcome. It is frequently used in meta-analyses, but for expressing risks it has the same disadvantages as relative risk reduction. An odds ratio of 6, 11, or 15 means that a target outcome (e.g., adverse reaction) is 6, 11, or 15 times more likely to occur than is the other possibility.
Absolute Risk Reduction
Absolute risk reduction is the difference in adverse event rates for two groups, usually control and treatment. The number is usually a decimal and does not make inherent sense to practicing physicians as a basis for making a choice among therapies (e.g., medicine X reduces the risk of stroke by 0.14 compared with placebo). Number Needed To Be Treated
Mathematically, the number of patients who must be treated to prevent one major adverse event is the reciprocal of the absolute risk reduction. This number has a readily understood meaning to physicians and has numerous statistical advantages over the other expressions described above (Laupacis et al., 1988). For example, for every seven patients treated with medicine X one fewer patient will have a stroke.
Expressing Measurements Relating to Benefit and Risk
Laupacis et al. (1988) reviewed four methods of measuring the consequences of treatment and recommended one (i.e. , number needed to be treated) as providing the clearest information for practicing physicians. The four measures are described below. Relative Risk Reduction
Relative risk reduction is an expression of the amount of reduction of adverse events. This term is presented as a proportion of the control rates (i.e., control rate
CONCLUSIONS
In reaching a decision on which treatment (1) a physician should use for a particular patient, (2) an HMO should keep on their formulary, or (3) a health policy advisor should recommend for their society, three questions should be considered. 1. Is the net benefit [i.e., sum of all benefits (including quality of life) minus the sum of all risks] greater than zero? If not, then giving no treatment may be the preferable choice.
746
/
PROBLEMS OF CLINICAL D A T A INTERPRETATION
2. Is the net benefit of the treatment in question greater than the net benefit of other alternative therapies (or no therapy)? 3. If both of the answers to these questions are “ y e s , ” then is the net benefit worth the cost in terms of additional money, equipment, profes-
sional time, and administration associated with the treatment? Additional discussions on benefit-to-risk ratios are presented in references by Pochin (1981), Lowrance (1988), and von Wartburg (1988).
CHAPTER 1 0 0
Coordination and Integration of Statistical Analyses with Clinical Interpretations General and Specific Types of Examples, 747 Practical Approaches, 748
Introduction, 747
Roles of Statisticians and Clinicians, 747 May a Single Professional Fill Both Roles? 747 Integration in an Ideal Situation, 747
Establishing and Maintaining Effective Relationships Between Statisticians and Clinicians, 748
Situations in Which Integration Is Required, 747
vasion of the other’s turf” and a failure to plan clinical trials or interpret data adequately. An incorrect application of statistics in the trial design or data analyses is also possible.
INTRODUCTION Roles of Statisticians and Clinicians
There have been many comments throughout this book on the collaborative activities of a clinician and statistician in planning clinical trials and analyzing results. The analyzed results are interpreted by the clinician, who may request additional analyses to be performed. It is the author’s contention that the role of the statistician is not usually to interpret data clinically or extrapolate data, although statisticians may provide valuable advice to the clinician who has these roles. Clearly, the statistician does provide the analyses on which much or all of the interpretation(s) and extrapolation(s) are based. One of the differences in approach between statisticians and clinicians is that statisticians usually discuss the methods of treating large numbers of patients in a clinic, whereas physicians treat a single patient at a time. This difference is often reflected in different attitudes toward individual patients and groups of patients.
Integration in an Ideal Situation
In an ideal situation, no formal integration process of statistical analyses with clinical interpretations should be necessary. All of the necessary statistical analyses should be completed prior to initiation of the clinical interpretation of a trial. The process of interpretation may be conducted immediately after the analyses are complete. The major point is that they are two separate processes, even if performed by one individual in a sequential manner. Various clinical hypotheses or questions may be posed, which may suggest further exploratory analyses or additional evaluations of the data. Nonetheless, if there is one set of accepted statistical analyses and they are generated prior to development of their clinical interpretation, the two processes should be able to be conducted without the need for formal integration of efforts beyond clinical input into the format and presentation of results.
May a Single Professional Fill Both Roles? SITUATIONS IN WHICH INTEGRATION IS REQUIRED
Some individuals have been trained in the disciplines of both statistics and medicine. This does not affect the preceding comments, since it is possible for an individual to fulfill both roles adequately. When an individual is not adequately trained in both disciplines, a variety of problems may arise, involving both “in-
General and Specific Types of Examples
Sometimes the two processes require a large degree of integration, as illustrated by a few hypothetical sit-
747
748
/
PROBLEMS O F C L I N I C A L D A T A INTERPRETATION
uations. In the first example, only one type of standardized and validated statistical technique is suitable for analyzing the clinical trial data, but two or more different types of clinical interpretations are possible. Each interpretation can be supported to some degree by the statistical analysis. The second example is the (opposite) situation in which two or more statistical approaches to the data are possible but only one clinical interpretation is plausible regardless of which set of statistical analyses are used. The final example is one in which both multiple statistical analyses and multiple clinical interpretations are possible. Specific examples include: 1. A one- or two-tailed t-test may be used, but data are clinically interpreted as positive regardless of which test is used. 2. Data may be analyzed by last observation carried forward or by simply using actual data obtained. Data may be analyzed by using intent-to-treat or by defining an “efficacy population.” The clinical interpretation will be positive only when one (or two) of these analyses is used. 3. Data are only analyzed one way, but depending on whether a global clinical impression of improvement or a global clinical impression of disease severity is used, the outcome will be positive or negative. 4. Data are only analyzed one way. The clinical interpretation of the most important test narrowly misses statistical significance, but the second, third, and fourth tests in order of importance are positive. Is the trial positive? 5. Data are only analyzed one way, but can support two or more clinical hypotheses to explain the results.
Practical Approaches
In order to evaluate whether any of these situations exist, it is often necessary to conduct what the statistician and clinical interpreter believe to be the most reasonable analyses of the data and then to examine the statistical results from a clinical perspective. It should then become clear whether the analyses support more than one interpretation, hypothesis, or model. If the answer to this question is uncertain, then additional statistical analyses may be performed, and a subsequent evaluation made of whether an alternative clinical interpretation may be supported. It should be noted that the present discussion refers to an interpretation of data in regard to the primary and secondary objectives of the clinical trial. Various subgroup analyses may be useful and important for developing hypotheses to test in future studies, but
these are not part of the primary clinical interpretation of data described in this section. Once it has been established that multiple statistical analyses or clinical interpretations are possible, it becomes important to integrate the analyses and interpretations. If there is either a single clinical interpretation (and multiple statistical analyses) or a single statistical analysis (and multiple clinical interpretations), there is a straightforward means of integrating them. That procedure is to choose the best clinical or statistical possibility among the multiple possibilities and to marshal all arguments in support of this choice. Reference to the other possibilities should be made, but strong arguments in favor of the chosen possibility will serve to tie together and integrate the statistical analyses and clinical interpretation. In the process of choosing one statistical analysis among many, reference may be made to reasons based in either the field of statistics, clinical medicine, or both. Insofar as a choice of statistical procedures is supported with clinical arguments or vice versa, there will be a strong connection between the two. The other situation occurs when both multiple statistical analyses are possible to conduct and multiple clinical interpretations are possible to defend. A prioritization within each group may be a useful means of identifying which analyses and interpretations are most reasonable. Alternatively, the single set of data analyses and interpretations may be chosen that fit best with each other. The question may be raised as to whether the analyses (or the interpretations) are actually the best within their own group and also when considered as a pair. In substantiating the choices made among analyses or interpretations, references to published literature are often useful, as is evidence showing that the other choices (or other pairs) lead to less meaningful interpretations of data. When the sample size in a clinical trial is large, there is a greater likelihood that differences will be observed that are statistically but not clinically significant. This occurs because the large statistical power allows the trial to detect small differences between two groups. One means of avoiding this potential dispute is to determine criteria of clinically meaningful medicine activity prior to the trial. ESTABLISHING AND MAINTAINING EFFECTIVE RELATIONSHIPS BETWEEN STATISTICIANS AND CLINICIANS
Some of the most important techniques for effective collaboration between statisticians and clinicians are for them to discuss and agree on (1) goals, (2) work assignments to achieve these goals, (3) providing information in both directions, and (4) eschewing use of
INTEGRAT I N G STATISTICAL ANALYSIS W I T H C L I N I C A L I N T E R P R E T A T I O N
jargon in favor of clear and open communications. These procedures should be operational before, during, and after a clinical trial if optimal data analyses and interpretations are to result. Finally, statisticians and clinicians who are working as part of a team on one project should not operate totally independently but should communicate frequently. It is generally useful after the clinical trial is completed for the statistician to describe what hap-
I
749
pened in the trial from a statistical perspective in addition to providing tables of raw data and analyses. This information will assist the clinician in interpreting the data. Furthermore, the statistician should review the interpretation(s) and any extrapolations reached by the clinician to ensure that they are properly supported by the best possible analyses of the data. Statisticians should help clinicians distinguish between the statistical and clinical significance of data.
CHAPTER 1 0 1
Misinterpretation of Data Paradoxes, Misconceptions, and Distortions of Numbers, 752
Misinterpretation of Data in Daily Life, 750
Example One: Student Test Scores, 750 Example Two: Quality of Education, 751
Averages, 752 Jumping to Conclusions, 753
Misinterpretation From Accepting Statements in Abstracts, 752
Misinterpretation Based on Mispresentations or Confusing Presentations of Data, 753 Misinterpretation Based on Presenting Different Analyses and Data Than Those Designed to Address the Primary Objective, 753
Functions of Abstracts, 752 Variety of Abstracts, 752 Interpretation of Abstracts, 752 References to Abstracts in the Medical Literature, 752 Misinterpretations Based on Publication of Pilot Trial Data, 752
These comments are primarily based on conclusions that usually represent a misinterpretation of the data (apart from the insincere posturing that usually occurs among certain political groups and individuals). A few representative questions that should be asked about the data before drawing any firm conclusions include:
Misinterpretation of data is a common event that pervades most aspects of one’s existence. Just open the pages of any newspaper or magazine and there are innumerable examples of data being distorted, mispresented, and misinterpreted. Most examples result from naivete, inexperience, incomplete reviews, or poor presentation of data, rather than deliberate attempts to distort data. Two hypothetical examples from the world of education (based on actual reports) are briefly described to illustrate this point.
1. What is the percentage of high school students taking the test in each of the states? A state in which poor or borderline students are directly or indirectly discouraged from taking the test will have higher average scores than a state in which a greater percentage of the students are encouraged take the test. 2. Did the percentage of students who took the test in the state increase or decrease compared with the previous year? A concerted effort by teachers to encourage more students to go to college would result in more borderline and poor students taking the test. This would lead to lower scores, although more students might attend college and potentially have a more productive future (which is clearly a major goal of education). 3. Did the absolute values on the test increase or decrease? A state’s position relative to other states could decrease because (1) its scores increased, but to a lesser degree than did other states, (2) its scores did not change, but those of other states increased, or (3)
MISINTERPRETATION OF DATA IN DAILY LIFE Example One: Student Test Scores
State A’s average scores for a national college entrance examination are reported in the newspaper to have fallen compared with their neighboring states. State A now ranks 45th out of 50, compared with its previous ranking of 43rd. Articles and editorials are written in many state newspapers about declining educational standards and results within the state. Loud calls are heard from politicians for more money to be spent on education and for the quality of the schools to improve. Many other strong statements are made at public meetings, in the state legislature, and elsewhere.
750
M I S I N T E R P R E T A TION O F D A T A
its scores decreased, but those of other states increased, did not change, or decreased to a lesser degree. 4. Are other states designing their educational curriculum to help students do well in the tests? “Teaching to the test” is becoming a more common practice. If the neighboring states are doing this and their scores surpass those of state A, it does not necessarily indicate that the standards of education in state A have changed. 5. Are more students in other states taking preparatory courses outside of school to help with their tests? If there is a large economic difference in the population of the states and state A is poorer (or wealthier), this factor would be important. 6. Are a greater percent of students enrolled in private schools in one state than another? There are various reasons why educational tests should not always be compared between public and private schools, without caveats. 7. What social groups of students are taking these tests? If the often quoted fact is correct that black children and poor children are at a significant disadvantage in taking the test because it includes words and knowledge that are more familiar to the middle class rather than being class neutral or balanced, then this factor could have great impact on the state’s scores. Have more poor and blacks entered the state? Is a higher percentage of this category of students taking the test compared with other states? If other states’ scores are rising faster than state A’s scores, it is necessary to determine the reasons before making interpretations about the conclusion. This rational approach usually is drowned out by reports in the media and at meetings by a loud chorus of charges, defenses, and entrenched positions. Each side in an argument often focuses their efforts on reaching decision makers who have political clout, rather than discovering and presenting an objective view based on logic and reason. The public seldom learns enough details (without hard work) to reach an independent judgment based on all available data on an issue. Even motivated individuals are seldom able to obtain sufficient information on any topic to reach a truly informed decision. Many additional questions and issues should be considered for almost any item reported in the news. That is probably why so few reasonable interpretations are provided by the media for world news events, statistical results, or other information one is exposed to in daily life. Few people want to ask continually the pointed questions necessary to achieve a better interpretation of the events presented. Those that do make a great effort usually learn that the answers received
/
751
to their appropriate questions are unclear or may be viewed in various ways.
Example Two: Quality of Education
The second example concerns a school system that is always bragging about how fine the education is that their students receive, because more students become National Merit Finalists each year than from any other school in the entire state. It is unreliable to reason backwards from an event to its purported cause in this manner. A more likely reason that the school leads the state in the number of finalists relates to the pool of students from which the school system draws. In this case it was observed that the parents of the students are primarily highly educated professionals, who are associated with one of two fine universities in the area, work at a nearby research park, or provide professional services to the community. This pool of students would be highly motivated and educated at home. They would make any school system look like a winner based on standardized tests, almost independently of the level of the education provided by the school. It is possible, however, for a school or school system to test the hypothesis that it has the best educational program in a region or state. Students in most grades are currently given national standardized tests. The differences in each student’s score from year to year could be easily calculated. If a school offered superior education, then the differences in a longitudinal educational trial over the 12 years of public education for each baseline strata of student (e.g., students with high, medium, and low scores at the start of first grade) should be greater in that school system than in others. Even then, it would be impossible to conclude that the education those students received was outstanding. For instance, the teachers could be “teaching to the test,” particularly in the higher grades, when students (and schools) are thinking more about entrance into universities arid colleges. Second, tests are an imperfect measure of education, although relative improvements (in percentiles) from year to year over a 12-year period might be expected to have some relationship to the quality of education. Third, both social and economic factors would have to be considered in comparing different school systems. Other questions and issues involve possible assessment of (1) the percent and number of students graduating, (2) the number who attend or eventually graduate from college, and (3) all students or just a subset (e.g., those intending to go on to college). Major and minor decisions must often be made on a medical issue without answers to most of the im-
752
/
PROBLEMS O F C L I N I C A L D A T A INTERPRETATION
portant questions necessary to interpret the data fully. Nonetheless, it is essential that attempts be made to obtain answers to major questions.
MISINTERPRETATION FROM ACCEPTING STATEMENTS IN ABSTRACTS Functions of Abstracts
Abstracts published in reputable journals as part of professional meetings serve several valuable functions. First, by reading them prior to a meeting, people can decide whether or not to attend. Second, they enable people at the meeting to decide which talks to attend, which poster sessions to visit, and which people to contact. Subsequent to a meeting they allow people to obtain a flavor of what transpired or to refresh their memories if they attended the meeting. Variety of Abstracts
The relationship between the contents of abstracts and scientific truth is often tenuous. Many abstracts are written before experiments or clinical trials are complete and abstracts often present interim data or are even written without any actual results. Others project what the authors anticipate will be found in their clinical trials or experiments. This practice is clearly unethical. Still others create or round out stories or experimental results to develop a whole picture that contains various distortions. Interpretations reached are often inadequately supported by the data. Interpretation of Abstracts
Readers almost never have sufficient data to agree with or challenge the results and interpretation reported in an abstract. This issue is not a major problem when the full results are published within a year or so in a peer-reviewed journal. At that point the reader may more completely judge the quality of the work conducted, its presentation, and interpretation. Abstracts should be viewed as having the functions mentioned above. If the data are important the authors have a responsibility to publish a complete paper. The failure to find a full paper in the literature within 3 or so years after an abstract with important results is published is definitely not proof that the abstract was false, or that later results went in a different direction. Nonetheless, suspicions will be raised. Many authors freely admit that some of their abstracts were prepared prematurely in that initial results were not confirmed, or that final results were not worthy of full publication.
References to Abstracts in the Medical Literature
Another problem regarding abstracts is that they are sometimes quoted as equal references to full papers in support of, or to refute, a certain point. The author is aware of several abstracts that have been widely quoted to support claims of a medicine’s efficacy, but efficacy has not been supported by data in full publications. Specific examples are not quoted to avoid embarrassing those authors. The conclusion is that abstracts should be used for purposes described and not as references to scientific truth.
MISINTERPRETATIONS BASED ON PUBLICATION OF PILOT TRIAL DATA
It is unfortunately true that many (if not most) pilot trials are generally viewed as “quick and dirty” trials (i.e. , designed, conducted, and possibly analyzed at a lower standard than that applied to controlled trials). Thus, the data obtained are often considered to be useful to apply to subsequent (better controlled) trials but are rarely convincing on their own. Pilot trials have a higher incidence than controlled trials of arriving at false-positive or false-negative conclusions and thus may mislead clinicians into believing that a given therapy has less or greater value than it truly has. The proper responses to this dilemma are to (1) improve the standards of pilot trials so that data obtained are more reliable, (2) view reports of pilot trials with some skepticism, and (3) understand what types of data can validly be obtained from pilot trials. PARADOXES, MISCONCEPTIONS, AND DISTORTIONS OF NUMBERS
There are many paradoxes about numbers, logic, statistics, and other areas that may sometimes influence, affect, or interfere with the interpretation of clinical data. A humorous collection of paradoxes is presented in the book aha! Gotcha by Martin Gardner (1982), the author of Scientific American's “Mathematical Games column.” Many of the points he makes about statistical concepts are important to consider when one is interpreting data. A few of these will be summarized. Averages
When there are a few extreme values, information about the “average” (arithmetic mean) may be highly misleading. If 100 patients have two lesions each and 5 patients have 200 lesions each, the average number of lesions is 1 1 .4. This number can be a misleading
MISINTERPRETATION OE
figure if only the average number of lesions is stated in a report or publication. if a study on drowning incidents reports that one person drowned in a river that has an average depth of 15 centimeters, it may conjure an image of someone who was inebriated, toxic, or had a heart attack and fell face down in a shallow pool of water and died. If it turns out that the person drowned in a place where the river is 10 meters deep, this information will assist in developing a more accurate overall impression. The word average is sometimes used to denote the median (the middle value in a list in which all values are placed in order of their magnitude) or mode (the value that appears most often). The average value given may be imaginary and not denote a possible situation. A well-known example describes the average number of children in an American household as 2.4. Thus, the term average is ambiguous primarily because it may have three definitions, mean, median, and mode, and also because it does not provide any indication of variance of the numbers used to generate the average.
Jumping to Conclusions
Assume that it is reported that more people proportionally die of cancer in Florida than in any other state. Does this mean that there are environmental factors in Florida that predispose patients to contracting cancer? The answer is that many elderly people go to Florida because of the environment, and since the overall incidence of cancer is related to age, Florida has a higher incidence of cancer than other states. Similarly, patients with tuberculosis often move to certain states because of their climates. This makes it appear that the incidence is highest in those states and may incorrectly raise the suspicion that something about that state contributes to the disease. Many events, major, minor, and even trivial, change over time, and if two or more events are each measured and compared, the changes observed in each may appear to be related. This does not necessarily mean that there is a true connection between the two events. Assume that the annual number of patients in a state who die from a particular type of cancer had been increasing over a number of years. If one also measured the consumption of chewing gum, the number of bingo halls, or hours of television watching, and one o r more of those events also showed a similar increase over the same period, no one would seriously associate the increase in cancer with the other event(s) as cause and effect. As an aside, it is possible that there could have been an additional factor such as increased radiation
1)AT A
/
753
emitted by defective television sets or toxins found in the chewing gum that might raise questions about their association with cancer. Even then, however, a true connection with increased numbers of cancer cases in the short term would probably be far fetched. If, however, there was a prolonged increase in pollution levels, radiation leaks from a nuclear power plant, chemical additives in foods, or changes in a factor that had been previously related to cancer, the association with the increased number of cancer deaths would be more likely to be related as cause and effect. Murphy (1982) described a number of bizarre associations that turned out to be true. Acquiring adequate information prior to reaching a conclusion is essential but difficult if the issue is one of great public interest and there is pressure for a rapid resolution of a problem. The point of this discussion is that it is important to avoid jumping to a conclusion.
MISINTERPRETATION BASED ON MISPRESENTATIONS OR CONFUSING PRESENTATIONS OF DATA
Chapter 105 discusses mispresentation of clinical data and Chapter 13 in Presentation of Clinical Data (Spilker and Schoenfelder, 1990) illustrates a number of overly complex and confusing presentations. An index of suspicion is often the only clue that the presentation contains flaws. The golden rule to follow in this area is that interpretations should be based on actual raw and tabulated data whenever possible, rather than graphical presentations (unless a complete set of data are graphically presented) or data summaries. The reader should be satisfied that all major questions about the data have been addressed before accepting an interpretation as valid.
MISINTERPRETATION BASED ON PRESENTING DIFFERENT ANALYSES AND DATA THAN THOSE DESIGNED TO ADDRESS THE PRIMARY OBJECTIVE
Most, if not all, clinical trials generate data that are analyzed in a variety of ways. Even when a single statistical test is planned to be used for the analysis, a variety of other tests and approaches are often explored. This is generally acceptable. However, some authors find that the data are more interesting when new clinical questions (i.e. , trial objectives) are asked after the trial is completed, rather than simply analyzing the data that address the primary objective. If these results are published and no mention is made ot the original objective, then it is unethical.
754 /
PROBLEMS OF CLINICAL D A T A INTERPRETATION
Signal Device Implemented
Baseline
Signal Device Withdrawn
5
Number of Pushups Per Hour 2
0 2
3
4 ; 5
6
7
8 ; 9
101112
Daily Assessment Probes
FIG. 101.1 Frequency of pushups per hour and mean values for each of the three phases. (Reprinted from Gouvier et al., 1985 with permission of the American Congress of Rehabilitative Medicine.)
70 60 50 Mean Pushup Duration (Secs)
40 30 20 10 -
Baseline
Signal Device Implemented
Signal Device Withdrawn
FIG. 101.2 Mean duration of pushups within each experimental phase. (Reprinted from Gouvier et al., 1985, with permission of the American Congress of Rehabilitative Medicine.)
80 70 Mean Duration of Up-time 60 Per Hour 5 0 (Secs) 40 30 20 10 Baseline
Signal Device Implemented
Signal Device Withdrawn
FIG. 101.3 Mean duration of up-time per hour within each phase. (Reprinted from Gouvier et al., 1985, with permission of the American Congress of Rehabilitative Medicine.)
MISINTERPRETATION OF DATA
Baseline
Signal Device Implemented
/
755
Signal Device Withdrawn
100 90 80 70 % of 15' Intervals 60 c Adequate 50 Pushup 40 30 20 10 010
11
12
FIG. 101.4 Percentage of 15-minute time periods across conditions in which a pushup of adequate duration occurred, c means “with.” (Reprinted from Gouvier et al., 1985, with permission of the American Congress of Rehabilitative Medicine.)
A classic paper by Gouvier et al. (1985) describes how the data of a single patient trial gave one interpretation when the original objective was addressed, but totally different interpretations when variations of the original objectives were assessed. This is shown in Figs. 101.1 to 101.4 below. Imagine how much distortion is possible in a more complex clinical trial. Plotting the number of pushups per hour at each session (Fig. 101.1) shows an effect of the timing device but no lasting effect after it was removed. Plotting the mean duration of pushups (Fig. 101 .2) shows that there was a lasting effect, but it was only observed after the
intervention was stopped. The mean time per hour spent doing pushups combines aspects of frequency and duration and shows different results (Fig. 101.3). This shows a dramatic improvement when the signal device was implemented that partially persisted after it was withdrawn and implies a training effect. The primary objective of the clinical trial was to train the patient to do pushups of at least 10 seconds duration every 15 minutes. When these data were plotted (Fig. 101.4) the results show an effect while the signal device was implemented, but no training or lasting effect occurred.
PART X
Publishing Clinical Data and Evaluating Published Literature
A man’s judgment cannot be better than the information on which he has based it. Give him the truth and he may still go wrong when he has the chance to be right, but give him no news or present him only with distorted and incomplete data, with ignorant, sloppy or biased reporting, with propaganda and deliberate falsehoods, and you destroy his whole reasoning processes, and make him something less than a man. — Arthur Hays Sulzberger, American journalist
I equate the function of critical journals of opinion with the spirit and method of science. Dissent is the journalist’s way of asking the scientist’s question: “ W h o says s o ? ” “Can you prove it?” or, simply, “ I don’t believe it.” It is the way by which individuals and societies protect themselves not only against oppressive orthodoxies but against foolish fallacies. — Carey McWilliams, American writer, editor
CHAPTER 1 0 2
Preparing Articles for Publication Choosing a Journal, 759 Preparing an Article, 760 Summary and Introduction of an Article, 761 Materials and Methods Section, 761 Results Section, 762
Discussion Section, 766 Illustrations, 767 Reviewing a Manuscript, 767 Proofing Manuscripts, 769 Editing (i.e., Compiling) a Book, 770
At some point after data have been interpreted, it is often desirable to prepare an article for publication. In addition or alternatively, a summary report may be written for distribution to a limited group of individuals, or an extremely detailed final medical report may be prepared as part of a regulatory submission. The detail and scope of a report (as opposed to an article) vary enormously, from a cursory synopsis to a highly detailed report that includes most (or all) of the raw data obtained plus detailed statistical analyses. This chapter considers the example of an article that is prepared for publication, and Chapter 125 discusses final medical reports. Many different types of articles may be prepared to present or discuss data and/or results from clinical studies. A number of types of published articles are listed in Table 102.1. Although most comments in this chapter are applicable to most categories listed in this table, the specific example used is an article that presents data or new information. Before preparing the article, all data to be presented, interpretations, and extrapolations should generally have been worked out and should be clear to the authors). Some parts of the article (e.g., introduction, methods) may be prepared while the clinical trial is still progressing. If more than one person prepares the actual document, a meeting to review each step in the process is usually essential. Assignments may be made at that time so that each individual understands and accepts his responsibilities in helping to prepare the final draft. It should be stressed that the basic approach, as well as contents and conclusions of the article, should be clear to all authors before they start to write the first draft. It is important to agree on the
order of authorship or at least to identify the primary author to prevent problems from developing.
CHOOSING A JOURNAL
It will be advantageous to have chosen the specific journal that will be receiving the article prior to writing the article. This allows the author to fit the format and sometimes even the style to match the journal’s requirements. Many biomedical journals have agreed to review articles prepared according to a standardized format (see International Committee of Medical Journal Editors, 1982, for a complete description). This convention has been accepted by more than 150 journals and should save a great deal of time for authors in revising manuscripts. Another alternative is initially to write the article in a general standardized format and to revise it after one or more drafts have been completed to accommodate the desired format of a specific journal. The choice of journal usually involves many factors, some of which are listed in Table 102.2. Any one of these factors may be paramount in influencing an author to choose a particular journal, or a combination of factors may be involved. Once a prospective journal or journals are chosen, it is useful to obtain their “instructions to authors.” These are usually published in each issue or in selected issues of a journal. A document titled A Compilation of Journal Instructions to Authors (U.S. Department of Health and Human Services, Public Health Service, 1980) is a collection of many such guidelines to authors from biomedical journals.
759
760 /
PUBLISHING D A TA AND EVALUATING L i I ERAT'URE
TABLE 102.1 Categories of published articles in
clinical medicinea b
1. Article on methodology or procedure (new or previously described) 2. Article on a hypothesis (new or previously proposed) 3. Article on a mathematical model to describe events (e.g., ideal or real model that is new or was previously proposed) 4. Report to present an observation 5. Report to present data or new information 0 6. Report on case study or studies, with or without substantial commentary 7. Review of a specific or general area or issue 8. Discussion of pros and cons about a current controversy or issue of medical interest 9. Editorial or commentary 10. Surveys of various groups (e.g., investigators, Institutional Review Boards, regulatory agencies, pharmaceutical companies) 11. Discussion from professional meetings, presented either verbatim or summarized 12. Proceedings of a professional meeting (e.g., symposium) 13. Abstracts and/or posters to be presented at a professional meeting 14. Discussion of public policy 15. Letter written to a journal (1) to present data on a hypothesis or proposal, (2) to comment on an article, or (3) to respond to another letter written to the journal 16. Theses 3
The articles in most of these categories may support or challenge previous publications or provide new information or interpretations. Numerous other types of medical reports that are published do not generally present original data and are not described (e.g., book reviews). b Various combinations of these types of articles are often published. c This is probably the most common type of clinical publication.
PREPARING AN ARTICLE Order of Writing Sections There are many different approaches used by authors. An individual author’s approach will often vary from article to article, depending on the particular data, type of article to be prepared, mood of the author, and other factors. There is no ideal method of “where to start” and “how to proceed,” but a few general comments are relevant about the usual order in which sections are written: 1. The introduction, methods, and results sections may be written in any order, but the results must be presented in a way that addresses the objectives of the clinical trial as described in the introduction. 2. The discussion section is usually not written until after the above three sections are complete or close to completion. 3. The summary and/or abstract may be written at any time during preparation of the manuscript but is often prepared at or near the end of the process.
Use of Checklists It is often helpful to use a checklist to ensure that all relevant information is included in an article that is being prepared. This may be utilized either before, during, or after a draft is prepared. Even experienced investigators and writers who may not have any difficulties in “dashing off” a polished manuscript may fail to include information that would be useful to the reader, especially in the methodology section. At present, it is relatively uncommon for journal editors or reviewers to utilize checklists to ensure that articles submitted for publication contain all of the elements that should be included. Word Choice The choice of words used in any scientific publication may vary from those with highly specific, concrete meanings to those with broader or more general denotations. Words or expressions that are vague should be expurgated from any draft. The most precise word or phrase should be chosen to convey the most specific meaning possible with the fewest connotations. Words
TABLE 102.2 Selected factors to consider in choosing
a journal to publish a medical paper 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
Authors’ assessment of the importance of the article Reputation of journal Circulation of journal 3 Nature of the audience who read the journal (i.e., is it primarily a general audience with varied scientific interests or a specialized audience with highly focused interests) Previous publications on the same topic by other authors in the journal Previous publications by one or more of the current authors in the same journal Journal’s reputation for rapidity of reviewing manuscripts Time required to communicate with the editors (e.g., a journal with offices on another continent may slow communications to an unacceptable degree) Journal's reputation for rapidity of publishing manuscripts after an article is accepted Cost of each page published and whether these charges may be waived if the author(s) cannot afford them Desire to choose the journal of a particular society Requirements posed by submitting an abstract on the same material to a society that requires that the full article first be sent to a specific journal Desire to present the material at a specific meeting or conference, the proceedings of which will appear in a predetermined journal Relationship of one of the authors with the editor or others associated with a particular journal Desire for widespread publicity for the article, since certain journals are carefully read each week by news services Frequency of publishing (e.g., quarterly, monthly, weekly)
3 Journals published in the United States include this information in one issue each year (usually toward the end of the year).
PREPARING ARTICLES FOR PUBLICATION
with numerous connotations are more likely to be misinterpreted. In choosing the words to describe the interpretation of data, it is not always possible to use the most precise terms that one would like to choose, because the data may be uninterpretable, incomplete, or subject to more than one interpretation. Choosing Tenses In many aspects of article preparation, a few simple rules generally suffice to guide an author successfully through the maze of potential problems. For example, the question of which tense to use may readily be addressed in most situations with the following few rules. 1. Use the past tense to present your results. 2. Use the past tense to attribute results to others. 3. Use the present tense to refer to parts of your article. 4. Use the present tense to state facts originally reported by others. Sources of Guidance There are a number of excellent books that present details and approaches used to write a scientific article. Some of the best are by Huth (1982), C B E Style Manual Committee (1983), and Day (1983). The book by Day presents numerous practical suggestions for dealing with journal editors and reviewers. SUMMARY AND INTRODUCTION OF AN ARTICLE
/
761
types of synopses. If a separate abstract and summary must be written for an article, it is best to consult the instructions to authors as well as examples in the journal to distinguish accurately between the two. A clear description of the clinical trial objectives and rationale for conducting the trial are the most important parts of the introduction. If the objectives are not clearly and adequately presented, it will be impossible for readers to know whether the trial design was appropriate and whether the data obtained are relevant. It is unfortunately true that the background and rationale for conducting trials are sometimes created “after the fact.” This technique makes it appear that the authors were “clever fellows” when, in reality, the trial was conducted for totally different reasons and the explanation given in the article is a purely teleological and convenient explanation. The rationale for conducting many trials is not necessarily of central importance to the data obtained. The important point is whether the objectives were reasonable and were satisfactorily addressed by the trial design and protocol. A checklist of the relevant components that may be included in an introduction is given in Fig. 102.2. Some authors prefer long introductions, but the current vogue in most journals is for short ones. The introduction should present a synopsis of background information that allows readers to understand the results without having to refer to other articles. It is generally worthwhile to include mention of important results in the introduction rather than trying to build suspense through the paper until the reader reaches the conclusion. MATERIALS AND METHODS SECTION
The relevant components of a summary are given in Fig. 102.1. The terms abstract and summary often have specific and distinct meanings but are also often used interchangeably and may refer to a variety of different
Most shortcuts are taken in the materials and methods section. Every scientist and clinician who has tried to reproduce the work of another scientist (or clinician)
TOPICS FOR SUMMARY OR ABSTRACT Name and doses of medicine(s) studied plus route(s) of administration Essential background information Number of sites and number of patients studied Major objective(s) of trial Major method(s), techniques, apparatus used to obtain data Major parameter(s) used Synopsis of results Important findings or conclusions that are described in the article Major interpretation(s) and hypotheses Major implications and extrapolations of results
FIG. 102.1 C h e c k l i s t of t o p i c s t o c o n s i d e r f o r i n c l u s i o n i n t h e s u m m a r y o r abstract o f a p u b l i cation.
762 /
PUBLISHING DATA AND EVALUATING LITERATURE
TOPICS FOR INTRODUCTION Historical background and pertinent literature Statement of the problem Methodological approach taken in this trial Rationale for the approach taken Previous experience with the medicine, medical device, or other treatment Description of the primary and secondary objectives Major results of the trial Reference to any preliminary publication or abstract
FIG. 102.2 Checklist of topics to consider for inclusion in the introduction of a publication.
based solely on information in the methods section knows how woefully inadequate it often is. It is rare for authors to include the necessary amount and level of detail required to explain fully the results and discussion and/or to repeat the clinical trial if desired. Of course, when authors send a manuscript to journal editors with detailed methodological descriptions, it is likely that a great deal of material will be cut. This issue is discussed in Chapter 106. The specific information that is often helpful to include in the methods section is listed in Fig. 102.3. The ten categories are subdivided to indicate better the specific types of information that may be included. Several of these categories are rarely presented in currently published articles but would allow readers to evaluate better the quality of the trial and data obtained. Subheadings are often useful to divide this section into smaller parts for ease of reference. The number of patients who are described as “entered” in a clinical trial or who are involved at different stages should be defined clearly in either the materials and
TABLE 102.3 Representative types of illustrations Drawings Photographs Line graphs Histograms Pie charts Flow charts Decision trees Algorithms Schematic diagrams
10. 11. 12. 13. 14. 15. 16. 17.
Maps with various identifications Four-quadrant distribution charts Gantt charts Computer graphics Three-dimensional line graphs Three-dimensional histograms Scatter diagrams Vector analyses
TABLE 102.4 Parameters usually plotted along the X or Y axis Parameters that are usually plotted along the X axis Dose of medicine versus Time versus Time versus
methods or results section. A number of possible categories are listed in Table 79.3 for types of involvement and in Table 79.4 for definitions of enrollment (see Chapter 79).
RESULTS SECTION Problems Two major problems often encountered in the results section of publications are insufficient quantity of data on both safety and efficacy parameters and inadequate quality in terms of the data presentation. These problems may be addressed by critiquing one’s own work and by discussing this issue with statisticians or clinical peers. Expressing Results as Percent Change An example of the latter problem is observed when data are only presented as “percentage change” or in other vague terms that do not provide sufficient detail to convert the percentage figures to values with real units. There are occasions, however, when expressing data as percentage change is acceptable (e.g., decrease in neuromuscular twitch tension). It is also relevant to indicate the percentage (or number) of patients who improved according to established criteria. If a report fails to provide these data and only presents average changes of the groups, it may be difficult to understand the clinical significance of the results. Summary tables and figures should be presented in the units measured, and a careful description of all manipulations of these data should be provided.
Parameters that are usually plotted along the Y axis Response to medicine (i.e., activity) Concentration measured Response to medicine
Quantitating Results Authors should generally quantitate all important results presented (e.g. , inflammation) even if only a qual-
PREPARING ARTICLES FOR PUBLICATION
TOPICS TO INCLUDE IN A MATERIALS AND METHODS SECTION Trial Design Basic design (parallel, crossover) and length of each period (baseline, dose ascension, treatment, taper, follow-up) Type of blind used and how it is defined (e.g., in a double-blind trial were the pharmacist, monitor, statistician, and other staff also blind?) Verification of the blind Type of randomization used Number of trial sites Number of patients planned per group Number of clinic visits per patient Type(s) of control groups used Conditions where concomitant medicines or other therapies were allowed or required _____ Doses of medicines studied and route of administration Frequency and time of medicine administration Method(s) to adjust doses during ascension or taper, for adverse reactions, or for other reasons Frequency of allowing dose adjustments Inclusion Criteria (includes “exclusion criteria’’) List or description of all pertinent inclusion criteria including the methods and criteria used to establish the diagnosis Replacement criteria for patients discontinued by investigator Replacement criteria for patients who drop out of their own volition Efficacy Measurements Parameters that were measured Was global impression evaluated by the investigator and/or patient Frequency of measuring each efficacy variable Method(s) of measuring each efficacy variable Safety Measurements Parameters that were measured Frequency of measuring each safety parameter Method(s) of measuring each safety parameter Equipment Identification of all equipment used Type, manufacturer, manufacturer’s address of relevant equipment Model number(s) of relevant equipment Number of tests, samples, or repeated determinations at each session Information on calibration of equipment Medicines and Chemicals Used Identification of all medicines and chemicals, including the type of salts used Source or supplier, location Relevant chemical information (e.g., molecular weight of the heparin used) Storage conditions Dispensing techniques Methods used to modify dosages Concurrent medicines allowed or required Characteristics of placebos evaluated, to demonstrate that they were “identical” to trial medicines
FIG. 102.3 Checklist of potential topics for inclusion in the materials and methods section of a publication. Consult the instructions of the journal for additional information that may be required.
/
763
Pharmacokinetics/Medicine Levels Parameters that were measured and evaluated Methods used to collect and store samples, measure and evaluate the parameters Frequency of collecting samples, measuring and evaluating the parameters Influence of pharmacokinetic values on patient treatment or trial design (e.g., did medicine levels above a certain value require dosing to be modified?) Assays used to measure medicine levels Trial Management Dates of the trial (initiation and completion) Location(s) of the trial (especially if different from sites listed for authors) Operational definitions used. Confirm that important terms are defined and used the same way as in the literature, unless there are reasons to use different definitions Individuals who participated in the trial (i.e., investigator’s staff, sponsor, consultants, others) Discussion of any pretrial roundtable meetings Description of the major roles of staff who assisted in the trial Relevant qualifications of the staff (e.g., did anesthesiologists, board-certified anesthesiologists, or nurse anesthetists participate?) Description of monitoring procedures used Indicate if the trial monitor was blind to patient randomization Methods of data collection, processing, and quality assurance Measurement(s) of patient compliance Measurement of investigator compliance Method(s) of handling patient and/or investigator deviation from acceptable compliance Method(s) used to verify the trial blind Instructions given to patients Indicate whether a written informed consent was obtained from every patient Indicate whether monitors were blind to patient randomization and results of any interim analyses Indicate if and how the protocol changed during the trial Indicate if an identical protocol was followed at each site. Describe any differences Indicate if an end-of-trial questionnaire was used Indicate if the protocol was approved by an Ethics Committee/lnstitutional Review Board Patient Demographics and Related Factors Number of patients screened, accepted, enrolled (plus reasons for failing screen or refusing to enter trial), and completing each period of the trial Number of patients discontinued from the trial by the investigator (plus reasons for discontinuation and demographic summary) Number of patients who dropped out of the trial (of their own volition) and demographic summary Source(s) of patients Number of patients in each treatment group at each site Summary statistics of demographics for (1) all patients entered, (2) all patients completed, and/or (3) all patients to be evaluated 1. Age of patients enrolled of each sex (plus range or standard deviation) 2. Sex of patients enrolled 3. Pertinent physical characteristics (e.g., weight, height, race) 4. Pertinent social, political, or economic characteristics (e.g., education, marital status, place of residence, income) 5. Pertinent baseline/screen characteristics (e.g., vital signs) Compliance of patients with taking medicines as prescribed Compliance of patients with other aspects of the trial (e.g., keeping scheduled visits) Data Analysis Describe how data were processed (e.g., double entry, quality-assured procedures) and analyzed Indicate power of the trial Describe which statistical tests were used Describe how the statistical tests were applied Discuss how data were handled for patient dropouts Discuss how data were handled to account for protocol violations Provide criteria used for patient improvement and/or medicine activity State number and type of any interim analyses performed Discuss any effect of interim analyses on the trial
FIG. 102.3 (continued)
PREPARING ARTICLES FOR PUBLICATION
itative scoring system is used (e.g., one plus to four plus). Relying on vague descriptive terms (e.g., inflammation was generally improved) is less meaningful to readers. Another alternative is to specify the number of patients with improvement or deterioration in addition to presenting means or other values. It is well known that mean values can be quite misleading even when standard errors or standard deviations are included. Median values are often more appropriate to use and often provide information that is meaningful clinically.
/
765
other hand, there are a large number of reasons for patient dropouts, and the investigators may believe that it is not scientifically correct to include any (or all) of the dropouts in the analyses. The statistician may be in agreement. One solution is to analyze and present (or discuss) the data both including and excluding patient dropouts and discontinuations. It is important to state what decisions were made regarding dropouts and discontinuations and why they were made. Chapter 31 discusses this topic in detail. Protocol Violations
Information To Include
A number of specific types of information and data that may be included in the results section are listed in Fig. 102.4. When only a small number of patients are studied, the presentation of raw data is usually particularly helpful in understanding the results. It would be beneficial to interested readers if more data than could be included in a paper were sent to a group that would serve as a repository and provide these data to interested readers for a reasonable fee. This concept is discussed further in Chapter 108.
Patients Who Were Discontinued or Who Dropped Out
The interpretation of a clinical trial may be entirely different if the data of all patients who were discontinued from a trial are included. An example of this is illustrated by the Department of Clinical Epidemiology and Biostatistics, McMaster University (1981). On the
An additional complication in presenting the results concerns patients who took doses that were protocol violations or were inappropriate: (1) the dose prescribed was inadequate to elicit the desired effect fully, (2) the dose given violated the protocol by being given at the wrong time or in the wrong amount, or (3) concurrent medicines were given that were protocol violations and were believed to contaminate the data. Approaches to these issues and presentation of other protocol violations should be discussed with a statistician. A movement away from p values and toward the use of confidence intervals is apparent in numerous medical journals (Braitman, 1988; Gardner and Altman, 1988). This is a positive step because it provides readers with more information for interpreting results. For example, merely stating that a comparison is not significant does not tell you whether the p value is 0.055 or 0.20; illustrating confidence intervals gives a still better view of how different the data are that are being compared.
TOPICS FOR RESULTS SECTION Patient accountability (number entered, dropped, completed; reasons for patient dropout and for investigator discontinuation) Modifications of the protocol Violations of the protocol Safety data (e.g., laboratory examinations, adverse reactions, electrocardiograms, physical and plasma examinations, plasma levels of medicine) Efficacy data (analyses of data from all completed patients and from other defined groups of patients) Pharmacokinetic data Placebo data (e.g., magnitude of the response, effects observed) Data from other control groups Problems encountered in the trial Missing data (e.g., quantity, areas where it occurs)
FIG. 102.4 Checklist of potential topics for inclusion in the results section of a publication.
766
/
PUBLISHING DATA AND EVALUATING LI TERATURE
DISCUSSION SECTION
ers of the article. Some of these are:
Purposes From the perspective of presenting the interpretation of data, the discussion is the most important part of an article. This is the section where (1) data are interpreted, (2) interpretations are extrapolated, (3) additional hypotheses are presented or modified, (4) other clinical trials are critiqued, and (5) speculation (within limits) is presented. Some authors propose other experiments or trials that are suggested by their outcomes and may even indicate which trials they will undertake next.
1. Findings in the present clinical trial should be compared with those of previous studies that are related, and significant points of difference (and agreement) should be noted and discussed. 2. If the objectives of the trial were not achieved, the reason(s) for this should be described. 3. In addition to the interpretations discussed, alternative reasonable and plausible interpretations should also be presented. The pros and cons of each may be included. 4. Data from all treatments and groups evaluated should be compared and any unexpected findings discussed.
Responsibilities To Readers
Methods To Create a Discussion Section
Regardless of the many possible directions and approaches an author may adopt in the discussion section, the author(s) have certain responsibilities to read-
Various aspects of a discussion are presented in Fig. 102.5. A few techniques that may be helpful in developing discussions are described below.
TOPICS FOR DISCUSSION SECTION Within This Trial Brief summary (but not a recapitulation) of the major result(s) Discuss which aspects are of statistical significance Discuss which aspects are of clinical significance (both theoretical and practical) Discuss which aspects have statistical significance but do not have clinical significance Discuss how the data might affect future scientific/clinical trials Discuss how the data might affect future clinical practice of medicine Present conclusions and interpretations Present exceptions to the interpretations or conclusions Present aspects of the trial that are unclear or questionable Comparison with Other Trials Discuss how major results compare with those reported by others Comment on reports that do not agree with the present trial and indicate possible reasons for differences Trace historical development of an idea, model, hypothesis, treatment, or other aspect relating to the trial Relate the current trial to the relevant field of medicine Comment on the methodologies and equipment used by others versus those used in the present trial (e.g., validation, state-of-the-art, limitations) Comment on the statistical analyses used by others versus those used in the present trial (e.g., power, number of patients, types of tests used) Implications Discuss questions raised that cannot be answered Propose new hypotheses, models, or questions to be studied in the future Identify which comments reflect the author’s view and which reflect “general consensus.’’ Reference the latter views (if possible) Present any extrapolations Conclusion(s) Present pros and cons of all major conclusions as fairly as possible End the article with a brief synopsis
FIG. 102.5 Checklist of potential topics for inclusion in the discussion section of a publication.
PREPARING A R T I C L E S FOR P U B L I C A T I O N
1. Develop algorithms to sort mentally through complex trial objectives, results, interpretations, possible hypotheses, or extrapolations. 2. Utilize “if, then” exercises in which the author begins a statement with “if” and then adds a phrase beginning with “then.” This encourages speculation about various alternative or more detailed (or broad) interpretations. 3. Determine if other trials can be envisioned that would either confirm or deny a new hypothesis or interpretation. 4. If the current interpretation differs from previous results, evaluate possible reasons for the discrepancy. Examine each of the experimental conditions, parts of the trial design, and manner in which the trial was conducted. 5. If comparing one’s results with other trials appears to be difficult or complex, consider presenting a table of all other published trials (or a selected subset) with pertinent parameters, results, conclusions, interpretations, and other information. 6. Present a table of various possible interpretations of the present trial with the pros and cons of each. 7. Present a table with pertinent details on each of the methods used in the present or other trials. Choose categories that highlight important similarities and differences. Other aspects of a trial may also be presented in this type of comparative framework. ILLUSTRATIONS The basic type of illustrations used in medical articles are generally well known. Most published illustrations are examples of the types shown in Table 102.3. Many other types of illustrations are used in special circumstances. Cleveland and McGill (1985) describe many newer statistical methods that may be used to present data, such as box plots, two-tiered error bars, scatterplot smoothing, dot charts, and graphing on a logbase-2 scale. An informative reference on creative illustrations of data was written by Tufte (1983). A recent book, Presentation of Clinical Data (Spilker and Schoenfelder, 1990), presents over 650 figures, graphs, and tables that are intended as prototypes of major formats. Variations of most of these formats are illustrated and the reason why each is included is clearly indicated. Selected Conventions There are usually conventions as to which parameters should be on the abscissa (horizontal or X axis) and which should be along the ordinate (vertical or Y axis) in a line graph. A few examples are shown in Table 102.4.
/
767
Figures must be used judiciously to illustrate results accurately in an optimal manner. Care should be taken to avoid putting too much information in an illustration, or it will confuse rather than enlighten the reader. Inappropriate types of illustrations or inappropriate scales along the ordinate or abscissa will tend to distort the data and make leader comprehension more difficult. Most individuals are familiar with the cliche of “Figures don’t lie, but some liars figure.” Yankelowitz (1980) showed in a humorous manner how data may be distorted in figures prepared for publication. REVIEWING A MANUSCRIPT A simpler checklist than those described in this chapter was proposed by Dixon et al. (1983). Their list is shown in Fig. 102.6. After a completed draft of an article is prepared and reviewed with (or without) checklists for completeness, there are usually a number of additional steps to follow before it is submitted to a journal. In addition to seeking critiques from peers, there are a number of considerations related to content and format that are described in Table 102.5. TABLE 102.5 Criteria to use in reviewing a
manuscript draft11
A. Related to content 1. Consider overall flow and content in terms of the article's objectives 2. Evaluate each section of the article to ensure that it contains sufficient information and attains an appropriate balance of specific information, interest, and readability 3. Evaluate each paragraph in terms of its content and placement within the section 4. Confirm that sufficient data and measurements of variability are included B. Related to format 1. Evaluate the format and organization of the overall article and within each section. Consider the use of subdivisions within each section 2. Determine if there should be additional (or fewer) tables or figures 3. Determine if some tables could be made into figures (or vice versa) 4. Confirm that all tables and figures are referred to and adequately described in the text 5. Confirm that all references are listed in the text and that they are listed in a consistent manner according to the journal’s requirements 6. Confirm that all references in the text to other publications are listed in the bibliography and that they are all in the same format. Confirm exact details with original sources 7. Review tenses of all words and confirm that the correct tenses were used 8. Confirm that all tables, figures, footnotes, and references adhere to the journal’s requirements 9. Confirm that marginal notes are included to indicate where tables and figures should be inserted in the article 10. Confirm that all pages are numbered consecutively 0
It is advantageous to reread some articles several times, each time concentrating on different criteria or issues. b A number of general considerations in writing and polishing a draft protocol are listed in Table 34.1.
A CHECKLIST OF DATA THAT SHOULD BE CONSIDERED FOR INCLUSION IN A CLINICAL TRIAL REPORT TITLE Indicate design of study and drug(s) Incorporate indexible words Maintain a balance between detail and attention-seeking qualities SUMMARY First sentence should catch the reader’s attention Must be factual INTRODUCTION Why do the study? (background) What is the principal question being asked? PATIENTS Entry criteria (disease and disease activity) Numbers Exclusions Methods of randomization Source Ethics committee approval, patient consent DRUGS Dosages Duration Times of administration Additional therapy allowed METHODS Clinical measurements—objective, subjective, reproducibility, monitoring of side effects, radiology Laboratory measurements—hematology, immunology, routine clinical biochemistry, specialized biochemistry, reproducibility Drop-outs—replacement criteria, handling of data from drop-outs Escape clauses Statistics—tests to be used, power RESULTS Changes in clinical and laboratory assessments Drop-outs—numbers, reasons Report other side effects Statistics—pretreatment matching of groups, treatment group comparisons, improvement within each treatment group, power Individual patient responses to therapy Was ‘blindness’ maintained? DISCUSSION Advantages and disadvantages of the therapy under investigation Side effects—serious, minor, unusual Future work? Therapeutic implications CONCLUSIONS Answer the question set out in the introduction ACKNOWLEDGEMENTS REFERENCES Include references for non-routine methods Refer to review articles in preference to a series of papers
FIG. 102.6 Checklist for preparation of articles for publication by Dixon, Smith, and Evans. (Reprinted by permission of British Journal of Rheumatology from Dixon et al., 1983.)
768
PREPARING ARTICLES FOR P U B L I C A T I O N
Common Deficiencies In reviewing one’s own manuscript, it is important to confirm that common problems are not present. A 10% sample of English language clinical papers published in 1980 were analyzed (Meinert et al., 1984), and the major deficiencies found were: 1. The rationale for the sample size used was not stated. 2. The primary outcome measure was not designated. 3. There was an inadequate description of the method of treatment assignment. 4. There was a discrepancy in the number of patients enrolled with the number used in analysis. 5. The source of support was not stated. Each of these deficiencies occurred in over half the clinical trials analyzed. Many stylistic rules differ from journal to journal, but must be considered and followed before the manuscript is sent for editorial review. Failure to do this may bias the editor or reviewer against the paper.
A. Overall 1. Name of journal listed in cover letter 2. Pages are numbered consecutively 3. All typing is double spaced 4. The appropriate number of copies is sent 5. A copy is kept by the author 6. Any sections to be set in small type should be marked 7. Running title and key words included 8. Footnotes included with indications in text of where they belong 9. All measurements are provided in appropriate units B. Tables and figures 1. Each is on a separate sheet 2. Each is numbered consecutively using appropriate numbers (e.g., arabic, roman) 3. Appropriate formats are used according to journal’s style 4. Places to insert each are shown in text 5. High-quality photographs are enclosed with the first author’s name written on the back along the edge. Consider putting a small piece of tape over the name to prevent smudging or transfer of ink 6. Top of photographs are marked, if necessary 7. Color figures or photographs are discussed with editors before mailing 8. All technical details in the Instructions to Authors are followed 9. Separate figure legends are prepared and all symbols and marks are explained 10. Permission for reproduction is obtained and indicated in the text, and a copy is enclosed with the manuscript C. References 1. Correspondence of names and years with those given in the text, or, all numbers in the text correspond with correct reference citations 2. No references are listed that are not cited and vice versa 3. Abbreviations of journal titles follow the journal’s policy 4. Sequence and style of references follow journal’s policy
769
Some of the points usually described in the Instructions to Authors are indicated in Table 102.6. PROOFING MANUSCRIPTS Proofing the final copy of an article prior to printing is an activity that requires concentration and attention to detail. Therefore, it should be done in a suitable environment. For most people, this means relative quiet and freedom from distractions or other competing thoughts. Objectives The objectives of proofing an article are to confirm (and correct if necessary) that (1) nothing is deleted, (2) sentences flow smoothly and achieve the desired clarity, (3) incorrect words or typographical errors are
TABLE 102.7 Objectives and techniques of reading
galley proofs Objectives
TABLE 102.6 Selected details to check in a manuscript and cover letter sent to a journal for publication
/
1. Confirm that all paragraphs, tables, figures, titles, and major sections are included and are sequenced in the correct order 2. Confirm that all sentences, words, and specific details (e.g., footnotes, table legends, references) of the original manuscript are included in the proof 3. Determine that the placement and size of tables and figures are acceptable 4. Confirm that there are no typographical errors present
5. Determine if any modifications are required to the proof
Techniques to achieve objectives 1. Skim original and proof to confirm that all parts of the article are included
2. Read proof for content (meaning) and compare with original after every few lines or sentences 3. Review layout of tables and figures for appropriateness and suitability 4. Read galley proof line by line. Pronounce each word out loud or silently, mentally dividing multisyllable words into individual syllables to confirm that their spelling is correct® 5. Add clarifications, information, references, “notes added in proof," or modify statements b
a It is often useful to cover all galley proof lines that are either above or below the line being read. Read each line of the galley proof forwards and/or backwards (i.e., read the words in each line from right to left) or alternate between lines, reading one forwards and the other backwards. Reading the lines forward should not give the reader any meaning of the words read that relate to the subject of the text. This reading should be of isolated words or groups of words. If this type of reading is not possible or easy for the reader to achieve, then it is preferable to read each line backwards. b Publishers desire to keep changes by the authors to an absolute minimum.
770
/
P U B L I S H I N G D A T A AND E V A L U A T I N G LITERATURE
first two separate readings. During the second time through the proof, read each word or pair of words as described in the third reading described above.
not present, (4) tables and figures are inserted in the correct place and are of appropriate size, and (5) there is consistency throughout the article in punctuation, grammar, and details of format. Under exceptional circumstances, a “ N o t e Added in Proof” is appropriate. This should only be included if it provides important new information for understanding the article. Three useful approaches to proofing are described:
Method C: Twice Through Read the proof twice. The initial two readings described in the first approach listed should be followed and the third approach for reading the proof is omitted As indicated above, there are various techniques used for proofing an article, and it is usually advantageous to use several of them. Different types of articles often benefit from use of different proofing techniques or other combinations of these techniques. Several are listed in Table 102.7. Many authors regard proofing as a bothersome chore that should be handled by their assistant, secretary, or publisher. This process is certainly also performed by publishers, although the quality of their proofing varies. More importantly, proofing is a final opportunity for the author to confirm that each word chosen is correct and appropriately conveys each thought with clarity and accuracy. Large parts of an article should not be rewritten during proofing, but changing a single word or phrase where necessary may substantially improve the clarity of an important point.
Method A: Thrice Through Read the proof three times using a different technique each time. The first time the proof is read, confirm inclusion of all paragraphs, sentences, tables, and figures. The second time, read each sentence for meaning and inclusion of all words, confirming that punctuation is correct. The third time read each word individually or in groups of two to three words, checking only for spelling. This last reading may be performed by reading each line of the text (and tables) either forwards or backwards. Ensure that if the words are read forward the person proofing the manuscript focuses on the spelling of isolated words rather than on the meaning or context of the words. One of the most effective means of ensuring that each word is spelled correctly in the proof is to read each line backwards, mentally dividing multisyllable words or even reciting the words out loud. This time-consuming technique is not usually necessary in most situations.
EDITING (i.e., COMPILING) A BOOK
Method B: Twice Through Although most medical books identify one or more editors, their major function is to compile chapters written by various authors. The editors serve as content
Read the proof twice. During the first reading think about both of the objectives described above for the
TABLE 102.8 Forms for tracking progress of a book project A. Soliciting authors Chapter number
Prospective authors (in order of desirability)
Chapter title
Telephone number
Called (date)
Issues
Date to respond
Letter with prospectus sent?
Editor’s to do
Date to complete
B. Handling draft manuscripts (or chapter outlines) Chapter number
Chapter title
Senior author
Draft received
Status of draft
Status of chapter
C. Handling final manuscript
Chapter number
Final manuscript rec’d (date)
Acknowledgement sent (date)
Name and title OK? (date)
Is the content OK?
Are the references OK?
Communication with authors
Is style OK for publisher?
Copy made for publisher
PREPARING ARTICLES FOR PUBLICA TION
editors in that they review and evaluate the content of each chapter and may request changes in content or even style. Even though some editors correct grammatical errors, this function is appropriately handled at a later stage by copy editors. Many approaches may be used to compile a book including (1) collecting manuscripts presented at a conference, (2) transcribing talks presented at a conference, (3) asking one’s friends and colleagues to contribute manuscripts on a general or specific theme, and (4) creating a table of contents and single approach to
I
771
be used in organizing and writing each chapter before approaching prospective authors. If the last method is being considered, then obtaining an outline from each author is strongly advised. It is easier for authors to modify their approach if they have only written a oneor two-page outline than if they have already prepared a 20- or 30-page typewritten manuscript. T o expedite progress on the preparation of a book, a number of forms may be created to follow the progress of various components. Three related forms are shown in Table 102.8.
CHAPTER 1 0 3
Systems to Evaluate Published Data Using Personal versus Standard Approaches, 772
Questions to Pose, 773
Critiquing One’s Own Data versus Those of Others, 772
Systems to Evaluate Publications, 773
Simple Systems, 774 More Formal Systems, 775
Common Problems, 772 Informal Approaches to Evaluating Publications, 773
ence in critiquing the literature. It is hoped that this presentation will also provide pointers to experienced readers.
There are several reasons why most individuals closely involved with clinical trials read and evaluate publications of other clinical investigators. These reasons include the desire (1) to evaluate how reliable and relevant others’ data and conclusions are, (2) to learn new information, hypotheses, and developments, and (3) to determine how other trials may affect the interpretation of his or her own trials that have been previously conducted or may be conducted in the future. Other trials may provide information on problems to avoid, methods to use, pointers to consider, and possible hypotheses or interpretations to test. It is therefore important to judge the quality of trials and to differentiate between trials that are well and poorly conducted, interpreted, and reported.
Critiquing One’s Own Data versus Those of Others
There are two major differences between critiquing one’s own interpretation of data and critiquing the published results of another investigator. The first is the bias one usually has when evaluating one’s own work and the difficulty of viewing it objectively. The second concerns the limitation in quantity of data and ancillary information that is available about published trials. Apart from these factors, the approaches one uses are generally similar. As a result of the general similarity, most of the techniques, methods, and checklists presented in previous chapters about how to develop and critique one’s own interpretation, plus how to prepare an article for publication, are also relevant for this chapter on evaluating published data and results.
USING PERSONAL VERSUS STANDARD APPROACHES
Most people who read the medical literature develop individual skills in evaluating articles through direct experience, discussions with colleagues, and information acquired in other ways. Few individuals ever approach this topic in a systematic manner. With time, experience, and practice, the need for formal training on this subject usually disappears. Nonetheless, there are occasionally reasons why experienced individuals desire to perform a systematic evaluation of published reports. This chapter presents a series of published systems to evaluate medical reports. Proposals that relate to a coding system for clinical trials (e.g., Bellamy, 1984) or to a classification of clinical trials (e.g., Bailar et al., 1984b) are not discussed. The approaches presented can be used by newcomers to medical sci-
COMMON PROBLEMS
In reading articles in any area of medicine, it is helpful to be familiar with common problems that occur in publications. Since there are an extremely large number of possible problems that could be discussed, only a few broad areas are mentioned. Serious shortcomings may be found in any section of a publication and often reflect an absence of information rather than a distortion, bias, or error in what was presented. Table 103.1 lists problems that begin when the authors initially decide to prepare an article. Few problems are
772
SYSTEMS T O EVALUATE PUBLISHED D A TA
TABLE 103.1 Common problems with clinical publications that relate to their preparation A. Introduction 1. Insufficient description of the objectives 2. Excessive amount of background information B. Methodology 1. Insufficient detail or inadequate description of information presented 2. Omitting information on certain topics (e.g., trial management) C. Results 1. Presenting derived data without adequate information on actual numbers or raw data 2. Presenting data in an ambiguous manner 3. Presenting insufficient efficacy data to allow the reader to arrive at his or her own interpretation 4. Presenting insufficient data on adverse reactions to address many basic questions D. Discussion 1. Some discussions lack conciseness and a clear organization 2. Results may only be compared with other papers that are favorable to the author's interpretation 3. All of the factors expected to influence the results obtained may not be considered 4. Data may be extrapolated to patient populations with an insufficient basis for such statements 5. Too many tangential issues may be presented
described that relate to clinical trial design (e.g., inadequate power) or to the conduct of a trial (e.g., excessive amount of missed patient visits and data). Problems relating to data analysis are not presented in this table. The types of trial designs published in prestigious journals have been shown to include a high percentage of weak trial designs (Fletcher and Fletcher, 1979). It is crucial to differentiate the quality of a clinical trial’s design from the quality of its conduct and also from the quality of the interpretation of data obtained. Any one or two of these elements may be excellent, but unless all three are at a high standard the clinical trial results will not be completely convincing. Various permutations of the adequacy of a clinical trial are possible, based on assessment(s) of the components of these three elements. INFORMAL APPROACHES TO EVALUATING PUBLICATIONS
Most people who read the medical literature follow an informal approach. Each clinical trial requires that a different set of questions be kept in mind by the reader. Many of these questions are basic ones that the reader brings to any article, and others are suggested by the nature of the article itself. This section comments only on the former (generic) type of questions. Each reader has probably formulated a personalized series of generally similar questions that are posed (either consciously or not) before, during, and after reading an article. In fact, reaching a decision to read
/
773
(or even skim) a n article means that it has already passed a significant hurdle, since scientists and clinicians are only able to read a relatively small number of available articles. Most readers use multiple systems to decide which articles to read. There are many sources of information listing which articles are available. Some of these sources also present abstracts of the articles. Questions To Pose
When reading an article, most readers will pose questions, and this may also occur after they have finished reading the article. A list of some general questions that apply to most papers is listed in Table 103.2. These questions refer both to the evaluation of the article’s contents, how the article affects the reader, and to a decision about the procedures that the reader will follow after completing the article. The most general aspects of a clinical trial to evaluate are: 1. What is the value of the trial in terms of new knowledge? 2. What is the overall quality of the data obtained? 3. Is the quantity of data obtained sufficient to address the objectives and to support the conclusions? 4. Were the analyses appropriately chosen and performed? 5. Are the interpretations and conclusions justified? 6. Are the extrapolations reasonable? 7. Will this article affect the reader’s research, clinical practice, or other activity? T o obtain the answers to these questions each person develops their own approach. Rennels et al. (1987) developed a computer-based model. In this approach an expert was asked to think aloud as he read an article. H e related certain basic elements to the context in which the clinical trial was conducted. For example, (1) “what type of patients seek care at the hospital where the research was done? (2) what is the track record of the author? (3) how qualified are the allied specialities that are involved in patient care but are not the subject of investigation, for example, postoperative nursing care? (4) what are the exact technical details for the treatments being compared (e.g. , two trials may compare the same medicines but the dose and dosing schedules might differ)?’’ SYSTEMS TO EVALUATE PUBLICATIONS
Numerous systems have been proposed and used to evaluate published articles. These systems vary greatly in complexity, intended purpose, and useful-
774
/
P U B L I S H I N G D A T A AND E V A L U A T I N G L I T E R A T U R E
TABLE 103.2 Selected questions to consider while reading and evaluating clinical publications A. Relating to the clinical trial design, methodology, and conduct 1. What type of trial was reported (e.g., case report, historical control, double-blind randomized control)? 2. Was the primary objective reasonable and worthwhile to evaluate? 3. Was the trial design appropriate to address the primary objective? 4. Was the protocol well conceived? 5. Were the assumptions reasonable? 6. Were the definitions used acceptable? 7. Were the endpoints used clinically relevant? 8. Is the method measuring what it claims to have measured? 9. Were the methods and research tools used validated, and are they well acepted? 10. Were the methods and research tools used appropriately, and was their degree of variability presented? 11. How was the trial conducted (e.g., careful attention to detail, sloppily), and can the technical quality be assessed? 12. Were there any glaring problems in the trial? 13. What was the magnitude of protocol violations, and how much influence does it have on the interpretation(s)? 14. Was the trial protocol changed during its conduct? B. Relating to the data collection, analysis, and interpretation 1. Were the data collected, processed, and analyzed well statistically? 2. If some data were collected from two or more sources, did they differ, by how much, and why? What are the probable reasons for this, and are values for variability given? 3. Were sufficient data collected, and were the data well presented in the article? 4. Is there agreement between results obtained by various techniques, people, and methods? 5. Are the results similar to those from other trials? 6. Are there any significant omissions in the data collected? 7. Were positive results related to the major trial objective ) or only to secondary ones or to a subgroup analysis? 8. Were reasonable interpretations considered, and do they make sense clinically? 9. Were the extrapolations that were made reasonable? 10. Overall, was the paper convincing? C. Relating to the reader 1. Are there findings that impact on the reader's previous trials? 2. Are there findings that impact on the reader’s planned (or current) trials? 3. Are there conclusions or findings in the paper that should affect the reader’s present medical practice or other activities? D. Relating to the reader’s processing and storage of the paper 1. Is there anyone I know who should read this material? If so, send a copy with a note or letter or merely identify the reference in a letter 2. Do I want to have a copy of this paper in my files? If so, make a photocopy, tear out the article, or request a reprint 3. Do I want to write comments on my copy or elsewhere? 4. What is the best way to file this paper? Is it necessary to cross index this reference for ease of later retrieval?
ness; they are described here as simple and more formal systems. A spectrum is shown in Fig. 103.1. It is useful to use one (or more) systems to judge or evaluate publications when: 1. It is done as a learning experience and the goal is
2. 3.
4.
5.
to familiarize oneself (or one’s students) with one or a few scales. One (or more) scales are used to evaluate grants. One is interested in placing a few codes or comments on each paper or report to assist in filing or to help remember one’s assessment at a later date. The purpose is to judge the quality of the papers to be included in a meta-analysis or traditional type of review. Editors of journals wish to evaluate manuscripts submitted.
All scales used should be validated. Validation methods are discussed in Chapter 43.
Simple Systems There are a few simple systems that can be used to review papers qualitatively. These approaches may be used informally, or they may be printed on separate forms, written in the margins of the paper itself, or entered in a computer file. The specific checklists presented in Chapter 102 to plan one’s own publication may also be used in evaluating published data to ensure that all relevant parts are included. Journal reviewers could be provided with checklists if journals establish a policy to include certain types of information or presentations. 1. A few simple types of evaluations that may be used in reading the literature include an assessment of the overall value of the paper for the reader based on a scale of (1) “relevant” or “not relevant” (i.e., a twopoint scale), (2) “great importance, some importance, little importance, no importance” (i.e., a four-point scale), or (3) “ 1 to 10” (i.e., a ten-point scale). Other scales may be used. The score may be written on the front of the paper, on index cards, or entered in a computer. Scores may be categorized and then filed by the article’s title, subject, author, or another category. Computer filing systems allow rapid cross indexing through use of multiple key terms, whereas filing in drawers or cabinets usually restricts an individual to use of only the single most appropriate term chosen at that time. 2. A few comments may be handwritten on the article or on a separate card, or comments may be filed on a computer disk, without any reference to a scoring system. 3. A simple evaluation form may be preprinted on blank pages. This approach may be useful to some authors who wish to retrieve information when preparing or reviewing an article or for other purposes. Two examples of a simple form that may be used in conjunction with a computer (or without one) are: A. Reference Key terms
1
SYSTEMS T O E V A L U A T E PUBLISHED D A TA
Informal Readings
Informal Critique Plus Questions Posed
Personal System of Reading & Evaluation*
Formal Methods Without Rating * * Scale Proposed
General Method Suitable for Any Paper
/
775
Formal Methods With Numerical Rating Scale of Quality * *
Specific Method for One Medicine, Type of Medicine, or Disease
FIG. 103.1 Spectrum of methods used to evaluate articles in the medical literature. *, also varies along a similar continuum from informal to formal approaches; **, these methods may or may not be validated. The degree of validation also varies along a continuum.
Objectives of clinical trial Were objectives achieved? Implications for my research Implications for my medical practice Implications for medical practice (in general) Problems with paper How important is the paper for me? Category used for filing B. Area (aspect) of study reviewed Key terms Points made Problems/shortcomings/qualifications Impression(s) Among the most common problems encountered in the literature are clinical trials in which the number of patients enrolled is too small to have adequate power. Young et al. (1983) have published sample size nomograms that easily allow someone who is interpreting clinical data to determine whether negative clinical studies have enrolled an adequate number of patients.
3. Were medicine effects measured objectively? This usually involves double-blind techniques. 4. Were results analyzed statistically? Their publication states that these questions are meant to provide a simple method for practicing physicians and others who are relatively unskilled in analyzing clinical publications. They used this straightforward approach to evaluate 203 articles published in the Canadian Medical Association Journal over a 5year period. Their method has been used by others in assessing the quality of clinical publications (e.g., Reiffenstein et al., 1968). Method of Lionel and Herxheimer. Lionel and Herxheimer (1970) presented a checklist (Fig. 103.2, pg. 779) that they used to evaluate 141 clinical trials in four medical journals. The conclusion of their checklist provides a final assessment of whether the article is definitely acceptable, probably acceptable, or un-
TABLE 103.3 Selected authors who have proposed
criteria, checklists, or other methods to evaluate clinical studies11
More Formal Systems
There are numerous published checklists, series of questions, or criteria that various authors have used or proposed for use in evaluating the clinical literature. A number of these systems (see Table 103.3) are presented in their entirety. Method of Mahon and Daniel. Mahon and Daniel (1964) proposed a four-step method to evaluate reports of medicine trials. Their process consists of applying the following four criteria: 1. Were adequate controls used? 2. Were treatments randomized?
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Mahon and Daniel (1964), see text Lionel and Herxheimer (1970), Fig. 103.2 Horwitz and Feinstein (1979), Table 103.4 Levine (1980), Fig. 103.3 Chalmers et al. (1981), Fig. 103.4 University of Rochester Clinical Pharmacology Group (Weintraub, 1982), Fig. 103.5 DerSimonian et al. (1982), Table 103.5 Haynes et al. (1983), Fig. 103.6 Bailar et al. (1984a), Table 103.6 Evans and Pollock (1985), Table 103.7 Meinert (1986), Table 103.8 __________________________
"Suggestions are also presented by the author in this chapter and in the checklists shown in Chapter 102. Other checklists and methods to evaluate published trials are discussed in the text.
*
77 6
/
P U B L I S H I N G D A T A AND E V A L U A T I N G L I T E R A T U R E
acceptable. This method has been used by others (e.g., Ravikiran et al., 1980). Method of Horwitz and Feinstein. Horwitz and Feinstein (1979) proposed a set of 12 standards to apply to retrospective case-control research. These standards are listed in Table 103.4 and were used to evaluate 85 clinical trials. Trial Assessment Procedure Scale (TAPS) Method of Levine. This method was proposed by Dr. J . Levine in 1980 to evaluate the quality of a study (J. Levine, personal communication). The form used is more detailed than most of the others and is one of the few that contains a scoring system. It is relatively easy and rapid to use, especially after some practice. This form is shown in a slightly abridged form (Fig. 103.3, pg. 780) and may readily be used to rate protocols as well
Methodological criteria of Horwitz and Feinstein for judging case-control research
TABLE 103.4
1. Predetermined method (i.e., the established method should be chosen prior to obtaining and analyzing the data) 2. Specification of the agent (i.e., the precise definition of what constitutes exposure to the purported causal agent must be identified, and this should be done prior to obtaining and analyzing the data) 3. Unbiased data collection [i.e., the individual who collects the data should be unaware of the objective(s) of the clinical trial as well as the identity of the patient as case or control] 4. Anamnestic equivalence (i.e., the differences in a patient's ability or incentive to recall previous exposure to the purported causal agent should be minimized between the cases and controls 5. Avoidance of constrained cases (i.e., bias is to be prevented in the choice of cases by applying standards that may affect the makeup of cases more than controls. Criteria for exclusions should be applied to both cases and controls) 6. Avoidance of constrained controls (i.e., bias is to be prevented in the choice of controls by applying standards that may affect the makeup of controls more than cases. Criteria for exclusions should be applied to both cases and controls) 7. Equal diagnostic examination (i.e., diagnosis of the disease should be made with similar procedures and criteria in both cases and controls) 8. Equal diagnostic surveillance (i.e., diagnosis of the disease should be made in patients who have been exposed to similar prehospital surveillance procedures and criteria in both cases and controls) 9. Equal demographic susceptibility (i.e., confirmation should be made of comparable demographic characteristics in both cases and controls) 10. Equal clinical susceptibility (i.e., confirmation should be made of equal number or magnitude of risk factors for the disease in both cases and controls) 11. Avoidance of protopathic bias (i.e., it is possible that a disease is present in a subclinical and unrecognized form prior to use of the suspected causative agent. This agent may be wrongly associated with the disease at a later date. Alternatively, if an early, unrecognized manifestation of the disease leads to a certain treatment, then the eventual manifestation of the disease may be blamed on the treatment) 12. “Community control” for Berkson's bias 8 (i.e., this bias occurs because people who are both exposed and diseased are more likely to be admitted to hospital than other groups) Headings are reprinted by permission of American Journal of Medicine from Horwitz and Feinstein (1979). 8 This criteria was stated to be optional.
as published clinical trials. Copies are available from Dr. Jerome Levine, Maryland Psychiatric Research Center, University of Maryland, P. O. Box 21247, Catonsville, MD 21228. Method of Chalmers, Smith, Blackburn, Silverman Schroeder, Reitman, and Ambroz. A checklist was proposed by Chalmers et al. (1981) that can be used to evaluate clinical trials (Fig. 103.4, pg. 787). This checklist is more detailed than that of Lionel and Herxheimer (Fig. 103.2). In addition to this list, the authors proposed an index of a randomized clinical trial (RCT) quality that yields a single number that is indicative of the overall quality of the trial evaluated. This score is based on the checklist answers for their forms numbered 2, 3, and 4. Readers are referred to their paper for information on their detailed method. A variation based on this approach by Poynard et al. (1989) uses fewer items (n = 14) and fewer responses for each item (n = 3). Method of The University of Rochester Clinical Pharmacology Group. The checklist of the University of Rochester Clinical Pharmacology Group (Weintraub, 1982) is shown in Fig. 103.5, pg. 790 and provides a column for evaluation/comment on most items. The overall assessment of an article is in terms of four important questions: (1) are there any “fatal” errors that invalidate results? (2) are the conclusions justified? (3) are results “significant” or “extrapolatable”? and (4) do benefits outweigh risks? This checklist was originally based on that of Lionel and Herxheimer(Fig. 103.2) but was greatly modified. Method of DerSimonian, Charette, McPeek, and Mosteller. DerSimonian et al. (1982) selected 11 specific criteria by which to evaluate published clinical trials. These criteria are listed in Table 103.5 and are directed toward assessing the methods sections of clinical reports, in particular the design of the trial and the analyses performed. DerSimonian et al. reviewed 67 trials published in four journals within an 18-month span. This method has been used by others (Emerson et al., 1984). The major point about this method is that it is limited to evaluating the methodology of a clinical trial and does not evaluate the manner in which a trial was actually performed, nor does it evaluate the data interpretations themselves. Method of Haynes, Sackett, and Tugwell. Haynes et al. (1983) published a flow diagram of various steps involved in evaluating a clinical trial (Fig. 103.6, pg792). Their approach is directed toward busy practitioners and academicians who must rapidly process and sift through many articles and can only read a small number. Method of Bailor, Louis, Lavori, and Polonsky. There are often sound reasons why some clinical trials cannot include adequate or even any internal controls.
SYSTEMS T O E V A L U A T E P U B L I S H E D D A T A
TABLE 103.5 Criteria proposed by DerSimonian, Charette, McPeek, and Mosteller to evaluate the design and analysis of clinical trials 1. Eligibility criteria (i.e., information explaining the criteria for admission of patients to the clinical trial) 2. Admission before allocation (i.e., information used to determine whether eligibility criteria were applied before knowledge of the specific treatment assignment had been obtained) 3. Random allocation (i.e., information about random allocation to treatment) 4. Method of randomization (i.e., information about the mechanism used to generate the random assignment) 5. Patients' blindness to treatment (i.e., information about whether patients knew which treatment they were receiving) 6. Blind assessment of outcome (i.e., information about whether the person assessing the outcome knew which treatment had been given) 7. Treatment complications (i.e., information describing the presence or absence of adverse reactions or complications after treatment) 8. Loss to follow-up (i.e., information about the numbers of patients lost to follow-up and the reasons why they were lost) 9. Statistical analyses (i.e., analyses going beyond the computation of means, percentages, or standard deviations) 10. Statistical methods (i.e., the names of the specific tests, techniques, or computer programs used for statistical analyses) 11. Power (i.e., information describing the determination of sample size or the size of detectable differences) Reprinted by permission of New England Journal of Medicine from DerSimonian et al. (1982).
/
777
Pollock (1985) used their method to rate 56 randomized controlled clinical trials and found that only 16 papers scored over 70 points. Their system of 33 rules is shown in Table 103.7. Method of Meinert. Meinert’s (1986) method poses “questions to consider when assessing a published report,” and is less formal than some of the other recent methods. No rating scale is proposed (Table 103.8). Other Methods. A number of other forms and checklists have been proposed to evaluate clinical studies. Some of these systems have been proposed or used to evaluate a particular type of clinical trial, such as trials concerned with contrast media (Andrew, 1984), perinatal medicine (Tyson et al., 1983), or lung cancer (Nicolucci et al., 1989). Other checklists are general (Hines and Goldzieher, 1969; Nyberg, 1974). A series of 29 separate reviews of the quality of clinical trials was presented by Hemminki (1982). Of these 29 reviews, 10 were general in regard to overall quality,
TABLE 103.7 Method of Evans and Pollock for
evaluating controlled clinical studies Yes
No
2 2 3 5 5 5 2 5 5 2 1 3 4 4 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 4 4 3 3 3
0 0 0 0 0 0
3 2 1
0 0 0
4
0
2 3 3 2 2 3 3 2
0 0 0 0 0 0 0 0
1. Does it appear that the intervention was applied with the primary intent of affecting the outcome reported, such as cure, survival, or the incidence of complications? 2. Is it clear that the authors’ intent to analyze and report their findings preceded the generation of the data (though the data may have been gathered for a different primary purpose)? 3. Have the authors shown that they had a plausible rationale for their interpretation of the data before the data were inspected or the analysis was undertaken? 4. Would the results have been interesting (i.e., publishable) if they had been different in some important sense from those actually obtained? Would “negative” findings have had a chance of being reported? 5. Do the authors present reasonable grounds for generalizing their results?
Design and conduct Is the sample defined? Are exclusions specified? Are known risk factors recorded? Are therapeutic regimens defined? Is the experimental regimen appropriate? Is the control regimen appropriate? Were appropriate investigations carried out? Are endpoints defined? Are endpoints appropriate? Have numbers required been calculated? Was patient consent sought? Was the randomization blind? Was the assessment blind? Were additional treatments recorded? Were side effects recorded? Analysis Withdrawals: Are they listed? Is their fate recorded? Are there fewer than 10%? Is there a comparability table? Are risk factors stratified? Is the statistical analysis of proportions correct? Is the statistical analysis of numbers correct? Are confidence intervals reported? Are values of both test statistics and probability given? In negative trials, is the type II error considered? Presentation Is the title accurate? Is the abstract accurate and helpful? Are the methods reproducible? Are the sections clear-cut? Can the raw data be discerned? Are the results credible? Do the results justify the conclusions? Are the references correct?
Reprinted by permission of New England Journal of Medicine from Bailar et al. (1984a).
Reprinted by permission of British Journal of Surgery from Evans and Pollock (1985).
Such trials often rely on external controls from outside the trial (e.g., historical controls). Bailar et al. (1984a) proposed a series of five questions to be used in assessing the value (i.e., strength of evidence) provided by externally controlled trials. Their questions, which were used to evaluate a group of 20 publications in The New England Journal of Medicine, are listed in Table 103.6. Method of Evans and Pollock. This method was based on that of Chalmers et al. (1981) and utilizes a score system with 100 points maximum. Evans and TABLE 103.6 Method of Bailar, Louis, Lavori, and
Polansky for evaluating clinical trials with weak or absent internal controls
778
/
P U B L I S H I N G D A T A AND E V A L U A T I N G LITERATURE
TABLE 103.8
Questions to consider when assessing a published report
A. General 1. Does the manuscript indicate the purpose of the trial and rationale for the treatments studied? 2. Does the trial address a relevant question? 3. Is the paper in a peer review journal? B. Investigators 1. Have the investigators done any previous work related to the trial being reported? If so, do you consider the work to have been of good quality? 2. Does the paper indicate the location and institutional affiliation of the various members of the team responsible for carrying out the trial? 3. Does the team include people with appropriate training and expertise for conduct and analysis of the trial? C. Sponsorship and structural 1. Does the paper indicate how the trial was funded? 2. Is the role of the sponsor in designing, directing, or analyzing the trial indicated? (especially important in trials involving proprietary products) 3. Are the key investigators, especially those responsible for analyzing the results and for writing the paper, independent of the sponsor? 4. Did responsibility for data collection and analysis in the trial reside with a group of people who were independent of the sponsor? 5. Did the authors recognize the possibility of conflicts of interest for study members (especially important if the report concerns a proprietary product) and do they indicate steps taken to avoid such conflicts? 6. If the trial involved multiple centers, does the paper list all affiliated centers and the functions performed by each? 7. For multicenter trials, does the paper list committees, along with their membership and a brief description of their functions? D. Trial design 1. Outcome measure a. Is the primary outcome measure identified? b. Does it have clinical relevance? c. If multiple outcomes are used, is it clear which one is of primary importance in the trial?
2. Treatments a. Is there a defined test treatment? b. Is the test treatment of any interest and does the administration of it correspond roughly to the way it would be used in general practice? c. Is there an appropriate control treatment? 3. Trial population and sample size a. Are the eligibility and exclusion criteria for patient entry into the trial stated? b. Is there a discussion of the types I and II error protection provided with the observed sample size? 4. Allocation a. Is the method of treatment allocation described? b. Does it appear to have been free of selection bias? c. Does it meet the general conditions specified in Section 8.4 of the book? 8 5. Data collection procedures a. Is the data collection schedule described? b. Are the patients in the test and control-treated groups enrolled and followed over the same time frame? c. Does the design include adequate provisions to protect against bias in the administration of the treatment and in measurement of the outcome, as evidenced by the use of appropriate masking procedures or other safeguards? E. Trial performance 1. Was a recruitment goal for the trial stated? Was it achieved? 2. Was the missed examination rate low? 3. Was the dropout rate low? 4. Was the dropout rate among the treatment groups about the same? 5. Was it possible to locate all patients, including dropouts, at the end of the trial to update key morbidity and mortality data? If not, was the number who could not be located small and about the same for each treatment group? 6. Did all the patients enrolled meet the eligibility criteria of the trial? If not, was the number who did not small?
Reprinted from Meinert (1986) with permission of Oxford University Press. “Section 8.4 of Meinert’s (1986) book.
6 were specific to a particular disease(s), 6 were specific to a particular medicine(s), and 7 to a particular medicine(s) in a given disease or group of diseases. Some of these reviews utilized their own systems for reviewing the literature.
Hines and Goldzieher (1969) concluded their article with the statement “The Romans had a motto for the marketplace: Caveat emptor. The reader of reports of clinical investigation can have one, too: Caveat lector."
CHECKLIST FOR ASSESSING A THERAPEUTIC TRIAL REPORT Y = Yes N = No, or not clear D = Doubtful
Author and Journal reference
Title 1. AIM: specific □ , or not clear □ ; single □ , or multiple
2-4. DESCRIPTION OF SUBJECTS, DRUG ADMINISTRATION, ETC. — ARE THE FOLLOWING SPECIFIED? 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9
Healthy subjects or patients?..................................................................................................................................... Volunteers or not? ..................................................................................................................................................... Age .............................................................................................................................................................................. Sex .............................................................................................................................................................................. Race ............................................................................................................................................................................ Criteria of selection .................................................................................................................................................... Contraindications......................................................................................................................................................... Presence of disease other than that treated............................................................................................................ Whether additional treatments were given............................................................................................................... If they were, are they described? .........................................................................................................................
Y Y Y Y Y Y Y Y Y Y
N N N N N N N N N N
3-1 3-2 3-3 3-4 3-5 3-6 3-7 3-8 3-9 3-10
Daily dose ................................................................................................................................................................... Frequency of administration....................................................................................................................................... Hour(s) o’clock when given ....................................................................................................................................... Route of administration.............................................................................................................................................. Source of drug (e.g., name of manufacturer) .......................................................................................................... Presentation (e.g., tablet, syrup, etc.)....................................................................................................................... Timing of drug administration in relation to factors affecting absorption (e.g. meals)......................................... Checks that drug was taken...................................................................................................................................... Other therapeutic measures (if drug was not used) ............................................................................................... If yes, are they described?..................................................................................................................................... Total duration of treatment ........................................................................................................................................
Y Y Y Y Y Y Y Y Y Y Y
N N N N N N N N N N N
4-1 4-2 4-3 4-4
Persons who made the observations ....................................................................................................................... Inpatient/outpatient ..................................................................................................................................................... Setting (e.g., one or several hospitals/clinics/wards)............................................................................................... Dates when trial began and was completed............................................................................................................
Y Y Y Y
N N N N
Y Y Y
N N N
Y
N
Y Y Y Y Y Y Y Y Y Y Y Y Y Y
N N N N N N N N N N N N N N
5. METHODS AND DESIGN Are the methods of assessing therapeutic effects clearly described? .................................................................. Were these standardized methods? ......................................................................................................................... Were control measures used to reduce variation that might influence the results............................. ................. If yes, specify: Identical ancillary Run-in period Concurrent controls treatment Patient his own Stratification or Other control matched subgroups Were controls used to reduce bias? ........................................................................................................................ If yes, specify: “Blind” patients “Blind” observers Random allocation Matching dummies
5-1 5-2 5-3
5-4
6. ASSESSMENT OF THE TRIAL 6-1 6-2 6-3 6-4 6-5 6-6 6-7 6-8 6-9 6-10(a) (b) 6-11 6-12 6-13(a) (b)
Were the subjects suitably selected in relation to aims (see sections 1 and 2)? ................................................ Were the methods of measurement valid in relation to the aim? ......................................................................... Were they adequately standardized? ....................................................................................................................... Were they sufficiently sensitive? ............................................................................................................................... Was the design appropriate?..................................................................................................................................... Were enough subjects used? ................................................................................................................................... Was the dosage appropriate? ................................................................................................................................... Was the duration of treatment adequate? ............................................................................................................... Were carry-over effects avoided or allowed for?..................................................................................................... If no controls were used were they unnecessary?.................................................................................................. If controls were used were they adequate?............................................................................................................. Was comparability of treatment groups examined?................................................................................................ Are the data adequate for assessment?.................................................................................................................. If statistical tests were not done were they unnecessary?..................................................................................... If statistical tests are reported (i) Is it clear how they were done? ...................................................................................................................... (ii) Were they appropriately used? ........................................................................................................................
ARE THE CONCLUSIONS JUSTIFIED? No Partially Completely
Is the trial ACCEPTABLE? Probably yes Definitely yes
No
COMMENTS
FIG. 103.2 C h e c k l i s t o f L i o n e l a n d H e r x h e i m e r t o evaluate p u b l i s h e d articles. (Reprinted b y p e r m i s s i o n of British Medical Journal f r o m L i o n e l a n d Herxheimer, 1970.)
N
Y Y
N
D
STUDY DEPARTMENT OF HEALTH AND HUMAN SERVICES PUBLIC HEALTH SERVICE ALCOHOL DRUG ABUSE, AND MENTAL HEALTH ADMINISTRATION NATIONAL INSTITUTE OF MENTAL HEALTH
TYPE
NAME OF RATER
TREATMENT
STATUS
PHASE
RATER
DATE
AFFILIATION
TRIAL ASSESSMENT PROCEDURE SCALE (TAPS)
ADDRESS PHONE
Trial Title and/or Identification Number: Type, Source and Date of Report: Name of Investigational Drug(s) or Treatment(s): Trial Status: Trial Phase:
Planned □(
Early II
No. of Treatment Groups:
Completed OLate II
Qlll
OlV
No. of Subjects in Trial:
INSTRUCTIONS OVERVIEW. The Trial Assessment Procedure Scale (TAPS) is a systematic technique for evaluating the quality of a clinical trial. The technique involves an analysis of the report (e.g., protocol, completed study report, or journal article) in terms of many descriptive characteristics or attributes which reflect trial quality. The attributes are logically clustered into eight categories so that the quality of various components of the trial can b e independently assessed. The intent is to rate the quality of the trial without regard to findings concerning treatment efficacy or safety. Each category is composed of two to five related attributes. For example, the first category, RESEARCH PROBLEM, is composed of two attributes labeled Background and Rationale and Objectives and/or Hypothesis. A separate rating page is provided for each attribute category, which lists the constituent attributes, along with examples of the kinds of factors that should b e taken into account in evaluating each respective attribute. Examples are representative and d o not exhaust all of the factors that may b e considered when rating a given attribute. RATING PROCEDURE. The TAPS rating procedure is identical for all attributes. The rating is made on a five-point scale: Totally Satisfactory, Satisfactory, Marginal, Unsatisfactory, or Totally Unsatisfactory. For each attribute, the rater is asked to check the appropriate box in the QUALITY RATING column, reflecting how well the trial under evaluation measures u p o n that attribute. The rater is encouraged to comment o n the basis for the rating of any attribute, and space has been provided for this purpose. RATING PROBLEMS. Because of differences in the nature and content of clinical trial reports, it may be difficult to meaningfully rate a given attribute. Thus, under the column heading RATING PROBLEMS A N D EXPLANATION, two boxes are provided. The first box, Not Applicable, pertains to the applicability of the attribute with respect to either the type of trial or type of report (e.g., protocol, journal article) through which the trial is being evaluated. This box is checked only when the attribute does not have meaning either in the context of the given trial or type of report, and therefore a quality rating should not b e made. The second box, Poor Documentation, concerns the availability a n d clarity of the written description needed to make a meaningful judgment about the given attribute; this box should b e checked when the appropriate information is either lacking or inadequate. Even after making a quality rating in response to the attribute, this box can b e checked to indicate that the relevant documentation is poor. Raters are advised to keep the following rules in mind. Lack of confidence about expertise in a specific area should not prevent the rater from making a rating. If the rater is having difficulty because the attribute is not applicable to the trial itself or to the type of report, the Not Applicable box is checked and explanation made in the space provided. N o quality rating is given to that attribute. If the rater is having difficulty because n o information is given in the report to permit a judgment on a given attribute (when it would have been applicable), the Poor Documentation box is checked. If the information given is unclear, incomplete, or has to b e inferred, a quality rating is made and the Poor Documentation box is checked. Thus, whenever a n attribute quality rating cannot b e provided, either the Not Applicable or Poor Documentation box must be checked and an explanation given. If a quality rating is made, and the rater wishes to indicate that the documentation is poor, the box Poor Documentation is checked. GLOBAL RATING. Once ratings have been completed for all attributes, a global rating is made indicating the rater’s assessment of the OVERALL QUALITY or “goodness” of the entire trial. The overall rating should take into account all the individual ratings across the attribute categories, as well as any other considerations that may have been noted. For example, if the quality rating of an attribute is so unacceptable that the trial is “fatally flawed,” then a very low global rating would b e given even though other attributes had been judged Satisfactory or better. As shown o n the global rating sheet, the rating is made o n a 0-to-100 scale, where 0 is “Very Poor” a n d 100 is “Very Good.” Any number between 0 and 100 can b e assigned. Below the global rating scale, space is available for comments about the overall rating for the trial. This area can also b e used for additional comments about the trial, any of the ratings, or other relevant considerations. RATING THE TRIAL REPORT. After the rater has become familiar with the format a n d content of TAPS, it is recommended that the report of the trial to b e rated first b e read in its entirety. Then, the attributes should b e rated in the sequence presented in TAPS referring back to various sections of the trial report as often as necessary. SCORING TAPS. After ratings have been completed a series of numerical scores can b e derived according to the SCORING INSTRUCTIONS.
FIG. 103.3 Trial Assessment Procedure Scale (TAPS) system of Dr. Jerome Levine to evaluate published articles.
SYSTEMS T O E V A L U A T E PUBLISHED D A TA
/
781
I. RESEARCH PROBLEM ATTRIBUTE A. Background and Rationale —appropriate presentation of previous relevant research findings; justification of research need and basis/rationale for hypothesis to be tested.
B. Objectives and/or Hypothesis —clarity of objectives, meaningfulness and precision of research question; relevancy of hypothesis for claims to be made.
QUALITY RATING
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
RATING PROBLEMS AND EXPLANATION
□ Not Applicable
DPoor Documentation
□ Not Applicable
DPoor Documentation
COMMENTS:
II. RESEARCH MANAGEMENT ATTRIBUTE A. External Review/Monitoring —adequacy of scientific and ethical review by a qualified independent group; use of external monitoring of research practices, conditions, and progress.
B. Site Selection —explicitness and objectiveness of clinical site selection; appropriateness of treatment and assessment setting.
C. Personnel —appropriateness of staff organizational structure, e.g., adequacy of supervision; professional skill of staff members for performing patient care, assessment ratings and data analysis functions.
D. Trial Period —appropriateness of length of planning phase, data collection period and analysis period, as well as the appropriateness of the intervals between completion of data collection and initiation of analysis.
QUALITY RATING
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
RATING PROBLEMS AND EXPLANATION
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
COMMENTS:
FIG. 103.3 (continued)
782
/
PUBLISHING D A T A AND EVALUATING LITERATURE
III. DESIGN CHARACTERISTICS ATTRIBUTE A. Independent Variables —choice of factors included in the design, e.g., treatments and choice of drugs within treatments (experimental drug, standard drug, placebo), patient diagnosis, periods of administration, etc.
B. Design Configuration —appropriateness and precision of experimental design, e.g., use of a crossover or independent groups design, as required; avoidance of effects which may be confounded with the treatment factor, e.g., treatment order, time, setting, etc. C. Subject Assignment —adequacy of sample size (within each treatment group) for testing hypothesis; appropriate assignment of subjects to treatment groups by proper randomization, matching, sequential procedures, etc.
D. Control of Treatment-Related Bias —adequacy of treatment and assessor blinding (e.g., double or triple blind); comparability of dosage schedule, dosage form, time of administration; provision to break blind for individual patient without breaking blind for all patients; utilization of explicit rules or criteria for dealing with marked improvement or worsening of subject illness, or occurrence of treatment-emergent side effects of toxicity. E. Control of Extraneous Variables —suitability of research environment, e.g., absence of marked investigator bias, " H a w t h o r n e Effect,” hopeless atmosphere, etc; reduction of pretreatment bias by, e.g., the use of stratification, avoidance of carry-over effects, etc.; limitation or control of other concurrent therapies including drugs other than those under study; utilization of explicit rules or criteria for dealing with such problems as intercurrent illness, change of residence, etc.
QUALITY RATING □ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
RATING PROBLEMS AND EXPLANATION □ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
COMMENTS:
FIG. 103.3 (continued)
SYSTEMS TO EVALUATE PUBLISHED DATA
/
783
IV. TREATMENT CHARACTERISTICS ATTRIBUTE A. Description —specification of relevant characteristics of experimental and control treatments, e.g., presumed clinical actions, side effects, duration of actions, pharmacological profile; rationale for choice of comparison agent, i.e., standard drug or placebo. B. Dosage —adequacy of dosage levels, equivalence of dosage across standard and test drugs, criteria for dosage adjustment; appropriateness of schedule and pattern (fixed or variable) of administration with respect to duration of action, research design and phase of the trial; appropriateness of form or route of administration, degree of consistency with pharmacological properties. C. Duration —necessity for, and appropriate length of, drying-out (pre-treatment) period, drug administration (treatment) period, and follow-up (post-treatment) period.
QUALITY RATING
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
RATING PROBLEMS AND EXPLANATION □ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
COMMENTS:
V. SUBJECT CHARACTERISTICS ATTRIBUTE A. Selection Criteria —clarity, explicitness, appropriateness and general acceptance of criteria used to diagnose patients and to include or exclude them in study.
B. Sample Representativeness —correspondence between sample and population in terms of illness-related characteristics (e.g., pattern and severity of psychopathology) and demographic/ situation-related characteristics (e.g., age, sex, acuteness of illness, inpatient/ outpatient status, etc.) C. Subject Induction —appropriateness and consistency of subject recruitment procedure, and degree of adherence to requirements for obtaining informed, voluntary subject consent.
RATING PROBLEMS AND EXPLANATION
QUALITY RATING □ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
FIG. 103.3 (continued)
784
/
PUBLISHING D A T A AND EVALUATING L I TERATURE
V. SUBJECT CHARACTERISTICS (CONTINUED) ATTRIBUTE D. Subject Compliance —adequacy of techniques to check for and assure medication ingestion as well as compliance with assessment procedures and schedule.
QUALITY RATING
RATING PROBLEMS AND EXPLANATION □ Not Applicable
□ □ □ □ □
ClPoor Documentation
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
COMMENTS:
VI. DATA COLLECTION ATTRIBUTE A. Scope of Assessment —adequacy of breadth of measures for assessing areas such as: sample (identification and recording of degree of illness, demographic information, etc.); efficacy (assessment to demonstrate improvement or worsening of patient illness); side effects (assessment to detect expected and unexpected treatment-emergent symptoms); d o s a g e (recording of dosages actually administered); safety (use of appropriate laboratory tests which are specific to drug under study, general for assessing bodily functions, etc.); other relevant areas (depending upon illness, drug, or special assessment techniques, e.g., EEG, blood levels, behavioral measures, etc.) B. Assessment Measures —appropriateness of measures and instruments selected with respect to areas being assessed; extent that rating scales and recording forms have been previously shown to be sensitive, reliable, and valid; degree to which measures have been generally used and accepted. C. Assessment Schedule —appropriateness of frequency and schedule of ratings, collection of baseline measures prior to start of trial, etc.
D. Conduct of Assessment —consistency throughout trial of application of rating and assessment techniques; attempt to maintain same rater for any given patient throughout trial; evidence to establish interrater reliability and rating validity within context of this trial.
QUALITY RATING □ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
RATING PROBLEMS AND EXPLANATION □ Not Applicable
□Poor Documentation
□ Not Applicable
□Poor Documentation
□ Not Applicable
D P o o r Documentation
□ Not Applicable
D P o o r Documentation
COMMENTS:
FIG. 103.3 (continued)
SYSTEMS T O EVALUATE P U B L I S H E D D A T A
/
785
VII. DATA ANALYSIS ATTRIBUTE A. Data Preparation —adequacy of data collection techniques (case report and recording forms, etc); data checking, editing, and verification techniques; use of standard computer analysis program vs. hand calculations, checking of intermediate data processing steps. B. Data Presentation —clarity, meaningfulness and utility of data description, organization, and display; appropriate level of data detail or summarization.
C. Statistical Analysis —correctness of application of statistical procedures, e.g., data transformations, pooling of data, handling of dropouts and missing data, etc.; use of statistical model appropriate to research design, e.g., parametric vs. nonparametric, analysis of variance vs. analysis of covariance, etc.; degree of statistical follow-through, e.g., use of multiple comparisons after finding a significant Fratio, etc. D. Data Synthesis —demonstration of systematic approach to answering research question; appropriate analysis of functional relationships (e.g., dose-response curves).
QUALITY RATING □ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
RATING PROBLEMS AND EXPLANATION □ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
□ Not Applicable
□ Poor Documentation
COMMENTS:
VIII. CONCLUSIONS AND INTERPRETATION ATTRIBUTE A. Focus —degree that conclusions (findings) and interpretation (explanation) presented are specific and clear; meaningful correspondence between conclusions and research hypothesis.
B. Logic —extent that conclusions are unambiguously supported, both logically and statistically, by the data collected in the study.
RATING PROBLEMS AND EXPLANATION
QUALITY RATING □ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ □ □ □ □
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
□ Not Applicable
D P o o r Documentation
□ Not Applicable
ClPoor Documentation
FIG. 103.3 (continued)
786
PUBLISHING DATA AND EVALUATING LITERATURE
/
VIII. CONCLUSIONS AND INTERPRETATION (CONTINUED) ATTRIBUTE C. Application —appropriateness of generalization of conclusions from sample i n study to the larger population (i.e., claims are not overstated or over-generalized beyond the supporting data); appropriateness of interpretation with regard to other published findings.
QUALITY RATING
RATING PROBLEMS AND EXPLANATION Not Applicable
Poor Documentation
Totally Satisfactory Satisfactory Marginal Unsatisfactory Totally Unsatisfactory
COMMENTS:
OVERALL QUALITY OF TRIAL p - 1 0 0 Very Good E-75
Good
— 50
Borderline
E-25
Poor
E-o
Very Poor
Assign any number between 0 and 100 in accordance with the scale at the left.
GLOBAL RATING
COMMENTS:
SCORING INSTRUCTIONS This page provides instructions and space to derive numerical scores from the TAPS ratings. The specific scoring procedures are as follows: CATEGORY SCORES These are derived from the attribute Quality Ratings previously recorded. (1) Convert each attribute Quality Rating in every category (l-VIII) into a n equivalent numerical Attribute Score as follows: Totally Satisfactory = 100; Satisfactory = 75; Marginal = 50; Unsatisfactory = 25; a n d Totally Unsatisfactory = 0. Record these scores o n the appropriate lines i n the table below. (2) S u m the Attribute Scores for each category a n d enter i n the table. Note a n d enter the number of attributes rated i n each category. Divide the Attribute Score S u m b y the Number of Attributes Rated a n d record the resultant Category Score o n the designated line i n the table. (If n o attributes were rated i n a given category, there would b e n o Category Score). CATEGORY ATTRIBUTE A B C D E Attrib. Score S u m No. Attrib. Rated Category Score TOTAL SCORE S u m all Category Scores derived above and divide b y t h e number of Category Scores (usually eight): S u m of Category Scores
-? Number of Categories
= Total Score
.
DIFFERENCE SCORE Compute the numerical difference (i.e., absolute value) between the Global Rating a n d the Total Score. Record . the Difference Score POOR DOCUMENTATION SCORE Count the total number of attributes where the Poor Documentation box was checked and write . the number
FIG. 103.3 (continued)
SYSTEMS TO EVALUATE PUBLISHED D A TA
Form 1: BASIC DESCRIPTIVE MATERIAL ID#
Study#
Reader
Title
Journal or publication
Peer reviewed: Yes
No
Unknown
Year of publication 1.1 Biostatistician 1. Author 2. Credits 3. Neither 4. Unknown 1.2 Country 1. 2. 3. 4. 5.
U.S. U.K. Scand. Other Unknown
1.3 Center status 1. Single Center 2. Cooperative study < 5 grps. 3. Cooperative study > 5 grps. 1.4 Source of financial support (multiple items possible) A. N.I.H. or M.R.C. 1. Yes 2. No B. V.A. 1. Yes 2. No C. Drug Co. 1. Yes 2. No D. Other 1. Yes 2. No E. None given 1.5 Source of patients (multiple items possible) A. University 1. Yes 2. No B. Public 1. Yes 2. No C. Private 1. Yes 2. No
D. Clinic (no hosp.) 1. Yes 2. No E. Industry 1. Yes 2. No F. None given 1.6 Number in Controls Tr. grp. 1 Tr. grp. 2 Tr. grp. 3 1.7 Type of 1. 2. 3. 4. 5. 6. 7.
trial Simple comparative Restricted (blocking) Stratified Crossover Factorial Other Unknown
1.8 Significance of findings A. Major endpoints 1 . + + Statistically significant (treatment) 2. + Trend (treatment) 3. 0 No difference 4. — Trend (control) 5. Statistically significant (control) 6. Significant in author’s opinion but no statistical test with probability stated. B. Minor endpoints 1. + + Statistically significant (treatment) 2. + Trend (treatment) 3. 0 No difference 4. — Trend (control) 5. Statistically significant (control) 6. None 1.9 Side effects, statistical finding 1. + + Statistically significant 2. + Trend 3. 0 No side effects 4. N.A.
FIG. 103.4 C h e c k l i s t of Chalmers, S m i t h , B l a c k b u r n , Silverman, S c h r o e d e r , Reitman, a n d A m b r o z t o evaluate p u b l i s h e d articles. N.I.H., National Institutes o f Health (US); M.R.C., M e d i c a l Research C o u n c i l (UK). (Reprinted b y p e r m i s s i o n of Elsevier N o r t h H o l l a n d , Inc., f r o m C h a l m e r s et al., 1 9 8 1 . )
/
787
788
/
PUBLISHING DATA AND EVALUATING LITERATURE
Form 2: THE STUDY PROTOCOL ID#
Study#
2.1 Selection description 1. Adequate 2. Fair 3. Inadequate 2.2 Number of patients seen and reject log 1. Yes 2. Partial 3. No 4. Unknown 2.3 Withdrawals 1. List given 2. No withdrawals 3. No list 4. Unknown 5. > 1 5 % withdrawals for long-term studies and > 1 0 % for studies lasting less than 3 months 2.4 Therapeutic regimens definition 1. Adequate 2. Fair 3. Inadequate 2.5 A. Control regimen (placebo) appearance 1. Same 2. Different 3. Unstated 4. N.A. B. Control regimen (placebo) taste 1. Same 2. Different 3. Unstated 4. N.A. 2.6 Randomization blinding3 1. Yes 2. Partial 3. No 4. Unknown Method 1. 2. 3. 4. 5.
of random blinding Envelope Pharmacy Other Unknown N.A.
_____________
Reader ___________________________
2.7 Blinding of patients3 1. Yes 2. No 3. Unknown 4. N.A. 2.8 Blinding of physicians re therapy3 1. Yes 2. Partial 3. No 4. Unknown 5. N.A. 2.9 Blinding of physicians and patients re results 1. Yes 2. Partial 3. No 4. Unknown 2.10 Prior estimate of numbers (endpoints selected, diff. of clinic interest a and /3 estimated) 1. Yes 2. No 3. Unknown 2.11 Testing 1. 2. 3. 4.
randomization Yes Partial No Unknown
2.12 Testing 1. 2. 3. 4. 5.
blinding Yes Partial No Unknown N.A.
2.13 Testing 1. 2. 3. 4. 5.
compliance Yes Partial No Unknown N.A.
2.14 Biological equivalent 1. Yes 2. No 3. Unknown 4. N.A.
FIG. 103.4 (continued)
SYSTEMS T O EVALUA TE PUBLISHED D A TA
/
Form 3: STATISTICAL ANALYSIS ID # ___________________________ Study # _________ _________________ Reader ___________________________ 3.1 O n major endpoints _ _ _ _1. If possible, test statistic and observed probability value are stated _ _ _ _2. If observed probability level given, but test statistic value not stated _ _ _ _3. If test statistic but not observed probability value given _ _ _ _4. If neither test statistic nor observed probability level given
3.5 Handling of withdrawals _ _ _ _1. Analyzed several ways _ _ _ _2. Included in original randomization _ _ _ _3. Counted as end result at time of withdrawal _ _ _ _4. Discarded _ _ _ _5. Changed groups _ _ _ _6. Unknown _ _ _ _7. No withdrawals/N.A.
3.2 Posterior [i estimates of observed difference for negative trials _ _ _ _1. Yes _ _ _ _2. Mentioned and necessity for more patients _ _ _ _3. No _ _ _ _4. N.A.
3.6 Side effects, statistical discussion _ _ _ _1. Adequate _ _ _ _2. Fair _ _ _ _3. Poor _ _ _ _4. N.A.
3.3 Statistical inference A. Confidence limits _ _ _ _1. Yes _ _ _ _2. No ____3. N.A.
3.7 Proper retrospective analysis _ _ _ _1. Good _ _ _ _2. Partial _ _ _ _3. None 3.8 Blinding of statistician or analyst re results _ _ _ _1. Yes _ _ _ _2. No _ _ _ _3. Unknown _ _ _ _4. N.A.
B. Life-table or time-series analysis _ _ _ _1. Yes _ _ _ _2. Shown but incorrect _ _ _ _3. No _ _ _ _4. N.A.
3.9 Multiple _1. _ _ _ _2. _ _ _ _3.
C. Regression analysis correlation _ _ _ _1. Yes _ _ _ _2. No _ _ _ _3. N.A. 3.4 Appropriate statistical analysis 1. Excellent _ _ _ _2. Good _ _ _ _3. Poor _ _ _ _4. Inadequate
looks considered Yes Fixed sample size, no look No
Form 4: PRESENTATION OF RESULTS ID # ___________________________
Study # _ __________________________ Reader ___________________________
4.1 Dates of starting and stopping accession _ _ _ _1. Yes _ _ _ _2. No 4.2 Results of prerandomization A. Data Analysis _ _ _ _1. Adequate _ _ _ _2. Fair _ _ _ _3. Inadequate
4.3 Tabulation of events employed as endpoint for each treatment _ _ _ _1. Presented _ _ _ _2. Not presented 4.4 Timing of events _ _ _ _1. Complete _ _ _ _2. Available to reader _ _ _ _3. Neither
B. Prognostically favoring _ _ _ _1. Treatment _ _ _ _2. Control _ _ _ _3. Equivocal ____4. Unknown
FIG. 103.4 (continued)
789
790 /
PUBLISHING D A T A AND EVALUATING L I TERATURE
CHECKLIST FOR ASSESSING CLINICAL DRUG TRIALS
Title, author, and journal (or book) General Aspects Aim:
Efficacy, toxicity, both: Explanatory or pragmatic:
Phase:
□!
Type:
Experiment □Survey: prospective or retrospective (case-control) □Therapeutic Prophylactic Symptomatic
Design:
□Within patient (crossover, Latin square, randomized blocks) Between patient (non-crossover, one way, parallel groups)
Oil
dill
DIV
DOther
Objective: Major Subsidiary Specific Characteristics Discussed
Evaluation/Comment
Population
Type (patients, healthy subjects) Criteria for inclusion Criteria for exclusion Comparability of treatment groups: Demographic (age, race, sex) Prognostic criteria Stage of disease Response to therapy Associated disease Generalizability (on basis of similarity of participants to patient population) Consent Treatments compared Rationale for dose: Based on weight, amount/time Fixed or flexible (If flexible, what is rule of dosage adjustment?) One dose level or dose-response? Dosage form and/or route of administration Interval between doses and/or hours of administration Ancillary therapy (Forbidden? Or if permitted, standardized, measured?) Duration of therapy Setting (hospital, home) Source of drug (ie, lot, bioavailability, changed formulation of standard medication, resemblance of control and test preparation)
FIG. 103.5 C h e c k l i s t o f T h e University o f Rochester C l i n i c a l P h a r m a c o l o g y G r o u p t o evaluate p u b l i s h e d articles. ( R e p r i n t e d b y p e r m i s s i o n of Drug Therapy f r o m Weintraub, 1982.)
SYSTEMS T O EVALUATE P U B L I S H E D D A T A
Experimental design details
Discussed
Controlled? Controls: Active or inactive Concurrent or historical Assignment of treatments: Randomized (balanced?) Matched Stratification or minimization “Run-in" or “washout” period Timing (schedule of visits, laboratory tests, assessments) Starting and stopping of treatment Compliance Participant (with treatments) Investigators (with protocol) Data collection Measurements used to assess goal attainment (Appropriate type? Sensitive enough? Done at appropriate time?) Observers (Who? Same or variable?) Method of collection (Standard? Reproducible?) Adverse effects Subjective (Volunteered or elicited?) Laboratory tests for toxicity (Done at appropriate time?) Control of bias “Blind" observers “Blind" subjects Evaluator blind but physician treating participant not blind Statistician blind (analysis done with treatment groups unidentified Data analysis Comparability of treatment groups (at beginning and at end of study) Missing data Drop-outs (drop-ins) Reasons Effect on results Compliance taken into account? Statistical tests If differences observed, are they clinically meaningful? If no difference shown, is this due to statistical power of study? Overall Assessment Any "fatal” errors that invalidate results? Conclusions justified? Results significant or extrapolatable? Do benefits outweigh risks?
□Yes Yes Yes Yes
No No Perhaps Perhaps
No No
General comments: _________________
FIG. 103.5 {continued)
Evaluation/Comment
/
791
792
/
PUBLISHING DATA AND EVALUATING LITERATURE
Look at the title: interesting or useful?
-------------------------------- No ■
Yes Review the authors: good track record? -------------------------------- No Yes w or
Go on to the next article
Don’t know
Read the summary: if valid, would these results be useful? -----------
No -----
Yes | Consider the site: if valid, would these results apply in your practice
----- No
Yes | What is your intent? To find out whether to use a (new) diagnostic test on your patients Was there an independent, “blind” comparison with a “gold standard of diagnosis?
No Was an “inception cohort’’ assembled? Yes
Yes
To distinguish useful from useless or even harmful therapy
To determine etiology or causation
To learn the clinical courses and prognosis of a disorder
No Were the basic methods used to study causation strong? Yes
No Was the assignment of patient to treatments really randomized?
No
Yes Read the “Patients” and “Methods” section
I ~ Diagnostic Test Did the patient sample include an appropriate spectrum of disease? Was the referral pattern described? Was reproducibility and observer variation determined? Was the term “normal” sensibly defined? If part of a cluster of tests, was the test’s overall contribution assessed? Was the test described well enough to permit it’s exact replication? Was the "utility” of the test determined?
I
Prognosis Was the referral pattern described? Was complete follow-up achieved? Were objective outcome criteria used? Was the outcome assessment blind? Was adjustment for extraneous prognostic factors carried out?
Causation Is there evidence from true experiments in humans? Is the association strong? Is the association consistent from study to study? Is the temporal relationship correct? Is there a dose-response gradient? Does the association make epidemiologic sense? Does the association make biologic sense? Is the association specific? Is the association analogous to a previously proved casual association?
I
Therapy Were all clinically relevant outcomes reported? Were the study patients recognizably similar to your own? Were both statistical and clinical significance considered? Is the treatment feasible in your practice? Was complete follow-up achieved?
FIG. 103.6 Flow chart of Haynes, Sackett, and Tugwell t o evaluate published articles. (Reprinted by permission of Archives of Internal Medicine from Haynes et al., 1983.)
CHAPTER 1 0 4
Meta-Analysis Introduction, 793
Step 7: Read Papers and Answer Questions Listed on the Checklist, 796 Step 8: Adjudicate Differences Among Readers on Quantitative Measurements, 796 Step 9: Combine Results Obtained and QualityAssure the Data, 796 Step 10: Analyze the Results, 797 Step 11: Interpret the Results, 797 Step 12: Report the Results, 797
Definitions, 793 Types of Data Pooling, 793 Purposes of Meta-Analysis, 794 Steps to Follow in Conducting a Meta-Analysis, 794
Step 1: Develop a Protocol for Conducting the Meta-Analysis, 794 Step 2: Identify Sources of Information T o Be Used, 795 Step 3: Define the Criteria T o Be Used for Selecting Trials to Include in the Meta-Analysis, 795 Step 4: Read, Classify, Code, Score, Evaluate, and Choose Papers for Inclusion, 796 Step 5: Adjudicate any Differences Among Readers, 796 Step 6: Develop Questions, Procedures, and Analyses to Pose of Trials Included in the MetaAnalysis, 796
The Case Against Meta-Analysis, 797
Heterogeneity of Clinical Trials, 797 Approaching the Truth, 797 Can Tons of Garbage Yield a Single Diamond? 798 Potential and Actual Problems of Meta-Analysis, 798 Examples of Meta-Analyses, 799 Future of Meta-Analysis, 799
Types of Data Pooling
INTRODUCTION
Meta-analysis is a relatively new method for reviewing and combining results from multiple clinical trials. Whereas other review methods usually involve narrative discussions of individual trials, a meta-analysis systematically aggregates and quantifies results.
The term “pooling” is often observed in the meta-analysis literature, but it is used with two separate meanings. One refers to combining actual raw data, so that the combined numbers have more power and are more convincing. The second meaning refers to combining the conclusions (e.g., odds ratios) of individual trials to create an overall averaged odds ratio. The author proposes that there is an intermediate level of pooling data that relates to combining summary data of specific groups of patients from multiple trials. Thus, the first level of pooling is that of individual patients’ raw data, the second level is that of summary data of specific groups of patients, and the third level is of overall trial results. Saying that eight of ten clinical trials show medicine A is more effective than placebo is a type of pooling at level three.
Definitions
Meta-analysis is described (Sacks et al., 1987) as “ a new discipline that critically reviews and statistically combines the results of previous research.” A metaanalysis, therefore, is the process of systematically combining and evaluating the results of clinical trials that have been completed or terminated. A further simplification states that meta-analysis is a quantitative summary of research.
793
794
/
P U B L I S H I N G D A T A A N D E V A L U A T I N G L I T E R A TURE
Purposes of Meta-Analysis A scientist or clinician who conducts a meta-analysis may have one or several purposes in mind. These include: 1. T o answer research or clinical questions in a systematic way that have not previously been posed. T o allow examination of interstudy variations as well as any patterns that may be present in the results. 2. T o increase the statistical power for addressing important research or clinical questions. Individual trials with a small sample size are often characterized by low power for detecting meaningful differences; the overall power is increased when data are pooled. This approach also allows subgroup analyses to be conducted with higher power. 3. T o serve as a tool to address research or clinical questions when controversy or conflicting data exist. This may take the form of looking at old trials or planning new clinical trials that have the best possibility of addressing an important question using the best methods. A meta-analysis helps determine whether new clinical trials are warranted, and if so, the sample size required to answer definitively the research question. 4. T o quantify better the magnitude of certain clinical or nonclinical responses. 5. T o decrease biases in addressing specific clinical issues by being more objective, thorough, and systematic. 6. T o learn more about which treatments are optimal for different types of patients. 7. T o strengthen regulatory submissions for new indications of older medicines or to support an old indication. Meta-analyses are relevant to conduct when ethical reasons preclude the conduct of additional placebocontrolled clinical trials to address an important research question. Practical reasons, including cost factors also encourage use of a meta-analysis in selected situations. STEPS TO FOLLOW IN CONDUCTING A METAANALYSIS Meta-analysis takes place in the following twelve steps, which will be described in greater detail below. 1. Develop a protocol for conducting the meta-analysis that specifically identifies the objective and describes the methods to be used throughout the study. 2. Identify sources of materials to be searched for
3.
4.
5.
6.
7.
8.
9. 10. 11. 12.
relevant clinical trials (e.g., computerized data bases, ad hoc searches, bibliographies from the first two sources). Also determine whether specific scientists or clinicians will be contacted to provide information on unpublished trials or trials published in nonindexed journals. Define the criteria for selecting trials to be included in the meta-analysis. Then develop a checklist to score papers for possible inclusion. When the quality of papers or trials will be rated, choose a scale for this rating. Have two or three independent readers read, classify, code, score, and finally evaluate and choose the group of trials that will be included in the metaanalysis. Adjudicate any differences between readers on whether specific trials should be included in the meta-analysis. Develop a checklist of questions, procedures, and analyses to which papers included in the metaanalysis will be subjected. This step is often conducted as part of step 1 (i.e., protocol development) and the methods to be used are included in the protocol. This step differs from step 3, which is the evaluation of trials for inclusion in the metaanalysis. Have two or three independent readers read the papers and answer questions listed on the checklist that was developed in step 6. These readers are extracting the data to be used in the meta-analysis and should be either all the same as or all different from those who conducted step four. Adjudicate any differences between readers (if appropriate) on their quantitative measurements in their specific trials. Combine results obtained and quality-assure the data. Analyze results of the meta-analysis. Create odds ratios if appropriate. Interpret results. Report results. Adequate documentation should be included to help readers of the report interpret the results, and also to enable future reviewers to repeat the meta-analysis, if desired.
A number of these steps will be described in more detail. Step 1: Develop a Protocol for Conducting the MetaAnalysis It is essential to identify specifically the question(s) to be addressed and to determine which measures will be used to address those questions. If any data are to be pooled at levels one or two, the parameters involved must be identified. Creating a protocol for a meta-anal-
META-ANALYSIS
ysis will help to minimize the introduction of biases into the procedure. Determine, insofar as possible, the statistical tests that will be applied to the data obtained. The protocol should provide general information on many of the subsequent steps. Step 2: Identify Sources of Information To Be Used Major sources used to identify potential studies are (1) literature searches of computerized data bases, (2) colleagues, (3) references included in published trials, and (4) unreferenced reports and fugitive literature from academic, private, and government authors (e.g., pharmaceutical company files, government reports). If colleagues or others are contacted, determine if this will be via telephone, letter, or by formal written request sent to members of one or more societies. A published request may also be made either as a letter to the editor or as an editorial in an appropriate journal(s). Some professionals who conduct a meta-analysis want to fine-comb the medical community to locate all possible studies and data for inclusion in the metaanalysis, including unpublished results. Their reasoning is based on the well-known publication bias toward positive studies. Other professionals choose to limit meta-analyses to published data. A third approach (favored by the author) is to obtain as many trials (published and unpublished) as may be obtained without “superhuman” efforts. All studies (both published and unpublished) should be rated as to quality and then stratified into low- and high-quality categories. Separate meta-analyses of each group of trials should be conducted as well as combined meta-analyses to determine any influence of trial quality on outcomes. Step 3: Define the Criteria To Be Used for Selecting Trials to Include in the Meta-Analysis There are several approaches to judging the quality of clinical trials for possible inclusion in a meta-analysis. Choose a scale for rating quality of papers and trials or develop a checklist for identifying trials to include in the meta-analysis. The simplest approach is to use a scale that has been derived ahead of time. Several standard scales have been proposed as valid indicators of a trial’s quality (see Chapter 103 for examples of many of these scales). Scales may be disease specific, medicine specific, or general. These scales may yield descriptive or general assessments, or a specific numerical value. There is no consensus that one type of scale or any specific scale is best. Nonetheless, it is important to determine if the scoring system used to evaluate a clinical trial’s quality is validated.
/
795
1. Does the scale give reproducible results when different people use it to evaluate the same paper? 2. Can the scale pick up differences both within papers or trials and between papers or trials that correlate with accepted differences in medical treatments? 3. How does the scale compare on the above points with other scales used to rate clinical trials? A drawback of the rating scale approach is that some or many of the trial characteristics that the scale is rating are presented in papers in an extremely brief manner, or not at all. Journals often cut large amounts of material from the very parts of a paper (i.e. , methods and results) that are mainly used to judge the quality of a paper. This means that the factor(s) being rated might have been cut from a paper and a rating of “not present” could be made in many categories. Alternatively, the authors may not have submitted adequate details on their trial for it to be fairly judged. If a scale is used to rate papers, it should be established beforehand whether only papers achieving a score of X will be included in the meta-analysis. Alternatively, the top Y% of papers may be included or the top Z number of papers. A further approach to quality rating is to speak to the authors of each published paper or to write them a letter indicating those questions that are unanswered in their publication. This approach requires a substantial effort and responses are usually returned slowly, making this method impractical for obtaining additional data on more than a few studies. It is possible to conduct a meta-analysis both with and without poorer quality trials, to see if the outcome is affected. Typical criteria used in a checklist for determining which clinical trials to include in a meta-analysis are: 1. 2. 3. 4. 5. 6. 7. 8. 9.
Published in a journal Published in specific journals Original research Human research Minimum number of patients Minimum age of patients Minimum duration of treatment Minimum dose of medicine Minimum quality score of publication or unpublished report 10. Type of blind (e.g., double blind) 1 1 . Type of clinical trial design (e.g., randomized controlled trial) Reasons why each rejected paper was not included should be tabulated. This information is often important in interpreting the results of the meta-analysis, as well as in understanding the quality of the trials conducted in a particular therapeutic area. If one combines a class of medicines in a specific
796
/
P U B L I S H I N G D A T A AND E V A L U A T I N G LITERATURE
meta-analysis it may increase any bias present. For example, if anticoagulants are being evaluated, they vary in activity, safety, and other effects. Anticoagulants include aspirin, dipyridamole, warfarin, and ticlopamide. Step 4: Read, Classify, Code, Score, Evaluate, and Choose Papers for Inclusion
People who rate clinical trials and decide on whether or not to include them in a meta-analysis should be blind both to the identity and type of institution of authors and to the source of funding for the reported research. Such information could bias those who are evaluating papers. Strict criteria for inclusion of trials in the meta-analysis must be established prior to acquiring and reviewing the papers. Two (or more) independent readers of the papers should use a checklist for each step. When they are not in agreement about whether to include the paper in the evaluation, either one of them is incorrect, or the paper is vague on a particular point(s). The lack of complete information in most published trials is one of the greatest obstacles to conducting an accurate meta-analysis. Any differences between readers should be settled by adjudication. Ensure that the number of articles evaluated at each part of step 4 is indicated in the final report. This could be accomplished by presenting data for each of the following. 1. Total number of papers read: 2. Number of papers ineligible for inclusion because of X, Y, Z reasons: 3. Total number of papers scored for quality: 4. Total number discarded for A, B, C, reasons: 5. Final number of papers included in the meta-analysis: Step 5: Adjudicate any Differences Among Readers
The one person (usually) chosen to fulfill this role should be chosen ahead of time and be acceptable to all readers whose differences may require adjudication. An alternative is to have the readers themselves settle any differences that arise. Step 6: Develop Questions, Procedures, and Analyses to Pose of Trials Included in the Meta- Analysis
This step relates to the methods that are used to address the objective of the meta-analysis. The methods may be simple or complex, loose or rigorous, involve subjective judgment or be totally objective. The methods chosen depend on the nature of the objective and
the state of the art in the therapeutic area being evaluated. A typical assessment relates to whether active medicine or placebo is preferable and by how much.
Step 7: Read Papers and Answer Questions Listed on the Checklist
This step should be done by two or three people who work independently. Some or many papers will not furnish answers to all of the questions used in the metaanalysis. In some cases this may be because the clinical trial did not obtain data to address those questions, but in many cases it will be because the data were not published. Therefore, a table or answer sheet of extracted data plus unanswered questions are often sent to the authors, who are requested to confirm data extracted and to provide missing data. Step 8: Adjudicate Differences Among Readers on Quantitative Measurements
Guidelines may be created so that only differences greater than a specific amount (or percent) require adjudication. This process may be conducted by one or more independent professionals unassociated with the meta-analysis, or it may be conducted by the readers themselves. Step 9: Combine Results Obtained and Quality-Assure the Data
Numerous approaches to combining data may be considered. The correct approach depends on the protocol, the specific questions posed, and the nature of the data obtained for the meta-analysis. One common analytic approach is to pool all data and evaluate the combined results. Data pooling may be in terms of raw or averaged data from individual patients, groups of patients, or individual studies. However, saying that 9 of 13 clinical trials had negative results and 4 of 13 had positive results ignores the number of patients in each trial and the quality of each trial. Pooling individual patient data is usually impossible because of marked differences between trials in patient inclusion criteria, treatments offered, and trial design. Another approach is to pool the mean results of numerous trials. The following formula may be used for this purpose:
effect size
=
/ mean size in\ / mean size i n \ _ the control the treatgroup / \ \ ment group / ---------------------------------—--------------standard deviation in the control group
META-ANALYSIS
Step 10: Analyze the Results
Analyses may vary greatly between meta-analyses. If studies utilizing different doses or other design aspects are pooled, there probably will be a bias in the type and number of adverse reactions reported. Thus, aggregating adverse reactions (or other outcomes of a trial) may introduce biases or confounding factors. Many published papers claim either positive or negative results, but careful evaluation by the readers participating in the meta-analyses may define outcomes differently. Another factor to consider in analyzing results is that two separate analyses used for one (or more) trial could yield different results. Confidence intervals are often used to present results of multiple clinical trials. This method has the advantage that confidence intervals illustrate the degree of agreement among the trials. This perspective is not obtained when outcomes of clinical trials are described as statistically significant, nonsignificant, or even highly statistically significant. Is it appropriate to average the means of the incidence reported for adverse reactions in different clinical trials (even if weighting is used to consider the numbers of patients evaluated)? Such pooling is impossible if the quality (or design) of each trial is different. How does one include consideration of the quality of each trial when combining results? It is possible to test whether poorer quality trials had different outcomes than better quality trials, in which case the poorer ones can be discarded. It may be, however, that some trials were rated as having a lower quality because the data desired could not be extracted from the publication or report, and not because the trials were poorly designed or conducted.
Step 11: Interpret the Results
The clinical interpretation of meta-analyses has sometimes been almost impossible. For example, if all clinical trials are grouped together regardless of their quality, there may be major problems in interpretation. There are a number of commonly encountered reasons for difficult interpretation. Treatment dosages may have been inadequate to yield an effective response. Clinical trials may have been uncontrolled. Combining results from open-label and double-blind controlled trials often obfuscates the overall results. Patients may not have been stratified by either the nature or the severity of their disease. Treatments often work differently in severely ill and mildly ill patients. Furthermore, patients may not have been stratified by their degree of risk for events being measured (e.g., risk of having myocardial infarction). It would not be possible to combine myocardial infarction results from a clin-
/
797
ical trial of patients who had a low risk of myocardial infarction with those from a trial where patients had a high risk. A more general problem is that trial designs may have been poor. Obviously, such trials tend to generate invalid data. The endpoints evaluated in the included trials may have been inappropriate or did not represent state-of-the-art measurements. Also, different endpoints may have been used in different trials. Finally, it may also be important to attempt to determine how much data are missing from the metaanalysis that might influence the results. Step 12: Report the Results
Reporting involves traditional methods of publication and dissemination of information. THE CASE AGAINST META-ANALYSIS
Meta-analysis is a sound concept with many practical applications. The following comments are not made as an attack on meta-analysis, but as a statement about its limitations. Heterogeneity of Clinical Trials
Assume that someone who wishes to conduct a metaanalysis works extremely hard and locates all clinical trials ever conducted on the research question (i.e., the objective of the meta-analysis), both published and unpublished. Examination of those trials reveals that they are quite heterogeneous and addressed different questions. Combining their results may be less valid than identifying the single (or two or three) best clinical trials and accepting its (their) data as the answer to the question. The counterargument is that the best trials cannot always be identified, and even if one or more are identified as best by one person or group, others will not necessarily agree with this assessment. Moreover, the counterargument states, lack of agreement about which is best (i.e., highest validity) is the very reason to do a meta-analysis. Approaching the Truth
A fallacy pervading this field is that an answer closer to “the truth” will be achieved by combining data from all existing clinical trials on a specific question or issue. The value and validity of a meta-analysis in addressing a specific question has more to do with the quality of the meta-analysis and the quality of the trials combined than with the number of trials, patients, or any other factor.
798 /
P U B L I S H I N G D A T A AND E V A L U A T I N G LITERATURE
Can Tons of Garbage Yield a Single Diamond?
Given the great inaccuracies within clinical trials in methods, presentations, and interpretations, it is absurd to believe that combining poor or mediocre trial results will yield summaries that approach the truth. Combining garbage does not yield diamonds. However, quality is only one factor and one would not want to include extremely high-quality trials in a meta-analysis, regardless of other criteria. A good example is the assessment of the operative mortality of coronary artery bypass grafts. The standards of surgical technique have improved over the last 20 years, so that older surgical trials should be excluded from a metaanalysis even though the trials may have been extremely well done. However, these trials would be important to include if one was comparing current and older results or was plotting mortality trends. POTENTIAL AND ACTUAL PROBLEMS OF META-ANALYSIS
Most criticisms of meta-analyses in the literature are directed toward the inherent problems of combining data from multiple trials. Some of these problems are listed in Table 104.1, and others are briefly mentioned below. A more complete discussion of methodological issues is given by Furberg and Morgan (1987) and in other articles presented at the same conference (Workshop on Methodologic Issues in Overviews of Randomized Clinical Trials), whose proceedings are published (Yusuf et al., 1987). Boissel et al. (1989) also present a major review of various issues. Publication Bias. Investigators are less likely to write up negative than positive results of clinical trials. A survey of 58 investigators who conducted 921 ran-
T A B L E 104.1
Potential problems in conducting a metaanalysis
Important differences may exist among trials combined in terms of the: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
Question posed in the clinical trial Diagnosis of patients Severity of the patients’ problem(s) Concomitant medicines and diseases Prior treatment of the patients’ problems Doses of medicines used Dosing schedules of treating patients Duration of treatment Quality of the clinical trials' design (e.g., open-label versus double-blind) Quality of the clinical trials’ conduct Quality of the clinical trials’ statistical analysis Quality and completeness of the clinical trials’ publication or report Parameter(s) used to measure efficacy Definition of a successfully treated patient Types of statistical analyses used
domized controlled trials found that 96 (21%) of the trials were unpublished and that their positive studies were more likely to be published (77% vs. 42%, p = 0.001 (Chan et al., 1982). Journals, too, are less likely to publish negative trials. Simes (1986, 1987) proposed that this bias could be eliminated by establishing an international registry of all clinical trials that could serve as a primary source of data for future meta-analyses. This subject is discussed further in Chapter 106. Selection Bias of Articles for Review. Clinical trials excluded from the meta-analyses may be critical to addressing the question posed. Some individuals advocate including only randomized clinical trials, whereas others are not as adamant on this point. Retrospective Approach. The nature of meta-analysis is retrospective, and such research is less likely to lead to valid results than is prospective research. Incomplete Data in Source Documents. Few reports or papers of clinical trials have sufficient information to answer fully the questions posed for conducting the meta-analysis. One approach to solve this problem is to contact the authors directly to supply missing data. Observer Bias. Those who read and evaluate papers may introduce their own biases. Quality of Data. If any data or analyses are substandard, then the quality of the meta-analysis will be affected Interpretation Bias. Reviewer’s bias may affect quality. Nonindependence of Results. Nonindependence is a statistical issue. False Sense of Security. The fact that a mathematical approach has been used to combine data across clinical trials makes some people believe that the outcome has greater validity than it truly does. Data from Abstracts. Given the relatively lower validity of abstracts in general compared with full papers, data from abstracts should not be included in metaanalyses. If included, the meta-analysis should be conducted both with and without those data. Although some authors have challenged the processes and concepts of meta-analyses, this approach is a significant advance toward reaching answers to major medical questions. The procedures used to conduct meta-analyses will undoubtedly change and improve, but the process is here to stay. A study of pooled results was conducted by Baber and Lewis (1982) to address a specific question relating to the time of initiation of [3-receptor blocker treatment after a myocardial infarction. The authors pooled clinical trial results separately for those authors who initiated treatment “early” or “late” after onset of pain. They observed a greater reduction in mortality in clinical trials in which treatment was initiated “late.” There are a number of arguments against the prac-
META-ANALYSIS
tice of pooling data from separate clinical trials. Some of these reasons are listed in Table 104.2. Each of these points has some validity, and they should be considered when this type of exercise is conducted. Nonetheless, each of these points may be strongly debated, and it is the author’s view that the balance is often on the side of pooling and reanalyzing data, providing that the potential drawbacks are considered. Pooling data from clinical trials that consider important therapeutic questions may eliminate ineffective therapies from medical practice more rapidly than would otherwise occur. Nonetheless, many interesting medical questions are not amenable to this approach. EXAMPLES OF META-ANALYSES
Two specific examples are used to illustrate the methodologies and approaches used in tertiary interpretations. These examples involve the questions: 1. Should antibiotic prophylaxis be used in patients during surgery of the colon? 2. Should anticoagulants be used in patients who have had a recent myocardial infarction? The question addressed in the first example was considered by Baum et al. (1981). They found that pooling mortality data (one of their two endpoints) from multiple trials enabled them to improve on the original analysis of the data. Each of the individual trials had too few deaths to conduct a valid statistical analysis. They examined 26 trials published from 1965 to 1980. In the most recent publications in these series (14 from 1976 to 1980), three advocated against the use of proTABLE 104.2 Potential problems in pooling data from
clinical trials that used different protocols 1. Definitions of important terms may differ between clinical trials 2. Inclusion criteria and the types of patients enrolled may markedly differ between clinical trials 3. Clinical trial designs or particular aspects of the design (e.g., medicine dosages, medicine regimens, controls, concomitant medicines) differ between clinical trials, and it is difficult to combine data from differently designed trials 4. Endpoints used may differ between clinical trials in the disciplines and methods used to measure effects (e.g., clinical, biochemical, physiological, pathological) 5. Even when the same endpoint is used, the definition of that endpoint may differ, or the tests and parameters used to evaluate the same endpoint may have differed 6. Unpublished clinical trials may have been performed that gave negative results and were not included in the pooled data. This tends to skew combined results in a positive direction 7. The quality of published clinical trials varies, and unless an evaluation is made of each, there is a risk of having one or more substandard trial have too great an influence on the pooled data 8. Combined dataobtained at different points in time may invoke an additional confounding factor
/
799
phylactic antibiotics, whereas 1 1 advocated the use of prophylactic antibiotics. The other endpoint they evaluated was abdominal wound infection. They observed that 19 of the 26 trials defined the criteria for wound infection but that differences existed in the operational definitions used. Sixteen trials required the presence of pus in the wound to state that the wound was infected, 5 trials required a positive culture, and 12 specified the time period during which wound infections could be detected. An example in which all available articles on a topic are collected but the data are not pooled is in the same area of antibiotic prophylaxis in surgery (Di Piro et al., 1984). These authors listed trials that presented data on 13 parenteral cephalosporins, but they did not pool and reanalyze the data. The second example involves the evaluation of anticoagulant medicines in patients who have had a recent myocardial infarction. The efficacy of this treatment was said to be unsettled in 1969 (Gifford and Feinstein, 1969). Between 1969 and 1973, four randomized controlled trials were conducted and demonstrated a lower rate of deaths in treated patients, but only one of these trials had a difference that was statistically significant (see Chalmers et al., 1977). These four trials plus 28 other controlled trials were reexamined, and all data from randomized control trials were pooled. Pooled data on fatality rates were statistically significant in favor of using anticoagulants (Chalmers et al., 1977).
FUTURE OF META-ANALYSIS
A meta-analysis differs from the traditional narrative review article of a field of medicine or of a particular scientific topic. The popularity of meta-analysis has grown substantially during the 1980s and is expected to continue to grow during the 1990s. The method, unfortunately, can be easily applied to inappropriate questions or may be poorly executed. Either problem will result in a poor meta-analysis — one that could mislead and certainly would not enlighten. This potential pitfail can be avoided through use of appropriate standards by journal editors. Journal editors are interested in the topic of metaanalysis and several have written editorials on the topic (e.g., Simon, 1987b; Cunningham, 1988; Naylor, 1988). Other journals have had “special communications” (Thacker, 1988) or “perspectives” (Wachter, 1988) on this topic. Reviews of meta-analyses are given by Sacks et al. (1987) and by Gerbarg and Horwitz (1988). A symposium on meta-analysis was published ih an edition of Statistics in Medicine (Yusuf et al., 1987).
800 /
PUBLISHING D A T A AND EVALUATING L I TERATURE
If, in the future, more investigators provide data on individual patients to archives, the pooling of raw data from different clinical trials will be possible. This would be an almost ideal way to conduct meta-analyses. Additional details on patients could be provided through archives, publications, or by authors who are contacted directly.
Interpretation of meta-analyses requires a great deal of common sense and a careful judgment of the clinical significance of the results. Results may be statistically significant, but one should question whether the difference in efficacy parameters presented is large enough to justify altering (or confirming) a clinical practice.
CHAPTER 1 0 5
Mispresentation of Clinical Data Methods Used to Mispresent Clinical Data, 801 Format Used, 801 Transformation Used, 801 Choice of Scales or Omitting Error Measurements, 801 Complexity and Confusion, 802
Using Nonconventional Presentations, 804 Illustrating Part of the Data, 804 Labeling Scales, 804 Incomplete Expressions, 805 Motivation of Individuals Who Mispresent Data, 806 Presenting a Single Set of Data in Several Ways, 806
Correct presentations of data are shown in Presentations of Clinical Data (Spilker and Schoenfelder, 1990). This chapter illustrates a number of presentations that misrepresent the data, whether done purposely or inadvertently. Additional examples are given by DeJonge (1983). Many other methods that illustrate how statistics may be used to deceive intentionally rather than enlighten were presented in the books How to Lie with Statistics (Huff, 1954), Flaws and Fallacies in Statistical Thinking (Campbell, 1974), and aha! Gotcha (Gardner, 1982).
Choice of Scales or Omitting Error Measurements This type of misrepresentation involves the use of graphs to illustrate data that have an obviously inappropriate scale (e.g., logarithmic scale for time), or an inappropriate scale that is less obvious (e.g., a logarithmic scale for certain variables that should be expressed arithmetically). The comparison of trends in two different groups may be greatly distorted if a logarithmic scale is used, or if two separate scales are used for either the abscissa or ordinate (Fig. 105.1). 1. A histogram without error bars or confidence limits can give the appearance of a large change having occurred that is probably significant. Nonetheless, not all histograms require error bars or confidence limits. 2. A narrow scale (e.g., diastolic blood pressure graphed from 90 to 95 mmHg to show medicine effects) can exaggerate minor changes. 3. A broken scale can also exaggerate minor effects. 4. A scale of clinically unimportant values can make a meaningless effect seem important. 5. A scale of values that are never observed clinically may make an unrealistic effect seem relevant.
METHODS USED TO MISPRESENT CLINICAL DATA Format Used Inappropriate formats (i.e., table, graph, figure) are sometimes used. If the reader should be shown details of the data to support a conclusion and only a summary graph is shown, it may be concluded that the wrong format was used.
A graphic illustration of how the presentation of data may influence interpretation is shown in Fig. 105.2. The plot of deaths from diphtheria using a logarithmic scale makes it appear as if the introduction of diphtheria immunization led to a rapid and major decline in childhood deaths. The arithmetic plot on the right illustrates that the death rate was markedly falling at the time when immunization was introduced and that
Transformation Used Data are transformed (e.g., to percent change), the raw data are not shown, and there is no method for the reader to determine the values of actual data obtained. This is a common error in the medical literature. See Chapter 117.
801
802 /
P U B L I S H I N G D A T A AND E V A L U A T I N G LITERATURE
20
30 20
Medicine A
Medicine A
10 10
5 Medicine B w «2 >2 >2 >2 1 1
No No Yes Yes No No
Cumulative Random In order of ascending dose Random Single dose or fixed range per group Single dose or fixed range per group
“The letters refer to dose-response relationships illustrated in Fig. 116.1 The “ * " in Fig. 116.1 indicates that time was allowed for a washout period between doses.
b
892 /
PLANNING/CONDUCTING MULTIPLE TRIALS
A.
RESPONSE OR ACTIVITY f Dose 1 t Dose 2 t Dose 3 t Dose 4
B.
t Dose 2
t Dose 1 t Dose 4
t Dose 1
t Dose 2
t Dose 3
t Dose 3
*
t Dose 4
D.
t Dose 3
*
t Dose 1
*
Group Group Group Group
1 Dose 2
IV Receives Dose 4 III Receives Dose 3 II Receives Dose 2 I Receives Dose 1
Clinical Trials of Clinical Trials of Clinical Trials of Clinical Trials of
Dose 4 Dose 3 Dose 2 Dose 1
TIME FIG. 116.1 Types of clinical trials that may generate data for creating dose-response relationships. The same dose-response curve may be created from any of these six graphs (graphs not drawn precisely to scale). Asterisks denote periods of washout. A: Cumulative doses administered to a group of patients in a stepwise manner without washout (in one or multiple trials). B: Doses administered in a random order to patients, without washout (in one or multiple trials). C: Progressively increasing doses administered in a stepwise manner with washout periods between each dose (in one or multiple trials). D: Doses administered in a random order to patients with washout periods between each dose (in one or multiple trials). E: Each patient in a group receives a single dose, and several groups in one trial each receive different doses. F: Each patient in a group (or trial) receives a single dose and several groups in different trials each receive different doses.
COMBINING EFFICACY D AT A FROM MULTIPLE T R I A L S
If they are too low they may be below or at the threshold of the dose-response curve; likewise, if the doses are too high they may be at (or beyond) the peak or plateau phase of the dose-response curve. In addition, they must cover an appropriate range so that a d o s e response relationship may be observed. Dose-Escalation Trials
In dose-escalation trials that are conducted to develop a dose-response relationship, only those patients who d o not respond to a given dose are given a higher dose. This is a highly ethical design because all data may be used and each patient contributes to the dose-response evaluation, even if they drop out of the clinical trial. This approach is also similar to the way patients are often treated in a clinical practice. In addition, inter- and intraindividual variability are distinguishable and one can obtain a population dose-response curve. On the other hand, there are some potential problems with this method, primarily because simple pooling of data may produce biased results. A statistician should be consulted for techniques to minimize or eliminate such biases. Approaches A through D in Fig. 116.1 may be utilized in a single patient, a single group, a single clinical trial, or in multiple trials. The choice between these approaches depends on several factors, such as how rapidly a medicine’s effect wears off and whether it is practical to give cumulative doses. Panels B and D show approaches to studying the dose-response relationship that are strong tests for whether a true dose-response exists. These approaches minimize the chances of a false-positive result, as compared to the approaches in panels A and C, in which a stepwise progression of doses is used. If a patient is responding positively to treatment the protocol should specify whether he should be given still higher doses. Forced escalation is not usually done because physicians rarely would increase the dose further in clinical practice unless the patient’s response was inadequate, and one of the major goals of Phase III clinical trials is to approximate usual clinical practice. In addition, it could be considered below ethical standards to raise the dose of a patient already benefiting from therapy to a point at which new adverse reactions or more intense ones could occur. Two additional cautions about using the dose-escalation techniques are that (1) the natural history of the disease may change over time and (2) effects may be attributed to medicine when, in fact, these effects are not medicine related. Also, it is possible to increase the dose when no effect is observed, but the patient may not yet be at a steady state pharmacokinetically or clinically at that particular dose. Therefore, it is
/
893
advisable to have a minimum period required before a patient’s dose may be raised to the next higher dose or dose range. Absence of a Dose-Response Relationship
If no dose-response relationship is observed with a medicine it may be a true effect or it may mean that (1) inappropriate efficacy parameters were used, (2) the efficacy parameters were inappropriately measured, (3) the data demonstrate great variability, (4) the doses chosen to study were all at the plateau part of the dose-response relationship, (5) the doses were too close together to illustrate the shape of the curve, or (6) human homeostatic mechanisms compensated and blunted dose-related effects. Are Dose-Response Relationships Always Possible?
Dose-response relationships are sometimes difficult to demonstrate in humans. Outside of all-or-none effects, this is particularly true when a large placebo effect occurs and efficacy itself is difficult to establish. This is particularly true for psychiatric diseases, in which a large subjective component is present. In many (if not most) of these diseases it has not been possible to demonstrate dose-response relationships to a medicine. Nonetheless, individual patients with those same diseases have shown dose responses to specific medications. When their data are combined or when groups of patients are given different doses, however, the combined data fail to show a dose response. Given the small increments over most of the active dose range, it should be possible to show a dose response by either enrolling very large numbers of patients or just exploring threshold responses. Neither of these approaches is realistic either in the context of developing new medicines efficiently or conducting academically oriented clinical trials. The most realistic approach to this issue is to titrate patients individually and to illustrate individual doseresponse relationships. Another approach is to examine a secondary measure of efficacy. In depression, for example, the onset of therapeutic benefit is delayed, and it may be possible to demonstrate a faster onset of activity with a higher dose than with a lower one, even though the amount of activity observed (i.e., the response) may be the same with each dose. Dose-Responses for Safety Parameters
In addition to dose-response relationships to illustrate and evaluate a medicine’s efficacy, dose-response relationships may be used to illustrate safety parameters.
894
I
P L A N N I N G / C O N D U CT I N G M U L T I P L E T R I A L S
A dose-response curve may be constructed for each adverse reaction, using each of the following as a parameter on the ordinate: percent of patients affected, median effect observed, maximum effect observed, or another measure of the intensity of the response. A different approach is to present all significant or frequently observed adverse reactions on a single d o s e response curve by indicating the dose at which each of the effects is known to occur. This approach would be useful to illustrate that different adverse reactions usually occur at different doses. A recent book edited by Lasagna et al. (1989) focuses on dose-response relationships.
Threshold Effect of Dose-Response Relationships in Toxicology Studies
The issue of the shape of the dose-response relationship at low doses of carcinogens or teratogens is actively debated in the fields of toxicology and oncology. Indications of the true shape are quite significant at very low doses of medicines. If the true shape of the dose-response curve was linear and without a threshold dose (i.e., the curve passed through the origin), it would mean that there is always some risk of that medicine causing cancer, even at relatively low doses. A supralinear dose-response curve (i.e., curved above a
Active Medicine
EFFECT Placebo
TIME OF TREATMENT B.
Active Medicine Placebo
C.
Active Medicine Placebo
Active Medicine Placebo
Active Medicine Placebo
Active Medicine Placebo
After Deleting Placebo Responders from the Clinical Trial FIG. 116.2 Time course of placebo response. Five patterns illustrate selected means to separate placebo and medicine effects. This information, if known historically, may help determine the appropriate design and duration of a clinical trial. When the situation in the left of part E is expected, all patients may be placed (single blind) on placebo and evaluated at a predetermined time. All responders are then discontinued. Remaining patients are then placed double blind on either trial medicine or placebo to give the curves on the right.
COMBINING EFFICACY D A T A FROM MULTIPLE T R I A L S
straight line, but meeting the line at its two endpoints), starting at the origin, would have the same interpretation. A linear dose-response curve with a threshold dose would mean that there is a dose below which there is no risk of the medicine causing cancer. A sublinear dose-response curve (i.e., curved below a straight line, but meeting the line at its two endpoints) with a threshold dose would have the same interpretation (see Chapter 88).
PLACEBO EFFECT Another goal of combining data from multiple clinical trials is to evaluate the nature of the placebo effect. The magnitude and time course of this effect is important to evaluate but may not be apparent in a single trial. Various types of time courses of a placebo response are illustrated in Fig. 116.2. If this information is known or estimated prior to a trial, it can help determine the optimal duration of a trial that is necessary to separate medicine and placebo effects. There are obviously several other factors that influence the du-
/
895
ration chosen for a trial. More extensive discussions of the placebo effect are included in Chapters 12 and 93. DETERMINING WHY PATIENTS FAIL THERAPY A potential means of improving efficacy may result from careful analysis of why patients fail therapy. It is therefore important not only to determine the number and percent of patients who fail therapy but also to determine why they fail and the amount of time it takes for them to fail after the medicine is initiated. Reasons given may relate to the rigors of the protocol and not to the medicine. Reasons may also relate to factors that may be modified (e.g., taste of the medicine, frequency of dosing, adverse reactions from rapid peak concentrations), which will allow the clinical trials to be completed successfully. In an extreme case, a suitable means of evaluating a medicine may be elucidated. It is important to explore this issue for clues to successful conduct of both current and future trials and to better data interpretation from completed patients and trials.
CHAPTER 1 1 7
Combining Safety Data from Multiple Trials How Much Safety Data Should Be Collected? 896
The Case against the Above Types of Laboratory Transformations, 900 International Units, 900
Collecting Laboratory Data, 896 Categorizing Abnormal Data, 897
Clinical Significance, 897
Adverse Reactions, 901
Presenting Data on Laboratory Abnormalities, 897 Presenting Laboratory Data Using Transformations, 898
Obtaining Adverse Reaction Data, 901 Categorizing Adverse Reactions, 901 Presenting Data on Adverse Reactions, 901 Calculating the Incidence of Adverse Reactions, 902
Types of Data Transformations, 898 Transformations of Laboratory Data: Proposals in the Literature, 900
Collecting Laboratory Data
HOW MUCH SAFETY DATA SHOULD BE COLLECTED?
Detailed laboratory data from normal volunteers and patients should be relatively extensive. These examinations include evaluation of clinical chemistry and hematology analytes in blood (even for topically applied medicines) and routine urine analytes. Specialized tests suggested by the specific actions of a medicine or by its chemical relationship to other known medicines should also be evaluated. Except in unusual circumstances, a minimum of approximately 100 patients should be evaluated with broad screening tests, in addition to the normal volunteers who are evaluated in Phase I clinical trials. There are exceptions to this generalization. For example, some orphan medicines for a rare disease will never be able to enroll 100 patients, and if over-the-counter status is sought for a marketed medicine then many more than 100 patients will be required to establish a reasonable level of safety. If the above data suggest that abnormalities may occur as a result of a medicine, then those parameters should continue to be closely monitored, until the reason for the abnormality is determined (e.g., a concomitant medicine) and evaluated. If the reason is not found, a sufficient characterization of the abnormality
It is important to determine how much safety data are adequate and necessary to collect in each clinical trial in terms of both scope (i.e., the number of different parameters and tests to evaluate) and depth (i.e., the amount of data required for each parameter). The answer to this question depends on the ( I ) characteristics of the medicine, (2) previous clinical experience and data available on the medicine, (3) disease being treated, (4) regulatory considerations, (5) safety of alternative therapy, and (6) the time in history that is being discussed. Requirements in the quantity and quality of data necessary to demonstrate safety of a new medicine have steadily increased over recent decades, and the requirements of many national regulatory authorities differ. One may address the question of how much data to collect by establishing the scope and magnitude of the safety data that a prudent physician in the relevant country would desire to know prior to prescribing a new medicine for a patient. It is also possible to address this question by attempting to second guess the relevant regulatory authorities on the amount of data required for approval.
896
COMBINING SAFETY D A T A FROM MULTIPLE T R I A L S
and its clinical importance should be achieved. Issues regarding benefit-to-risk considerations are discussed in Chapter 99. It is important during Phase 1 clinical trials of most new medicines to evaluate normal individuals who do not abuse drugs or alcohol. These groups of abusers generally have a higher rate of abnormal laboratory results and are usually less reliable in complying with a protocol. Nonetheless, it is common for normal individuals to have one or more laboratory parameters fall outside the reference range and for this event to be clinically unimportant. The abnormality observed is often not reproducible.
/
897
acterized by the investigator as probably or definitely medicine related 7. Abnormalities that have been confirmed by repeating duplicate samples or have been reported as abnormal on at least two separate occasions. Any of the above categories may be used with this additional criterion 8. Duration of abnormality after medicine treatment is discontinued. The time for the return of an abnormal value to the normal range (or to baseline) may be used as a parameter 9. Association of the degree of abnormality with (1) dose of medicine given, (2) duration of treatment, (3) total quantity of medicine given, or (4) another parameter
CATEGORIZING ABNORMAL DATA Clinical Significance
In comparing laboratory data from several clinical trials it is possible and usually meaningful to categorize the data by degree of abnormality, in addition to the generally all-or-none question of whether the data are defined as abnormal. One means of categorizing abnormalities is by the percentage deviation both above the top of the normal range and below the bottom of the normal range. Various categories that classify degrees of abnormality may be created. One such example is to choose ranges of 20 to 40% above normal values, 41 to 60% above normal values, and so on. Alternatively the actual laboratory units of measure may be used (e.g., 201-400 mg/deciliter, 401-600 mg/ deciliter, and 601 mg/deciliter and above). Similar categories would be created for percents or values below normal values for most laboratory analytes. For a few biochemical parameters, abnormalities are not considered to exist when a value is below the reference range (e.g., serum cholesterol). Apart from establishing broad categories as described above, there are numerous other ways to classify laboratory results. These are described by Spilker and Schoenfelder (1990) and include categorizing: 1. Values that extend beyond an expanded normal range, so as to include only those values that are clinically significant abnormalities 2. Differences of before and after medicine values that are greater than a fixed percentage of the normal range 3. Differences of before and after medicine values that are greater than a fixed percentage of the baseline value 4. Values greater or less than a fixed number (e.g., platelets >500,000, platelets