259 51 2MB
English Pages 71 [72] Year 2023
Bradley K. Weiner
A Scientific Approach to Improving Animal Research in Biomedicine Giving Animals a Chance
A Scientific Approach to Improving Animal Research in Biomedicine
Bradley K. Weiner
A Scientific Approach to Improving Animal Research in Biomedicine Giving Animals a Chance
Bradley K. Weiner Orthopaedic Surgery Houston Methodist Hospital, Texas A&M University, and Weill Cornell College of Medicine Houston, TX, USA
ISBN 978-3-031-24679-1 ISBN 978-3-031-24677-7 (eBook) https://doi.org/10.1007/978-3-031-24677-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Dennis Weiner, MD “Without whom, not”
Preface
The impetus for this book was a paper that I reviewed in my work as an editor for a major journal in my field a dozen years ago. The paper presented results of a ‘new’ intervention being tested using rabbits as an animal model. But the intervention was not really new, rather it was a spin-off of similar interventions that had been clearly shown to be ineffective in humans over a decade prior. The methods used in the study were equally odd; bad statistics, clear ‘mining’ to find significant data, no potential clinical significance of the mined data, etc. Thirty rabbits were used and euthanized with no chance to provide valid information and no chance of translation of the intervention to humans—and the ‘no chance’ conditions should have been obvious before they did the research from the methods employed and the available background information available. Simply, the animals had been used and euthanized without any chance of providing useful information. As a result, with the help of a medical student (Paige Vargo) who was with me at the time, we began to look at other studies that had been published. We pulled the 100 most recent animal studies in my field. We pulled the most recent studies using animals for kidney failure treatments. We pulled papers regarding diabetes. In the end, only about ten percent of papers we assessed that used animal models to test therapeutic interventions had methodologies sufficient to provide valid/useful information. Nine out of every ten papers failed to do so. Animal numbers used were in the thousands. Digging further, it became clear that others, too, had recently recognized the same thing. There were papers trickling out systematically reviewing far more papers and using far more elegant techniques than ours to show that many/most of the published literature (preclinical trials) was invalid/irreproducible. While their focus was on lost money, brainpower, and time; innumerable animals were also lost. This book represents an effort to understand and explain this problem; to delineate the sources of this problem; and to explore ways in which this problem might be addressed. The goal is to head toward the limited the use of animals in this type
vii
viii
Preface
of research—only when there are no reasonable alternatives and only in predetermined research scenarios where the information provided will be scientifically valid and might directly impact human health. Houston, TX, USA Bradley K. Weiner
Introduction
The purpose of this book is to make those involved in animal research studies aimed at improving human health aware of a problem and to provide potential solutions to that problem. ‘Those involved’ includes many/most of us—scientists undertaking the animal research; physicians using the results of the animal research in the hope of improving human healthcare; committee members overseeing the research to ensure it is done properly (from small institutions to big government funding agencies); authors, reviewers, and editors of scientific publications; all of us who might benefit from the results of such research (patients and their families); and those in the general public who care about the animals involved. Historically, and rightfully, the focus in thinking about animal research has been primarily an ethical one: If we believe that the use of animals in research to improve human health is justifiable (and, obviously, there are those who do not believe this), then we must ensure that it is carried out in a manner that is as humane as possible. ‘The Problem’ addressed in this book, however, is only secondarily ethical in this sense and is primarily one of scientific methodology. Simply: Most of the animal research currently undertaken with the goal of eventually improving human health uses underlying assumptions and research methodologies which are insufficient to provide the appropriate foundation needed to reasonably translate/apply the results to humans. The subsequent ethical impact of this is: If a solid foundation cannot be obtained from the majority of animal research because of assumptive and methodological failures, then animals are being used and sacrificed without even the hope of making an impact on human health. And estimates suggest that about 115 million animals are used—per year—in research. Note that, in this book, the focus will be on animal research ‘with the goal of improving human health’—research wherein animals are meant to be models of human disease and interventions are being tested to eventually treat that disease in humans; part of the ‘translational’ research process. For instance, a new drug for diabetes may have been developed; will be tested in a strain of mice with diabetes; if effective and safe, it may be tested on larger animals; and if success continues, tested in humans; and (again, if successful) then used as an option for treatment for human patients with diabetes. ix
x
Introduction
There are many other ways in which animals are used in research—such as gaining knowledge about physiological and pathological processes, studying diseases that affect animals and finding treatments for them (veterinary applications), etc.— and these will not be addressed directly in this book; although many considerations covered in this book will apply indirectly. The ideas in this book will often be explicated by answering a series of questions; and with this series of questions, ‘The Problem’ and its consequences will come to be understood (Chap. 1; ‘Critique’) and potential solutions to ‘The Problem’ will be explored (Chap. 2; ‘Constructive’). (A note about references: In order to limit the disruption of flow in the text, references are presented at the end. Specific references noted early in the text that help delineate the problem are numbered in the text and general references are collected at the end in sections that match the sections of the book on the Problem, Justification, Methodology, Performance, Translatability, and Systemic concerns).
Acknowledgments
I am grateful to Paige Vargo MD and Olivia Weiner MD who inspired and helped me early on to pursue the issues covered in this book; to Ennio Tasciotti PhD and Francesca Taraballi PhD, brilliant young scientists who have run our lab and taught me much; to my children (Jacob, Olivia, and Simon) who are my inspiration to try and make a difference; to my wife, Sherry, whose support and positivity is boundless; to Jim Sheridan PhD and Mark Notturno PhD who years ago showed me the value of critical thinking; to Houston Methodist Hospital for supporting the mission of academic medicine; to Merry Stuber and her colleagues at Springer for their support and patience(!); and to my dogs, cats, donkeys, and horses for obvious reasons.
xi
Contents
1 Critique �������������������������������������������������������������������������������������������������������� 1 1 Failures of Justification���������������������������������������������������������������������������� 2 2 Failures of Scientific Methodology���������������������������������������������������������� 8 3 Failures of Performance �������������������������������������������������������������������������� 17 A sample “Materials and Methods”�������������������������������������������������������� 20 4 Failures of Translatability������������������������������������������������������������������������ 25 5 Systematic Failures���������������������������������������������������������������������������������� 28 6 Summary of the Critique�������������������������������������������������������������������������� 31 2 Constructive�������������������������������������������������������������������������������������������������� 33 1 Justification���������������������������������������������������������������������������������������������� 33 2 Scientific Methodology���������������������������������������������������������������������������� 39 3 Performance �������������������������������������������������������������������������������������������� 46 4 Translatability������������������������������������������������������������������������������������������ 49 5 Systematic Failures���������������������������������������������������������������������������������� 50 6 Summary of Constructive Section ���������������������������������������������������������� 54 3 Putting It All Together: Hope Moving Forward��������������������������������������� 57 References ���������������������������������������������������������������������������������������������������������� 59
xiii
Chapter 1
Critique
In the Introduction, it was stated: Most of the animal research currently undertaken with the goal of eventually improving human health uses underlying assumptions and research methodologies which are insufficient to provide the appropriate foundation needed to reasonably translate/apply the results to humans. Undoubtedly, this statement will appear odd to many readers—including those within both the scientific and lay communities. They might reply: “Certainly, a scientist or group of scientists may occasionally make an unrecognized mistake and end up with findings/data that aren’t helpful. Or, maybe, they’ll get negative results that show that their intended treatment was ineffective. But certainly both of these scenarios are uncommon; and most animal research is truly helpful.” Unfortunately, however, the available literature on/research into the subject strongly suggests that our statement is, unfortunately, true: Most animal research undertaken is not useful due to assumptive or methodological faults, and as a consequence, many, many animals are sacrificed with no potential benefits to human health. The literature on the subject that delineates the “problem” is impressive and extensive. Here are just a few of many examples: 1. In three important reviews published in Nature journals from the early 2010s, preclinical and animal studies aimed at the discovery of new drugs were found to be irreproducible (and, thus, useless for the translational process) in 75–90% of papers published in high-profile journals [1–3]. 2. In a series of papers published in the journal Lancet during the same time-frame, the percent of “wasted”/useless biomedical research was estimated at 85% [4–9]. 3. In two major review papers looking at multiple studies that used small and large animals aimed at assessing potential treatments for cerebral stroke, analysis of the data suggested that less than 10% of studies could provide valid data—that is, 90% of animal studies provided no useful information [10]. 4. In oncology, a review of 53 papers showed that 90% of the papers reviewed did not/could not produce useful data for translation [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. K. Weiner, A Scientific Approach to Improving Animal Research in Biomedicine, https://doi.org/10.1007/978-3-031-24677-7_1
1
2
1 Critique
5. In another Nature article looking at treatments for Amyotrophic Lateral Sclerosis (ALS), it was found that not one out of one hundred studies using mice to test potential treatments provided useful/valid data that might be translatable. Not one [11]. 6. In a now classic paper assessing thousands of published animal studies aimed at discovering treatments for neurological disorders, only about 5% of studies were estimated to provide valid/useful information [12]. 7. In several studies from our group, assessing these issues in a variety of disease processes—from animal studies aimed at treatments for spinal disc degeneration to those assessing acetaminophen effects on the liver—we have found that well under 10% provide valid/useful information. 8. Etc. The list of papers and reviews documenting “the problem” could go on. In nearly every medical subspecialty, biomedical animal research is found to provide the valid information needed to potentially afford translation, maybe, 10–15% of the time. Thus, 85–90% of biomedical animal studies produce information that is of no use for translation to humans. So, what accounts for the high rate of animal research that is not useful? The sources of failure fall into five broad, and often overlapping, categories: (1) Failures of Justification; (2) Failures of Scientific Methodology; (3) Failures of Performance; (4) Failures of Translatability; and (5) Systematic Failures. In what follows, an in- depth exploration of each of these sources of failure will be undertaken.
1 Failures of Justification Over the past one hundred years, justification for the use of animals for biomedical research included only two requirements: Potential benefits to human health and humane treatment of the animals. These minimal requirements no longer suffice. Rather, justification for the use of animals—especially in light of “the problem” outlined above—requires more. Are there good alternatives to using animals? In other words, can we get the answers we seek using scientific techniques that don’t require animal use at all? This is the first and most important “justification” question. And, currently, the answer is increasingly (and, thankfully) becoming “yes.” The first and oldest options are tissue culture techniques. These include immortalized cell lines, explant cultures, and organ cultures. Immortalized Cell Lines are cells that continuously divide having derived from cancer cells or non-cancer cells that have been manipulated artificially. They can be purchased by researchers for use in their labs and have a distinct advantage of being (in theory) genetically identical; so they are generally well-characterized, reproducible, and afford the opportunity for many researchers to address a similar problem; enhancing the likelihood of success/providing useful information. In addition, they
1 Failures of Justification
3
multiply rapidly and continuously allowing sufficient material for detailed analysis. And, finally, they are often human-derived cells—so effects are more likely (than the use of animal-derived cells/tissues) to mirror the characteristics of those found in humans. There are many cell lines available; the most famous of which are HeLa cells derived originally from Henrietta Lacks, a patient with cervical cancer. Simply, often animal models are not needed for processes which can be studied using these disease-specific cell lines; when effects of interventions at the cellular or molecular level are being studied. A potential problem with these cells is that with repeated division they can occasionally express unique gene patterns requiring repeat characterization/validation over time. An additional problem in using these cell lines—as it regards animal replacement—is that they are often used in 2D where the cells are grown/maintained in a monolayer on glass or plastic. Thus, the cells do not fully behave as they would in the in vivo setting; where they are in a three-dimensional setting surrounded by extracellular matrix, have vasculature, and have exposure to the immune system. Gene expression of cells in 2D is lessened. The morphology/shape of the cells does not mimic that seen in vivo (and there is a connection between morphology and function). Combined, the strengths and limitations allow cell lines to replace animals for very specific types of studies but do not afford the replacement of animals beyond those specific instances. Explant Culture is a technique wherein a portion of tissue is carefully and sterilely removed from a human or animal and then placed in an environment that supports it; mimicking the in vivo situation, while being ex vivo. Done properly, the tissues generally remain viable for several weeks affording experimentation and visualization of responses of tissues to therapeutic interventions. This model requires an ex vivo environment that carefully matches the in vivo milieu in humans. The advantage is that both explanted cells and tissues (cells with their extracellular matrix) can be used for experiments using this technique; and culture and expansion can be undertaken allowing further experimentation. The tissues can come from humans undergoing surgery (cartilage, for instance, explanted during joint surgery), thereby affording a reasonable alternative to animals—human tissues in a simulated in vivo setting. The technique can also be undertaken with animal tissues; but sacrifice can be avoided with limited tissue resected and potentially cultured, thereby reducing the number of animals that may be needed. Limitations including repeatability, lack of vasculature, lack of exposure exposure to the immune system, etc., are similar to 2D cellular systems. 3D Cell Culture techniques are the next step “up” toward better mimicking the in vivo situation. Here cells are grown/cultured and studied on 3D scaffolds that can be formed to mimic properties of tissues to which the cells are exposed in the human in vivo setting. Permeability, surface chemistry, porosity, mechanics, degradation, etc. of the scaffolds can be controlled to closely mimic the cells’ in vivo microenvironment. Thus the cells are more likely to behave as they would in vivo in humans, improving their function as animal substitutes for research. The scaffolds can be created from metals, ceramics, polymers, polysaccharides, proteins, natural
4
1 Critique
extracellular matrices, hydrogels, etc., providing a nice niche for near “normal” cellular behavior and study. Disadvantages include problems with reproducibility of the scaffolds complex structures (which can alter cellular function once on the scaffolds), rapid/high output studies are difficult given the complexity of scaffolds and their production, and vasculature and immune considerations persist (or the attempts to address these issues are rudimentary). Thus, while providing more potentially valuable information than 2D systems, they are still imperfect and cannot wholly replace the in vivo features of animal studies. Organoids are unique 3D culture systems. Using pluripotent stem cells or adult stem cells that are induced down specific pathways (induced differentiation), the differentiating cells may mimic organ regeneration in vitro when grown in a manner that allows expansion and self-organization in three dimensions. This is often accomplished without the use of scaffolds, and the resultant “organs” have form and function that mimics human organs in vivo. The technique has proven useful to study intestines, stomach, lung, thyroid, liver, kidney, blood vessels, pancreas, prostate, etc. A significant potential advantage of this technique is that, using a patient’s own particular adult stem cells, “organs” can be produced that are “their own”— affording study for personalized (patient-specific) medicine. Organs-on-Chips are microdevices (about the size of a flash drive) engineered with human cells to mimic organ function affording the assessment of tissue functions as well as reactions to external conditions—such as stressors (hypoxia, toxic chemicals, etc.) or medications. Like organoids, they use pluripotent stem cells (with induced differentiation), but, rather than rely on self-organization, they use 3D printing, microfluidics, and cell sensors to model organ function and responses. This has proven useful for drug development, and newer techniques to link multiple organs-on-a-chip (body-on-a-chip) are being developed better imitate multiple organ systems’ interplay. Importantly, multiple cell lines can be integrated to mimic vasculature, etc., and personalized medicine approaches are possible. In Silico Modeling is a computer modeling technique that uses vast stores of information gathered from multiple sources, retrieved from databases, and then synthesized to make predictions regarding chemical (drug) toxicities and bioactivities. The premise is that predictions can be derived from the structures of the chemicals explored for potential medical applications. The molecular structure of a chemical allows prediction of its intrinsic properties, potential interactions, and functions. Thus, by simply knowing a chemical’s structure—given wide background knowledge from multiple sources—we can learn a great deal about how it might behave in humans without the use of animal testing. Summary: Animals should not be used in the first place if there is a reasonable alternative available that can provide the answers we seek. If no good alternatives exist, is the human physiological or pathophysiological process for which the study is being undertaken significant enough to justify the use of animals? This question is best addressed via a cost/benefit type of analysis. The
1 Failures of Justification
5
“costs” in this case often include animal stress, pain, and eventual sacrifice. The “benefit” is the potential to improve some aspect of human health. The weighing of costs and benefits will be subjective; in that individuals will have differences of opinion. There are those for whom the costs—any animal stress, pain, and sacrifice—can never be justified; no matter the potential benefits for human health. A rejection of any level of anthropocentrism, so to say. There are others (disturbingly) who hold a nearly opposite stand. (Once, early in my academic career, I attended a presentation given at a large international conference wherein a lecturer presented an animal study that appeared to be far from ethically sound in regards to animal use. When asked about it during the question and answer period, the author’s response was: “They’re rats!”). While acknowledging that these extremes of opinion exist, we recognize that the great majority of biomedical scientists’ and the general publics’ opinions fall between them: They are willing to accept some costs, if the animal stress, pain, and sacrifice (and numbers used) are minimized and the human pathophysiological processes that are being addressed are deemed “important” enough. Like the “costs” side of the equation, the “benefits/importance” side is also subjective. At one extreme, there is considerable agreement on the importance of life- threatening diseases for which the individual suffering from the disease holds little or no responsibility. Childhood cancers, many adult cancers, amyotrophic lateral sclerosis, spinal cord injuries/paralysis, genetic diseases, infectious diseases, etc. are examples. There is also general agreement on the importance of those processes which harm or kill large populations of humans even when they (the patients) hold some responsibility, such as hypertension, cardiovascular disease, Type 2 diabetes, etc. Most people feel comfortable with properly done animal research aimed at therapeutic interventions for these issues. The controversies generally arise when researchers explore problems less worrisome to the general public. Justifying the costs to animals of uncommon or non-life- threatening or non-maiming diseases—or, further, to prevent wrinkles, to re-grow hair, or to lessen the aches and pains of aging—is more difficult. There is considerable animal research in my field aimed at preventing or halting early spinal disc degeneration, despite the fact that spinal disc degeneration (without significant concomitant neurological compression) rarely causes health problems that an aspirin and a good walk in the park wouldn’t take care of. On the benefits-to-human-health continuum, these problems would be considered by most people to be relatively “minor,” and it is certainly unclear/controversial that minor problems are worth the costs to animals—despite the abundance of animal research directed at many such issues. Many would say: “Address these problems if you like, do your research; but don’t use animals for these minor problems.” Summary: The use of animals should be reserved for “important” problems. Is the intervention being tested reasonable given the disease process? While in the prior question above we asked if the medical problem was important enough to justify animal use; here we are asking if there is an appropriate “match” between
6
1 Critique
intervention and disease—enough so to justify the use of animals. This concept is best explored by example: Many laboratories around the world are experimenting in the use of targeted nano-drug-delivery systems. For those unfamiliar, these are tiny devices that can be injected into animals/humans, may travel to a desired location in the body, and deliver a drug to the area with the aim of providing local therapeutic impact. These tiny devices are extremely complex, difficult to produce, expensive, and high risk/high reward (in that their likelihood of success is small and they are potentially dangerous; but, if they succeed, could prove revolutionary). For simplicity sake, we can think of interventions such as these as “high intensity.” The use of such systems for animal studies aimed at deadly diseases for which there is no current good treatment or cure (aggressive cancers (brain, lung, certain breast cancers)) is generally considered justifiable—“high-intensity” therapies for “high intensity” diseases. However, the use of these “high-intensity” therapies for processes that are less threatening (“low-intensity” disease’) and/or where a more reasonable (less risky, less expensive, “less intense”) potential treatment is available may well render the use of animals unjustifiable. Studying the use of nano-drug-delivery systems to treat simple osteoarthritis of the knee just doesn’t make sense (an intensity mismatch between therapy and disease), and animals should not be used (exposed to stress, pain, and sacrifice) when this is the case. (Simply, as another example, we don’t need to test high-powered new antibiotics in animals with a minor infection known to be amenable to a dose of penicillin.) There are, of course, counter-examples. Some significant chronic diseases are felt to be complicated by the accumulation of senescent cells in specific tissues. Senolytics are a class of drugs capable of selectively “killing” these potentially harmful senescent cells, and they are easy and inexpensive to produce and administer (simple pills). Here the ratio is low risk, low cost, simple production, with potential high reward. “Low-intensity” intervention with potential in “high-intensity” disease. A favorable mismatch. Summary: When justifying the use of animals in research, a mismatch between therapeutic “intensity” and disease “intensity” should be carefully evaluated. Is the hypothesis being tested reasonable/important enough to justify the use of animals? So far, it has been suggested that justification for the use of animals should include no reasonable alternatives, research aimed at important health problems/ diseases, and that the intensity of the therapy under investigation be appropriate for the intensity of the disease. In addition, justification should include an appropriate hypothesis. Again, best explored by example: Many years ago in my role as an editor, I reviewed a paper wherein the authors aimed to treat mice with spinal disc degeneration with anti-prion medications. (The discs are the cushions between the vertebral bones in the spine. Prions are altered protein fragments that can cause rare and unusual infections such as “Mad Cow Disease” and neurodegenerative diseases in humans.) The problem with the study was that there is simply no evidence to support the concept that disc degeneration is a prion disease, and as a consequence,
1 Failures of Justification
7
there was no reason to believe that taking the blind step of administering the medication would make an impact. The hypothesis lacked foundation and the paper rejected, although the animals had been sacrificed. Summary: Justification should include a scientifically reasonable hypothesis. Is there a sufficient likelihood-of-success of the experiment to justify the use of animals? A researcher may have no reasonable alternatives, be addressing an important problem, may have an appropriate level intensity therapy in mind, may have a reasonable hypothesis; but still may be set up for failure—the likelihood of the intervention working may be too small to warrant the use of animals. While taking chances and hope-of-cure are part of many/most great medical discoveries; when animals are being used something more than a “shot in the dark” is warranted. If we anticipate a 2% chance of success, is it worth sacrificing animals? How about 10%? It will depend on the intensity of disease, costs/benefits, etc. but needs to be addressed as part of the justification process. Small pilot/exploratory studies may have a role here to better delineate the likelihood of success. Has the question already been answered in other studies? So, even if a researcher has satisfied the other criteria for justification, she must make sure the answer is not already known. If it is, she will be sacrificing animals unnecessarily. Avoiding this requires a thorough assessment of previously completed research on the subject via a systematic review of similar available animal studies. While this used to be remarkably difficult and involved hours in the library, phone calls to colleagues, etc., internet search capabilities and databases of biomedical research have made it far easier. The process involves a search of available databases for animal studies on the same/similar subject; assessing the studies using clear criteria (similarity to the planned study, scientific quality of the study); combining the information provided in studies with similar aims and of sufficient quality; and allowing the findings of the systematic review to guide the proposed study. The conclusions from the systematic review can vary: (1) The question has already been answered in high-quality studies. So, performing a similar study will add little to the available knowledge base and animals should not be used. (2) There are similar studies, but the quality of them is poor; so moving forward with a carefully planned, high-quality study may be reasonable. (3) There are high-quality studies available, but their conclusions conflict; so further exploration of these previously performed studies is warranted to explore the sources of divergent conclusions prior to moving forward with animals. If the more accurate answer is failure of the intervention, one should not move forward. (4) There appear to be very few similar studies published, despite the researchers knowing that others are working on similar projects. This one is especially concerning since failures/negative results often go unpublished (authors don’t like to publish their failures… nor do journals). In this case, the scientist may need to search deeper—directly contacting other researchers in the field for insight, searching through conference abstracts and not just published articles, looking at materials for grants given to similar projects that
8
1 Critique
have not come to publication. Failure of publication often signals failure of the intervention. Summary: A well-done systematic review of prior, similar animal studies is necessary in order to justify the use of animals. Overall summary to this point: If there are no viable alternatives to animal use and the other requirements for justification can be met, then a carefully planned animal study may be considered.
2 Failures of Scientific Methodology The aim of the current section is to flesh out how animal research might be ideally performed from a methodological standpoint and how methodology can—and quite often, does—deviate from this ideal. The assumption—actually, the requirement— is that the use of animals has been justified fully before arriving at this point as outlined in the previous section. It is also important to note that by “failures of scientific methodology,” it is meant failures in following the ideal method and not a failure of the scientific method itself. This is a human-level “failure to do the research right.” What is the ideal methodology to determine if a particular treatment works for a particular disease? For over half a century, the Randomized Controlled Trial (RCT) has been recognized as the ideal method to determine if a particular intervention works as a treatment for a particular pathological process. (i) What is an RCT? An RCT involves taking a group of similar subjects—in this case, animals—randomly assigning them to undergo different interventions; otherwise treating them identically (before and after the intervention); and measuring how they do (using measures that are pre-determined). Since the participants are as similar as possible and they are otherwise treated similarly, one can assume that differences noted in outcomes (how they do) can be attributed to the differing interventions. As an example, we can envision a group of genetically similar mice that have been altered to have high blood pressure. We can divide them up into two groups randomly. We can give one group a new medication meant to decrease blood pressure to a healthier level, and we can give the other group a placebo (a medication that is known to have no physiological effect on blood pressure in mice). If the group that received the medication lowered their blood pressure to a pre-defined level that is believed to make an important difference health-wise, then we may be
2 Failures of Scientific Methodology
9
able to say that the treatment worked to lower blood pressure in mice, may improve their health, and may someday serve a similar benefit to humans. Note the “may” in the previous sentence. While we noted that RCTs provide the best available evidence on efficacy, a positive finding from an RCT does not necessarily guarantee an accurate answer to the question of whether a particular treatment actually works. As with most (all?) things in life, there is more to the story. (ii) How can we get closest to an accurate answer using an RCT? This question is one of validity. What makes an RCT valid—one that we can trust with a high degree of likelihood to provide an accurate answer? First, the research question and the way we will measure outcomes need to be well-delineated a priori (before we start). Second, we need to ensure that the groups being compared are indeed similar. Third, we need to truly randomize the participants. Fourth, it is best if the participants do not know whether they have received the treatment or not. Fifth, it is best if the scientists assessing the outcomes and doing the statistical analysis of those outcomes are unaware which group was treated with the medication/intervention. Sixth, we need to make sure we have the “right” number of participants in each group. Seventh, using statistics we need to determine whether the observed differences between groups following the treatments is “real.” Eighth, we need to determine if that difference is clinically important. And ninth, we need to ensure that the intervention is done nearly identically in all participants. I will refer to this list as we move forward in the text as the “Big Nine of Validity.” Next, we’ll explore the Big Nine. (iii) The Big Nine (a) The research question and the measured outcomes The research question and measured outcomes should be specific and stated before the study is started. Using our example: Does the new drug lower blood pressure to a healthier level? We should know a great deal about the drug and expect that it should be safe and most likely will reduce blood pressure. We should also define what degree of blood pressure reduction counts as satisfying the “healthier level” criteria. (b) Similar groups The goal of these measures is to make the best effort to eliminate differences between the two groups going into the study—differences that might influence the measured outcome and thereby falsely lead us to believe that the intervention did or did not work. Using our example: We might know that the drug is less effective in female mice or too strong for young ones. If we know there are factors that might bias the results toward one group in the study (in this example, female sex and youth if they are not evenly distributed between the groups)—these are termed “known confounders”—then we can limit the participants before we start to a particular population via strict inclusion/exclusion criteria—middle-aged male mice only.
10
1 Critique
(c) Randomization Randomization is another method to (hopefully) avoid being misled. By truly randomizing participants to the two groups, we hope to spread out the confounders (“known” such as age and sex in the example; “unknown” such as cryptic subtle genetic differences that may make the drug ineffective in some) evenly between the groups; thereby assuring that the groups are truly similar and that the differences in outcomes noted are most likely due to the intervention and not these other factors (confounders). (d) Blinding Assuring that participants don’t know what treatment they’ve gotten (in our example, medication versus placebo) and that those evaluating the outcomes and doing the statistics don’t know which group is which helps to limit psychological and social factors that can impact outcomes. Needless to say, blinding the animals is not necessary, but blinding the humans involved in the interpretation of the results is important. (e) Power When comparing one group to another, it is vital to have the “right” number of participants in each group. Too few in each group can lead to an inability to detect real differences between the groups. And too many in each group can lead to the detection of tiny (but “statistically significant”) differences of no real consequence. The “right” number (a power calculation) should be determined prior to starting the study. (f) Statistics Once the results come in, it is vital to evaluate them properly. Is the difference noted between the groups “real” (i.e., due to the intervention)? How confident are we in that finding? How big of a difference is there between the outcomes in the groups? (g) Clinical Significance If there appears to be a real difference between the two groups, the difference noted should be clearly important to the health of those receiving the treatment. (h) Intervention Consistency The administration of the intervention should be close to identical throughout. If a drug is administered via an intravenous pathway, so should the placebo. If the study is of a surgical intervention and placement of a device versus no device; besides device placement, the surgical procedure should be identical. It is also best to have the same person perform all interventions or to ensure that all who are performing the interventions are trained to do so in a nearly identical fashion.
2 Failures of Scientific Methodology
11
How Can Studies Deviate from the Ideal? If a randomized trial in animals fails to be performed appropriately, its results will be unclear/inconclusive. Simply, we will not know whether the findings observed are “real” or not; and, accordingly, they cannot give us any insight that will be useful for translation. And, if the studies performed cannot give us insight, then animals will have been used and likely sacrificed to no benefit—the methodological concerns and ethical consequences are inextricably tied. This point was hinted at earlier but will be explored deeply throughout the rest of the book. With that in mind, let us look at how failure to closely attend to the Big Nine can invalidate the findings of a study. (a) The research question It has been the experience of many researchers that some of their best research questions have emerged casually—over a beer while talking with smart people; hearing a lecture at a local conference; browsing through published research papers stored electronically (PubMed, etc.). Over a decade ago, I serendipitously stopped into a research building dedicated to molecular medicine (primarily because I knew nothing about molecular medicine) wherein was a laboratory dedicated to the use of novel technologies for the treatment of cancer. Commonly in such buildings/labs, posters of research recently presented at conferences are hung in the hallways—“walls of fame” meant to intrigue passers-through and potential donors, and to inspire young researchers within. In this case, I discovered technologies that I thought might be used in my (far unrelated) field of orthopedic surgery. Over time, significant grants and a large program emerged via my collaborations with these scientists, but began with a simple, casual question: “I wonder if I can use some of these concepts/ technologies to encourage bone formation in people with osteoporosis or broken bones?”. One might address this question with animal research, assuming it could be adequately justified. However, when it comes to actually doing the animal research, such casual questions should have no role in direct application. It would be wildly inappropriate to obtain materials from the lab and to place them into mice with weakened or broken bones. Rather, animal research questions need to be formal; in the sense that they are founded upon very solid background information. Using my novel technology example, prior to embarking on research in animals, we need to know that the material is safe in mice; that once there, it is unaltered (is stable); that it can encourage bone formation when we need it, where we need it, and in the quantity we need it; that once it does its job, it breaks down safely (into non-toxic byproducts), etc. In other words, we need to be quite confident that what we are going to do has a good chance to work safely and effectively prior to undertaking the study in animals—otherwise animals may be sacrificed without foundation—that ethics thing again. So, prior to being undertaken and in order to be valid and ethical, animal studies need to have well-documented, well-described, solid background information that supports the proposed research question at hand. “Test-tube” proof
12
1 Critique
long before animal experimentation (if you will). Unfortunately, this does not always occur. (b) Measured outcomes The answer one gets depends wholly on the question one asks. This is especially true when it comes to measuring outcomes. For any pathological process and any treatment aimed at it, there are many possibilities that would count as (measured) outcomes. For instance, imagine an animal model of stroke in mice or rats. One might acutely clamp a vessel going to the brain of the animal and the lack of blood supply may result in focal brain injury seemingly akin to stroke in humans. If a treatment aimed at reversing or limiting the vascular insult (a surgery, stent, a drug, a stem cell treatment, etc.) is undertaken, there are often many different outcomes we might measure and then compare with a placebo (non-treatment) group. Was damage reversed or prevented at the molecular level? How many brain cells were salvaged or regenerated by the intervention compared to the placebo? What was the area or volume of injury difference between the groups on MRI or 3D CT reconstruction? What changes were noted in the rats’ behavior? How about their neurological reflexes on examination? What about their function measured by walking tests or physical tasks? Or tasks of memory? Their strength and sensation? What was the cost of the intervention? Just like the case with the formulation of the research question, the choice of outcome measure is vital and needs to be formulated formally as opposed to informally and done so prior to embarking on the study. The “right” choice of outcome measure will vary with the aim of the treatment and the particular interests of the researchers. If the aim of an intervention in our example is to limit the spread of cell death following the vascular insult, then a cell-count or MRI study may be best. If our primary concern is whether physical function is impacted, then a functional measure (walking ability, strength, etc.) is better suited as our outcome of choice. Often, we may want to measure several outcomes, but the key is to make sure that our desired/specific outcomes are indeed measured and declared beforehand (so that we don’t go “mining” for anything we can lay claim to after we have all the data—if one searches hard enough, something will be found that’s publishable(!)). Unfortunately, it is often the case that they are not—and this failure to ask the right question results in answers that do not help us, and animals are used without benefit. (c) Similar groups Confounders are factors that can perturb and potentially invalidate findings of research studies. As noted previously, these might include known factors— younger mice may be known to not respond to a medication like older mice, and if we have more young mice in the treatment group as opposed to the placebo group, we may falsely come to conclude that a medication or intervention was ineffective when it might actually work. The ways to control for known confounders are by “restriction” and/or “matching.” Restriction is obtained via strict inclusion/exclusion criteria. The researcher may choose to include animals of only one sex and age group and exclude those outside of these c ategories,
2 Failures of Scientific Methodology
13
thereby eliminating the risk that treatment and control groups show different outcomes based on these factors. She does not need to worry about equal distribution of known confounders after randomization if they are simply eliminated from the study beforehand. (But they still need to be randomized to groups (as we’ll see below).) The second way that a researcher might address the problem of known confounders is via “matching.” In this instance, she might make sure that there are equal numbers of young and old as well as female and male animals in each of the groups—an equal distribution of known confounders. Then, again, each subgroup will need to be randomized to treatment/non-treatment. (d) Randomization Just as confounders might be known, they may also be unknown. That is, common factors (age, sex, etc.) and less so (subtle genetic or physiological variations between animals) might impact outcomes of treatment without our knowledge; and, therefore, bias the results inappropriately if these confounders are not evenly distributed between the test groups. Randomization is the method of choice to provide the highest likelihood that confounders (known and unknown) are equally distributed; and, therefore, to avoid (limit, actually) inadvertent invalid findings due to an unequal distribution of confounders—the discovered outcomes being the result of characteristics of the groups as opposed to characteristics of the intervention. That said, animal researchers may be tempted to assume that all the rats in their study are alike in all of the important ways; and thus, choose to not truly randomize them. After all, they look alike, come from the same genetic line, from the same animal supplier, etc. But still, a particular diabetic rat may have subtle genetic/physiological differences, may have had a dissimilar diet prior to being delivered (or may have eaten differently), its delivery/transportation may have differed, it may have been housed in a different cage/setting from others, may have come in contact with a human or another animal that altered it in some way, etc. Those who have worked with animals know how these seemingly minor differences can have great impacts on specific animals—they are fragile in that sense—and thus, in turn, may impact their response to therapies. So, randomization with the hope of equally distributing these factors is vitally important. Proper randomization is vital for the validity of therapeutic studies. “Proper” randomization is the equivalent of a coin toss just prior to the pivotal point of differentiation between the treatment and control groups. If a researcher is doing a surgery on a rabbit with the aim of fusing one spine bone to another by the use of a new drug placed intraoperatively, the surgery should be taken down to the spine, the spine prepared appropriately, and the coin tossed then and there (heads=drug; tails=placebo), with the rest of the procedure carried out identically. She should not know the results of the coin toss (“coin tosses” are often computer-generated numbers in sealed envelopes picked blindly by someone else in the room or from a website—but the end result is the equivalent) beforehand since this may bias the surgery. “This one is placebo; so I can be less concerned with my preparation of the spine bones.” “It’s a placebo,
14
1 Critique
my IV for injection of the drug isn’t particularly good, but that’s okay—it won’t work anyway.” (These are stated as conscious statements by imaginary sloppy researchers. But real, well-intended, non-sloppy researchers can have subconscious behaviors with similar effects.) The intervention undertaken should be randomized and concealed until the “pivotal point.” (e) Blinding Blinding of participants and scientists is important to avoid both conscious and subconscious bias. As noted, blinding of animals is not needed. (The failure to “blind” human subjects in RCTs, however, is remarkably important. Generally, if they didn’t get what they thought would work and they know it, they won’t do as well as if they didn’t know what they got.) However, blinding of the scientists evaluating the outcomes is important. Unblinded scientists who are invested (professionally (“My promotion to Associate Professor depends on this”), financially (“I need a positive outcome to continue this grant funding”), emotionally (“I know this works!”) in the results have a way (conscious or subconscious) of finding the desired results—another type of “mining”—looking more closely for subtle changes on histology slides to detect differences in the treatment group, selecting a beneficial statistical technique that shows a statistically significant result when the other statistical tests did not, etc. If the scientist does not know if the histology slide or the functional outcome measure is from the treatment or placebo group, such “mining” can be minimized. Close attention to concealing from which groups the results came before handing them over for assessment; or having an independent blinded scientist (outsourced) assess the results is best. Failure to blind assessors can (and frequently does) result in biased and invalid findings. (f) Power Having the “right” number of animals in each group is vital. Too few animals can result in finding no difference between groups when, in fact, there are important differences. Too many animals in each group can lead to the detection of minor/insignificant differences between groups and thinking these differences “matter.” Power is a statistical concept used to help find the right number of animals per group (sample size). The most accurate way to detect a difference between groups is to include all participants with a disease. For instance, if we could gather all humans with diabetes and properly randomize them into two groups, use a drug in one group and placebo in another, stick to all aspects of the “Big Nine,” and find a difference in our desired outcome, we would know with great certainty that the drug worked. Gathering all humans or all animals for a study, however, is rather impractical(!), so we aim to gather a representative portion of the total ( a “sample”) that will give us enough information to likely mirror the “all animals” scenario ( or a specific subgroup of interest). The preferred sample size is calculated by the interplay of three variables— power, statistical significance (most commonly, P-value), and effect size. Power and P-value are usually “pre-set” by convention at 0.8 and 0.05, respectively. Loosely, power is “set” to make it highly likely we will detect a difference
2 Failures of Scientific Methodology
15
between groups if there is one, and P-value is set such that we can be pretty confident that the difference we detect is “real.” Effect size is based upon our background information. Sample size, then, is derived via the interplay of these three factors. Simply, if the effect size is very high—that is, the differences in outcomes between treatment and control groups are anticipated to be dramatic, and since Power and P-value are fixed, sample sizes needed will be small. It would take very few animals to demonstrate that specific antibiotics work better than no antibiotics at all in animals with bacterial meningitis which without antibiotics is universally fatal. To include a thousand mice in such a study would be wildly inappropriate—many would be sacrificed when we already have enough information/have the answer to our question after just a few animals. In contrast, a smaller effect size with fixed power and P-value requires a larger sample size. A new chemotherapy drug that might be slightly (but importantly) better than the current drug used may well need a thousand mice to provide solid information. And using only 500 (too few) would be equally “wildly inappropriate.” In both cases—“too many” or “too few”—animals are sacrificed without the possibility of providing useful information. Thus, careful power analysis completed prior to starting an animal therapeutic study is vital and ethically mandatory. (g) Statistics When doing animal research, we want to make sure that any differences we see between the treatment group and control group are most likely “real.” The P-value noted above is a measure of the probability that an observed difference between the groups in a study could have occurred just by random chance (is not due to the intervention being tested). As P-value becomes smaller, the statistical significance of the difference noted increases. Loosely, again, the generally accepted P-value for significance of 0.05 correlates with our being 95% confident that the observed difference is not due to random chance; but, rather, is “real” (using my previous terminology)—due to the intervention. It is a measure of psychological confidence that our data reflects actuality. Published animal research should provide P-values for differences noted between groups so that readers can get a sense of the confidence in the findings. Failure to provide this information can lead to inaccuracies in the interpretation of the findings—thinking an observed difference is “real” when it is not, and vice versa. While P-values/measures of statistical significance are mandatory for valid studies, they are not sufficient by themselves. For many years, the relationship between P-values and researchers was akin to the first few years of most marriages. P-values were the be-all and end-all of researchers. A low P-value was all one needed. Over time, like many marriages, the relationship with P-values has matured, and it has been recognized that there is more needed to make it (research findings, life as a couple(!)) meaningful. It is now recognized as a component of statistically valid research as opposed the measure of validity. Two other components are important. Confidence Intervals are about whether the sample we have used accurately represents the whole population. For
16
1 Critique
instance, in human studies, a researcher may have taken a group (a “sample”) of patients with diabetes in which to study a new drug. She would want that group to accurately represent the entirety of patients with diabetes such that the conclusions drawn from her study can be applied widely to the population as a whole. Confidence Interval is the statistical measure of confidence that the study group accurately reflects the entirety of the population of interest. Animal studies offer the advantage (ideally) of limited internal and external variability—the study group is quite representative of the whole, generally; thus, this measure is less important in the lab. Effect Size (mentioned above in “Power”) is felt by many statisticians to be more important than P-values. While P-values provide confidence that the treatment makes an impact, Effect Size asks how big that impact is, allowing a researcher to ask other important questions—is that impact important and is it worth the risk of the treatment? An excellent example of P-value versus Effect Size importance comes from the Physicians Health Study of aspirin to prevent myocardial infarction (MI). In this human study with more than 22,000 subjects, the use of aspirin was associated with a decrease in MI that was highly statistically significant using P-values. So significant, indeed, that study was terminated early due to conclusive evidence of impact. And aspirin was recommended for general prevention. However, the Effect Size turned out to be remarkably small, and it soon became clear that the magnitude of impact was tiny while the risks of using aspirin daily were not. Further studies found even smaller impacts and the recommendation to use aspirin has been greatly narrowed. In animal studies, information beyond P-values is equally important—effect size and a priori power are mandatory. (h) Clinical Significance If a researcher has done a great job and satisfied the previous eight components of the “Big Nine,” she has generated solid, valid data; she must still answer the question of whether or not her findings truly matter. That is, she may have clear, valid evidence that an intervention works in making an impact in the measured outcomes, but those “impacts” are not clinically important. As an editor, I once reviewed a paper that demonstrated that a new laser technique decreased the cell count in the annulus of the intervertebral disc in mice to some small but statistically significant degree. (The annulus is the outer portion of the cushions (discs) between the bones in the spine.) The finding was statistically valid and other components of the Big Nine satisfied, but decreasing a cell count by some small amount in the annulus of the disc has no clinical application—there is no disease process that would be benefitted by this therapy, and even if there was, the findings were so small that they could not have made an impact (Effect Size—see above). While I’m sure the laser company was satisfied with the results (the laser did something) and the researcher achieved their goal (showing it did something), the clinical impact of the findings was nil, and animals were sacrificed without any chance of making an impact on healthcare. I rejected the paper and lamented the unnecessary use and
3 Failures of Performance
17
sacrifice of the animals involved. (I suspect it was published elsewhere—significant P-values tend to find publication somewhere.) Thus far, we have demonstrated the components of a methodologically ideal study and shown how animal studies can deviate from this ideal. Now we ask: Do Most Animal Studies Deviate from This Ideal? As was documented in the Introduction, the available literature on the subject (and our own studies) suggest that the answer is a resounding “yes.” About 85–90% of published animal studies deviate from the ideal study in such a way as to render them useless from which to draw valid conclusions that might lead to translation to humans. And these methodological failures are avoidable.
3 Failures of Performance Thus far in the book, two “failures” have been discussed—the failure to justify the use of animals in the first place and the failure to follow methodology that would provide valid information. In this current section, failures of performance will be covered. A researcher may well have justified the use of animals and set up a methodologically solid study, but things can still go wrong resulting in useless information being produced and animals used without the chance to make an impact on human health. It is estimated that about half of irreproducible animal studies are due to failures of methodology (previous section) and that much of the other half are due to failures of performance (current section). Performance failures can be divided into failures of performance of the “ingredients” used in the scientific experiments or failures of those performing the experiments themselves; the researchers. Ingredient Performance Failures Antibodies: If a vertebrate (animal, human) is exposed to an invader such as a bacterium, the body produces antibodies to fight off the invader. This phenomenon has been exploited for fifty years as a tool used in animal research. As an example, a cancer cell may have a unique protein on its membrane; this protein can be injected into an animal; the animal’s white blood cells (B Cells) may produce antibodies to the protein; and these antibodies can be collected from the animal’s blood; or, better, the B cells producing the antibodies can be collected. These B cells can then be cultured to provide a long-term supply of antibodies specific to the protein of concern. Following the example, a group of animals altered to have the cancer of interest can be treated with an active chemotherapeutic and a second similar group with a placebo (an RCT). The tumor-specific antibodies can then be injected into the animals with a marker attached that allows it to “light up” when it attaches to the tumor. In this way, the response to the treatment can be measured, the groups compared, and conclusions drawn regarding treatment efficacy. The ideal antibody used should be highly sensitive and specific. Sensitivity refers to the ability of the
18
1 Critique
antibody to readily grab on to the protein of interest. Ideally, antibodies should grab on to even small amounts of protein in the sample or tissue, and they should grab on in proportion equal to the “actual” amount of protein in that sample/tissue. This allows both detection of the protein and the concentration of it to be measured. Specificity refers to the ability of the antibody to grab on to the protein of choice and only that protein/chemical. If it grabs on to other proteins, then the researcher may wrongly believe the protein of interest is there, when it isn’t—fooled by the lack of specificity. When antibodies were first used this way in animal research, the researchers would create the antibodies themselves; a long and tedious procedure that could be costly and time-consuming. Accordingly, the production of antibodies for use in research has grown into big business, wherein a researcher can currently pick out the antibody they want from an online catalog, purchase it online from any number of companies (there are over 300 companies that do this), get delivery within a couple days, and get to doing the “heart” of their research without delay—a remarkable convenience. The catch (and there is always a catch) is that the quality of the antibodies is widely variable; not only between companies but between batches from the same company. Lack of adequate sensitivity (failure of the antibody to grab on to the protein to which it is supposed to when the protein is actually present) and specificity (failure to grab on the protein of choice and only that protein) abounds. It is not unusual for the “same” antibodies obtained from different companies to perform differently—or even from the same company using different batches. And, further, there are several well-documented (and costly) cases where the antibody in the bottle is, simply, an antibody to some completely different protein. Each of these scenarios results in invalid data collected. And it is estimated that up to 50% of commercially available antibodies lack adequate sensitivity and/or specificity. That is, 50% of the animal studies in which this “ingredient” is used provide no useful information—and the animals used are sacrificed to no benefit. And this particular “ingredient”—antibodies—is the workhorse of experimentation and used almost universally in animal research. Cells: Cells are commonly used in animal research. For instance, known lines of cancer cells may be injected into mice (or other animals) intravenously, circulate, deposit at distant sites, and grow, thereby creating a model of metastatic disease. Much like antibodies, however, cells can be wrought with problems that render them unreliable. Surprisingly, some cell lines are simply mis-identified. That is, they are sold to researchers as a particular type of cancer cell but turn out to be something quite different. A well-known example is the MDA-435 cell line which had been thought to be a triple-negative metastatic breast cancer cell line but appears to have been a cell line for melanoma. Over 1000 articles have been published using the cell line; the results of which appear to be useless for the breast cancer of concern. (How much money lost? Time wasted? Animals used?) A second problem concerns cross-contamination wherein genetic information from other cell lines gets incorporated into the cells of interest rendering them
3 Failures of Performance
19
functionally different from the intended cells. HeLa cells have been well-documented to be a problem here. Thirdly, microbial contamination—another common occurrence—can render the cells useless. Accordingly, the only way to (hopefully) ensure the usefulness of the cells is by careful characterization within the individual labs using them. This would require genomic, proteomic, and phenotypic characterization which is difficult, costly, and falls outside of the capabilities of most individual labs. Thus, it is rarely completed and trust is placed wholly (and blindly) on the purchased cells despite such trust often being misplaced. Even in cases when cell lines are accurate and free of cross-contamination or infection, technical factors such as passage number (the more often cells are divided, the more likely they are to have genetic mutations inhibiting function), problems with media/serum for culture (other universally used “ingredients”), and cell damage (cryopreservation, etc.) must be overcome. There is an old lab joke that says “One equals one; two equals none” (If a researcher performs one experiment with positive results, they get one publication out of the results. If they do two of the same experiment, they get no publications since the results so often differ/are irreproducible. Not a funny joke when animals are used.) Product Performance Failures The ultimate goal of translational research is to use various ingredients (and some of the shortcomings of commonly used ingredients are noted above) to develop a novel “product” (material, chemical, device, etc.) that can be tested in animals and eventually used to improve the diagnosis and treatment of human disease. The development of such products is difficult, technically demanding, costly, and time consuming; and demands a remarkable amount of patience and persistence, and cooperation between multiple individuals often with differing areas and degrees of expertise. The potential pitfalls in this process are numerous and are best explored via example. In what follows, the methods used in one of our lab’s experiments (published in the journal Biomaterials) are outlined in a “Materials and Methods” section that is annotated to highlight points of possible “failure.” Failure of any of the steps might lead to failure of the project/program/findings as a whole. As a primarily orthopedic laboratory, one of our goals is to create a material that is capable of acting as a functional scaffold for bone formation. In order to get broken bones to heal, the body naturally lays down a scaffold between the broken parts which is a perfect niche for bone-forming cells to start and perform the job of bone healing. In some circumstances, fractures don’t heal (the injury is too severe) and bone grafting is needed; where bone from a different site is transplanted to the injured bone to encourage healing. Often bone from the pelvis (iliac crest) is used and placed in the area of the bone defect to enhance/allow healing. Problems with using such autograft transplants center on the donation site, which may be a source of lasting pain at the site and may provide too little bone or too few cells to actually afford healing. Accordingly, as a solution, substitutes for this autograft bone have
20
1 Critique
been/are being developed. What follows describes one of the ways our lab has aimed to address the problem. Simply, a scaffold was created and seeded with mesenchymal stem cells (bone-forming cells) to confirm function in vitro. As noted, the below section is annotated via italicized questions interspersed throughout the “Materials and Methods” to demonstrate all that can go wrong: In this study, we aimed to manufacture and characterize a composite material able to recapitulate the chemical-physical and morphological cues of young human osteogenic niche as a potential robust substitute for bone augmentation. Our working hypothesis was that mimicry of the composition and structure of the osteogenic niche within a composite scaffold would facilitate osteogenesis in vitro and bone formation at an ectopic site. In an attempt to mimic the newly formed bone niche and test our hypothesis, we developed a bio-inspired, nanocrystalline magnesium- doped hydroxyapatite/type I collagen composite scaffold (MHA/Coll). Its osteoinductive potential was initially determined in vitro using human bone marrow-derived mesenchymal stem cells (hBM-MSC) over a 3-week period, then its effectiveness in promoting new bone formation was assessed in vivo using an ectopic rabbit model.
A sample “Materials and Methods” MHA Synthesis Magnesium-doped hydroxyapatite (MHA) powders were prepared at 37 °C (Temperature accurate and controlled?) in air atmosphere (Characterized?) by dropping 300 mL of an aqueous solution containing 44.4 g of H3PO4 (Aldrich, 85% pure) into basic suspensions consisting of 50 g Ca(OH)2 (Aldrich, 95% pure) and different amounts of MgCl2·6H2O (Merck, A.C.S., ISO) in 500 mL of water (Ingredients pure and accurate in these purchased products?). The apatite was left aging for 24 h (Does timing matter if it’s off by an hour the next day due to a lab meeting?) at room temperature (Well controlled? If not, impact?), then washed with DI water (Non-contaminated? Not temporally expired?) and freeze dried (Is the system for this working well?). Scaffold Synthesis Type I collagen from bovine tendon (Nitta Casing) (Quality of this store-bought ingredient?) was the organic matrix used for the synthesis of MHA/Coll. The collagen was dissolved at a concentration of 10 mg/mL in an aqueous acetic buffer solution at pH 3.5 (Clean? Accurate pH?). MHA nanocrystals (Characterized for inconsistencies that might impact function?) were directly nucleated on the collagen fibrils, during their pH-driven self-assembling. Briefly, 40 mM aqueous solution of H3PO4 was added to 40 g of the acetic collagen gel and dropped in a aqueous 40 mM basic suspension of Ca(OH)2 and MgCl2·6H2O. MgCl2·6H2O was added to the basic solution to obtain a teoric 5% substitution of calcium in the final apatite lattice the material underwent a wet crosslinking in a aqueous solution of 1,4-butanediol diglycidyl ether (BDDGE) (2.5 mM), at 4 °C (A very complex recipe. Were all the
3 Failures of Performance
21
ingredients good? Were each of the steps carried out meticulously and consistently?). After this step, the material was washed several times with distilled water to eliminate any residual solvent and crosslinking solution. The slurry was adjusted in 48-well plates (2 mm height) to fabricate the scaffold for the in vitro studies, while for the in vivo studies we utilized a cylindrical mold (4 cm × 1 cm). The final porosity of the scaffold was generated by freeze drying. Briefly, the materials was frozen from +20 °C to −20 °C in 3 h, and then heated from −20 °C to +20 °C in 3 h under vacuum conditions (80 mTorr) (Relying on complex machinery and exacting technique?). Non-mineralized collagen scaffolds (Coll) were also synthesized as described above, from an acetic collagen slurry (10 mg/mL), which was precipitated to pH 5.5 with NaOH (1.67 mM). Collagen was washed with DI as well, and scaffolds were prepared with the same freeze-drying process followed for MHA/Coll. All chemicals were purchased from Sigma Aldrich (similar technical and store bought concerns). Compositional Characterization of MHA/Coll X-ray diffraction (XRD) patterns were recorded by a Bruker AXS D8 Advance instrument, in reflection mode with Cu Ka radiation (Machine validated for this application?). The samples were mounted on a customized support to obtain relatively uniform samples. Fourier-transformed infrared spectroscopy (FTIR) was performed using a Nicolet 4700 Spectroscopy on pellets (10 mm in diameter) which were prepared by mixing 2 mg of grounded sample with 100 mg of KBr in a mortar and pressing to produce the pellet to be analyzed. Spectra were analyzed by the software EZ OMNIC (Nicolet) after baseline correction (Techniques and machines validated? Reliance on others for this technique and reading?). Inductively coupled plasma–atomic emission spectrometry (ICP-AES: Liberty 200, Varian, Clayton South, Australia) was applied to determine the content of Mg2+, Ca2+, and PO43− ions constituting the mineral phase forming the composite. Samples were previously prepared by acid attack with nitric acid 65 wt%. The obtained values were expressed in terms of Ca/P and Mg/Ca molar ratios. Thermal gravimetric analysis and differential scanning calorimetry were performed through a TGA/DSC thermogravimetric analyzer (METTLER TOLEDO), by placing the samples in alumina pans and undergoing a heating ramp from 25 °C up to 1100 °C at 10 °C/min (Potential failures of technique or machines?) Structural Characterization of MHA/Coll Coll and MHA/Coll morphology was evaluated and compared to that of human trabecular bone by scanning electron microscopy (SEM) (Quanta 600 FEG, FEI Company, Hillsboro, OR) (Similar concerns?). Scaffolds were compared to human trabecular bone specimens, which were decellularized and dehydrated according to established protocols. Freeze-dried samples were sputter coated with 10 nm of Pt/ Pd and imaged at a voltage of 10 mA. The Atomic Force Microscopy (AFM, Bruker MultiMode 8) measurements were obtained using ScanAsyst-air probes, and the spring constant (Nominal 0.4 N/m, radius 2 nm) and deflection sensitivity have been calibrated. From the bulky scaffold, we removed some fibers, we suspend them in MQ water, and then we let them
22
1 Critique
dry on a mica surface. All presented AFM images are height and phase data and images within each figure are on the same height scale and have been subject to the 1 order of flattering. (Similar concerns and reliance on others’ expertise?) AFM images were collected from different samples and at random spot surface sampling (at least five areas per sample). The quantitative roughness (Ra) data was obtained by Bruker software (NanoScope Analysis) on sampling areas of 100 nm2, randomly analyzed on four different images. Quantitative mechanical characterization was determined using the same instrument, operated under peak-force tapping mode with 1.0 Hz scan rates and a 200 mV amplitude set point. To calculate Young’s modulus, the retract curve of the force versus separation plots could be fitted by the DMT model [33] (more machines and experts?): F – Fadh is the force on the cantilever relative to the adhesion force, R is the tip radius, d – d0 is the deformation of the sample, and E* is the reduced modulus. Young’s modulus measurement has been calculated on a sample corresponded to 512 × 512 force–separation curves obtained over an area of 1.7 μm × 1.7 μm. hBM-MSC Culture hBM-MSC cultures were established following the manufacturer’s instructions (Gibco). Adherent cells were serially passaged using TripLE™ Express (Invitrogen) upon reaching near confluence (80%) and then reseeded for culture maintenance (cell culture problems revisited?). Cell Organization and Morphology in 3D Culture To be seeded into scaffolds, hBM-MSC were harvested and resuspended in standard cell culture medium. A drop of 30 μl containing 350,000 cells was seeded on the center of each scaffolds (either collagen or MHA/Coll) and kept in incubator for 10 min. Culture medium was then added to each well. Cell morphology in 2D culture was evaluated by staining with phalloidin and DAPI according to manufacturer’s protocols (Life Technology), while in 3D by staining with LIVE/DEAD®. Images were acquired by confocal laser microscopy. The scaffold was imaged exploiting its autofluorescence at 358/461 nm. Cell viability on the scaffolds was evaluated by LIVE/DEAD® cell viability assay, which was performed according to manufacturer protocol. Samples were imaged by an Eclipse Ti fluorescence microscope (NIKON), and cell viability was calculated via an automated measurement tool of the NIS-Elements Software (NIKON) (machines and ingredients?). Cell Proliferation on Scaffolds Cell proliferation in the scaffolds was evaluated by Alamar Blue assay (Invitrogen) according to manufacturer’s instructions. Absorbance was measured at a wavelength of 570 and 600 nm. Three days after seeding, hBM-MSC were stained using a fluorescent Live-Dead Viability Assay (Molecular Probes, Eugene, OR) according to the manufacturer’s instructions and captured on a A1 confocal laser microscope (NIKON) (similar concerns?).
3 Failures of Performance
23
hBM-MSC Osteogenic Differentiation The osteogenic potential of MHA/Coll has been evaluated over a 21-days period (Why this time frame?). hBM-MSC at passage 3 were seeded onto Coll and MHA/ Coll at the density of 10,000/cm2. hBM-MSC cultured in 2D conditions, either exposed to inducing media (StemPro® Osteogenesis Differentiation Kit, Gibco) or kept in standard media, were used as positive and negative controls, respectively. Media changes were performed every 3 days (serum/ingredient concerns?). Gene Expression Analysis To confirm the osteogenic induction, total RNA was isolated from cells grown on scaffolds by homogenization in 1 mL of Trizol reagent (Life Technologies) according to the manufacturer’s instructions with a Power Gen 125 tissue homogenizer (Fisher Scientific). For each sample, RNA concentration and purity were measured using Nanodrop Spectrophotometer (Nanodrop® ND1000). Treatment with DNAse (Sigma) was performed to avoid DNA contamination. cDNA was synthesized from 1 μl of total RNA using a Taqman Reverse Transcription reagents kit (Applied Biosystems, Branchburg, NJ). Amplifications were set on plates in a final volume of 10 μl and carried out using TaqMan® Fast Advanced Master Mix and TaqMan Probes (Applied Biosystems). The expression of osteocalcin (BGLAP; Hs01587814_ g1), osteopontin (SPP1; Hs00959010_m1), and alkaline phosphatase (ALP; Hs01029144_m1) was assessed. The expression of osteogenic specific genes of hBM-MSC cultured on MHA/Coll was compared to that of cells on collagen scaffolds (Coll), and cells cultured in 2D conditions with osteogenic media (MSC- induced) or not (MSC-ctrl). Results were normalized to the level of glyceraldehyde 3-phosphate dehydrogenase (GAPDH; Hs02758991_g1) and represented with respect to MSC-ctrl (more machines and experts?). Subcutaneous Implantation of MHA/Coll Skeletally mature New Zealand white rabbits were used for the purpose of this study. All animals were maintained and used in conformity with the guidelines established by American Association for Laboratory Animal Science (IACUC). In prone position and under the effects of surgical anesthesia; with the use of sterile technique, three incisions were made on the back of each animal. Using blunt dissection, the subcutaneous pockets were created. One scaffold (1 cm diameter × 3 cm long) was implanted subcutaneously into each pocket. Each rabbit was implanted with three scaffolds. After the placement of implants, the incisions were closed with staples (an experienced surgeon/vet?). DynaCT Analysis of Specimens Under the effects of sedation, lumbosacral computed tomography (DynaCT) scans were obtained from all animals at 24h, 2, 4, and 6 weeks post-surgery. Scanning was performed using a Siemens Axiom Artis C-arm (d)FC (Siemens Healthcare, Erlangen, Germany), with a 48 cm × 36 cm flat-panel integrated detector. Acquisition parameters for DynaCT were as follows: 70 kV tube voltage, automatic tube current of 107 mA, 20 s scan. Each scan entailed 222 degrees of rotation, with 1 image taken every 0.5° for a total of 444 images (each digital acquisition had a matrix of
24
1 Critique
514 × 514 pixels) per acquisition. Three-dimensional bone deposition analysis was performed on every CT scan with the use of the Inveon Research Workplace 4.2 Software (Siemen Medical Solution, USA, Inc.) (machines and other experts?). Statistical Analysis Statistics were calculated with Prism GraphPad software. Statistics for experiments was performed using a two-way ANOVA followed by a Tukey’s multiple comparison test. In all cases, * was used for p < 0.05, ** for p < 0.01, *** for p < 0.001, and **** for p < 0.0001. All experiments were performed at least in triplicates. Data is presented as mean ± SD (proper statistic analysis?). ……” Etc. The purpose of this exercise in annotation was to document all of the complex things that must come together for the production and testing of “products” when they are used in animal research in modern translational science. An expired serum, mislabeled ingredient, a cell that is not quite right, a poorly calibrated detector/ machine, an inattentive scientist at one step—one lousy link in a chain of a hundred events—can render the end result invalid and with animals used for no benefit. It is easy to see why reproducibility is so difficult. Human Performance Failures Beyond the difficulties with the ingredients (cells, antibodies, etc.) and the “products” of research lie the difficulties of human performance. Even when an experiment is justified, follows solid methodology, and is carried out in a technically exacting manner, things might not work out because of multiple uncertainties (that are out of the researchers’ control) in ingredients or the process of creating products. But science done properly is also hard, technically demanding, and exacting human work. Subtle changes or problems (that are potentially within the researchers’ control) at any point in the process can also render the result invalid; and it is not unusual for such failures to occur and to be recognized only after animal experiments have been completed. In our “Materials and Methods” example above, each of the steps described is carried out by someone. And that someone has to have sufficient knowledge and skills to get them done properly—not to mention be “on their game” that day. Those who carry out these steps may include the professorial lead scientist of the lab, an associate or assistant professor who works under them, a post-doctoral (early career) PhD, a PhD student, a technician, or an undergraduate student. Many of the tasks along the way require expertise of other collaborative labs and will be performed by their separate team. The bottom line is that each of the tasks requires someone with the right expertise, skills, etc. to get that specific job done properly. And this is not easy. Given the demands on lab leadership (grant writing, fund-raising, providing guidance for the lab, writing papers, speaking at conferences, etc.), most specific tasks in a project will (out of necessity) be carried out by others—generally, those at the post-doctoral level or below. It is important to note that post-doctoral PhDs in
4 Failures of Translatability
25
the sciences are, generally, brilliant young scientists driven to make a difference. They are special people who quite often will evolve with time into leaders in their very difficult fields. But, first, they are young and still learning and are unlikely to have the detailed technical skills to do many of the tasks required from the start. They will require extensive training. As a general rule, six months of education and practice (with fine-tuning) of many of the tasks in a lab will be required to afford a researcher the skills necessary to perform the task independently and consistently. And there are many tasks to learn. And individuals will learn them within different time-frames and attain different levels of proficiency. “Expertise” of many of the complex tasks in a modern lab often requires four or five years to reach. And the average life-span of a postdoctoral PhD in a biomedical laboratory is two or three years. They often move on to greener pastures or create their own program of research at this point. Outside of a few major labs and major academic institutions, lack of sufficient funding and frequent turnover results in labs having to do the best they can with what they have—a difficult pill to swallow when animals are involved.
4 Failures of Translatability Thus far, the prerequisites for the use of animals in biomedical experiments include appropriate justification for their use; solid methodology to provide valid information; and well-performing ingredients, products, and scientists during the research process. Translatability is another requirement for the use of animals. Simply, if the goal is to use animal models as a foundation for the eventual introduction of novel treatments to humans, then we should have great confidence that the animal models used closely approximate the human diseases and the human responses to treatment. Additionally, a clear pathway from animal testing to human application should be able to be delineated—justifying the stepwise translational process. Are animal models accurate representations of human disease? Most often the answer is “no.” The top ten leading causes of death, globally, in order are: Ischemic heart disease, stroke, chronic obstructive pulmonary disease, lower respiratory infections, neonatal conditions, lung cancers, Alzheimer’s disease and other dementias, diarrheal diseases, diabetes mellitus, and chronic kidney diseases; followed closely by other cancers (breast, prostate, etc.). Respiratory infections, neonatal conditions, and diarrhea diseases are most predominant where quality healthcare is not readily available and are often treatable if such treatment is undertaken in a timely fashion. Death rates from these diseases are decreasing as access to quality care is improving worldwide. The other conditions on the list are the major killers in developed countries; are increasing in numbers; and, accordingly, are the focus of the majority of translational research efforts using animals. A deeper dive into the pathophysiology of these diseases will provide greater insight into the adequacy of animal models: Ischemic heart disease involves a complex pathophysiology characterized by a mismatch in coronary blood flow and myocardial energy needs. Four major
26
1 Critique
pathologic factors characterize the problem: (1) Athersosclerosis (“hardening of the arteries”); commonly associated with aging, diabetes mellitus, arterial hypertension, dyslipidemia, cigarette smoking, and genetic predispositions (in various combinations). (2) Vasospasm (the harmful, self-squeezing behavior of abnormal vessels that limits blood flow) associated with organic vessel stenosis, abnormal vasomotor tone at rest, and coronary artery hyperactivity. (3) Inflammation abnormalities mediated by cytokines, angiotensin II, and oxidative stress. (4) Coronary microvascular dysfunction (where the arteries intrinsic to the heart fail to provide sufficient blood flow). Stroke is a similar mismatch between blood supply and the needs of tissues fed by that supply; in this case, the brain. Hypertension, smoking, alcohol use, drug use, degree of physical activity, hyperlipidemia, poor diet, diabetes, atrial fibrillation, blood disorders, and genetic factors can combine in multiple different ways to create an atherosclerotic, embolic (intra-vessel blood clots), or hemorrhagic (bleeding) event. Diabetes mellitus is a complex metabolic disease characterized by elevated blood glucose levels due to anomalies in insulin secretion or function. The result is a chronic disease with widely varied presentations affecting multiple different organs/ tissues/systems, primarily on a microvascular and macrovascular level. Widespread metabolic and organ system changes of various degrees can occur and interact with varying genetic and risk-behavior factors. Lung cancer results via the remarkably complex and incompletely understood interplay of repeated exposure to carcinogens (smoking, asbestos, etc.) and genetic factors that lead to genetic mutations affecting protein synthesis, the cell cycle, and carcinogenesis. Alzheimer’s disease is a neurodegenerative disease caused by aberrant processing and polymerization of soluble proteins in the brain, resulting in the proteins taking abnormal conformations—again, characterized by the complex interplay of chronic genetic, aging, and external factors. Etc. The key takeaway of this brief discussion of various pathophysiologies is that the major diseases affecting humans all have complex, heterogeneous, and incompletely understood pathogeneses and pathophysiologies that develop via the long-term, complex interplay of multiple factors. Contrast this, then, with the animal models used to study these diseases. Animal models are generally created by one of three pathways: genetic manipulation, surgical intervention, and injection of cells (cancer) or chemicals (drugs, toxins, etc.). Models of ischemic heart disease and stroke are created in animals by the injection of embolic substances or the clamping of vessels that lead to the heart or brain. Cancer models generally use the injection or placement of human cancers into mice that have been genetically altered to mount no immune response (so that the human cancers can grow without resistance). Diabetes models are often derived by single gene manipulation.
4 Failures of Translatability
27
It takes little effort to see that these simple manipulations cannot possibly recapitulate the complex, heterogeneous, chronic, multifactorial character of the pathogenesis and pathophysiology of human diseases. And the proof is simple: Even when novel drug treatments perform well in well-performed multispecies animal studies, between 92% and 96% will fail when tested in humans. Only 4–8% translate successfully(!). And, as always, there is more to the story. (Non-human) Animals aren’t human. Significant evolutionary, contextual (biopsychosocial and behavioral), genomic, phenotypic, physiological, etc. differences are obvious. So, not only are the models used in animal research often poorly representative of human disease/pathology being studied; the animals themselves are poorly representative of humans. (And this holds true for naturally occurring diseases of animals that mimic human diseases and for non-human primates. They are perhaps “closer” in some sense, but the ways in which they are different appear to outweigh these similarities given the poor history of translatability evident in chemical therapeutics even in these settings.) Translatability can not only fail due to poor animal models, but it can fail due to failure to appropriately assess the likelihood of the intervention to get from “bench to bedside.” Here, it is best to consider another real-world example: Nanoparticles for the diagnosis and treatment of cancers have received considerable attention over the past two decades. They offer the hope of selectively detecting and killing cancer cells. Under normal circumstances, chemicals injected into humans (contrast agents used for diagnosis, chemotherapeutic agents used to kill cancer cells) flow through the bloodstream and into the tissues in a wide distribution with a small percentage, hopefully, making it to the cancer cells. The far greater percentage never makes it to the tumor of interest and may cause toxicity where it ends up—the kidneys, liver, lungs, bone marrow, etc. We are all aware of the widespread and potentially life- threatening side-effects of chemotherapies, radiation treatment, etc. The system of delivery commonly used is remarkably inefficient. The hope of nanomedicine is that by housing the chemical (chemotherapeutic drug, marker for diagnosis) in a nanoparticle, the chemical might be delivered directly, selectively, and deeply into tumors; and in high concentrations, avoiding an array of biological barriers and “consumption” along the way by the immune system and filter organs; affording precise diagnosis and aggressive killing of the tumor cells without associated systemic toxicity. The concept is so appealing that billions of dollars have been poured in to research, including over a billion dollars from the US government alone over there last decade; and this does not include private (corporate) or philanthropic financial investment, considered a similar investment amount. To date, however, the end result of such investment has been disappointing. While the techniques have afforded detection and killing of cancer cells in cell cultures (in vitro) and in some mice experiments, further translation of nanoparticles has been minimal. In 2016, an important article was published that aimed to analyze the delivery of nanoparticles to tumors. (Its importance lies not only in the findings but also because the senior author is a recognized international leader in the field.) The authors evaluated several hundred published papers to assess what percentage
28
1 Critique
of injected nanoparticles reached the target tumor in mouse models—the main goal of the intervention. They found that a median of 0.7% reached the tumor and that the nano particles likely remained on the surface as opposed to penetrating deeply (limiting their effectiveness even if they make it to the target). Thus, 99.3% of particles do not make it to the tumor and are distributed to tissues or taken up by the filter systems (kidney, liver, lung, etc.) where they are clearly ineffective and a potential source of toxicity. While a multitude of strategies and types of particles have been tried, billions of dollars invested, and thousands of animals used, the goal of targeted chemical delivery via nanoparticles for diagnostic and therapeutic purposes in cancer has not been adequately achieved—even in mice, nonetheless humans. And, if we take a huge step and imagine the technology actually applied to humans, further problems with translatability (beyond not achieving the stated goals, even in mice) emerge. The dose of particles/chemicals needed to be potentially effective for humans would be massive. The production of nanoparticles is tedious and difficult. The massive doses of nanoparticles needed for human application would require a remarkable alteration in production times, and there is no evidence that most particles can be “scaled up” efficiently at this time. There is evidence, however, that efforts to scale-up production appears to diminish quality of the particles further adding to “ingredient” problems discussed in the prior section. Beyond scalability and quality, the production of nanoparticles is expensive and the costs for the needed volume of particles would be prohibitive. And, finally, the uncontrolled distribution into uninvolved tissues (even crossing endothelial barriers) of nanoparticles makes them high risk for toxicity. Thus, FDA approval, except perhaps in cases of surely terminal diseases, is unlikely. Translation from animals to humans requires many steps—accurate/appropriate animal models, success in these models, successful translation up an evolutionary pathway, a high likelihood of success in humans, scalability, likelihood of FDA approval, costs that are reasonable for the problem at hand, etc. There are certainly many success stories—a multitude of drugs and devices currently used effectively to improve human health began with animal studies. At this point in time, however, it is clear that the problems with the usual road taken to translation—the use of animals in almost all cases—need to be seriously reconsidered. There are better roads to get there.
5 Systematic Failures If there are ever-emerging alternatives to animal use, an understanding that methodologies currently employed in animal studies are poor and need to be tightened up, a recognition that the quality of ingredients and products used need to be improved, and that the road to translation is (at best) rocky; why is animal research still considered the bedrock of translational medicine?
5 Systematic Failures
29
It is estimated that about half of the $270 billion dedicated worldwide annually to biomedicine is used to support and conduct animal testing and research. That’s $135 billion and a minimum of 115 million animals used per year; for relatively little return as it regards translation to humans that makes a difference in human health. But “translation to humans that makes a difference in human health” is only one “return” on the massive investments made in animal research. There are other “returns” and incentives to be considered. And it is these returns and incentives that likely account for the persistence of the current research paradigm. It is fair to believe that the vast majority of scientists involved in animal research sincerely want to eventually make a positive impact for human health; that a clear motivation is to do something good for humanity. There are many other motivations as well, and given that most efforts at translation will fail, these other motivations may emerge as the prime motivators to continue work in the field. While difficult, the design and performance of successful experiments in biomedical science are rewarding to those who undertake them. Even the early, smallest tasks—one step in a long series of steps toward finalizing a project brings rewards; a challenge faced and overcome using scientific creativity. Coupled with the fact that even these small successes are shared with other really smart people and the hope that there are better things to come makes the beer taste that much better at the end of the day. Many individuals have jobs that are uninteresting, surrounded by uninterested coworkers, and will make little impact. Biomedical science is not one of those jobs, and most of those involved understand that and are motivated by that. And beyond being rewarding in this sense, it is also a “job” that provides an income and brings with it the potential for advancement, and this is a considerable motivator as well. Advancement in biomedical science generally progresses in a step-wise academic fashion. From PhD or MD student; to post-doctoral work doing tasks for a more senior scientist’s programs; to Instructor, Assistant Professor, Associate Professor, and Professor; affording greater autonomy along the way to pursue individual research interests, coupled with increasing salaries, grants, and prestige along the way. Such advancement through the ranks is inextricably tied to successful experimentation and subsequent publication of the positive results. The more publications in quality journals, the better. Publications are the primary measures of success in academic science. There are currently thousands of journals dedicated to the publication of biomedical science, and the numbers are increasing as “open” publication (often with less-stringent publication requirements) expands and electronic publication allows for rapid and less expensive publication processes. If the process of doing biomedical experiments “right” as defined so far in this book requires using time-consuming costly alternatives to animals (organs on a chip are not cheap or fast or readily available); time-consuming and costly methodologies (biostatisticians and methodologists are not cheap or fast); testing and validating ingredients (costly, time-consuming, requiring special expertise); ensuring translatability (time and cost of a long-term doable plan) and the key to academic success is multiple rapid publications; why not simply use animals as is currently done? Animal experiments are relatively
30
1 Critique
inexpensive, results (valid or not) can be obtained relatively quickly, and those results can be published rapidly given the wide options for publication available, even in many highly ranked journals. Why not six publications in less time and at less cost; compared to one publication done “right”—especially in light of the fact that they may be published in equally “high-quality” journals regardless of the concerns? Why not do the best one can with what they have while remaining in line with the current way things are done? Along with publications and advancement also comes prestige, and the impact is real. Many/most scientists engaged in biomedical research were the top students from grade school through college and through graduate school. They are competitive types and recognition matters to them. And publications are a common currency—puts their names in print, adds qualifications to their stationary, and allows them to speak at the podium, to win awards, and to secure grants for future research. Beyond this, new “products” developed and tested in animal studies may be novel and patentable. And, once patented and having shown promising results in animal studies (regardless of risk of failure of translatability) may provide the entrepreneurial researcher with an opportunity to make considerable money via biomedical company collaboration (or a buy-out for the patent). This opportunity for profit is dependent wholly on positive results; animal testing is still a gold standard for investment (despite the problems outlined); and the problems outlined often correlate with positive outcomes in animals. The risk of failure of translation is often taken on by the company (which still measures its profits in the millions or billions). Just as scientists rely on journals to be homes for their publications; journal publishers rely on scientists to supply the volume necessary to fill the journals. It is a symbiotic relationship and volume is key for both parties. The major biomedical journal publishers measure their profits in the hundreds of millions of dollars per year. Income comes from subscriptions—especially from academic institutions/libraries; purchase of individual articles by readers; publication fees paid by researchers or their institutions; advertising; academic institutions; non-profit foundations; specialty societies; etc.—many sources depending on the journal. And costs to publish are decreasing as technology advances at a remarkable pace. Quantity trumps quality when profit trumps all. Academic medical centers also benefit greatly from research efforts using animals. NIH funding provides considerable support to these centers. Prior to the 1980s, researchers and their work were paid almost exclusively by the institutions where they worked. NIH budgets then gradually increased (more money to fund the research) and then doubled yearly between 2000 and 2004. Since then, NIH dollars cover the majority of researchers’ salary and indirect funding (meant specifically to help support institutional expansion) is supplemented. This influx of money is often used to expand research facilities, which are then filled with more scientists who can bring even more NIH funding. The benefits of this system is that biomedical research has seen remarkable expansion, but the risk is that it has created a deep reliance on
6 Summary of the Critique
31
NIH/Federal funding to keep it all going. Now, as NIH funding has generally flat- lined, concerns over how to pay for the plethora of new research institutions and scientists have arisen. And additional funding from philanthropic sources is generally insufficient except at specific centers. Accordingly, NIH funding is specifically and aggressively targeted by scientists/institutions and animal studies, despite the shortcomings noted throughout this book, remains a foundation of the usual pathway of projects and programs that are most commonly funded. Thus, institutions invest heavily in animal research to ensure a pipeline of NIH funds—supporting further research, expansion, and the prestige (and medical/hospital business in actual human healthcare) that comes with it. Currently, FDA approval of novel drugs, devices, and combinations almost universally requires animal testing despite the shortcomings. The use of animals in a progressive manner along the translation pathway has been so entrenched for so long that failure to proceed without these steps would be viewed by many (including those in the general public) as dangerous: “How did this medication that later proved dangerous to humans get through the FDA without first being tested in animals?” In order to get approval, pharmaceutical and device manufacturers are obliged to prove safety and efficacy in animals first, despite the fact that the generated safety and efficacy data may be inaccurate as it applies to human translation. And, finally, the societal financial impact of animal research is huge. From the companies that supply animals, cages, food, equipment; to the medications for animal anesthesia and care; to academic veterinary staffs, etc. There are many, many people (and their families) whose lives currently depend on a robust, widespread animal testing enterprise. With radical changes to the system, what becomes of the displaced when the alternatives are so far removed from the current standard? What are the human societal costs and benefits? These are difficult questions that pertain not only to animal research, but human healthcare, the energy sector, the military sector, etc. They are systematic problems that come with competitive, profit-driven sectors that provide necessary societal services.
6 Summary of the Critique The conclusion to be drawn from this “Critique” section is that many “failures” can occur in the process of doing biomedical animal research—either in isolation or in combination—that can render the research undertaken of no value to humans or inappropriate in the first place (The Problem). Beyond wasting money, manpower, and intellectual abilities (opportunity costs), all of these “failures” result in unnecessary harm to the animals involved. Failures have been broken into five basic categories: (1) Failures of Justification. Did one really need to use animals? Were there good alternatives? Did it make sense in the first place to use them in this way? (2) Failures of Methodology. Did one closely follow the ideal scientific method (a randomized controlled trial) to ensure
32
1 Critique
that the results obtained were valid? (3) Failures of Performance. Did the ingredients and technologies one used for the research function as expected/needed? Did the scientists doing the experiments function as expected/needed? (4) Failures of Translation. Did the choice of animal make sense—did it accurately model human disease in the specific ways one was hoping for and allow for the stepwise progression through species to humans? Did the intervention makes sense in the first place—scalable, affordable, FDA-able? (5) Failures of Research System as a whole. Was the research undertaken carried out for reasons outside of that which is commonly assumed; that of positively impacting human health? What has been shown is that the rather poor record of documented success of business-as-usual biomedical translational research using animals often fails, and this failure may occur on many different fronts and in many different combinations. In the following major ‘Constructive’ section, ways in which such research can be improved will be explored—some of these ways may be concrete and “easy”; some more theoretical and difficult. But what is clear is that for both the humans (researchers, eventual patients, the general public, etc.) and animals involved, significant change is needed.
Chapter 2
Constructive
If the majority of biomedical animal research aimed at translation is flawed and very few of those that are not flawed actually lead to translation to humans, why do things persist as they are and how can change be initiated? As was noted at the end of the Critique section, addressing some problems/failures should be easy, while others will be more difficult. As it regards animal research, unfortunately, the difficult problems/failures cause a trickle-down effect to create and sustain what should be the more easily addressable problems. While incremental improvements can be made by addressing the easier failures, significant change will require addressing problems across the board.
1 Justification Addressing the problem of justification must begin at the lab and institutional level—local levels. While, as we will see later, justification may be encouraged from “higher levels”—journals may require justification for the use of animals or reject papers that used animals despite readily available alternatives, funders may consider justification as part of peer review, etc.—incentives and disincentives— corrections from the top down—are often slow and subject to considerable resistance from multiple stakeholders. The primary aim of journal publishers is to publish papers, fill the journals, and sell them—not to prescribe animal use justifications. The primary goal of funders is to distribute money to support the scientific research mission deemed important and most-likely-to-succeed by the funders and their contributors. These entities have evolved over years to do things in a certain way. They have—through that evolutionary process—generally done their specific jobs well. Journals have generated large profits for their publishers, and peer-review committees have distributed considerable amounts of money. Profound changes (aimed at “other” goals) within these established entities generally result only when © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. K. Weiner, A Scientific Approach to Improving Animal Research in Biomedicine, https://doi.org/10.1007/978-3-031-24677-7_2
33
34
2 Constructive
concerns become widespread and there is outside pressure for change. An example: Changes in journal policies and at the government funding agencies resulting from concerns regarding conflicts of interest over the last couple decades. But conflicts of interest were clearly recognized as a problem for many decades prior to the “noise” becoming loud enough to call for change. It appears, at this point in time, that outside pressures (the “noise”) on these entities is insufficient to enact profound changes directed at animal research. Simply, papers are still routinely published and research programs routinely supported where clear alternatives to animal-research- as-usual were available and lack of translatability was obvious from the start. Thus, the quickest and most efficient way to initiate change in this setting is from the bottom up—in the labs and at the research project approval level; the institutional/ local level. Alternatives to Animals The first should-be requirement for animal use is a convincing argument that no good alternatives exist. This requirement seems easy: Construct a checklist to be included as part of the submission process for animal research, trust that it is being filled out truthfully and correctly, and then verify that it is. The checklist should be long enough to specifically cover the available alternatives (“Are there available alternatives?” is an insufficient question; yet, commonly seen); short enough to be fillable by the researchers in a timely fashion (as they are already burdened with the never-ending cycle of applying for grant support and the other frustrations of research); and allow for explanation. Yes/no responses to questions like: “Can the answer be determined by using established cell lines as opposed to animals?” are insufficient. Justification requires explanation. “Cell lines will not allow us to answer our question because…” The weaknesses of the various substitutes as it pertains to replacement of animals are outlined in the Critique section. The construction of the checklist (and some are available) should be the relatively easy part—list the available alternatives and require explanation/justification. Filling it out “correctly” is more difficult. Researchers are most often experts in disease processes, pharmaceutical chemistry, biomedical technologies, etc., but not in the subtleties of the knowledge required to discern whether alternatives are sufficient to answer the questions their research proposes. So, while the questions in the checklist may be answered to the best of their knowledge, they may lack the required knowledge to answer them correctly. Thus, expertise from appropriately trained individuals within the institutions evaluating the responses will be required. Such individuals may be hard to find, will add to expenses, and may slow down the process. Our academic institution is relatively small but has over 700 scientific faculty and publishes roughly 1700 papers annually; perhaps a third involving animals. Manpower, expertise, cost, and efficiency issues—even to verify a simple primary checklist—are real. For all things that require effort, we must ask: “Is it worth it?”. I, personally, believe the animals used for biomedical research are worth it.
1 Justification
35
Is the Disease/Process Being Studied Important Enough to Justify Animal Use? This may be addressed at the institutional level as well. Many institutions have areas of specific focus that they deem important both for the institution and the local or general population. Our institution was the home of Michael DeBakey, MD, whose surgical and technical innovations helped radically change the care of cardiovascular disease. As such, the institution has continued significant support for similar innovations; surgical and device-related technologies that may require some form of animal testing (especially in light of FDA requirements); and is deemed to justify animal use in these cases. And, since the research is targeted, adequate disease- specific expertise can be employed to fairly assess the justifications proposed for animal use. Decisions made by committees whose interests lie far-afield can be avoided. In contrast to the “alternatives” discussion above, as it regards “importance,” large funders can have a more direct effect in addressing this problem as well. As was noted earlier, translational research aimed at major societal medical problems (cardiac, cancer, infections, etc.) receive the greatest amount of funding. Accordingly, researchers will aim their research at that which can be funded, institutions will invest in researchers who are most likely to obtain that funding, and—as a result— the “importance” issue is often resolved by funding sources (an off-shoot of Sutton’s law—study potential treatments for this disease because that’s where the money is); and is generally in-line with societal desires and needs. Committees reviewing proposed research programs that lie outside of major health issues should be carefully assessed prior to approval and alternatives should be more aggressively considered until a clear translational pathway emerges. Is There a Mismatch Between the Disease and the Suggested Treatment? In the medical literature, there are many papers using animals published with a mismatch. Complex, elegant solutions to simple problems with known simple (and existing) solutions. New, exciting, and expensive drugs and technologies aimed at medical problems with existing simple, efficacious, and inexpensive solutions; for example. This can occur at the lab level for a few reasons. First, there may have been no systematic review (or an insufficient one) done prior to moving forward with animals. They simply did not know that the problem had been solved sufficiently via more simple interventions, or that their solution was unlikely to prove superior. Second, labs tend to “fall in love” with their own products (often, their patented drug or device) and aim to use them for a multitude of medical problems less directly related to their primary intention. Enthusiasm impairs judgment and institutional review before animal experimentation should aim to curb this. That said, labs and their institutions can jointly get caught up in the enthusiasm; wishing to put forth their novel drugs and technologies because of the potential profits as part of shared Intellectual Property agreements. Pharmacies are literally filled with the newest, cutting-edge, expensive medicines which were tested on animals despite good reason to believe that they would offer no improvement to human health beyond
36
2 Constructive
existing medications; developed solely to capture market share and the profits that come with it. The mismatch commonly occurs in the opposite direction as well. A new drug is aimed at a single-step alteration of a complex biochemical pathway based on laboratory studies using mice cells. The importance of the pathway for translatability to humans is highly unlikely—an imaginary simple solution based on limited murine knowledge aimed at complex multifactorial human disease. It is hoped that institutions and funding agencies become aware of these (mostly obvious) mismatches and aim to ensure that they warrant support. The mismatch concept needs to be on their checklists. Is the Hypothesis Reasonable? Here, background information is key. A scientifically solid “story” should be told that would lead other scientists and members of review committees to believe that there is a sufficient foundation for the hypothesis and that the research is at the point where the use of animals would be the next logical step. While in the Critique section, a simple example was presented (anti-prion medications for disc degeneration); often in practice cases are much less clear-cut. An example would be nanotechnologies used for tumor targeting that was covered previously. Knowing that multiple current nanoparticles/delivery systems currently fail to deliver drugs in a sufficient manner to even be remotely considered as a translational weapon in human cancer; then new nanoparticles/delivery systems should have a high hurdle to jump prior to being approved for testing in animals. The background information demands greater scrutiny. How is this technology different? Is the purported difference enough to suggest that translation may actually occur? Or is the difference simply that of subtle alterations that allow the new device to be patented—proprietary, but with no true greater hope of translatability? Or is the difference simply that of allowing application for additional funding? Here, again, at both the local and granting levels, strength is required by committees to demand more justification prior to approval for animal studies given the background information. Has the Question Already Been Answered? If there is solid evidence that the proposed treatment (or something substantively similar) is either clearly going to or clearly not going to work, then animals should not be used. The answer is known, so the studies need not be undertaken. The “solid evidence” comes from a systematic review of the available information. The available information can be discovered from several sources. For positive outcomes (wherein a treatment is likely to be effective), the published literature is the primary source. The systematic review in this case would entail a critical appraisal of animal studies that are similar to the study that is being proposed and, then, synthesis of the findings in those studies to provide insight into whether the answer is already “out there.” If a nearly identical treatment has been found effective in valid studies, then animal research is not warranted. If similar treatments appear to be effective, then the proposed treatment may be better tested in a manner that does not involve
1 Justification
37
animals. For instance, a medication may have been developed that is sufficiently similar to other medications in it class such that its actions and toxicities may be solidly assessed using human 3D cell or tissue cultures or in organs-on-a-chip or in silico. Or, the gathered information can (at least) be used to decrease the number of animals needed to obtain appropriate Power (perhaps using a Bayesian approach). In either case, the information gathered impacts the research significantly by either decreasing the number of animals needed or by allowing the researcher to avoid the use of animals altogether. But if no reasonable systematic review is undertaken prior to undertaking the research, ignorance alone will account for the unnecessary use of animals. Thus, it is vital that such reviews are performed prior to approval of animal studies. This will require extra work for researchers and for reviewers. The benefit to researchers, however, may be a simplified research plan wherein the answer needed can be achieved with less work and costs. Simply, they will not waste time searching for answers that are already out there (“reinventing the wheel”). For reviewers/committees, more work will be involved to ensure that the review presented by the researchers is accurate and complete. This process is loosely akin to Intellectual Property submissions undergoing thorough institutional review prior to patent submission to ensure no prior art. Effort and some expertise are required, but available database/search abilities are extensive and readily available to facilitate. What if, as part of the review of available literature, nothing similar is found? Then either the concept is wholly novel (rare) or similar research done in other labs has not been published (common). Journals publishing animal research are incentivized financially and researchers are broadly incentivized (financially through grant support, academic advancement, etc.) to publish positive results. Negative results from animal studies can occur from poor methodology (lack of adequate power falsely suggesting no effect of the therapeutic intervention, etc.); good methodology and performance of the study, but with no effect of the intervention; or adverse events/side effects that rendered the intervention unusable. Whatever the source, the negative findings are quite often “buried” and rarely published in the literature (“Back to the drawing-board!”). The repercussions of the failure to publish (or to make available to others in other ways) negative results are real. First, animal researchers who obtain a negative result may be unaware of methodological flaws and a truly promising treatment may be misinterpreted as a failure. Editorial review after submission—assuming good editorial work—might have suggested tightening up the methodology and trying again. Failure to submit in the first place eliminates this possibility and may bury a potentially beneficial therapeutic intervention. Second, if a methodologically solid animal study finds a negative result and is not published, then other researchers may repeat the same or similar study, wasting time, energy, and money—as well as unnecessarily harming animals. And, third, if multiple similar animal studies are undertaken and only the positive results are published, then the systematic review suggested will provide inaccurate findings. Three positive studies are published and fifteen negative studies are not, and the researcher and review committees are given false-hope based solely on the published results.
38
2 Constructive
These problems of publication bias against negative results are increasingly recognized in human clinical trials, and steps have been taken to try to avoid them, including the registration of all trials a priori with the transparent submission of methodology and uploading of data and results as they “come in.” In this way, all results—both positive and negative—are potentially available to other researchers (and the public). Similar efforts for therapeutic animal studies would be a great first step to benefit researchers (focused time, money, energy), committees used for approval (ready access to available background information), the public (fine-tuning of translation), and the animals. But currently, no well-developed efforts at an animal registry exist, so the search for negative results must start elsewhere. Two main strategies may be employed. First, a review of abstracts from scientific society conference proceedings can be undertaken. Research presentation and poster abstracts are often published online either in journals affiliated with the society or at the website of the society itself. These can be reviewed looking for research similar to that proposed. For instance, a researcher may be interested in gene therapy for intervertebral disc degeneration. Integrating a specific gene into cells from the discs might allow the long-term production of proteins that would decelerate or prevent the degenerative process. Prior to writing this current sentence, I went online and was easily able to find over thirty abstracts of presentations or posters from major spine societies that presented animal research on this subject in the last 15 years. Most results were positive, although they all suggest “mining” phenomena. Needless to say, none of the approaches has progressed toward translation and few went on to formal publication in journals. They likely failed to work on further investigation, failed to be published because of methodological flaws, were abandoned as impractical—a clear case of “mismatch” with a costly and risky intervention aimed at a physiological process that most commonly results in no morbidity and (never) mortality, or abandoned as non-translatable (FDA approval). The lesson, however, is that even though the published literature on a particular subject is limited, the available information is often available with relatively easy searches online; and this information can be used to guide researchers and review committees. Using our example, pushing forth with gene therapy for disc degeneration appears unlikely to head anywhere near translation for multiple reasons, and using animals should be discouraged unless a truly novel approach is presented. The second strategy aimed at discovering negative results involves direct contact with other researchers. Via prior publications, research abstracts, and less formal communications (conversions at meetings, for instance), most researchers have some familiarity with other laboratories doing research similar to their own. A simple phone call or email might provide valuable information regarding how (or if) to proceed. (While there is certainly active competition between laboratories aimed at similar problems, most often the spirit of collaboration toward a common goal prevails—most scientists are honest and forthcoming and are hoping to make a real difference in human health; and, simply, like working with others with similar interests.)
2 Scientific Methodology
39
Summary: As it regards Justification, small steps at the lab and institutional levels can make an impact. A checklist requiring full, detailed answers to each of the questions above is a start. Are there available alternatives to animal use? List the alternatives and require explanation as to why they would be insufficient. Is the disease process being studied important enough to justify the use of animals? Explain why. Is there a good match between ideas intensity and interventional intensity? Discuss. Is the hypothesis being tested reasonable/likely enough to justify the use of animals? Provide a well-documented story to suggest that the research is at a point where animal studies are the next logical step. Has the question already been answered? Provide a thorough systematic review of published literature, abstracts from scientific meetings, and personal communications to support moving forward. The committee at the institutional level will need to have members with sufficient expertise to assess the responses provided. Those with this expertise can be considered part of a “scientific core” provided by the institution to improve animal research; the payoff for the institution being higher quality research, less waste, less costs, greater likely impact of the research produced (more likely to translate), enhanced reputation, greater ability to recruit quality researchers, obtain grants, etc. The transition will involve upfront costs for institutions (the experts needed), but the benefits are likely to far exceed these costs. Changes at higher levels (funding agencies and government) should also be implemented to ensure justification. Sections in the grant proposals (checklists) similar to those at institutional level should be mandatory, and unjustified animal studies should not be supported. The broader creation of animal research registries similar to those used for human clinical trials would be of great benefit. The research plans, materials, methods, etc. all submitted transparently at the start of the research; and the subsequent uploading of transparent data and outcomes—be they positive or negative. This would allow the research community as a whole to have greater understanding of what interventions might have the best ability to translate to humans and to focus their attention there; advancing healthcare for humans while having the secondary benefit of limiting unnecessary animal use.
2 Scientific Methodology Currently, great care is taken to ensure that clinical research investigating efficacy and safety of interventions used/to be used in humans follow careful scientific methodology; specifically, the prospective randomized controlled trial or a cohort trial with tightly controlled matching. Both aim to minimize potential biases/improve likely validity via strict attention to the research question, choice of outcomes measures, matching, randomization, blinding, power, statistical analysis, clinical significance, and interventional consistency as outlined in the Critique section.
40
2 Constructive
It is also clear that animal studies aimed at translation currently rarely follow this methodology. And this is despite the fact that animal studies done as randomized trials are substantially easier to perform than human clinical trials. Enrollment and consent are skipped, lifestyle and upbringing and genetic variables limited (if done properly). Why then does poor methodology persist in animals while it has improved greatly in humans? A brief history lesson will be helpful, allowing us to see how and why human clinical research evolved to this point and where animal research currently stands on this evolutionary road. The Evolution of Human Clinical Research The modern/recent history of human clinical research begins with Earnest Avory Codman. A surgeon at Massachusetts General Hospital in Boston, in 1934 he authored a book entitled The Shoulder. The book is considered a “classic” of health services research/evidence-based medicine because it dedicates 70 pages to an unusual autobiographical Preface and Epilogue that included one very important concept. This concept—The End-Result Idea—he originally conceived in the early 1900s. The End-Result Idea “was merely the common sense notion that every hospital should follow every patient it treats long enough to determine whether the treatment was successful and to inquire ‘if not, why not’ with a view to preventing similar failures in the future.” While this concept appears quite reasonable to us a hundred years on, it was extremely controversial at the time—many (most?) doctors at the time did not want their outcomes of treatment carefully scrutinized and neither did their hospitals. Simply, if it was clear that outcomes were poor then doctors and hospitals lose business—so why risk it? Even then, healthcare was big business. Undeterred by colleague and hospital resistance, Codman used his position as chairman of the local Boston medical society to provide an unusual method of pushing for reform. He organized a panel that included hospital administrators, a few long-time colleagues still offering support, and Mayor Curley of Boston to discuss hospital efficiency. He assigned himself as the final speaker leading a “General Discussion.” The meeting was aggressively promoted and attendance was excellent—standing room only with civic, hospital, and medical leaders shoulder to shoulder in the hall. When his time came to speak, he unfurled a large cartoon/ poster prepared with the help of a local artist. The cartoon—published later in The Shoulder—depicted the wealthy citizens of Boston as an ostrich, head in the sand and blindly, ignorantly laying golden eggs being collected by the physicians and hospital; payment for their (the patients’) health care. President Lowell and the Board of Trustees of MGH are seen in the background pondering whether, if the ostrich knew the real End-Results of care, she’d still lay golden eggs. The cartoon managed to offend nearly all comers—patients (depicted as ignorant), doctors, and administrators (depicted as greedy). As a result, Codman was considered “ruthless and radical, lacking respect for tradition and the medical profession.” He lost his position in the medical society, his instructor position at Harvard, and many friends. Codman responded by opening a private hospital in his own home; founded and named for “End-Results.” Over a period of several years—based on the quality of
2 Scientific Methodology
41
his care—he found his way partly back onto the staff at MGH, and he introduced several practical spin-offs of the End-Result Idea there. These included Morbidity and Mortality conferences (where physicians openly discuss their failures with colleagues), a national Bone Sarcoma Registry (where cases and outcomes of cancer were recorded as a resource for other physicians), early hospital Quality Assurance (data collected to ensure that hospitals are doing what is necessary to maintain patient safety), and basic monitoring of patient outcomes. These initiatives, over many years, would be incorporated at many hospitals throughout the nation and persist to this day. At the same time Codman was causing turmoil over End-Results (outcomes) in Boston, J. Alison Glover—across the Atlantic in England—was making an observation that would eventually have an equal impact. Published in 1938, Glover examined trends in the incidence of tonsillectomy performed among children in the UK from 1895 to 1937. Despite the fact that he noted no change in the incidence of tonsillitis over this 40-year span, the incidence of tonsillectomy (the surgery used to remove the tonsils) had increased dramatically over that period, especially among the children of the wealthier classes living in upper-class areas. These findings led Glover to conclude that a lack of consensus among practitioners regarding the efficacy of surgery (outcomes) represented the only logical explanation for such variation and that “…tonsillectomy is performed as a routine prophylactic ritual for no reason and with no particular result.” This observation, coupled with a stream of analogous studies in other specialties which followed shortly thereafter (Lembcke’s work on appendectomies and pelvic surgery, Doyle’s assessment of hysterectomies, Shep’s analysis of hospital quality, etc.), was the seed from which the dedicated study of Variation in Healthcare would grow. “Variation” simply refers to similar patients, with similar disease processes, being treated in quite different ways. And these “different ways” seem to cluster among specific geographic regions. That is, children in a particular county in England were far more likely to be treated by tonsillectomy than children in the next county over. Glover’s observation, like Codman’s Idea, caused great concern. While Codman was appalled that physicians and hospitals would ignore the results of treatments rendered in order to maintain profits, Glover was equally appalled that the treatments rendered depended more upon a patient’s socioeconomic status and treating physician’s preferences than on the patient’s underlying medical disease and the results of such treatments. How you were treated rested primarily upon who you were, where you lived, and who you saw—and only secondarily on what you had and whether treatment was indicated. Less than 6 months following the publication of Glover’s findings, on September 1st, 1939, at 4:45 AM, the German Air Force (Luftwaffe) launched air attacks against Krakow, Lodz, and Warsaw in Poland. Within hours, Britain and France issued a joint ultimatum demanding German evacuation. The deadline expired at 11 AM, and 15 min later British Prime Minister Neville Chamberlain announced on BBC radio that “consequently this nation is at war with Germany.” Two hours later, the British SS Athenia en-route from Glasgow to Montreal was torpedoed by German submarine U-30. Australia, India, and New Zealand declared and World
42
2 Constructive
War II, following years of tension, erupted. Codman headed to Halifax to support the effort. Glover and the Royal Society transitioned all efforts to the cause. Medicine and surgery in the Western world would be dedicated over the next 10 years to the active and continued care of soldiers and civilians injured either directly or indirectly by the war. And issues of outcomes of everyday care and geographic variation of that care—rightfully—fell out of focus. The period following recovery from the war and extending into the mid-1960s was one of rebuilding and rapid expansion in medicine. A burgeoning of developments and breakthroughs in every medical subspecialty occurred. Novel pharmaceuticals to treat infections, diabetes, heart disease, and depression arrived on the scene. Innovative surgical interventions allowed for the replacement of arthritic joints and the bypass of clogged coronary arteries. Coupled with a generalized post- war enthusiasm, medicine was envisioned as having a solid scientific and technological foundation; and science and technology were emerging as pragmatic “kings” throughout society. Concerns about end-results and variation in care remained on a back burner. Medicine worked. And not only for patients, but for institutions who expanded their medical schools and built impressive research centers on NIH (government) and philanthropic dollars, and for private investors who found healthcare to be a vibrant sector for growth (for-profit hospital ownership, investment in companies whose foundation was healthcare, etc.), and for insurers whose businesses were booming. These insurers were joined by the government in large scale via Medicare and Medicaid legislation in 1966. And, suddenly, 85% of the American population had insurance for care…and they were going to be served. By the early 1970s, however, the generalized societal enthusiasm of the 1950s and majority 1960s had begun to wane. Suspicion and activism—born from the civil rights and anti-war movements—grew and enveloped all institutional and professional arenas. Medicine was not immune. The “End-Result Idea” and “Regional Variation in Healthcare” were resurrected. Physicians and hospitals, quite simply, had failed for generations to accurately monitor the specific outcomes of treatments provided. While, in general, health status over the past 50 years had improved (primarily as a result of public health efforts); it remained undocumented/unclear whether many of the most commonly used methods of treatment—medical or surgical—actually worked as well as suspected. Commensurately, regional variation persisted too. Wennberg and Gittelsohn in 1973 found substantial variation in care between neighboring hospital service areas in Vermont, reigniting Glover’s fire that had cooled for nearly half a century. Multiple publications—both scientific and editorial—followed, echoing poorly documented outcomes and persisting variation. The coupling of this new critical attitude toward medicine, uncertainty regarding the outcomes of care of many accepted therapies, and skyrocketing costs led health services researchers to focus keenly upon the outcomes of medical interventions and geographic variation. If providers and hospitals are paid only for interventions that work and care across the country can be standardized; we won’t end up paying for ineffective treatments and over utilization. And it is likely that a remarkable amount of money would be saved and that patients would not be exposed to risk without known likelihoods of reward.
2 Scientific Methodology
43
And what treatments work? Until the late 1980s, this question was answered via a physician’s personal experience treating the specific disease encountered coupled with his or her education—the experiences of others, handed down. He or she would look back at cases they had previously treated successfully, consult the available literature to compare this with others’ experiences, check the textbooks to see what the acknowledged experts recommend, and think back to the methods their mentors used. Founded upon these considerations, the physician would do what was felt best for the patient. Needless to say, each physician’s experience and education is somewhat unique and what one takes from the available literature and textbooks is colored a priori and, circularly, by one’s experience and education. Thus, “what’s felt best for the patient” varied considerably between physicians. Comparative outcomes remained unclear and variation persisted. What was needed was a neutral and objective method to determine which treatments are best and a subsequent method to ensure that these best treatments are employed—methods which limit the potential biases that might inappropriately impact medical decisions regarding treatment. Health services researchers, like Gordon Guyatt from McMaster University, first formally articulated the benefits of such a method in the 1980s, and Sackett later fleshed them out at the same institution. Given the name “Evidence Based Medicine,” the suggested method aimed to shift the foundations of physician decision-making (as it pertains to treatments undertaken) away from “experience and education” and toward decisions based upon much firmer evidential ground. With a primary goal to limit potential bias, a hierarchy of published evidential quality was established. That hierarchy is depicted below:
As can be seen, case series reports (a physician’s unique experience in treating a particular disease in a particular manner; his/her/their peers’ “experience”) and expert opinion are at the bottom of the pyramid—the most likely to be biased. At the top are randomized clinical trials and systematic reviews of randomized trials. They are felt to limit potential biases and to provide the highest quality evidence regarding the efficacy of therapeutic interventions.
44
2 Constructive
The impact of this hierarchy on medicine and how it is practiced today cannot be underestimated. Many physicians believe that randomized controlled trials (RCTs) and their systematic reviews provide the only valid evidence upon which to make treatment decisions, and they serve as the primary foundation for the development of treatment guidelines. Those physicians should follow the guidelines, and practice evidence-based medicine is expected. Heeding the results of RCTs, it is thought, will maximize outcomes and minimize the use of less satisfactory treatments, thereby likely decreasing costs (don’t pay for what doesn’t have evidence of efficacy) and decreasing variation in care. Other evidence—cohorts using non- randomized and non-blinded controls or case series reports—is considered inferior; to be used only if a lack of sufficient RCTs requires that we settle for less. That randomized clinical trials have become the gold standard for documenting the efficacy of therapeutic interventions is undeniable. Insurers/payers use guidelines based primarily on the results of RCTs to determine what does and does not get paid for and, thus, what is likely to be done for patients. So, then, what accounted for this evolution from poor methodology to solid methodology in human research on therapeutic interventions and where is animal research on the evolutionary pathway? First, was the recognition of “the problem” by experts within the field. Voices from experts are key (given that to those outside of fields, details are often opaque, they engender less trust, and opinions are poorly supported by evidence), and for human care, Codman and Glover stand out. For basic science research looking at therapeutic interventions using animals, John Ioannidis (an epidemiologist at Stanford) stands out given his work on irreproducibility of preclinical studies based on poor methodology beginning in the early 2000s. Second, others within the biomedical sciences listened to the original leaders, further explored the issues and published/publicized their findings. This step, in human health sciences, took quite a while due to World War II; but for studies using animals, multiple biomedical scientists from a variety of fields have exposed the “problem” within their own field of expertise within a relatively short time span following Ioannidis’ papers; such that the “Problem” is now generally recognized by most academic biomedical scientists. The third step toward change in human healthcare involved “noise.” The problem was recognized and publicized; health services researchers developed methods (the emergence of RCTs) to improve the situation; widespread discussion within the medical community followed (entire conferences dedicated to the concept of evidence-based medicine); health services researchers, ethicists, and others began to work collaboratively with physicians to address concerns; mainstream media (newspapers, television news, etc.) began to run stories. As it regards biomedical animal research, however, things are different. The problem is becoming recognized in small pockets but is not often publicized. Discussion among animal researchers is generally hushed (as opposed to open) such as to not invite scrutiny of one’s own lab (much like Codman’s colleagues, perhaps). Meta-researchers exploring animal studies are few and far between as compared to health science researchers involved in evidence-based medicine in humans. Media coverage is generally limited to
2 Scientific Methodology
45
animal rights websites, rightly addressing the ethical concerns but with little scientific/methodological consideration, and read/viewed only by those with similar interests (“preaching to the choir”). The fourth step is about money and is, perhaps, the most important. While nearly every field of human medicine now has a relatively well-developed evidence base (RCTs and other quality studies that suggest the best available treatments), it is still quite often the case that these evidence-based treatments are not undertaken and other, less efficacious, treatments are. Old habits of relying on experience and expert opinions, resistance to change based upon even solid research, lack of self-scrutiny of one’s relative outcomes of care, and the profit motive offer formidable resistance to the widespread implementation of evidence-based care. The solution has been to aim to limit payment to those treatments showing the best evidence and outcomes. This shift by payers has made the most positive impact on trying to improve human healthcare—but, of note, still has a long way to go and is but one factor in the effort to improve the situation. As it regards animals, just as physicians will tend to follow the evidence if the money guides them in that direction, scientists will likely follow solid methodology if there are clear financial (and other) incentives to do so and disincentives to not doing so. Currently, grant funders have not adequately led the charge for change. So, on the pathway to change as it regards methodology, research into human therapeutic interventions is “there,” and the implementation of that research into care provided is progressing; albeit, slowly. Early fourth step. As it regards methodologies in animal research aimed at translation, it appears to be in the early third stage (at best). Some noise. And this book is part of that. The obvious solution to address the widespread failure of animal translational research to use solid methodology is to not approve and or pay for translational animal research unless the methodology is solid. At the laboratory level, the key is education. Researchers (students, post-docs, principal investigators) involved should not only be educated/trained in their specific basic science and the specific techniques that will be used for their research but also should be educated/trained in methodology; specifically, how to conduct a proper randomized controlled trial in animals. The “teachers” should be experts from the health services research department. These are the scientists that are experts in RCTs used in human clinical trials, and they are present at almost every academic institution. Cross-talk between animal researchers and human health services researchers, however, is uncommon; but change should not be difficult. These same health services researchers should not only be involved in the education of animal researchers but also members of committees overseeing the approval process for research to be undertaken at the institutions. Concurrently, grant review committees should have similar experts actively involved on the funding side. While, as it regards issues of justification noted in the prior section, expertise may be hard to find and develop; as it regards Methodology evaluation, the expertise is readily available down the hallway or across the street. Institutions and funders can use human health services researchers to quite easily improve the translational
46
2 Constructive
animal research situation. The approach would, again, be a simple checklist assessing all the key components—the question, outcomes measures, matching, randomization, blinding, power, statistical analysis, clinical significance, and consistency of interventions—are addressed adequately prior to approval or funding.
3 Performance In the Critique section, failures of performance were divided into two broad groups: failures of the performance of the ingredients used in research and failures of the humans doing (and assisting) the research. As it regards ingredients, the story of antibodies is representative of the problems seen with nearly all ingredients used in translational animal research. Accordingly, the focus here will be on antibodies and ways to address the problems, but similar problems and solutions can be extrapolated to other ingredients (cells, products, etc.). Commercial antibodies purchased and used by researchers are often assumed to accurately detect specific proteins/molecules that serve as specific markers, but quite often they do not. Several factors account for this. First, there may be considerable variability in antibodies aimed at the same target between companies producing them. Attention to details in production, methodologies used to produce them, storage techniques, methods of distribution, and other factors (once in the laboratories) can impact quality, and often, these factors are opaque to researchers looking simply at a label and using them to test their samples. Second, there may be batch- to-batch variability within a single company’s production; such that the quality and accuracy of antibodies aimed at the same target vary. Simply, purchased antibodies are assumed to be sensitive (they are able to detect a particular protein/molecule when it is present in a sample) and specific (they are able to detect only the targeted protein/molecule and not mistakenly detect another). If they are not sensitive and specific, they cannot provide useful information. And it is estimated that 50% of commercially available antibodies lack sensitivity and/or specificity straight out of the package. Add to this the fact that researchers often use the antibodies in ways that have not been validated (a different tissue, method of application to the tissues, etc.) by the companies producing them, and the likelihood of inaccurate/useless information increases greatly. Glenn Begley, one of the authors in a study that showed that 47 of 53 “landmark” cancer research papers could not be reproduced, felt that problems due to antibody failures were the most likely sources of failure. Again, the persistence of the problem is inextricably tied to a lack of incentives and disincentives. Currently, there are over 300 companies making antibodies for researchers, and the market is worth about $2 billion. While medications sold in the US are closely assessed, monitored, and regulated by the FDA; no such oversight exists for the ingredients used in animal research aimed at translation. Companies producing the antibodies measure their outcomes in profits—their primary incentive. Researchers measure the companies on speed of delivery and often assume the
3 Performance
47
ingredient does what it is purported to do. This is not to suggest that neither the companies nor the labs care about quality, but ensuring the absolute highest quality requires significant time and money from both parties—and both time and money matter greatly to both and are hard to come by. The proof is “in the pudding” where greater than 50% of products lack sufficient quality suggesting that both the companies and laboratories skip validation steps at points along the line. And there is no fair argument to be made for maintaining the current situation and just allowing for market corrections. In the past, when various automobiles were demonstrated to readily catch fire after collisions, corrections could be easily made. The source of the problem identified clearly and corrected by companies aiming to avoid massive class-action lawsuits and loss of consumers no longer buying the dangerous vehicle; and government intervening to ensure changes are made to address the situation via mandatory crash testing. With antibodies, there is far greater opacity. It may be unclear to companies at which step in production, packaging, delivery, etc. that they lose their sensitivity and specificity—a lot can go wrong, the materials are fragile/finicky. If the likelihood that antibodies are accurate is only 50%, then if the labs using them get a negative result they cannot determine if it’s because their proposed intervention does not work or if the failure was in the antibody (and the intervention might work). And the problem is not only within companies, but across companies—so users can’t rationally choose another source from which to obtain them. Thus, “crash testing” becomes the key to addressing the problem. “Crash testing” in the case of antibodies is the independent validation of their quality, sensitivities, and specificities. The government could serve a role here, providing strict standards for the production and testing of antibodies at the company level using commonly employed “trust, but verify” techniques; wherein the companies self-test, report, and, then, agencies “visit” to ensure production is done properly and testing is accurate, with disincentives for failure at either point. Currently, this is not carried out to any great extent, likely because the connection to human public safety is indirect. A second option is testing by independent entities that could provide certification of quality, sensitivity, and specificity via best-available validation—a “stamp off approval” allowing individual laboratories to have some guidance on choice of whom to buy from. This would need to be antibody specific as subtleties in production may make companies stronger or weaker at the production of particular antibodies. A third option is use by laboratories of registries/online portals that provide systematic reviews of validation studies of various antibodies from various providers. There are a couple of these currently available (Antibodypedia, Antibodies-Online). Much of the validation information comes from the companies themselves raising concerns over the reliability of the reported data. And the validation data coming from independent labs and found on these sites continues to suggest that less than 50% of tested antibodies pass the test. While available and valuable/helpful on a relative scale, these resources are only occasionally used/accessed by labs using the antibodies.
48
2 Constructive
That there are currently so many companies producing antibodies offers some benefit. Strict validation and documentation of quality done either in-house or externally at independent labs (but at company cost) can serve as a marketing tool in a competitive field. There a couple of companies that do this currently, but whether buyers (the laboratories) pay attention is unclear. It is too early to determine whether it has made an impact (that would be reflected in significant relative increased sales). A fourth “radical” suggestion was published in Nature in 2015. The authors and 100 co-signatories from prestigious laboratories suggested defining antibodies down to the level of the DNA sequence that produces them and, then, producing the antibodies from human genetically engineered recombinant cells. But specific sequencing and avoidance of animals for production do not guarantee that the produced antibody will work. Validation would still be required. And costs would be 100 times compared to current methods of production—limiting profits for producers and generally not affordable at the lab level on grant funding. Coupled with the impact on extant companies in the business and their thousands of employees, this “ideal” appears impractical and unlikely to take hold. The final option rests within the laboratories and institutions at the local level. Whether using antibodies produced in-house or purchased externally; validate them in-house. “How-to” algorithms for this purpose have been produced that test the antibodies in cells known to contain and not contain the target protein/molecule (positive and negative controls). If the antibody accurately detects true positives (“lights up” on cells known to have the target) and true negatives (does not “light up” on cells known to not have the target), then further testing to confirm sensitivity (“lights up” to a degree that mirrors the amount of antibody used and the amount of protein/molecules present) and reproducibility (using different sets of cells and antibodies) can be undertaken. Failure at any point suggests that the antibody will not provide valid/useful information. Only antibodies which provide valid information in testing should be used in animal research—both for the sake of possible translation to humans and for the animals used for testing. The degree to which these algorithms have been incorporated into laboratories is small, but there is movement by some major journals that might require documentation of some form of validation as part of the submission process for consideration for publication. Failures of human performance are very hard to address. As was noted in the Critique section, science done properly is difficult, technically demanding, and exacting human work. And it requires special skills to carry out the multiple scientific tasks that go in to the research program, and these skills require intensive training. Far and away, the bulk of tasks on any particular project is carried out by researchers at the post-doctoral level or student level. But post-doctoral researchers and students are most often temporary workers with a life span in the lab of 1–3 years, and the tasks they must perform can take 6 months for a fundamental knowledge and up to 5 years to achieve expertise. This mismatch makes for a situation which is high risk for human performance failure. What accounts for the mismatch and how might it be rectified? The answer overlaps with discussions that will follow in the section on systematic failures later. But
4 Translatability
49
a short response is worthwhile here. The turnover of post-docs and students is a function of two factors: professional advancement and soft money. Post-docs and students are meant to “move on”—students advance to post-docs and post-docs advance to develop their own ideas in their own laboratory. “Soft money” refers to how they are paid and supported while working in the lab. As opposed to being paid directly by the institution/university, they are paid using money set aside in grants specifically for their support. They are temporary workers, then, because they either graduate to the next level or lose their financial backing when the grant funding expires. The solution has simple and complex components. The simple portion is for greater investment by institutions in both technicians and “cores.” Technicians who have expertise in specific tasks can be hired into a lab as a long-term employee. They can train the post-docs and students coming through while ensuring that the tasks are performed properly. If there is significant overlap wherein multiple labs are performing similar tasks, the technicians can be available via a central “core” to provide services for various labs as needed. The catch, of course, is that these changes require institutions/universities to invest financially in manpower at the same time that they have become accustomed to not paying for them—using the temporary workers on someone else’s (grant funders) money. They have allowed grants to pay for these necessary workers, have assumed that the money to support them will keep coming, and have shifted money that might have been used for this purpose elsewhere—most commonly, to investment in more research space (buildings) to be filled with more researchers with more indirect money coming to the institution, etc. Quite a cycle—and a cycle that leaves the day-to-day task within laboratories at risk for failures of human performance, and animals at risk for use in experiments where poor technique may render the findings invalid. Further discussion on the subject will be undertaken in the Systematic section.
4 Translatability Translation progressing from the conception of a new therapeutic intervention to its widespread use in humans currently most often includes testing in animals as models. In the Critique section, it became clear that there are many problems as it pertains to the use of animals for this purpose. Animals are often poor models of human diseases given the complex, multifactorial, and chronic pathogenetic and pathophysiological nature of the most important human diseases compared to the rather simplistic methods employed to alter animals in an attempt to mimic the human disease states. Animals are also often poor models of response to therapeutic interventions given significant differences in physiology when compared to humans. Contextual differences are also profound—a rat in a cage bears little resemblance to a human construction worker, especially in light of the importance of biopsychosocial factors and their impact on human response to medical therapeutic interventions. It is even clear that rats in a cage bear little resemblance to rats in the wild
50
2 Constructive
given their differing biosychosocial settings. Given this knowledge, it should come as no surprise that a successful intervention in mice only occasionally makes its way successfully to humans. That said, there are certainly cases where this occurs—(for instance) a specific drug alters a specific chemical pathway and alters that pathway in a manner that results in a positive impact on health—and there is enough similarity in the specific pathway between animals to humans through the translational evolutionary progression. Many drugs (and other interventions) work in humans and got there via the information provided by translation through animal models. Let us assume then, that a researcher has adequately justified the use of animals. They are a reasonable model of the disease, a reasonable model of response to interventions, no reasonable alternatives exist, solid background, solid theory, etc. Let us also assume that the researcher has established a framework wherein the methodology used will be solid, affording the production of valid information. And assume that all “ducks are in a row” regarding the performance of the ingredients and materials to be used, and the researchers undertaking the tasks are skilled at doing so. Prior to starting any animal research, even when all these other conditions are met, the researcher should be able to demonstrate that the intervention has a chance to get to patients. Simply, translation requires more than “everything going well” prior to human studies. That “more” includes many factors that are both within and outside the bounds of science itself. Examples: (1) If a new drug to be used in heart attacks is proposed and developed: Is it chemically stable enough such that it could be packaged, stored, and used when and where it is needed in humans? Or is it so unstable that production and use would need to coincide? Is that remotely possible? (2) If a scaffold has been produced to allow regrowth of liver tissue after damage, is it similarly stable, shelf-life, etc.? Is it “scalable”—can it be efficiently produced in sufficient quantity for distribution widely? Will it maintain its functional integrity given the differences in production in a small lab versus large-scale production techniques? (3) If a novel nanoparticle is to be used for drug delivery in lung cancer, would it have a chance to achieve FDA approval? Is it remotely scalable? If so, would the costs be reasonable? These questions regarding translation are rarely addressed prior to embarking on animal research. Entire programs at multiple research institutions/universities aimed at the concept of “translational medicine”—novel drugs and products and devices— with no hope of getting to humans in any reasonable capacity are produced and then tested in animals routinely. These questions should be addressed a priori—further items on the pre-experimental checklist.
5 Systematic Failures The most difficult problems resulting in failure are systematic and are driven by the incentives currently driving biomedical research. Nothing runs without fuel, and the fuel of biomedical animal research at the level of the individual labs is grant
5 Systematic Failures
51
funding. Doing animal research is expensive. Funding is required. Currently, the vast majority of funding comes primarily from the federal government (NIH, etc.) and secondarily from a combination of institutional (university) and philanthropic sources. Other sources of funding such as industry make up a smaller portion, are usually targeted, and often seek out safe programs aimed at incremental change/ capturing part of the market share. How do scientists aiming to create novel translational solutions to improve health (or aiming to improve the solutions currently available) get the funding from these sources to support their work? Grant funding in the United States, with only occasional exceptions, is founded upon the “peer review” system. In peer review, proposals for research in which animals are used are submitted for potential funding to NIH, other federal institutions, philanthropic organizations, and universities. The proposals submitted are then evaluated by “peers” for quality, and the “best” of them receive funding—the fuel of research. The scientists’ submitted proposals are akin to a sales pitch, and they are hoping the funders will buy. And which sales pitches currently most commonly get funded/win the grants? The funders (NIH, philanthropy, universities) like “winners.” They want to know (to the best degree possible) which proposals are likely to succeed (as measured by positive results of the experiments/research program undertaken) and, then, preferentially support these. This is similar to investors aiming to put their money in the right stock or a gambler on the right horse. There is risk involved for the funders. They don’t want to waste precious research money that they’ve been charged with distributing on research programs that are unlikely to succeed. So what are safe bets for funders? How do they determine those proposals most likely to succeed? Researchers and programs with a proven track record of successes are safer bets. Thus, the funding system is biased toward “winners” with a proven track record of preliminary data predicting likely success of their proposed future research program. The track record is established by preliminary data in the form of publications in the peer-reviewed literature. Prior publication is key. It is the ticket that most often affords future important funding. And publications in highly ranked journals are even better. Contemporaneously, journals want to publish powerful success stories (positive results) and, especially, success stories that are first-of-their-kind (so that their (the journals’) prestige and “impact factor” increases and sales jump). The pressure on researchers to obtain external (NIH, philanthropic) funding is significant. They need to publish “winners” to be eligible for significant funding. They need to do their research, aimed at disease states of interest to the funders, show that their interventions work, and get the news out there. And they need to get it done quickly. The “pressure” to succeed quickly is a circular result of the funding system itself. Americans are greatly concerned about health and healthcare, and their voices (those of individuals and those of disease-specific lobbies (cancer societies, medical schools, etc.)) are heard loudly and clearly by politicians/policy makers. As a result of a bursting concern around health care in America in the early 2000s, the budget for NIH funding doubled and the restraints on the use of such funding by institutions and medical schools were loosened, allowing NIH money to
52
2 Constructive
be used for scientists’ salaries, purchase of scientific equipment, and a rapid expansion of research buildings—to encourage growth and innovation. (The new buildings were needed to house more labs and more researchers to bring in even more NIH money, and to keep pace with the other institutions and medical schools going full in. They needed to keep up or be left behind.) This cycle was coupled with a decreased commitment by institutions and universities to their researchers and labs—why pay for these if the government funders will? The institutions have beautiful buildings, but the researchers/individual labs have to now pay for themselves via salary supported by grant funding and for their space in the research building— rent—divvied up using a formula of square-feet per dollars in federal grant support. This way of doing things, of course, assumed a continuance of significant NIH funding increases (doubling annually in those “golden years”), but this did not occur. Funding (the NIH and other federal agency funding) plateaued and has seen small incremental decreases or increases year-to-year based on the overall economy at the time. So, the current system most commonly works like this: For those with promising or established research programs (winners and likely winners), the institution/university will provide start-up funds for 3 years. Most biomedical research labs engaged in animal research will have seven or eight people within it. A director (Principal Investigator (PI)), two or three post-doctoral fellows (PhDs starting out— they will work here for a while and then, hopefully, head out as PIs with their own labs), three lab graduate students working toward their PhD, and an administrator who oversees the nuts-and-bolts of the lab. Most labs in the field doing animal research will cost around $600,000 per year for people, equipment, animals, research “ingredients,” etc. During the 3-year start-up, in addition to doing/running the research, the PI will be working on grant proposals to be submitted. After 3 years, the lab is meant to be self-supporting, wherein grant funding (preferably, government, which brings with it indirect funding meant to support the institution to continue research endeavors) is used to cover the great majority of costs—salaries, equipment, “ingredients,” animals, etc. Only occasionally—beyond 3 years— is sufficient support provided for the researcher/lab by the institutions/universities if adequate grants are not obtained or pending. Failure to obtain funding within a short time frame may result in closure of the lab. As was suggested before: If the process of doing biomedical experiments “right” as defined so far in this book requires using time-consuming costly alternatives to animals (organs on a chip are not cheap or fast or readily available); time consuming and costly methodologies (biostatisticians and experts on methodology are not cheap or fast); testing and validating ingredients (costly, time-consuming, requiring special expertise); ensuring translatability (time and cost of a long-term doable plan) and the key to academic success is multiple rapid publications; why not simply use animals as is currently the standard? Animal experiments are relatively inexpensive, results (valid or not) can be obtained relatively quickly, and those results can be published rapidly given the options for publication available, even in many highly ranked journals. Why not six publications in less time and at less cost,
5 Systematic Failures
53
compared to one publication done “right”? Why not do the best one can with what one has within the current system? Simply: The system incentivizes research that is carried out relatively quickly and relatively cheaply, and results in strongly positive results that can be published and used for fuel to obtain grants. Too often, the result of the incentives is loosely performed research, and beyond that, the potential for the knowing submission of potentially flawed research or even outright fraud. In contrast, high-quality research using animals takes more time, more money, close attention to details on all fronts, and letting the answers declare themselves through solid/rigorous scientific methodology; knowing that the intervention may well not work; but, if it does, the likelihood of successful translation is greatly improved. So, what can be done at this systematic level to help the situation? Change the Incentives Away from Positive Results at the Funding Level Obviously, there are significant benefits from positive results of biomedical research using animals that culminate in successful human translation. But positive results in the current system are not only necessary but are also often sufficient to obtain funding. Care by funders to ensure that the current proposed research program is set up to address concerns regarding justification, methodology, ingredients used, staffing, and pathways to translation are in line prior to funding is mandatory for truly positive change. Additionally, funders should carefully review the background information and publications submitted to support the proposed research. Is the proposed research founded upon solid, valid information derived from high-quality studies? Or is it founded upon flawed papers and, thereby, likely to fail? Change the Incentives Away from Positive Results at the Publication Level Published articles using animals for biomedical research should satisfy all of the requirements regarding justification, methodology, etc.; and, if they do so, be published whether the results were positive or negative. Ideally, this would be achieved by submissions for publication being required immediately prior to embarking on the research itself. The submissions would be judged wholly on their merit as it regards justification, methodology, etc.; and if they satisfy, these prerequisites would be guaranteed publication regardless of positive or negative results. Given this commitment on the publishing side, labs would also commit to publish regardless of results. There is no one in biomedical science who would not read these articles. One is guaranteed high quality, and one learns what might work and what does not work with minimal bias, thereby potentially and appropriately impacting one’s own research program moving forward. Require All Animal Studies to Be Uploaded into a Fully Transparent Registry Similar to what is currently done for human clinical trials, mandatory participation in a central registry could be required for funding. Materials and methods would be uploaded at the inception of the research; all data generated during testing uploaded in real time; results posted as they come in; and final outcomes summarized— whether positive or negative. All done transparently and with open access for the research community.
54
2 Constructive
Require Institutions/Universities to Have More “Skin in the Game” Limit the percentage of support for salaries, lab space, etc. coming from funders and require institutions to document that they will support the difference as part of the submission for funding. No institutional support; no funding. Institutions should have belief in the research programs undertaken in their space and in their faculty; and they should be invested; not simply filling space with the aim garnering indirects to continue a flawed cycle. Provide Greater Support for Novel Ideas and Brilliant People to Build from the Ground Up Shifting the incentives away from immediate results in publication and grant obtainment, and, rather, empowering really smart people to attack problems in novel ways seems obvious but is rarely undertaken. The current “way things are done” works well for funders, institutions/universities, and many labs—unless one looks at the outcomes. So the outcomes are generally only discussed at length when they are clear successes of translation; the other 95% tolerated and supported as “works in progress.” It is currently quite difficult for truly novel ideas and young, smart people to get funding either from the government or their institutions. They are not the best bets for the quick positive results that are built into the current system. Less than 20% of R01 and their equivalent level grants go to researchers under the age of 40. And this is despite the fact that it is well recognized in nearly all scientific fields that those under 40 are the most likely to have novel, impactful, paradigm-breaking ideas. This disconnect is concerning. The system currently encourages/requires the young to “pay their dues” over a time frame accounting for a third to three quarters of their productive careers in more senior researchers’ labs and chasing grants prior to embarking on/attacking their own ideas. Many end up at age 35–40; burnt out and cynical; and abandon what might have been ground-breaking ideas and programs to enter “industry” where (with some notable exceptions) safe, incremental change based on profits and market share trump all. They acquiesce and take up golf. Programs within the NIH and at some institutions are recognizing these important issues and are incrementally beginning to address them.
6 Summary of Constructive Section Justification for the necessity for the use of animals, the use of solid methodologies, and a clear plan outlining a reasonable pathway to translation to humans can each be improved through relatively simple measures; specifically, checklists that address each of the issues raised that require complete and clear explanation; completed before any animal work starts; and reviewed/approved by individuals with expertise sufficient to proceed with a “trust, but verify” examination of proposals. This checklist can and should be implemented at each level—in the lab prior to submission
6 Summary of Constructive Section
55
with the PI at the lead; at the institutional animal research review committee; at the funding level by reviewers of grant submission; and at the publication level by reviewers/editorial levels. It is of note, that at each of these levels, currently, checklists aimed at some of these issues are currently used and, in some cases, required; but they are often inconsistent, often incomplete, generally use check-boxes without further justification/explanation, and may be leaky nets that allow animal studies to move forward unjustified and invalid. Performance issues and systematic problems, again, are recognized by some, but are not recognized or are simply overlooked by many more. Significant restructure and investment in materials and expertise will be required to improve performance; and a change in incentives will be required at the systematic level.
Chapter 3
Putting It All Together: Hope Moving Forward
Readers will note, having read this book, that it is not about several equally concerning issues in science. It is not about scientific fraud. It is not an attack on the scientific method. It is not an attack on the use of animals in biomedical research on purely ethical grounds. But the critique and constructive analyses undertaken have implications broadly to these areas as well. Scientific fraud is the intentional falsification of scientific information that is presented in a manner that appears legitimate. What is it about scientific fraud that bothers other researchers the most? What makes them mad? First and foremost, it is that the scientific community—and these are smart folks—has been duped. No one wants to be lied to and then believe that lie for a while. They look the fool. Anger is the natural reaction. Once anger from being fooled cools off, they begin to ask other questions: How did this happen? Filters at the lab level, institutional level, funding level, and publication level failed. What are the consequences of it having happened? Wasted money. There are famous (infamous?) examples in multiple areas ranging from stem cells to Alzheimer’s disease wherein labs have falsified “groundbreaking” results and have been rewarded with remarkable amounts of funding for further research into their falsified findings with which, they almost universally, “double down” by producing more false information; consuming scarce grant money until such time as they are discovered; and, by then, the money’s gone. What is the consequence for others in the field? Often, other labs, believing the falsified results start to follow the path presented in the falsified papers. For falsified “landmark” papers, the repercussions are serious. Numerous labs begin programs into novel ways to diagnose and treat processes built on falsified foundations published by other labs. And these spin-off research programs consume brain-power, money, and time that could have been used for truly promising alternatives. A recent example, currently being scrutinized in the scientific literature, documents what appears to be falsification of information on an oligomer of amyloid purported to play a key causal role in Alzheimer’s disease. It appears, at this point, that the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. K. Weiner, A Scientific Approach to Improving Animal Research in Biomedicine, https://doi.org/10.1007/978-3-031-24677-7_3
57
58
3 Putting It All Together: Hope Moving Forward
original and subsequent research from the same researchers on the subject was falsified. The results were published in major journals in the early 2000s; awards were won; and innumerable other researchers began down similar paths based on the information. It is estimated that researchers around the world may have wasted 17 years and $290 million in research funds on a path to nowhere. Years, brainpower, and money can’t be recouped. Neither can the animals used in the studies. Let’s now get back to the animal research covered in this book—not falsified, but completed in a manner that leads to the original “Problem”—that most translational animal research published in the literature is invalid or unjustified. How does that differ from cases of outright scientific fraud? Time is still lost, brain-power still misused, money still spent, and animals still used uselessly. Cases of fraud, however, are greeted with anger, whereas cases of research using animals with no real chance for the production of providing valid information a priori are greeted with occasional articles about irreproducibility. One suspects that lack of anger results from the fact that the “Problem” is so widespread (85–95%), it’s just part of how things are done, especially when coupled with concerns that making significant changes—such as those suggested in this book—will take considerable effort, time, and money; across the board from labs to institutions to funders to publishers. I believe that the use of animals with no hope of making an impact on human health should be a greater source of concern; if not anger. This book is also not an attack on the scientific method. On the contrary, the scientific methods of choice leading to translational are known. This book is a “how-to” do it properly (or, sat least, improve it significantly)—when using animals—from conception to commercialization. This book is also not directly about the ethics of using animals in research. It is about the many points of potential failure that can render biomedical translational research useless, and the ways that it can be cleaned up. And the animals involved and all of us humans with our ills and diseases stand to benefit substantially from such a cleaning. I have hope moving forward that we are starting to head in a positive direction.
References
A. Specific References Delineating the ‘Problem’ Referred in Opening Section 1. Prinz F. Believe it or not. Nat Rev Drug Discov 10: 712, 2011. 2. Begley CG. Drug development. Raise the standards for preclinical research. Nature 483: 531, 2012. 3. Peers IS. In search of clinical robustness. Nat Rev Drug Discov 11: 733, 2012. 4. Chalmers I. Avoidable waste in the production and reporting of research evidence. Lancet 374; 86, 2009. 5. Chalmers I. How to increase value and reduce waste. Lancet 383: 156, 2014. 6. Inonnidis JP. Increasing value and reducing waste in research design. Lancet 383: 166, 2014. 7. Glasziou P. Reducing waste from incomplete or unusable reports of biomedical research. Lancet 383: 267, 2014. 8. Al-Shahi Salman R. Increasing value and reducing waste, regulation and management. Lancet 383: 176, 2014. 9. Chan AW. Addressing inaccessible research. Lancet 383: 257, 2014. 10. Sena ES. Publication bias in reports of animal stroke studies. PLOS Biol 8: 344, 2010. 11. Perrins S. Make mouse studies work. Nature 507: 423, 2014. 12. Tsilidis K. Evaluation of excess significance bias in animal studies of neurological diseases. PLOS Biol 11: 1609, 2013.
B. General References Delineating the ‘Problem’ Begley C. Improving the standard for basic and preclinical research. Circ Res 116: 116–126; 2015. Sena E. Publication bias in reports or animal stroke studies. PLOS Biol 8: 344, 2010. Prinz F. Believe it or not. Nat Rev Drug Discov 10; 712, 2011. Begley C. Drug development: raise standards for preclinical research. Nature 483: 531–533, 2012. Perrin S. Make mouse studies work. Nature 507: 423–425, 2014. Tsilidis K. Evaluation of excess significance bias in animal studies of neurological diseases. PLOS Biol 11: 1609, 2013.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. K. Weiner, A Scientific Approach to Improving Animal Research in Biomedicine, https://doi.org/10.1007/978-3-031-24677-7
59
60
References
Macleod M. Biomedical research. Lancet 383: 101–104; 2014. Ioannidis J. Increasing value and reducing waste in research design. Lancet 383: 166–175; 2014. Gunn W. Reproducibility. Fraud is not the big problem. Nature 505: 483; 2014. Chase J. The shadow of bias. PLOS Biology 11: 609, 2013. Perel P. Comparison of treatment effects between animal experiments and clinical trials. BMJ 334: 197, 2007. van der Worp H. Can animal studies of disease reliably inform human studies? PLOS Med 7: 245, 2010. Pound P. Where is the evidence that animal research benefits humans? BMJ 328: 514–517, 2004. Hartung T. Look back in anger-what clinical studies tell us about preclinical work. Altex 30: 275–291, 2013. Kilkenny C. Survey of the quality of research using animals. PLOS One 11: 7824, 2009. Jarvis M. Irreproducibility in preclinical biomedical research. Trends Pharm Sci 37: 1016, 2016. Hay M. Clinical development success rates for investigational drugs. Nat Biotech 32: 40–51, 2014.
C. General References Regarding Justification Mignini L. Methodological quality of systematic reviews of animal studies. BMC Med Res 6: 10, 2006. Dance A. Building bench top human models. Proc Nat Acad Sci 112: 6773–6775, 2015. Clark J. The 3Rs in research. Cambridge Core October 2017. Huh D. Reconstituting organ-level lung functions on a chip. Science 328: 1662–1668, 2010. de Vries R. The potential of tissue engineering for alternatives to animal experiments. J Tis Eng 9: 771–778, 2013. Kinter L. A brief history of use of animals in biomedical research and perspective on non-animal alternatives. ILAR 1: 1–10, 2021. Bedard P. Innovative human 3D tissue-engineered models as an alternative to animal testing. Bioeng 7: 115, 2020. Madden J. A review of in silicon tools as alternatives to animal testing. Alt Lab An 48: 146–172, 2020. Low L. Organs-on-chips—into the next decade. Nature Reviews 20: 345–361, 2021. Cheluvappa R. Emerging alternaives to animal experimentation. Pharm Res 5: 332, 2017. Brom FW. Science and society. Altex 19: 78–82, 2002. Huh D. Microfabrication of human organs on a chip. Nat Protoc 8: 2135–2157, 2013. Kostomitsopoulos N. The ethical justification for the use of animals in biomedical research. Arch Biol Sci 62: 781–787, 2010. Guhad F. Introduction to the 3Rs. Contemp Top Lab Anim Sci 44: 58–59, 2005. Lloyd M. Refinement. Lab Anim 42: 284–293, 2008. Nevalainen T. Training for reduction in animal use. Altern Lab Anim. 32: 65–57, 2004. Olsson L. Ethics and refinement in animal research. Science 317: 1680, 2007. Russel W. The principles of humane experimentation technique. Methuen, London: 1959. Robinson N. The current state of animal models in research. Int J Surg 72: 9–13, 2019. Ericsson A. A brief history of animal modeling. Mo Med 110: 201–205, 2013. Loeb J. Human vs. animal rights. JAMA 262: 2716–2720, 1989.
References
61
D. General References Regarding Methodology Ioanidis J. Why most published research findings are false. PLoS Med 2: e124; 2005. Sullivan G. Using effect size. J Grad med Educ. 4: 279–282, 2012. Kyriacou D. The enduring evolution of the p value. JAMA 315: 1113–1115; 2016. Button K. Power failure. Nat Rev Neurosci 14: 365–376, 2013. Chavalarias D. Evolution of reporting p values in the biomedical literature. JAMA 315: 1952, 2016. Goodman S. Evidence and scientific research. Am J Public Health 1558–1574, 1988. Goodman S. A dirty dozen. Sem in Hematol 45: 485–496, 2008. Altman D. The scandal of poor medical research. BMJ 308: 283–284, 1994. Altman D. Poor quality research. JAMA 2765–2767, 2002. Arrowsmith J. A decade of change. Nat Rev Drug Discov 10: 87, 2012. Lang T. Twenty statistical errors even you can find in biomedical research articles. Croat Med J 45: 361–370, 2004. MacCallum C. Reporting animal studies. POLS Biology 8: 413, 2010. Alberts B. Rescuing biomedical research from systematic flaws. Proc Nat And Sci 111: 5773–5777, 2014. Jager L. An estimate of the science-wise false discovery rate. Biostat 15: 1–12, 2014. Begley C. Six red flags for suspect work. Nature 497: 433–434, 2013. Macleod M. Risk of bias in reports of in vivo research. PLOS Biol 13: 2273, 2015. Peers I. In search of preclinical robustness. Nat Rev Drug Disc 11: 733–734, 2012.
E. General References Regarding Performance Lazic S. Improving basic and translational science by accounting for litter-to-litter variation in animal models. BMC Neurosci 14: 37, 2013. Bradbury A. Standardize antibodies in research. Nature 518: 27–28; 2015. Lorsch J. Fixing problems with cell lines. Science 346: 1452–1453; 2014. Buehring G. Cell line cross-contamination. Cell Dev Biol 40: 211–215; 2004. Langdon SP. Cell culture contamination. Methods Mol Med. 88: 309–318, 2003. Macleod R. Widespread interspecies cross-contamination of human cancer cell lines. Int J Cancer 83: 555–563, 1999. Stacey C. Cell contamination leads to inaccurate data. Nature 203: 356, 2000. Baker M. Blame it on the antibodies. Nature 521: 274–276, 2015. Arrowsmith C. The problems and perils of chemical probes. Nat Chem Biol 11: 536–541, 2015. Freedman L. Changing the policies and culture of cell line authentication. Nature Methods 12: 493–497, 2015.
F. General References Regarding Translatability Begley C. Drug development. Nature 483: 531–533, 2012. Ioannidis J. Extrapolating from animals to humans. Sci Trans Med 4:151, 2012. Roberts I. Does animal experimentation inform human healthcare? BMJ 324: 474–476, 2002. van der Worp H. Preclinical studies of human disease. J Mol Cell Cardiol 51: 449–450, 2011. Pound P. Is it possible to overcome issues of external validity in animal research? J Trans Med 16: 304, 2018.
62
References
McGonigle P. Animal models of human disease-challenges in enabling translation. Biochem Pharm 87: 162–171, 2014. Hackam D. Translation of research evidence from animals to humans. JAMA 296: 1731, 2006. Wilhelm S. Analysis of nanoparticle delivery to tumors. Nature Reviews 2016.
G. General References Regarding Systematic Issues Bourne H. Expansion fever and soft money plague the biomedical research enterprise. PNAS 115: 8647–8651; 2018. Korn D. The NIH budget in the post-doubling era. Science 296: 1401–1402; 2002. Alberts B. Overbuilding research capacity. Science 329: 1257; 2010. Bienenstock A. Have universities overbuilt biomedical research facilities? Issues Sci Tech 31: 3; 2015. Begley C. Institutions must do their part for reproducibility. Nature 525: 25–27; 2015. Freedman LP The economics of reproducibility in preclinical research. PLOS Biology 13: 1–9; 2015. Collins F. NIH plans to enhance reproducibility.Nature 505: 612–613; 2014. Baker D. Two years later. PLOS Biol 12: 1756, 2014. Fang FC. Misconduct accounts for the majority of retracted scientific publications. Proc Nat Acad Sci 109: 17028–17033, 2012. Landis SC. A call for transparency to optimize predictability of preclinical research. Nature 490: 187–191, 2012. MacLeod M. Why animal research needs to improve. Nature 477: 511, 2011. Freedman L. The increasing urgency for standards in basic biologic research. Cancer Res 74: 4024–4029, 2014. Freedman L. The economics of reproducibility in preclinical research. PLOS Biol 13: 2165, 2015. Simeon-Dubach D. Quality really matters. J Pathol 228: 431–433, 2012. Casadevall A. Reforming science. Infect Immune 80: 891–896, 2012. Nosek B. Restructuring incentives and practices to promote truth over publishability. Persepect Psychol Sci 7: 615–631, 2012. Nicholson J. Research grants—conform and be funded. Nature 492: 34–36, 2012. Ioannidis J. More time for research. Nature 477: 529–531, 2011. Freedman D. Lies, damned lies, and medical science. The Atlantic, 2010. Collins F. NIH plans to enhance reproducibility. Nature 505: 612–613, 2014. Begley C. Institutions must do their part for reproducibility. Nature 525; 25–28, 2015. Bourne H. Expansion fever and soft money plague the biomedical research enterprise. PNAS 115: 8647–8650, 2018. Kilkenny C. Improving bioscience research reporting. PLOS Biol 8: 412, 2010. Filer J. Irreproducibility of published bioscience research. Mol Met 6: 2–9, 2017. Drucker D. Never waste a good crisis. Cell Metab 24: 348–358, 2016. Akhtar A. The flaws and human harms of animal experimentation. Camb Quart Health Ethics 24: 407–419, 2015. Keen J. Wasted money in united states biomedical research. Book Chapter, 2019. Stephen P. How economics shapes science. Harvard University Press, 2015.