122 13 3MB
English Pages 218 [214] Year 2022
Research for Development
Alessandro Ferrero Veronica Scotti
Forensic Metrology An Introduction to the Fundamentals of Metrology for Judges, Lawyers and Forensic Scientists
Research for Development Series Editors Emilio Bartezzaghi, Milan, Italy Giampio Bracchi, Milan, Italy Adalberto Del Bo, Politecnico di Milano, Milan, Italy Ferran Sagarra Trias, Department of Urbanism and Regional Planning, Universitat Politècnica de Catalunya, Barcelona, Barcelona, Spain Francesco Stellacci, Supramolecular NanoMaterials and Interfaces Laboratory (SuNMiL), Institute of Materials, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Vaud, Switzerland Enrico Zio, Politecnico di Milano, Milan, Italy Ecole Centrale Paris, Paris, France
The series Research for Development serves as a vehicle for the presentation and dissemination of complex research and multidisciplinary projects. The published work is dedicated to fostering a high degree of innovation and to the sophisticated demonstration of new techniques or methods. The aim of the Research for Development series is to promote well-balanced sustainable growth. This might take the form of measurable social and economic outcomes, in addition to environmental benefits, or improved efficiency in the use of resources; it might also involve an original mix of intervention schemes. Research for Development focuses on the following topics and disciplines: Urban regeneration and infrastructure, Info-mobility, transport, and logistics, Environment and the land, Cultural heritage and landscape, Energy, Innovation in processes and technologies, Applications of chemistry, materials, and nanotechnologies, Material science and biotechnology solutions, Physics results and related applications and aerospace, Ongoing training and continuing education. Fondazione Politecnico di Milano collaborates as a special co-partner in this series by suggesting themes and evaluating proposals for new volumes. Research for Development addresses researchers, advanced graduate students, and policy and decision-makers around the world in government, industry, and civil society. THE SERIES IS INDEXED IN SCOPUS
Alessandro Ferrero · Veronica Scotti
Forensic Metrology An Introduction to the Fundamentals of Metrology for Judges, Lawyers and Forensic Scientists
Alessandro Ferrero Department of Electronics, Information and Bioengineering Politecnico di Milano Milan, Italy
Veronica Scotti Department of Electronics, Information and Bioengineering Politecnico di Milano Milan, Italy
ISSN 2198-7300 ISSN 2198-7319 (electronic) Research for Development ISBN 978-3-031-14618-3 ISBN 978-3-031-14619-0 (eBook) https://doi.org/10.1007/978-3-031-14619-0 © Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Justice and science have fairly different ultimate goals that have been widely discussed by philosophers along the centuries under many different perspectives which have changed according to the different cultural backgrounds and the evolution of human knowledge. Without entering into philosophical details that are outside our competences, we dare to state that the ultimate goal of justice is administering fairness, so that the different communities of people may live peacefully. This is generally done by punishing crimes and misconducts and granting fair compensations to the victims. Science aims at explaining the reality by defining suitable theoretical models, validating them with universally recognized and reproducible experiments, in order to provide a quantitative representation of the different phenomena, allowing us to predict their evolution with an accuracy that is directly proportional to the accuracy with which the available model can represent them, and the accuracy with which we can measure the present state. Metrology, that is the science of measurement, plays an important role in assessing the validity of the experiments and, consequently, in assessing how well models can represent reality and how well we can express our knowledge in numbers. As Lord Kelvin stated I often say that when you can measure what you are speaking about, and can express it in numbers, you know something about it; but when you cannot express it in numbers your knowledge about it is of meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be. So, therefore, if science is measurement, then without metrology there can be no science.
Despite their different ultimate goals, justice and science share a common need: ascertaining the factual truth. Justice, and especially, but not only, criminal justice, needs to reconstruct facts to identify the perpetrator of a crime or misconduct. Science needs to understand facts to build a theoretical, general model capable of explaining and describing all similar facts and understand all possible cause-effect relationships. It is quite natural that justice has sought the help of science in the solution of its cases, since modern science is recognized to be rigorous in its findings, and v
vi
Preface
the results of scientific judgments are supposed to be more reliable than those of human judgments. Forensic science represents the application of scientific methods in support of justice. However, science is also the result of human observations and reasoning and, therefore, cannot provide absolutely certain knowledge. The history of science proves that it has developed along the centuries through a constant refinement of its findings, making and correcting errors, and the development of new and more accurate instruments and measuring methods has also largely contributed to its advancement. An important, though often neglected by historians and philosophers, contribution to the advancement of science and knowledge has been given by metrology, mainly since the 18th century, when modern metrology started. It is often and erroneously thought that the science of measurement deals only with developing new methods and instruments. While this is surely one extremely important task of this science, a second task is equally important. Metrology knows its limits and knows that it can never provide the true value of the measurand, that is the quantity being measured. Any measurement method and any measuring instrument, no matter how accurate it is, can only provide limited information about the measurand. From a practical point of view, this means that whenever a measurement result is used as an input element to a decision process (is a quantity above or below a threshold? does a measured pattern match with a reference one? ...), the risk of making the wrong decision is always present. The consequences of a wrong decision when this decision is about guilt or innocence may be quite dramatic. The second, important task of metrology is that of quantifying how reliable the measurement result is, that is how well the result of the measurement represents the value of the quantity being measured. This is achieved by defining, evaluating, and expressing measurement uncertainty, an important part of any measurement result. Measurement uncertainty is therefore an important concept also in forensic sciences and whenever the result of experimental tests are submitted to the trier of facts as pieces of evidence. Not only it proves that science cannot provide absolutely certain elements, but, more importantly, it gives a quantitative evaluation of the doubt on how well the result of the measurement represents the value of the quantity being measured. It might become an extremely useful element, to the trier of facts, in deciding whether a verdict based on scientific evidence can be rendered beyond any reasonable doubt or not. This is the important task of forensic metrology, a branch of the forensic sciences that has been neglected since a few years ago, when this term was first proposed. Unfortunately, the term (uncertainty) used to express an important concept in metrology does apparently clash against an important concept (certainty of law) in jurisprudence, so that forensic metrology still finds opposition among judges, lawyers and, generally, the main actors of justice. This book is aimed at explaining, in a way as simple as possible, the main concepts of metrology and how they can be usefully employed in administering justice to judges, lawyers, and forensic scientists. It will explain how the most important concepts of metrology are fully compatible with the foundational principle of justice
Preface
vii
and how uncertainty in measurement is an important piece of evidence that helps the trier of facts to ascertain facts. It will show that not considering uncertainty is the same as keeping an important piece of evidence hidden from the trier of facts and is not dissimilar from false testimony. Examples of cases where metrology played an important role will be considered as well, to give evidence of the dramatic miscarriage of justice that might be caused by neglecting the fundamental concepts of metrology when scientific evidence is treated as conclusive evidence. Milan, Italy June 2022
Alessandro Ferrero Veronica Scotti
Contents
Part I 1
Justice, Science and Their Methods
Justice and the Law Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 The Origin of Justice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Legal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 The Common Law Systems . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 The Civil Law Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Common Principles to Common Law and Civil Law Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 In Search of Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Substantive Truth and Procedural Truth . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 5 6 8 9 12 14 15
2
Science and the Scientific Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 What Is Science? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Scientific Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Experimental Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Theoretical Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Advancing Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The Limits of Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 The Model Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The Experiment Contribution . . . . . . . . . . . . . . . . . . . . . . . 2.5 The Role of Metrology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17 17 18 18 20 21 22 23 24 24 25
3
Forensic Science: When Science Enters the Courtroom . . . . . . . . . . . 3.1 Science and Justice. Same Goal with Different Methods? . . . . . . 3.2 Scientific Evidence and Its Evolution in Time . . . . . . . . . . . . . . . . 3.3 The Role of Technical Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 The Role of Technical Experts in Civil Law Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27 27 28 30 30
ix
x
Contents
3.3.2
The Role of Technical Experts in Common Law Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Validity of Forensic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Forensic Metrology: A Step Forward . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part II
31 32 37 39
Metrology and its Pillars
4
The Measurement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Why a Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Measurement and Metrology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Experimental Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.4 Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43 43 44 44 47 50 52 53 54 55
5
Measurement Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 The Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Origin of the Doubt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Definitional Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Instrumental Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 The Effects of the Doubt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Systematic Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Random Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Measurement Repeatability . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Measurement Reproducibility . . . . . . . . . . . . . . . . . . . . . . 5.3.5 A First Important Conclusion . . . . . . . . . . . . . . . . . . . . . . . 5.4 The Uncertainty Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 How to Express Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Random Variables and Probability Distributions . . . . . . . 5.5.2 Standard Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Expanded Uncertainty and Coverage Factor . . . . . . . . . . 5.5.4 Methods for Evaluating Standard Uncertainty . . . . . . . . . 5.5.5 Combined Standard Uncertainty . . . . . . . . . . . . . . . . . . . . 5.5.6 Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57 57 59 59 61 62 63 64 65 65 66 67 69 69 76 77 78 84 89 92 93
6
Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.1 An Important Problem in Metrology . . . . . . . . . . . . . . . . . . . . . . . . 95 6.2 Comparison of Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . 96 6.3 Calibration: The Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.4 Calibration: The Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Contents
xi
6.5 Calibration: How Often? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Calibration Versus Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Calibration Versus Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
103 104 105 106
7
Traceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Where Are the Measurement Standards? . . . . . . . . . . . . . . . . . . . . . 7.1.1 The SI and Its Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 The Calibration Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Metrological Traceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 The Accreditation System . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
107 107 108 109 110 110 111 113
8
Uncertainty and Conscious Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Measurement Results in Decision-Making Processes . . . . . . . . . . 8.2 Decisions in the Presence of Uncertainty . . . . . . . . . . . . . . . . . . . . . 8.3 The Probability of Wrong Decision . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
115 115 116 118 123 124
Part III Practical Cases 9
Breath and Blood Alcohol Concentration Measurement in DUI Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 DUI Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 The Measurement Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Definitional Uncertainty Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 An Important Influence Quantity: The Presence of Mouth Alcohol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Instrumental Uncertainty Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Practical Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Verdict 1759/2016 of Genova Criminal Court . . . . . . . . . 9.5.2 Verdict 1574/2018 of Brescia Criminal Court . . . . . . . . . 9.5.3 Verdict 1143/2019 of Vicenza Criminal Court . . . . . . . . . 9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
132 134 135 136 141 145 149 149
10 Forensic DNA Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Genetics and DNA Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 The Scientific Basis of DNA Profiling . . . . . . . . . . . . . . . . . . . . . . . 10.3 Some Useful Concepts in Probability . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Conditional Probability and Bayes’ Theorem . . . . . . . . . 10.3.2 Probability in Forensic Testimony . . . . . . . . . . . . . . . . . . . 10.4 Foundational Validity of DNA Profiling . . . . . . . . . . . . . . . . . . . . . 10.4.1 Single-Source DNA Samples . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Simple-Mixture DNA Samples . . . . . . . . . . . . . . . . . . . . .
151 151 155 157 158 159 165 165 168
127 127 128 130
xii
Contents
10.4.3 Complex-Mixture DNA Samples . . . . . . . . . . . . . . . . . . . . 10.5 Validity as Applied of DNA Profiling . . . . . . . . . . . . . . . . . . . . . . . 10.5.1 STR Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 Possible Profile Irregularities . . . . . . . . . . . . . . . . . . . . . . . 10.5.3 Probability of Identification Errors . . . . . . . . . . . . . . . . . . 10.6 An Emblematic Case: The Perugia Crime . . . . . . . . . . . . . . . . . . . . 10.6.1 The Crime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.2 The Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.3 The Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.4 Conclusive Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169 170 171 176 180 184 185 186 190 199 200
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Part I
Justice, Science and Their Methods
Chapter 1
Justice and the Law Systems
1.1 The Origin of Justice Since the time humans have become aware of the importance of living together to better face problems or achieve a more comfortable way of life, they discovered the new concept of organized systems based, in ancient times, mainly on religious dogmas and obligations. It is worth noting that, in ancient times, the King was often (if not always) also the supreme religious leader, so that the power to govern was founded not only on a political basis but was also assigned by God or a superior being, exploiting the atavistic fear of humans who were obliged to observe this order to avoid Gods’ revenge. It is easy to conceive how justice was administered in this scenario: there were little decisions based on (objective) evidence because the case and the consequent conviction were stated by the King (or delegates) on irrational bases such as ordeals or tortures through which the defendant should have proven his or her innocence. However, examples can be found, even in ancient times, of the application of scientific knowledge to a trial, though they are limited to a few specific and particular cases, not relevant enough to qualify them as a paradigm. An interesting example is offered by the case of the golden crown commissioned by Hiero1 to a goldsmith. After the goldsmith delivered the crown, rumors led Hiero to doubt that the goldsmith had cheated him by replacing some of gold he gave him with silver. So he asked Archimedes to investigate it and, eventually, discover the fraud. After some days, while Archimedes was taking a bath, he noted by chance that when he entered the bath the water inside overflowed; from this experience, he deducted his famous buoyancy principle which allowed him to answer Hiero’s query. Eventually, he discovered, by applying this scientific principle, that the crown was 1
Hiero was the tyrant of Siracuse, in Sicily, from 270 BC until 215 BC. Although his reign was troubled by the war with Carthage, it is also remembered for some praiseworthy actions such as encouraging the advancement of sciences, at that time symbolized by Archimedes, who could freely attend to his work. © Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_1
3
4
1 Justice and the Law Systems
partially made of gold and, consequently, the goldsmith was sentenced to death. This short story shows that scientific knowledge had been employed in legal matters or to pursue a goal related to justice since the ancient times, but on a rare and occasional basis and without ever being regulated to this aim. During the Roman Empire, the law system was improved by introducing clear rules and more reliable proofs in proceedings, such as those based on testimony, but not on the scientific knowledge that was almost totally absent inside the legal proceedings, which were modeled on legal schemes defined in written laws issued by the authority. However, the Roman trial allowed the defendant to offer many elements supporting his or her theory which could be admitted or rejected even in the case of a lack of proof because of the basic principle defined as iuxta alligata ac probata.2 This legal structure was quite advanced, considering that the majority of the cultures at that time were marked by the divine origin of the authority, so it was really difficult to admit people’s complaints which were, most of the time, disregarded in favor of the arbitrary decision of the authority. Indeed, an interesting element of this period of time was represented by the nature (structure) of the trial which, surprisingly, was accusatorial and not inquisitorial as it could be reasonably expected. Therefore, the crime was not prosecuted without a formal notification to the authority by someone who took the responsibility for this action, aware of the risk of being charged of slandering the defendant in case of acquittal. After the dissolution of the Roman Empire, a new concept of law, rougher and less complex, became predominant among the barbaric people who inherited the former empire. This state of things continued through the centuries until new philosophical theories started to develop a different concept of human being, also thanks to new discoveries and a newly flourishing scientific approach. In this new framework, the authority could not be considered as a divinity any longer but was asked to justify its power on a different basis, mainly on people consent. This evolution can be easily found in the major works of some philosophers. For instance, Thomas Hobbes [1] considered human beings always conflicting and struggling against each other and depicted this situation by using the well-known Latin expression: “Homo homini lupus”.3 Under this perspective, human beings are only aimed at pursuing their own goals, without caring for their fellow creatures, whose lives are considered as useless and worthless. This situation is called state of nature, where people are totally and completely free to do what they want without limits, since they feel in right to do that. It can be readily recognized that this way of life leads to destruction; consequently, according to Hobbes, the human beings decided to renounce their freedom and entrust their “rights” to an authority, appointed by a social contract. Hobbes’ theory was
2 3
The judge must judge according to what is submitted and proved. This expression means that men attack each other like wolves.
1.2 The Legal Systems
5
considered the philosophical background for absolutist government systems, such as kingdoms ruled by plenipotentiary kings. However, this was not the only philosophical approach to social orders. Other coeval authors, such as John Locke [2], proposed a different view of the state of nature and considered legal systems in a more democratic way as the result of a rational decision taken by people who were conscious of the necessity of a solid order, not only to avoid wars and conflicts but also to better ensure and protect freedom of each individual by a social consensual contract. This opened the way to the modern concept of authority and justice, developed by many other philosophers, whose theories are far beyond the scope of this book. Here, we wish to simply add that the evolution of the social order is also the result of the development of science, that yielded a better knowledge and awareness of the mechanisms and scientific laws that rule the physical events. While a brief survey of the scientific evolution and the role played by science in creating knowledge will be presented in Chap. 2, let us anticipate here that the increased scientific knowledge has had a major impact also on the way justice is administered, since it is believed that science can provide objective proofs. Therefore, the few cases solved in the past thanks to scientific knowledge, such as the one involving Hiero, the goldsmith, and Archimedes, have now increased to such an extent that they represent, nowadays, an important branch of science. To understand how science can be usefully exploited to help justice, it is important to briefly analyze the legal systems and how they operate.
1.2 The Legal Systems According to the above considerations, as well as to history, it is possible to state that law and legal systems have been originated by the primordial and fundamental human need to live in a peaceful and organized society so that conflicts among individuals can be prevented, and the potential consequent chaos avoided so that social order is not overthrown and the individual welfare is not endangered. It is also immediate to realize that the legal systems have undergone significant modifications, during the last centuries—especially the last two ones—as a consequence of the geopolitical changes that have led to establishing new states as well as the dissemination of fundamental principles that have established basic rights nowadays internationally recognized: • Equality • Freedom • Fair judgement. Keeping into account the high number of legal orders (over 200) and considering that their analysis goes well beyond the scope of this book, we will briefly consider only the two most relevant ones, the common law and the civil law systems [3].
6
1 Justice and the Law Systems
Although these two legal orders are significantly different, from both a substantial and procedural point of view, they share some fundamental principles [4] that are quite useful in identifying common elements toward a consistent application of the scientific knowledge, regardless of the considered order. Both systems are aimed at ensuring legal certainty, one of the cornerstone principles of many legal orders, that allows individuals to foresee the consequences implied by different behaviors and acts, so that each individual can orient its behavior toward activities that are legal and do not imply any sanction or disapproval. Although this is the main common point of the two legal orders, it also shows the differences among them, since civil law systems put an even stronger emphasis on this point, given that the behavioral paradigm to follow can be more easily identified, being it coded in a written rule. In nowadays globalized world, where scientific knowledge has no borders, but the different fields are becoming more and more specialized, technical experts may be called to provide their expertise in different countries and it is therefore important to briefly consider and discuss the major traits of the two main legal systems, to fully understand how they consider scientific knowledge.
1.2.1 The Common Law Systems The legal system called common law system is the legal order of Anglo-Saxon countries and has originated in the United Kingdom after it has been conquered by the Norman King William II the Conqueror, in 1066 AC [5]. Having unified the kingdom, King William II tried to enact a legal system better organized and uniform than the one in force at that time, in order to clearly define illegal facts and treat them in uniform way. The result was a system that, according to our present way of thinking, cannot be considered totally logical and functional, since judicial methods still coexisted based on popular or mystic belief.4 It happened, indeed, that the solution of a conflict was based on ordeals or oaths taken by fellow mates of the defendant, who granted his or her trustworthiness and reliability so that his or her statement could be considered as true. Several changes took place, in the following years, related both to the trier of fact (traveling courts, clerical courts, …) and the trial procedure, since a jury started to be considered, more and more often, with the role of representing the community to which the defendant belonged, in order to ensure a fair judgment. Many centuries flew by, during which several jurisdictions coexisted, such as the Crown Jurisdictions and local jurisdictions, that applied different decision methods, before the present system was established by the Judicature Acts of 1873 and 1875 [6]. A new principle—stare decisis— was affirmed, that bound the lower rank courts to adopt the same decision taken by higher rank courts on similar cases.
4
Assize of Clarendon, 1166.
1.2 The Legal Systems
7
Without entering too many technical details, it is worth noting here that the common law systems are mainly based on the body of precedents, thus assigning to the judicial activity the fundamental role of providing a solution to a case on the basis of rules, or better on the ratio decidendi5 previously applied by other judges on similar cases. By following this way of reasoning, justice is given a strict logic also considering that the general principle nothing that is against reason can be lawful is hardly adequate to generate a whole jurisprudence on its own, but it serves case by case to weed contradictions out of the law and thus to make the law a reasonable whole. The method followed to formulate a verdict is inductive since the judge considers all elements of a given case and checks whether, in the past, similar cases have been treated and a ratio decidendi can be found in previous verdicts rendered by same rank or higher rank courts and applied to the given case. Moreover, other useful elements can be found in the so-called obiter dictum, that is considerations of the incidental kind found in the previous verdicts, although such elements are not binding to the successive verdicts. It is also worth noting that the respect of precedents—the stare decisis principle— can be observed on two distinct layers: the horizontal stare decisis, when the judge has to follow the decision taken by a judge of a same rank court, and the vertical stare decisis, when the judge has to follow the decision rendered by a judge of a higher rank court. The common law system, being bound to precedents, might appear prone to immobilism and unable to cope with novelty, including the application of new scientific findings. However, this system is not as rigidly unalterable as it may appear under a superficial perspective, because two ways can be considered to deviate from previous sentences, while still observing them, or overriding them allowing new elements or line or reasoning to be considered. These two ways are referred to as: • Distinguishing This principle applies when a court decides that the reasoning followed to solve a precedent case cannot be wholly applied to the present case, due to materially different facts between the two cases. This decision must be supported by two formal points: the reasoning of the earlier case must be applied again but for an additional fact not relevant to that case, and the ruling in the considered present case must not explicitly criticize the decision rendered in the precedent case. • Overruling. Overruling is the procedure followed by a higher court to set aside a legal ruling established in a previous case. Overruling is typically followed when the principles underpinning the previous decision are found erroneous in law or are considered overtaken by new legislation or developments. These two mechanisms are the key to overcome outdated theses and decisions that, if fully embraced might become anachronistic and conflict with the present knowledge. 5
This Latin locution indicates the rationale of a decision.
8
1 Justice and the Law Systems
1.2.2 The Civil Law Systems The civil law systems are presently the most widespread law systems and find their roots in the law system of ancient Rome. Differently from the common law systems, they are based upon statutory laws and codes that are encompassed in legislature. This approach implies the definition, in abstract and general terms, of rules that identify the essential behavioral elements (legal or illegal, according to the different ways the rule can be expressed) aimed at orienting the judge. In civil law systems, legal proceedings develop according to abstractions in a deductive way since the case submitted to the judge is compared with the behavioral elements considered by the law to assess whether it can be considered illegal and, hence, punished. In the legal orders belonging to civil law systems, the trier of fact is in charge of applying and interpreting a rule of law, in its broad meaning,6 identifying and employing it, whenever needed, to solve the treated case. In most countries adopting a civil law system, the trier of fact—usually a judge—is subject to law and applies it, being always bound to it. In some countries this represents, at least from a formal point of view, a strict bond and any broad interpretation of the law coming from a personal elaboration of the judge is excluded, while in some other countries, the judge is allowed some discretionary power, though within boundaries defined by the law. This framework, already rooted in the ancient Roman juridical tradition, finds its modern origin in the principles asserted by the French Revolution, that referred to the well-known Montesquieu’s theory about the sovereignty of the people: according to this theory, the people manifests its willing through the laws, and the judge is merely applying them (“le juge est la bouche de la loi”7 [7]) without any personal contamination so that all cases are treated in a uniform and right way, that is according to justice. From the point of view of legal practice, civil law systems are often considered of simpler application, with respect to the common law systems, because the parties are given the opportunity, on one side, to define the rules through their representatives in the lawmaking boards and, on the other side, to foresee the consequences of any given behavior, that can be well identified in terms of those rules. However, there are systems (for instance, the European Union and its member states) where several superstructures are present, as well as different sources of law, that are characterized by a considerable complexity mainly related to the possibility of knowing all rules, that are often not fully known and understood by the subjects
6
In some legal orders, the term law refers to a specific act, approved according to a well-defined procedure and issued by a definite state entity. Here, the same term is used with a broader meaning, including any source of law, as well as any act enforceable as law or rule that must be compulsorily observed. 7 The English meaning of this French expression is that “the law speaks through the mouth of the judge”.
1.2 The Legal Systems
9
that shall comply with them and that cannot plead ignorance of the rules in exemption of their violation, according to the known Latin legal principle ignorantia legis non excusat.8
1.2.3 Common Principles to Common Law and Civil Law Systems Despite their differences and the different origins, the Common Law and Civil Law systems share some of the principles on which, generally speaking, the modern law systems and judicial proceedings are based, even though the procedures adopted by the different countries may differ in the details, and actions and rights of the parties. Three common elements can be outlined: • Principle of legal certainty • Principle of presumption of innocence (in criminal law) • Principle of fair trial.
1.2.3.1
The Principle of Legal Certainty
Legal certainty is a pivotal principle of the rule of law in the modern law systems [8]. Its modern roots can be found in the European Enlightenment, though it can be traced back to the Roman law system when rationalism emerged in philosophy together with positive law.9 Due to its mostly European origin, nowadays, legal certainty is considered a pillar of law in the European countries, while American jurists prefer to refer to it as legal indeterminacy [9]. Independently of the terms used to refer to it, this principle requires that all laws be sufficiently precise to allow individuals subjected to law to foresee the consequences that their behavior and actions may entail, at least to a degree that is reasonable in the circumstances within which a given action or behavior developed and, possibly, without or with minimal need for advice from experts. This principle can be put into practice by ensuring that [8]: • • • • • 8 9
laws and decisions are public; laws and decisions are definite and clear; the decisions of courts are considered as binding; laws and decisions have no or very limited retroactivity; legitimate interests and expectations are protected.
Ignorance of the law is not an exemption. From Latin ius positum.
10
1 Justice and the Law Systems
As seen in the previous sections, both the Common Law and Civil Law jurisdictions ensure the above points, though following different paths. Legal certainty can be seen as a way to ensure that decisions are made in a consistent, logical, and foreseeable way that can be attained by referring to the precedents (in Common Law systems) and laws and procedural codes (in Civil Law systems). The considered references are different, but the result is pretty much the same.
1.2.3.2
The Principle of Presumption of Innocence
This core principle of criminal law can be also expressed according to an old legal maxim: a defendant must be considered innocent until proven guilty. The origin of this law principle can be found, as many others, in the Roman criminal law. While it is attributed to Emperor Antoninus Pius, it came to us as it was later encompassed in the Code of Justinian as Ei incumbit probatio qui dicit, non qui negat 10 [10]. After the fall of the Roman empire, it survived in the different states and cultures that originated from the dissolution of the Roman empire along the Middle Age and Renaissance, despite justice was often subjected to absolute power [11]. In the modern age, this principle was reaffirmed during the French Revolution and encompassed in the Declaration of the Rights of Man and of the Citizen of 1789 that states that any individual is presumed innocent until he or she has been declared guilty. This notion is nowadays present in practically every legal system and emphasizes that the prosecution has the obligation to prove each element of the offense, while the defendant cannot be charged with the burden of proof. The presumption of innocence involves three main procedural rules that must be followed to prevent the risk that an innocent is considered guilty. • The prosecutor has the entire burden of proof with respect to the critical facts of the case. In other words, the prosecutor has the burden to prove whether a crime was committed and whether the defendant was the perpetrator. • The defendant does not have any burden of proof with respect to the critical facts of the case. The defendant does not have to testify, call witnesses or present any evidence, although he or she has the right to do so. However, if the defendant chooses not to testify or present evidence, this decision cannot be used against him or her. • The trier of fact shall decide the case solely on the evidence presented during the trial, without being influenced by the fact that the defendant has been charged with a crime.
10
The translation from Latin reads: Proof lies on him who asserts, not on him who denies.
1.2 The Legal Systems
11
Despite a few differences, including the adopted terminology,11 presumption of innocence is adopted as a pivotal principle in criminal law in both the Common Law and Civil Law jurisdictions.
1.2.3.3
The Principle of Fair Trial
The term fair trial can be found in both the Common Law and Civil Law systems and is generally considered as a necessary element to grant justice. A fair trial is generally obtained by implementing judicial procedures designed to grant the holders of subjective rights or legitimate interests that have been violated the right of acting and defending themselves before all levels of jurisdiction. This implies some other rights that are here briefly summarized. • The right to due process of law. This principle prevents that an individual’s right is arbitrarily infringed without some type of formal procedure. This means that the trier of fact cannot decide on a case if the individual charged by the case has not been formally informed about his or her implication and was not given the right to appear before a court. • The right to an impartial judge or jury. This principle ensures that the decision is taken impartially and the trier of fact does not deliberately favor one of the parties. While the general principle is the same, it is implemented in different ways in Common Law and Civil Law systems and in criminal cases and litigation, mainly because of the different triers of fact involved in the different situations. In Civil Law systems, the trier of fact is a single judge or a panel of judges, depending on the discussed case. In this case impartiality, as well as competence, is granted by the recruitment and appointment process and some prerogatives and prohibition inherent in their status. In Common Law systems, when criminal cases are discussed, the trier of fact is a jury. The defendants are entitled to have a trial with an impartial jury of their peers, that is a jury composed by jurors not having a stake in the outcome of the case and not approaching the case with any bias against the defendant. The competence of the jurors, especially when scientific evidence is presented, is somehow granted by granting the defense lawyers the right of preparing the jurors to treat it as science and give them basic scientific tools to evaluate it in the case [12]. • The right to confront and call witnesses. Defendants are given the right to confront their witnesses, both in Common Law and Civil Law systems. In Civil Law systems, the court has the right to appoint technical or scientific experts whenever the case requires technical or scientific competences and the expert appointed by the court has the rights and duties of 11
The term presumption of innocence is typical of the European law systems, while the American law system refers to this same principle by stating that the decision of guilt must be taken beyond any reasonable doubt [11].
12
1 Justice and the Law Systems
assistant to the judge. Whenever such an expert is appointed by the court, defendants (in criminal cases) and the parties (in litigation) have the right to appoint their own experts who must be given the right to take part in all activities performed by the court’s expert. Defendants and parties are also given the right to present evidence and call witnesses who support their case. • The right to legal counsel. In both Common Law and Civil Law systems, defendants and parties have the right to legal counsel, which means that they have the right to have an attorney representing their legal interests and also the right to an adequate defense. In some Civil Law countries, such as Italy, it is mandatory that defendants be represented by an attorney. In the case they do not appoint any attorney, a public defender is assigned and, in the case their poor financial situation does not permit them to pay the attorney’s honorarium, the state will take charge of it. • The right to a fair duration of the trial. Many countries have set forth this principle in their constitution. While its general meaning of not involving defendants and parties in proceedings of unreasonable duration can be easily understood, its practical implementation is quite controversial, since the duration cannot be predetermined independently of the specific case and its complexity. For this reason, and to avoid too long proceedings, a prescriptive period is defined, in many countries, especially those where a Civil Law system is adopted, to set the maximum time after an event within which legal proceedings may be initiated so that, when the specified time runs out, a claim might no longer be filed or, if it is filed, it may be subject to dismissal. Generally, particularly serious crimes, such as murders, are not subject to a prescriptive period, and, consequently, these claims may be brought at any time.
1.3 In Search of Truth It is believed that ascertaining truth is one of the goals of justice. While this is a substantially correct statement, to fully perceive its meaning the difference between substantive law and procedural law should be considered. Substantive law encompasses the reference rules and obligations that every individual belonging to a social entity must comply with to avoid sanctions and has the ultimate and most important aim of granting peaceful coexistence among individuals. Independent of the considered legal system, it differs from procedural law, that is aimed at identifying and sanctioning violations to substantive law. Indeed, law shows a dual role, static and dynamic, in both the Common Law and Civil Law civil law systems. It is static with respect to the definition of behavioral rules to which the individuals must comply. These are the reference rules whose adoption grants the smooth
1.3 In Search of Truth
13
functioning of a social system and prevents conflicts. It is dynamic when a conflict arises due to a violation of the established rules and gives origin to a judicial procedure. It can be regarded as the pathological condition of law when the violated rule is identified and the conflict is solved by sanctioning the perpetrator and indemnifying the victim. Is it possible to state that this dynamic role of law is aimed at ascertaining truth on the bases of the critical facts of the case? Even if law is often perceived as a separated, distant and indifferent system to human events, it is nevertheless grounded on facts and occurrences, related to individuals, that are taken into account and ruled according to a distinctive logic that, even if it is not always immediately perceived, is aimed at ensuring the very existence of not only the human beings but also the system itself and prevent conflicts, or solve them if arisen. The very origin of the law systems, as briefly recalled at the beginning of this chapter, was the attempt to find an answer to the need of building a civil society ordered in such a way to avoid conflicts and aimed at preserving human life and the social relationships, in the interest of collective and individual long-term peace and security. This goal could not be achieved without implementing law systems specifically created and not yet existing, since their implementation required the preliminary design of an organized system, in which the fundamental rights and powers could be defined [13]. The modern law systems are the result of human historical and social evolution and tend to assimilate the external reality, reflecting, compatibly with their ultimate goal, the social changes that have occurred in the meantime. However, different schools of thoughts have different views on the future evolution of the law systems. According to Kelsen’s theory [14], law should not be influenced by social or political changes, since it should develop in an autonomous way, based on fundamental principles. Schmitt’s theory [15, 16], on the other hand, assumes the existence of a sort of natural law, above any other law system defined by men, that serves as a reference to all other systems. In our opinion, law exists and evolves in combination with the society and the actual socio-cultural context, as proven by the many existing examples of laws issued on those bases. Several cases exist, indeed, when the law has acknowledged existing situations and coded rules already accepted by the community or set by the evolution of science, framing them into the law system and assigning them a legal meaning in favor of a clearer and more efficient discipline, that would be impossible to achieve absent a legal framework. A typical example of laws generated by pre-existent situations is given by the commercial practices that are generally adopted well before the law considers and regulates them. Indeed, laws aimed at defining their legal force are introduced only after having acknowledged their existence and common practice. Similarly, technical Standards of voluntary application—such as, for instance, the ISO Standards—often represent the regulatory framework for a given field and become mandatory as a consequence of a legislative action.
14
1 Justice and the Law Systems
1.4 Substantive Truth and Procedural Truth As mentioned in the previous section, lay persons generally believe that the task of justice is that of ascertaining truth about the facts of the considered case. While this is certainly the ultimate desired goal of judicial proceedings, the previous Sect. 1.3 has also briefly shown that the need to ensure a fair trial imposes to follow a number of procedural rules that might not disclose some documents or pieces of evidence to both parties or the trier of fact, thus providing an incomplete representation of the critical facts. The opening question of Francis Bacon essay “Of Truth” appears to be quite pertinent here: What is truth? This question is a rather philosophical one and has probably not yet found a definite answer. For sure, when law is involved, finding an answer is even more difficult [17, 18], considering that the debate has started since the beginning of the modern jurisdictions, back in the early nineteenth century, as reported in [19] and has not yet ended [20]. There is no doubt that any judgment, both in criminal and civil law, must be based on facts and fact-finding should represent the main task for courts. On the other hand, the principle of fair trial is one of the pivotal principles of justice, both in Common Law and Civil Law jurisdictions, as shown in the previous Sect. 1.2.3.3, and sets rules of law and discretions designed to protect other public values which may constrain the fact-finding process [21]. Consequently, restrictions can be set on truth finding, that may vary according to the different jurisdictions, but do generally include: • • • • • •
Legal professional privilege Confessional privilege Journalists’ privilege Exclusion of illegally obtained evidence Privilege against self-incrimination Exclusion of involuntary or unknowing confession.
This implies that facts cannot be always proved during judicial proceedings for the simple reason that the trier of fact is precluded access to some relevant piece of evidence and must take a decision only on the basis of the available evidence, that is the evidence that can be found and submitted to the court. This is generally the case in the adversarial system typical of Common Law jurisdictions, where, in criminal cases, the judicial decision-maker acts as a neutral arbiter between the parties who determine what evidence will be called [4]. Differently, in the inquisitorial system typical of Civil Law jurisdictions, the judicial decision-maker plays a more active role and determines which additional evidence will be called, always within the limits set by the fair trial rules [4]. In civil procedures, the differences between Common Law and Civil Law systems are less evident and tend to slowly reduce toward stricter procedural rules that bind the trier of fact to take a decision only on the basis of the documentation filed by the parties in the initial stages of the procedure. Therefore, if one party decides not
References
15
to disclose part of the facts, because they can be detrimental to its claims, and the other party cannot prove them, the procedural truth that will emerge from the judicial proceedings may substantially differ from the substantive truth. It can be therefore concluded that, despite the efforts made to unveil truth, procedural truth, that is the facts found during judicial proceedings, may differ from the real facts and hence substantive truth may remain hidden, at least partially. This is due to the procedural constraints outlined above, and also, especially in criminal procedures, to investigative errors or incorrect interpretations of the investigative results, as the example considered in Chap. 10 will clearly show. For this reason, the role of scientific evidence must be carefully evaluated and its validity carefully considered.
References 1. Hobbes, T.: Leviathan. In: Malcolm, N. (ed.) Clarendon Edition of the Works of Thomas Hobbes. Oxford University Press, Oxford, UK (2012) 2. Locke, J.: Two Treatises of Government. In: Laslett, P. (ed.) Cambridge University Press, Cambridge, UK (1988) 3. Head, J.W.: Great Legal Traditions: Civil Law, Common Law, and Chinese Law in Historical and Operational Perspective. Carolina Academic Press, Durham, NC, USA (2015) 4. Pejovic, C.: Civil law and common law: two different paths leading to the same goal. Vic. Univ. Wellingt. Law Rev. 32, 817–841 (2001) 5. Frank, T., Plucknett, T.: A Concise History of the Common Law. Liberty Fund, Indianapolis, IN, USA (2010) 6. Perkins, W.B.: The English judicature act of 1873. Mich. Law Rev. 12, 277–292 (1914) 7. Spector, C.: La bouche de la loi? Les figures du juge dans L’Esprit des lois, Montesquieu Law Review 3, 87–102 (2015) 8. Maxeiner, J.R.: Some realism about legal certainty in the globalization of the rule of law. Hous. J. Int’l L. 31, 27–46 (2008) 9. Maxeiner, J.R.: Legal certainty: a European alternative to American legal indeterminacy? Tulane J. Int. Comp. Law 15(2), 541–607 (2007) 10. Kearley, T., Frier, B.W., Corcoran, S., Dillon, J.N., Connolly, S., Kehoe, D.P., McGinn, T.A.J., Crawford, M., Salway, B., Lenski, N., Pazdernik, C.F.: The codex of Justinian: a new annotated translation, with parallel Latin and Greek text. Cambridge University Press, Cambridge, UK (2016) 11. Pennington, K.: Innocent until proven guilty: the origins of a legal maxim, Jurist: stud. Church L. Minist. 63, 106–124 (2003) 12. Vosk, T., Sapir, G.: Metrology, Jury Voir Dire and scientific evidence in litigation. IEEE Instrum. Meas. Mag. 24(1), 10–16 (2021) 13. Glenn, P.H.: Legal Traditions of the World. Oxford University Press, Oxford, UK (2014) 14. Kelsen, H.: Pure Theory of Law. The Lawbook Exchage Ltd, Clark, NJ, USA (2009) 15. Schmitt, C.: Political Theology: Four Chapters on the Concept of Sovereignty. The University of Chicago Press, Chicago, IL, USA (1985) 16. Schmitt, C.: On The Three Types Of Juristic Thought. Praeger Publishers, Westport, CT, USA (2004) 17. Fernandez, J.M.: An exploration of the meaning of truth in philosophy and law. Univ. Notre Dame Aust. Law Rev. 11, 53–83 (2009) 18. Ho, H.L.: A Philosophy of Evidence Law—Justice in the Search for Truth. Oxford University Press, Oxford, UK (2008)
16
1 Justice and the Law Systems
19. Strier, F.: Making jury trials more truthful. Univ. Calif. Davis Law Rev. 30, 95–182 (1996) 20. Garda, V.: Theories of truth in legal fact-finding. In: Klappstein, V., Dybowski, M. (ed.) Theory of Legal Evidence—Evidence in Legal Theory. Law and Philosophy Library, vol. 138, pp. 149– 165. Springer, Cham, CH (2021) 21. Summers, R.S.: Formal legal truth and substantive truth in judicial fact-finding—their justified divergence in some particular cases. Law Philos. 18, 497–511 (1999)
Chapter 2
Science and the Scientific Method
2.1 What Is Science? What is science? This question has probably been waiting for a satisfactory answer since the first time it has been asked. Scientists and philosophers have written countless essays, books, articles, and new texts that are being quite likely published while we are writing these lines. This book is obviously not aimed at answering this question, but only at trying to understand what can be expected from science, and in particular what people involved with the administration of justice can expect from science. It is generally assumed that science is aimed at advancing knowledge. However, this is not the prerogative of science only, since philosophy, for instance, has this same goal. So, the question may become: what distinguishes science from other disciplines? Up to a recent past, the answer to this question was that science deals with the physical world and aims at explaining it. The old distinction between the world of concepts and abstractions and the empirical world, that is attributed to Socrates and dates back to the fifth century B.C., pops up again in this definition and seems to confine science to the mere explanation of facts and events, assigning philosophy the task of finding an answer to more noble and existential questions. However, nowadays, science does not match this definition any longer. Not only it tries to investigate the very beginning of existence, glancing to a few nanoseconds (a few billionths of a second) after the Big Bang but also covers subjects, such as neuropsychology, cognitive psychology, and cognitive science, that are far from the usual concept of the physical world. The distinctive prerogative of science, therefore, seems to be related not to its object, but rather to the method it follows in its analysis. It is a common belief that the scientific method is based on experiments, so science is often referred to as the experimental way to knowledge. Can we agree upon this? Or is this, once again, a too superficial assumption?
© Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_2
17
18
2 Science and the Scientific Method
2.2 The Scientific Method Everybody agrees that the modern scientific method is due to Galileo Galilei, the famous Italian scientist who lived across the sixteenth and seventeenth century [1]. The core of Galileo’s thoughts was that every scientific theory must be validated by experimental and measurable evidence. Moreover, all experiments shall feature repeatability and reproducibility. An experiment is repeatable if it provides the same results when it is repeated by the same operator, in the same location, following the same procedure, and using the same equipment under the same operating conditions. On the other hand, an experiment is reproducible if it provides the same results when it is repeated by different operators, in different locations, and using different equipment under different operating conditions. We will see in the following that the meaning we shall assign to same results is not really the literal one and that we should have better said compatible results. However, for the moment, let us suppose that we can assign the literal meaning to same results and let us consider the innovation that Galileo’s thoughts brought into science. Up to Galileo’s time, philosophy was dominated by Aristotle’s thoughts [2] and its scholasticism degeneration. While Aristotle considered observations as a tool to build knowledge, he nevertheless assigned the highest importance to logic and human reasoning and did not consider experiments as a tool to validate or confute theories [3]. It is well known that he considered that heavy bodies fall down faster than light ones, and nobody confuted this theory until Galileo proved, with his famous experiments in Pisa, that bodies fall at the same speed. Galileo showed the importance of the experimental physics and its experiments in understanding phenomena. Does this allow us to state, as the empiricists did, that knowledge comes primarily from sensory experience, and, hence, from experiments? So that, as Popper stated, a theory, or a result, cannot be considered as science unless it can be refuted or, more generally, falsified [4]? Centuries of science history have proved that experiments are one of the two sides of the same coin, the other being the theoretical models. This gave origin of the dichotomy between experimental science and theoretical science, as if different sciences can be possible. For the sake of simplicity, let us consider them separately, to prove that they are two different, though complementary ways to progress knowledge.
2.2.1 Experimental Science It is often thought that experiments are the trademark of science, and that experimental results can, by themselves, fuel the progress of science. In the present era of Big Data, when a huge amount of data can be collected, stored, and made easily available to practically everybody, this concept has become stronger and stronger. It is often assumed that knowledge can advance just by digging into this huge amount of data and looking for correlations.
2.2 The Scientific Method
19
However, it can be easily proven that experimental data that are not supported by a valid model can lead to wrong interpretations, such as, for instance, interpreting a correlation between the presence of storks in a given area and the contemporary increase in births as if babies are delivered by storks [5]. The Ptolemaic model of the universe is an extremely clear example that observations alone may lead to formulate a wrong theory. If we observe the sky, we do obviously observe the relative motions of the Sun, planets, and other stars with respect to the Earth. Indeed, such observation does not imply that the Earth is still and all other bodies move around it. This may only come from the false assumption that we are an absolute observer and not a relative one. How to solve the problem? The correct analysis of the available experimental data could have challenged the model: indeed, the planets appeared to circle backward, with respect to the other stars, in some periods of the year. This phenomenon, known as the retrograde motion, should have warned that the model assumption of the Earth placed at the center of the universe was not the correct one. In such a case, Popper’s approach would have been beneficial. On the other hand, cultural and religious taboos, totally alien to science, prevented to recognize that the model was incorrect and we had to wait until Newton stated his universal gravitation law, in 1687, to finally have a model that could correctly interpret the observed data. A more recent example of how incorrectly interpreted experimental data may induce a wrong model is that of the cold fusion. In 1989, two scientists, Fleischmann and Pons, claimed they obtained a nuclear fusion at room temperature [6]. When other scientists tried to reproduce the same experiment, different results were obtained, proving that measurement uncertainty on the original experimental result was not correctly taken into account, thus leading to an incorrect interpretation of the experimental data. A totally different example, proving that experimental data may be extremely useful in refining a theory, is the discovery of the transistor. A team of scientists was working on a semiconductor PN junction (a diode), trying to obtain a highperformance diode. They were making several measurements on the semiconductor prototype, connecting several instruments to the junction through different electrodes. They observed that some of the obtained measurement results did not match with the diode model they were using. Since the number of measurement results that did not match the model were not many, they could have been considered as outliers, and simply discarded. Luckily, the scientists realized that the measured values differed from the expected theoretical ones for more than the measurement uncertainty (once again we step into this important concept!) and decided to investigate what could have caused them. They realized that the experimental data matched with a different theoretical model, that of a double PNP junction, and that the second junction could have been realized unintentionally when connecting the instruments to the single PN junction. The transistor (a double PNP junction) was discovered thanks to the careful analysis of unexpected experimental results. And its countless applications changed our lives [7]!
20
2 Science and the Scientific Method
These few examples give clear evidence that experiments, by themselves, are not enough to fully and correctly explain and interpret a phenomenon. They need also a correct model.
2.2.2 Theoretical Science Theoretical science develops in a way that recalls Aristotle’s way to knowledge. It builds a theoretical model, mostly based on mathematical relationships, that aims at explaining reality through logical concepts and logical reasoning. Models are refined and redefined according to logical and mathematical rules. Probably, the best-known, recent example of theoretical science is Einstein’s theory of relativity [8]. Einstein started from the theoretical and experimental results achieved mainly by Michelson, Lorenz, and Poincaré, and developed his own theory as a logical and mathematical construction capable of framing those results into a single framework and providing a solution to the still open problems. Although relativity appeared to be logically and mathematically flawless, it disrupted so many well-assessed concepts and beliefs that it was not universally accepted. Its problem was that, when it was conceived, no experimental validation was possible, due to the technical limitations of the available measuring equipment. It was only several decades after that accurate-enough atomic clocks were developed as well as fast enough jet airplanes to move them at a high-enough relative speed to appreciate the time variations predicted by relativity [9]. Today, we apply the correction predicted by relativity, though unconsciously, in the everyday use of GPS navigators. Our position is measured by measuring the time of flight of timestamps transmitted by a constellation of artificial satellites whose orbits are known. Since the satellites and their onboard atomic clocks are flying at a much higher speed than the speed of the Earth surface during rotation, the timestamps they transmit must be corrected by the time variation predicted by relativity. Therefore, being capable of locating our position with good accuracy is another proof that Einstein was right. Another example of a model that was successfully developed on a theoretical basis is given by Maxwell’s equations, that model the electromagnetic field and related interactions [10]. At the early stage of magnetism and electricity, Maxwell analyzed the already available equations describing the interactions, under specific conditions, between electricity and magnetism, and unified them into his famous equations, and deduced that light is an electromagnetic wave. One of the consequences of Maxwell’s equation is that the electromagnetic field travels in space at light speed. Therefore, in principle, it can be used to transmit information in a wireless way. 30 years later, also thanks to Hertz’s experiments, Marconi finally proved that Maxwell’s equations had correctly predicted that electromagnetic waves travel in space, and the radio was invented. These two, well-known examples, show that theoretical models, to be accepted, need to be validated by experimental results that confirm what the models predict.
2.3 Advancing Knowledge
21
On the other hand, experiments must be carefully designed, and the obtained results critically discussed so that possible contradictions between model predictions and experiments may induce to refine either the model, or the experiments.
2.3 Advancing Knowledge The previous paragraph showed how the scientific method needs, to advance knowledge, of both a valid theoretical model and valid experimental data. The aforementioned examples also showed that theory and experimentation, alone, are quite likely to provide incorrect results. So, why do we need models and experiments to advance knowledge? The answers are apparently quite simple: models are needed to provide a correct interpretation of the experimental data, and experiments are needed to validate the models. At a more careful look, this answer may appear to be circular: how can a model, derived to interpret experimental data, be validated by the same experimental data? The key is in the fact that the experimental data should not be the same. If we consider how science has progressed along the centuries, we discover an interesting point that is somehow implicit in all existing philosophical interpretations of science: science has always progressed by means of successive approximations. Observations have been the initial trigger of any theory. The resulting theory has provided a model, aimed at describing, interpreting, explaining, and predicting the observed events. Prediction is a key point in the scientific method, since this is the true essence of scientific knowledge: find a cause-effect relationship so that we can predict that an event, or a sequence of events, will cause effects that we can quantify starting from our theoretical model. The Ptolemaic model of the universe was quite good in predicting the relative position of the Sun, the planets, the satellites, and the stars in the different seasons of the year, thus allowing one to predict, with an astonishing accuracy, the sequence of the seasons and a number of astronomical events, including eclipses that, when not yet explained by science, were a rather scaring event. Today we know that the Ptolemaic model of the universe is not correct, and we know this thanks to other experimental data, coming from the observations of Galileo, Copernicus, and Kepler, who showed that models, while good at predicting the relative positions, could not satisfactorily explain the motion of those bodies, according to the new available experimental data. Then Newton came and proposed a new model: the universal gravitation. How can we be sure that this model is correct? Indeed, when the accuracy of the available instruments improved, observations of Uranus orbit showed that it was not in agreement with the predictions provided by Newton’s model, and could have proved that it was flawed. On the other hand, the same model explained the detected anomalies in Uranus orbit with the presence of another planet and predicted its mass and position. More accurate observations detected this new planet, Neptune, and proved that Newton’s model was correct.
22
2 Science and the Scientific Method
So, once again, we have proof that science advances because theory and experiments support each other, and discrepancies between theoretical and experimental results, if correctly analyzed, yield the refinement of the models, or the improvement of the measuring methods and instruments. When models cannot satisfactorily describe, interpret, explain, and predict events, or the predicted events cannot be observed, we are probably on the way to advance knowledge, but we are still far from true scientific results. This is the case of seismology: we can observe earthquakes, we can explain them, and we can even state that an earthquake might be more probable in a specific region rather than in another region, but we cannot predict when and where it will hit because the available model of the Earth, its internal movements and the forces that can develop in the different layers of the crust is not yet accurate enough to do so.
2.4 The Limits of Science According to the different examples reported in the previous paragraphs, the hallmark of science is its continuous evolution. While this means that human knowledge has constantly advanced, this also means that scientific information is never absolute and cannot provide complete information about the phenomenon it tries to describe. This does not necessarily mean that a claim is not scientific if it cannot be refuted, as Popper stated [4], but, more likely, that we should always expect that every scientific claim or result might be, sooner or later, at least refined, if not refuted. In this respect, Nobel Laureate Richard Feynman appears to better depict the limits of science when he wrote [11]: It is scientific only to say what is more likely and what is less likely, and not to be proving all the time the possible and the impossible.
Scientists are well aware that science can provide only limited knowledge. Indeed, their work is to discover these limits and try to overcome them. The problem, when scientific knowledge has to be used to explain facts, as in forensic applications, is to assess the epistemological robustness of the scientific results, and consequently, of the decision made on the basis of those results. The problem can be reformulated in a simpler way: if science cannot deal with absolutes and provides only limited knowledge and limited information about the phenomena it deals with, where can the reasons for such limited knowledge be looked for? Is it possible to estimate, in a quantitative way, how limited is the provided information? Is it possible, according to Feynman, to state how likely a claim or a result is? The first step to follow to find a possible answer to this question is to understand where the reasons for such limited knowledge originate. The previous paragraphs told us that science achieves knowledge by building theoretical models that are both inspired and validated by experimental data. Therefore, being scientific knowledge
2.4 The Limits of Science
23
based on models and experiments, we may assume that the very reasons that make such knowledge incomplete originate there too.
2.4.1 The Model Contribution Models are, generally, a set of equations aimed at providing a mathematical relationship between one or more input quantities and one or more output quantities that are assumed to describe a given phenomenon, as graphically shown in Fig. 2.1. The input quantities, represented by mathematical variables, describe the quantities that may influence the phenomenon described by the model. On the other hand, the output quantities, also represented by mathematical variables, describe the effects that the given input quantities are supposed to cause on the considered phenomenon. Such a model may deliver an incomplete representation of the considered phenomenon, that is values of the output variables that do not correspond to the true ones, for a number of reasons, such as those included in the non-exhaustive following list. • One or more input quantities that might influence the phenomenon have not been considered, because their effects were considered erroneously insignificant and could have made the model too complex. • The considered equations cannot fully describe the relationship between the input and output quantities, because of the intrinsic limitations of the employed mathematical theory. • Although the set of equations used to describe the phenomenon is known to admit solution, this same solution cannot be computed in closed form so that only approximate solutions can be obtained. • A finite mathematics (finite number of digits) is used to represent all employed variables, thus yielding approximations in the computation results and approximations that depend on the number of digits and the way the variables are processed.
I1 I2 Ii IN
Model
O1 O2 Oi OM
Fig. 2.1 Mathematical model showing a relationship between the N input variables {I1 , I2 , . . . , Ii , . . . , I N } and the M output variables {O1 , O2 , . . . , Oi , . . . , O M }
24
2 Science and the Scientific Method
2.4.2 The Experiment Contribution When experiments are aimed at validating a given model, they are generally designed to trigger the phenomenon described by the model through the generation of a set of predetermined input quantities, and measure the output quantities to verify whether they correspond to those predicted by the considered model. On the other hand, when experiments are aimed at collecting data related to a given, still largely unknown phenomenon, they are generally designed to observe the phenomenon and measure all quantities that appear to be somehow related to that phenomenon and might be assumed to be the input (cause) and output (effect) quantities. The results of such measurements will then be analyzed and discussed to formulate a suitable model. In both cases, the data used either to be compared with those provided by a given model or to formulate a suitable model come from measurement processes. It is well known that measurement results can never provide the true value of the measurand, that is the quantity intended to be measured, because of the always present imperfections in the measurement procedure, the employed instruments as well as the operator performing the measurement. Therefore, measurement results are only an approximation of the measurand value and they can provide only finite, limited information about the measurand. This means that, even an ideal model, capable of providing a perfect mathematical description of the considered phenomenon, can be considered accurate only to the extent of the accuracy with which the experimental data used to validate it are measured. Conversely, when experimental data are used to build a model, this model cannot be considered more accurate than the measured values used to build it. In other words, the information provided by the model cannot be more complete than that provided by the experimental data employed to build it.
2.5 The Role of Metrology Metrology is the science of measurements. According to the considerations made in the previous paragraphs, measurements play a relevant role in science evolution, since experiments require the measurement of several quantities and the experimental data come from measurement results. It is generally thought that metrology deals with the design and implementation of measurement methods and the design and development of measuring instruments and measuring equipment. This is surely true, but it is only a partial view of this branch of science. Since measurement results can provide only finite, limited information about the measurand, an important task of metrology is to assess and quantify how incomplete the information associated with a measurement result is. The concept that defines the amount of information associated to a measurement result is called, in metrology,
References
25
uncertainty. The same term is also used to quantify the correctness of the stated result, that is the doubt about how well the result of the measurement represents the value of the quantity being measured. Therefore, considering that measurement results are employed to assess the validity of the scientific models and theories, and are the most significant source of science limits, metrology is capable, in principle, to provide a quantitative evaluation of these limits. It can be hence readily perceived that this particular part of metrology, often referred to also as the fundamentals of metrology, plays an extremely important role in assessing the validity of the scientific methods and can, according to Feynman’s statement, quantify how likely a result is. When forensic science methods are considered, it is necessary to assess whether the information they provide is complete enough for the intended scope, that is supporting a decision taken beyond any reasonable doubt. The role of metrology, in this case forensic metrology, is of utmost importance, and uncertainty in measurement should always be considered as an important piece of evidence supporting or weakening the collected scientific evidence. The next chapters will then consider the fundamental concepts in metrology, the pillars on which metrology is based [12], to prove how useful and helpful it might be in the courtroom, since a basic understanding of these concepts, accessible also to nonscientist lawyers and judges, allows them to critically evaluate scientific evidence on a wide spectrum of possible cases, without the need for entering the peculiar details of every application. They will also be in the position of demanding the technical experts to provide a correct presentation of the experimental results, uncertainty statement included, to proceed to a pondered evaluation of the considered scientific evidence.
References 1. Sharratt, M.: Galileo: Decisive Innovator. Cambridge University Press, Cambridge (1994) 2. Ackrill, J.L.: Aristotle the Philosopher. Oxford University Press, Oxford (1981) 3. De Groot, J.: Aristotle’s Empiricism: Experience and Mechanics in the 4th Century B.C. Parmenides Publishing, Las Vegas (2014) 4. Popper, K.: The Logic of Scientific Discovery. Psychology Press, Oxford (2002) 5. Höfer, T., Przyrembel, H., Verleger, S.: New evidence for the theory of the stork. Paediatr. Perinat. Epidemiol. 18, 88–92 (2004) 6. Fleischmann, M., Pons, S.: Electrochemically induced nuclear fusion of deuterium. J. Electroanal. Chem. 261, 301–308 (1989) 7. Riordan, M., Hoddeson, L.: Crystal Fire: The Invention of the Transistor and the Birth of the Information Age. W. W. Norton & Company, New York (1997) 8. Einstein, A.: Relativity the Special and General Theory. Henry Holt and Company, New York (1920) 9. Ives, H.E., Stilwell, G.R.: An experimental study of the rate of a moving atomic clock. II. J. Opt. Soc. Am. 31, 369–374 (1941)
26
2 Science and the Scientific Method
10. Maxwell, J.C.: A dynamical theory of the electromagnetic field. Philos. Trans. R. Soc. Lond. 155, 459–512 (1865) 11. Feynman, R.: The Character of Physical Law. The M.I.T. Press, Cambridge (1965) 12. Ferrero, A.: The pillars of metrology. IEEE Instrum. Meas. Mag. 18, 7–11 (2015)
Chapter 3
Forensic Science: When Science Enters the Courtroom
3.1 Science and Justice. Same Goal with Different Methods? The previous chapters have briefly discussed the fundamental concepts of the law systems, in Chap. 1, and science, in Chap. 2, their goals, and the methods followed to achieve such goals. Apparently, they share the same objective, although with different methods: investigating truth. However, we have also seen that truth may take different meanings [1] so that we may expect justice to refer to science, but we may also anticipate that their interaction is not immediate and several issues, mainly related on how to assess the forensic validity of the scientific methods, had to be discussed and are still being discussed. The goal of science is to explain the different phenomena through a suitable causal model that can anticipate an effect, given a cause. In this respect, science tries to discover universal laws starting from the observation of specific phenomena, and scientists are rather free, in their investigations, generally subject only to ethics and their conscience. If a theory is proved incorrect, this has generally no or little consequence—unless it has been put in practice before being validated—and, on the contrary, its total or partial falsification can be beneficial to the advancement of knowledge, according to Popper’s theoretical approach [2]. Justice is aimed at ascertaining the factual truth, under the constraints of the fair trial principle shown in Chap. 1. In this respect, it moves from universal laws—in this case the substantive law—represented either by precedents (in the Common Law jurisdictions) or codes (in the Civil Law jurisdictions), and tries to explain the specific fact—in this case, a crime or a rule violation. It can be immediately recognized that failing to ascertain the factual truth or, even worse, depicting a different fact from the actual one may result in a miscarriage of justice, with dramatic consequences for the involved individuals. No wonder, hence, that the triers of fact have always searched for reliable means that could assist them in this critical task, and, at the same time, have always regarded them, in particular the scientific ones, with a hint of distrust, probably because their © Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_3
27
28
3 Forensic Science: When Science Enters the Courtroom
cultural background does not allow them to fully understand the principles on which scientific evidence is based. Consequently, the procedures adopted to admit such evidence in the courtrooms have significantly evolved, following the evolution of science and also as a countermeasure to the use of the latest scientific findings by criminals, in an attempt to hide their crimes and put the investigators off the scent. The following Sections will briefly discuss the reasons for referring to scientific evidence, how the consideration given to experts’ opinion has evolved in time, and how the validity of experts’ opinion can be assessed.
3.2 Scientific Evidence and Its Evolution in Time Except for the few flagrant crimes, ascertaining the facts related to a criminal offense or that caused accidents or damages is a rather complex task, considering that the facts often occurred way before the judicial proceedings and several factors may contribute to blurring the picture, from the criminal’s attempts to hide their involvement in the case, to environment and weather influence. Testimonies rendered by eyewitnesses or percipient witnesses have always been considered as potentially conclusive evidence, although their reliability could be often doubted, even when there was little doubt about the good faith of the witnesses. Memory, as clearly discussed in [3], may fail for different reasons, and the most dangerous ones, for the legal systems, are probably suggestibility, when memories are implanted as a result of questions or suggestions, and bias, when current knowledge and beliefs may influence the way past is remembered or facts are interpreted and reported [4]. As a natural consequence of the potential lack of total trust in percipient witnesses’ testimonies, triers of fact have always tried to find more objective evidence, starting from the more ancient times, when suspects were subjected to ordeals or judgments of God: if they could survive some painful, usually dangerous experience, such as walking through fire, or barefoot over the coals, they were considered innocent and exonerated. Luckily, ordeals have been considered an irrational way of administering justice since the twelfth century and were completely discontinued during the fourteenth and fifteenth centuries [5]. It is no coincidence that ordeals faded out of trials when modern science affirmed, with Galileo, its rational method, validated on evidence coming from experiments and not on irrational beliefs. At that time science had not yet developed to a point useful to help justice, but a couple of centuries later knowledge in chemistry and toxicology was good enough to understand not only whether death was caused by poisoning but also what kind of poison was used [6]. The door letting forensic science enter the courtrooms was open and has never closed again since then. The decades across the nineteenth and twentieth centuries saw the celebration of the scientific method in the solution of crimes in the popular novels of G. K. Chesterton and A. Conan Doyle, testifying the interest that those new methods exited in the general public. The same years saw the birth of the modern
3.2 Scientific Evidence and Its Evolution in Time
29
forensic science with the work of Edmond Locard [7, 8], who also established the first police forensic laboratory. Locard formulated the exchange principle, that is still today one of the fundamental principles of criminalistics. In summary, it stated that every contact leaves a trace and, therefore, it is impossible for a criminal to perpetrate a crime without leaving some trace of his or her presence. The task of forensic science and forensic scientists is hence finding those traces and associating them with a suspect [6]. What Locard never stated is that every contact leaves a trace nor that it is always possible to find those traces unaltered and associate them with the perpetrator: Locard was a scientist and he was fully aware of the limits of science. Unfortunately, this part of his studies was not fully perceived by those who applied his principle so that, at first, triers of fact expected full certainty in the results provided by forensic experts [9, 10]. Such an expectation is clearly evidenced by a resolution adopted in 1979 by the International Association for Identification (IAI), related to latent fingerprint identification. This resolution considered a professional misconduct, for its members, to provide courtroom testimony describing an identification through latent prints as possible, probable, or likely rather than certain [9]. In those same years, the scientific and technical community has started to build a widely recognized awareness that the results returned by any experimental activity are affected by errors whose magnitude can only be estimated in probabilistic terms. This awareness led to the formulation of the Guide to the expression of Uncertainty in Measurement (GUM) [11], whose first edition dates back to 1995 and whose recommendations will be discussed in Chap. 5. Similar to all other scientific findings, this new awareness has slowly started to permeate also other activities of everyday life, including justice. The first, relevant sign of this paradigmatic shift is the famous Daubert case [12], when judge Blackmunn of the USA Supreme Court wrote, in 1993: “[I]t would be unreasonable to conclude that the subject of scientific testimony must be ‘known’ to a certainty; arguably there are no certainties in science”. The sentence rendered in this case started a broad discussion on whether expert testimonies should be considered as fully certain, or should they be expressed taking into account uncertainty or some kind of safety margin [9, 10]. A significant result of such a discussion can be found in a 2010 opinion rendered by a USA federal court prohibiting a fingerprint examiner from testifying to a certain opinion [13]. Such a paradigmatic shift has somehow upset the total, almost blind trust justice laid in scientific evidence and Locard’s method. If experts’ opinions cannot be considered as fully certain, but they rather have to be reported by disclosing the degree of uncertainty, two new problems arise: how to assign validity to an expert’s opinion and how to communicate uncertainty.
30
3 Forensic Science: When Science Enters the Courtroom
3.3 The Role of Technical Experts Before discussing how scientific evidence is validated and admitted in the courtrooms, a brief analysis of the role plaid by the technical experts in the different law systems is useful to highlight how their opinion may affect the final decision. There is little doubt that scientific evidence has represented a sort of game changer in judicial proceedings and that the trier of fact does not have, generally speaking, the competence to understand how it is formed and how reliable it is in reconstructing the factual truth. The triers of fact, as well as the prosecutors and lawyers, have a great skill, supported by their experience, in evaluating percipient witnesses’ testimonies. They are perfectly aware that witnesses may lie and that memory may fail [3], and they know how much trust can be assigned to these testimonies. On the other hand, the rapid evolution of scientific knowledge, especially in the genetic field, and the available instrumentation require, to correctly exploit them, a specific competence that does not belong to the background of jurists. Regardless of the specific jurisdiction, the presence of experts in courtrooms has become unavoidable.
3.3.1 The Role of Technical Experts in Civil Law Systems In Civil Law systems, technical experts are appointed by the judge (that, as seen in Chap. 1, is the trier of fact), are considered as proxies for the judge, as long as technical matters are discussed, and are subject to the same obligations and prerogatives as the judge. To be appointed, experts must be certified or accredited by the court or by other authorities1 recognized by the court. Under special circumstances, when the competence required by the case cannot be found in the panel of accredited experts, the judge has the right to appoint an expert whose competence in the field is universally recognized. The appointed expert is given a deadline to file a report on the assigned task in order to contribute to the judicial settlement of the case by providing technical or factual evidence. It is important to underline that, in all European judicial systems that adopt the Civil Law jurisdiction, experts’ opinions are not binding the trier of fact to follow them. In these jurisdictions, judges are the sole authorities that may interpret facts and the rule of law [14]. However, in practice, judges do seldom disregard the opinion of the experts they appointed, and this opinion has often a decisive influence on the interpretation of facts, the outcome of litigations, and the final decisions. The prosecutor and the defendants, in criminal cases, and the parties, in litigations, have the right to appoint their own experts, without being constrained to select them from the accredited panels. Experts appointed by the parties have the right to assist to 1
In some countries, the right to certify their members as judicial experts is recognized to professional associations or professional orders.
3.3 The Role of Technical Experts
31
all activities performed by the expert appointed by the court and file their own reports. However, reports filed by party experts do not feature the same characteristics of the report filed by the court’s expert. They are admissible in the same way as any piece of evidence submitted to the court by the parties. In some countries in which the Civil Law system is adopted, codes of conduct are established for technical experts and these codes require the expert to be impartial, honest, and competent. Failing to cope with these requirements may result in the expert being disqualified and, in case of serious misconduct, sanctioned.
3.3.2 The Role of Technical Experts in Common Law Systems In Common Law systems, technical experts are considered as witnesses and are called to testify, as expert witnesses by the parties. Their task is to provide a knowledgeable opinion about the specific scientific question debated in the court [15]. As witnesses, they make an oath to tell the truth and swear not to hide any circumstance useful for ascertaining truth. Consequently, should they provide false information or wrong opinions, they may be convicted for perjury. Different from technical experts in Civil Law jurisdictions, expert witnesses are not requested to be certified or accredited, although they generally belong to professional associations, such as the International Association for Identification. When such associations exist, they establish rules and best practices for their members serving as expert witnesses. Similar to technical experts in Civil Law jurisdictions, expert witnesses are required to be impartial, honest, and competent. Should they fail, they can be sanctioned and expelled from the association. In some countries, such as the USA, expert witnesses’ testimony must be admitted by the court, who shall establish whether the expert is reliable, thus qualifying him or her, and the testimony is relevant to the specific case. The expert witness is qualified by knowledge, skill, experience, or education, that must be considered relevant to the case. As mentioned above, the task of an expert witness is that of providing knowledgeable opinions. This means that, differently from percipient witnesses, expert witnesses are not subject to the hearsay rule. This rule binds a percipient witness to tell only what he or she actually knows about a case and prevents him or her to express opinions or conjectures. On the other hand, an expert witness is expected to testify on facts that are not personally known to him or her, and provide his or her opinion on specific pieces of scientific evidence referring not only to his or her personal experience but also relying on scientific articles, opinions of other reputed scientists, or discussions with colleagues. Therefore, the hearsay rule does not apply to expert witnesses.
32
3 Forensic Science: When Science Enters the Courtroom
3.4 Validity of Forensic Methods The most direct consequence of having recognized the limits of scientific evidence, as discussed in Sect. 3.2, is that such evidence cannot be considered reliable per se. The main motivation for referring to scientific evidence—its intrinsic objectivity— appears to fail, thus bringing back an a priori unpredictable amount of subjectivity in the decision-making process. A wide discussion has then started, among jurists, on how to establish whether a scientific method is reliable enough to provide objective evidence about a specific fact. A great part of this discussion developed in the Common Law jurisdictions— mainly in the USA—probably because of the lack of those codes of law that, in Civil Law jurisdictions, may falsely reassure that only valid scientific methods are allowed into the courtrooms. A first case that, in the USA, has led to significant changes in the use of experts’ testimonies is the Frye case, back in 1923 [16]. In this case, the defense attempted to use the results of a polygraph test to prove Frye’s innocence and attempted to introduce the testimony of an expert witness to explain those results. The court rejected the request by stating: [W]hile courts will go a long way in admitting expert testimony deduced from a wellrecognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.
This statement is quite acceptable, in principle, but does not provide any clear indication on how to assess the general acceptance. Having set a precedent, Frye was mentioned several times by other courts, though, absent a general consensus on the meaning of general acceptance, its application was not consistent. Its importance went far beyond the USA, and it was mentioned also in other jurisdictions, whenever a doubt on the validity of a forensic method was raised. In 1975, in an attempt to rule evidence in a clearer way, the United States Congress issued the Federal Rules of Evidence. In particular, rule 702 is aimed at providing a standard for expert witness’ testimony. It states: A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if: (a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue; (b) the testimony is based on sufficient facts or data; (c) the testimony is the product of reliable principles and methods; and (d) the expert has reliably applied the principles and methods to the facts of the case.
Once again, the principle stated by rule 702(c) (the testimony is the product of reliable principles and methods) may lead to inconsistency and incorrect assessment. A typical case is that of the Blood Pattern Analysis (BPA) that, in Lewis v. United States [17] was admitted under the motivation: MacDonnell’s studies are based on
3.4 Validity of Forensic Methods
33
general principles of physics, chemistry, biology, and mathematics, and his methods use tools as widely recognized as the microscope; his techniques are neither untested nor unreliable. The weakness of such a motivation becomes apparent if we consider that also the Ptolemaic model of the universe (see Sect. 2.3) was based on the general principles of physics and mathematics of that time but was proved to be incorrect, as the BPA method was proved to suffer from enormous uncertainties [18]. Strangely enough, a sentence of the Italian Supreme Court assigned high reliability to this method using these same words [19], thus proving that the difficulty in assessing the validity of scientific methods and the errors made in doing so crosses borders and jurisdictions. Daubert sentence highlighted four factors that can be used as a sort of metrics in the application of Rule 702 [12]: 1. Whether the expert’s theory or technique can be (and has been) tested; 2. Whether the theory or technique has an acceptable known or potential rate of error; 3. The existence and maintenance of standards controlling the technique’s operation; 4. Whether the theory or technique has attained “general acceptance”. While the first and fourth points reflect the general acceptance requirement without adding any tangible novelty, the second point has introduced a quite innovative element: the rate of error of the method. For the first time, it is stated that the rate of error shall be known and shall be acceptable. Even if it is not explicitly stated, the knowledge of the error rate can be mathematically related to the probability that the result provided by the employed forensic method is wrong and, consequently, it quantifies the probability that a decision taken on the basis of the result is wrong. The third point is also important, because it refers, indirectly, to the need, for the forensic labs, to be accredited by an independent accreditation body according to a recognized Standard,2 thus ensuring that the method is applied following known and approved procedures. The importance of these concepts has been widely recognized also outside the USA and outside the Common Law jurisdictions. It is worth mentioning here an Italian case of murder, that will be considered in more detail in Sect. 10.6 of Chap. 10: the judges of the Italian Supreme Court3 wrote, in their sentence: “[T]he result of a scientific evidence can be considered as reliable only if it is checked by the judge, at least in reference to the subjective reliability of those who report on it, the scientific acceptance of the employed method, the error rate and how acceptable it is, and the objective validity and reliability of the obtained result.”4 [20]. The rules for assigning forensic validity to scientific methods are in constant evolution, to follow the evolution of science and jurisprudence. The federal judiciary’s Advisory Committee on Evidence Rules has proposed two significant amendments 2
An example of such a Standard is the ISO IEC 17025 Std., already adopted by several forensic labs around the world. 3 The Italian Corte di Cassazione. 4 The official text, in Italian, was translated into English by the authors.
34
3 Forensic Science: When Science Enters the Courtroom
to Rule 702, that are expected to take effect December 1, 2023, and would change Rule 702 as follows (changes highlighted in italics): A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if the proponent has demonstrated by a preponderance of the evidence that: (a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue; (b) the testimony is based on sufficient facts or data; (c) the testimony is the product of reliable principles and methods; and (d) the expert’s opinion reflects a reliable application of the principles and methods to the facts of the case.
However, despite all attempts to rule validity and admissibility of scientific evidence and exclude subjectivity in favor of objectivity, this goal is still far from being achieved. The US Congress, recognizing that significant improvements are needed in forensic science, asked the National Academy of Science an opinion on how to strengthen forensic science, thus originating an important report in 2009 [18]. The academy’s initial consideration is quite interesting and is worth quoting it in its entirety: For decades, the forensic science disciplines have produced valuable evidence that has contributed to the successful prosecution and conviction of criminals as well as to the exoneration of innocent people. Over the last two decades, advances in some forensic science disciplines, especially the use of DNA technology, have demonstrated that some areas of forensic science have great additional potential to help law enforcement identify criminals. Many crimes that may have gone unsolved are now being solved because forensic science is helping to identify the perpetrators. Those advances, however, also have revealed that, in some cases, substantive information and testimony based on faulty forensic science analyses may have contributed to wrongful convictions of innocent people. This fact has demonstrated the potential danger of giving undue weight to evidence and testimony derived from imperfect testing and analysis. Moreover, imprecise or exaggerated expert testimony has sometimes contributed to the admission of erroneous or misleading evidence.
Three points have been highlighted by the NAS report: • Faulty forensic science, that is methods whose scientific validity has not been validated or accepted by the scientific community. • Imperfect testing and analysis, that is assigning a high degree of certainty to test or analysis results that show a non-negligible uncertainty, such as the BPA technique. • Imprecise or exaggerated expert testimony, that focuses on the critical issue of how to assess the competence of the experts. The NAS report has also identified some of the major problems facing the forensic science community, in particular, [18]: • Disparities in the forensic science community—A significant disparity in money, staff, training, equipment, certification, and accreditation was found among the
3.4 Validity of Forensic Methods
35
existing forensic science operations in federal, state, and local law enforcement jurisdictions and agencies. This disparity results in great disparities also in reliability and overall quality of the obtained scientific evidence. The problem is not confined to the USA and can be generalized to the global forensic community. • Lack of mandatory standardization, certification, and accreditation—Several forensic methods are not standardized and, in cases when standards exist, they are often non-compliant with such standards. Moreover, there is no uniformity in the certification of forensic practitioners, or in the accreditation of crime laboratories, that are not required in many states (and also outside the USA). Therefore, there are no references against which a forensic method and a forensic result can be validated. • Problems relating to the interpretation of forensic evidence—Forensic evidence is often offered to support a matching attribution of a specimen to a particular individual or source, or about classification of the source of the specimen into one of several categories. However, there is no rigorous evidence that forensic methods (including cases of DNA profiling) have the capacity, with a high degree of certainty, to demonstrate a connection between evidence and a specific individual or source. Such a connection is generally obtained by means of the expert’s interpretation, whose validity is not always assessed on the basis of scientific studies. • The need for research to establish limits and measures of performance—The evaluation of the validity of a given piece of evidence in the specific context of a case is an open problem that, at the time the report was published, lacked a scientifically sound solution. The NAS report [18] discussed the above points in depth, covering the different fields of forensic science and formulated several recommendations to overcome the detected problems. A few years later, in 2016, under Obama’s presidency, the President’s Council of Advisors on Science and Technology (PCAST) gave a pondered response to the NAS’ recommendations in a report, generally known as the PCAST report [21]. The main focus of this report is on ensuring the scientific validity of featurecomparison methods. In particular, the report considers seven feature-comparison methods: • • • • • • •
DNA analysis of single-source and simple-mixture samples; DNA analysis of complex-mixture samples; Bitemark analysis; Latent fingerprint analysis; Firearms analysis; Footwear analysis; Hair analysis.
Despite this focus, the PCAST report provides two definitions for scientific validity—the foundational validity and the validity as applied—that have a much broader meaning. It is worth quoting them integrally [21]:
36
3 Forensic Science: When Science Enters the Courtroom Foundational validity for a forensic-science method requires that it be shown, based on empirical studies, to be repeatable, reproducible, and accurate, at levels that have been measured and are appropriate to the intended application. Foundational validity, then, means that a method can, in principle, be reliable. It is the scientific concept we mean to correspond to the legal requirement, in Rule 702(c), of “reliable principles and methods”. Validity as applied means that the method has been reliably applied in practice. It is the scientific concept we mean to correspond to the legal requirement, in Rule 702(d), that an expert “has reliably applied the principles and methods to the facts of the case”.
The PCAST report explains the meaning of these two definitions with some additional details. In particular, it states that the essential points of foundational validity include the following [21]: (1) Foundational validity requires that a method has been subjected to empirical testing by multiple groups, under conditions appropriate to its intended use. The studies must (a) demonstrate that the method is repeatable and reproducible and (b) provide valid estimates of the method’s accuracy (that is, how often the method reaches an incorrect conclusion) that indicate the method is appropriate to the intended application. (2) For objective methods, the foundational validity of the method can be established by studying measuring the accuracy, reproducibility, and consistency of each of its individual steps. (3) For subjective feature-comparison methods, because the individual steps are not objectively specified, the method must be evaluated as if it were a “black box” in the examiner’s head. Evaluations of validity and reliability must therefore be based on “black-box studies”, in which many examiners render decisions about many independent tests (typically, involving “questioned” samples and one or more “known” samples) and the error rates are determined. (4) Without appropriate estimates of accuracy, an examiner’s statement that two samples are similar—or even indistinguishable—is scientifically meaningless: it has no probative value and considerable potential for prejudicial impact. According to the previous points (2) and (3), the PCAST report makes a distinction among objective methods and subjective methods. In particular, it states that “by objective feature-comparison methods, we mean methods consisting of procedures that are each defined with enough standardized and quantifiable detail that they can be performed by either an automated system or human examiners exercising little or no judgment” [21]. On the other hand, “by subjective methods, we mean methods including key procedures that involve significant human judgment—for example, about which features to select within a pattern or how to determine whether the features are sufficiently similar to be called a probable match” [21]. Moreover, the PCAST report explains that, in order to meet the scientific criteria for validity as applied, two tests must be met [21]: (1) The forensic examiner must have been shown to be capable of reliably applying the method and must actually have done so. Demonstrating that an expert is capable of reliably applying the method is crucial—especially for subjective methods,
3.5 Forensic Metrology: A Step Forward
37
in which human judgment plays a central role. From a scientific standpoint, the ability to apply a method reliably can be demonstrated only through empirical testing that measures how often the expert reaches the correct answer. Determining whether an examiner has actually reliably applied the method requires that the procedures actually used in the case, the results obtained, and the laboratory notes be made available for scientific review by others. (2) The practitioner’s assertions about the probative value of proposed identifications must be scientifically valid. The expert should report the overall false-positive rate and sensitivity for the method established in the studies of foundational validity and should demonstrate that the samples used in the foundational studies are relevant to the facts of the case. Where applicable, the expert should report the probative value of the observed match based on the specific features observed in the case. And the expert should not make claims or implications that go beyond the empirical evidence and the applications of valid statistical principles to that evidence. The above definitions and points received a general consensus by the forensic community also outside the USA, although their application has not been overall consistent. Indeed, from a technical perspective, those definitions suffer from a rather significant critical missing point. While everybody may agree that a forensic method must be “repeatable, reproducible, and accurate, at levels that have been measured and are appropriate to the intended application”, and that it must be reliably applied in practice, as specified by the points mentioned in the PCAST report, yet the report fails to provide recommendations on how repeatability, reproducibility, accuracy, and reliability in practical applications shall be evaluated. Without clear and universally agreed indications on how to quantitatively evaluate these elements, any assessment of the scientific validity of a forensic method remains largely subjective, despite the efforts done to search for objective assessments [22].
3.5 Forensic Metrology: A Step Forward According to the point discussed in Chap. 2, science expresses itself through numbers. The validity of any scientific theory or method is assessed when the quantitative results provided by suitably designed experiments confirm the theoretical results forecast by the formulated theory under the same conditions as those of the test. This principle was clearly expressed by William Thomson Lord Kelvin in his famous lecture of May 1883 [23]: When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever the matter may be.
38
3 Forensic Science: When Science Enters the Courtroom
Forensic sciences make no exception to this principle. Assessing their validity makes no exception either. According to Lord Kelvin’s thought, this means that if we want to know something about how valid a forensic method is, its validity must be expressed in numbers. Otherwise, not only our knowledge about the validity of an employed forensic method is of an unsatisfactory kind but also can be inconsistently considered as valid by a court and rejected by another court. The problem to solve, therefore, is not only that of finding a definition for the validity of a forensic method, as the PCAST report did, but also that of expressing those concepts—repeatability, reproducibility, accuracy, and reliability—in numbers. Even more importantly, this shall be done in an universal way, valid for every possible forensic method in use. Those methods feature a unifying point: they all collect samples on the crime scene, or, in the case of litigations, on the object that originated the litigation and try to extract as much information as possible from them. This process is, actually, a measurement process, and, as such, it is ruled by the science of measurement: metrology, as seen in Sect. 2.5 of Chap. 2. As already anticipated in that Sect. 2.5, an important role of metrology is that of assessing, in a quantitative way, the validity of the obtained results, both in principle—foundational validity—and as applied to the specific measurement application—validity as applied [24]. The validity of a measurement result is expressed by a parameter—measurement uncertainty—that is obtained, as it will be seen in the next Part II, by combining the different contributions of uncertainty affecting the result: those originated by the employed method and its limit in describing the true value of the quantity to be measured—validity in principle—and those originated by imperfections in the employed instruments, variations in the environmental properties (temperature, humidity, pressure, presence of chemical substances, electromagnetic interferences, . . .) with respect to the reference ones, and the contribution of the operator—validity as applied. The definition of such a parameter and the methodology employed to evaluate it are universal, as stated in the already mentioned Guide to the expression of Uncertainty in Measurement (GUM)5 [11]: the method should be applicable to all kinds of measurements and to all types of input data used in measurements.
Therefore, metrology provides a unique and universal method to quantify all concepts mentioned by the PCAST report [21], such as repeatability, reproducibility, and accuracy, as well as through the implementation of tests such as intralaboratory 5
The universality of the concepts expressed by the GUM is reflected by the seven organizations that supported its development: BIPM—Bureau International des Poids et Mesures, IEC— International Electrotechnical Commission, IFCC—International Federation of Clinical Chemistry, ISO—International Organization for Standardization, IUPAC—International Union of Pure and Applied Chemistry, IUPAP—International Union of Pure and Applied Physics, OlML— International Organization of Legal Metrology.
References
39
and interlaboratory tests, or similar ones, the capability of experts to actually and reliably apply the employed method to the examined case. Moreover, as it will be seen in the next Part II, uncertainty evaluation relies on probability. This means that, if correctly evaluated, uncertainty provides the probability distribution of the values that can reasonably be attributed to the quantity to be measured by the measurement process [11] and, consequently, it is possible to evaluate the probability that a measured quantity is actually above or below a given threshold. Hence, it is possible, as it will be shown in the next Chap. 8, to quantify the doubt that a decision based on a given scientific evidence is not correct, because of the uncertainty associated with that piece of evidence. Some triers of fact have already referred to metrology in rendering their decisions, as reported in [9], and in next Chaps. 9 and 10, though occasionally and in a rather inconsistent way. The next parts of this book are aimed at proving that the forensic applications of metrology may represent an extremely powerful tool in helping the trier of fact assessing the validity of scientific evidence in a quantitative, scientifically sound way, thus contributing to prevent the cases of miscarriage of justice caused by bad science or an incorrect application of forensic methods and an incorrect interpretation of their results.
References 1. Fernandez, J.M.: An exploration of the meaning of truth in philosophy and law. Univ. Notre Dame Aust. Law Rev. 11, 53–83 (2009) 2. Popper, K.: The Logic of Scientific Discovery. Psychology Press, Oxford, UK (2002) 3. Schacter, D.L.: The Seven Sins of Memory: How the Mind Forgets and Remembers. Houghton Miflan, New York, NY, USA (2001) 4. Dror, I.E.: Human expert performance in forensic decision making: seven different sources of bias. Aust. J. Forensic Sci. 49, 541–547 (2017) 5. Leeson, P.T.: Ordeals. J. Law Econ. 55, 691–714 (2012) 6. Tilstone, W.J., Savage, K.A., Clark, L.A.: Forensic Science: An Encyclopedia of History, Methods, and Techniques. ABC-CLIO, Santa Barbara, CA, USA (2006) 7. Locard, E.: L’enquête criminelle et les méthodes scientifiques. Ernest Flammarion éditeur, Paris, France (1920) 8. Locard, E.: Traité de criminalistique. Desvigne, Lyon, France (1931) 9. Imwinkelried, E.J.: Forensic metrology: the new honesty about the uncertainty of measurements in scientific analysis. UC Davis Legal Studies Research Paper No. 317 (2012) 10. Scotti, V.: Forensic metrology: where law meets measurements. In Proceedings of the 20th IMEKO TC4 International Symposium, pp. 385–389. Benevento, Italy (2014) 11. BIPM JCGM 100:2008: evaluation of measurement data—guide to the expression of uncertainty in measurement (GUM) 1st edn (2008). http://www.bipm.org/utils/common/documents/ jcgm/JCGM_100_2008_E.pdf 12. Daubert, V.: Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993) 13. United States, V. Zajac, 748 F.Supp.2d 1327 (D. Utah 2010) 14. European parliament directorate-general for internal policies: civil-law expert reports in the EU: national rules and practices (2015). http://www.europarl.europa.eu/supporting-analyses 15. Bronstein, D.A.: Law for the Expert Witness, 4th edn. CRC Press, Boca Raton, FL, USA (2012) 16. Frye, V.: United States, 293 F. 1013 (D.C. Cir. 1923)
40
3 Forensic Science: When Science Enters the Courtroom
17. Lewis, V.: United State, 737 S.W.2d 857 (Tex. App. 1987) 18. National Academy of Science, National Research Council, Committee on Identifying the Needs of the Forensic Sciences Community: Strengthening Forensic Science in the United States: A Path Forward, Doc. n. 228091 (2009). https://tinyurl.com/zsrzutar 19. Corte di Cassazione, Prima sezione penale: Sentenza del 21 maggio 2008, n. 31456 20. Corte di Cassazione, Quinta sezione penale: Sentenza del 27 marzo 2015 no. 36080/2015, pp. 1–52 (2015) 21. President’s Council of Advisors on Science and Technology (PCAST): Report to the President—Forensic Science in Criminal Courts: Ensuring Scientific Validity of FeatureComparison Methods (2016). https://tinyurl.com/abv5mufh 22. van der Bles, A.M., van der Linden, S., Freeman, A.L.J., Mitchell, J., Galvao, A.B., Zaval, L., Spiegelhalter, D.J.: Communicating uncertainty about facts, numbers and science. R. Soc. Open Sci. 6, 1–42 (2019). http://dx.doi.org/10.1098/rsos.181870 23. Thomson, W.: Electrical units of measurement, in Popular Lectures and Addresses, vol. 1, pp. 80–143. MacMillan & Co., London, UK (1889) 24. Ferrero, A., Petri, D.: Measurement Models and Uncertainty. In: Ferrero A., Petri D., Carbone, P., Catelani, M. (eds.) Modern Measurements: Fundamentals and Applications, 1st edn., p. 4000. Wiley-IEEE Press (2015)
Part II
Metrology and its Pillars
Chapter 4
The Measurement Model
4.1 Why a Model? We have already seen, in Chap. 2, that models play an important role in the advancement of knowledge. Nowadays, every scientific theory and every technical activity are based on models that describe them in logical and mathematical terms. The measurement activity makes no exception and can also be represented by a suitable model to describe the different processes, both descriptive and experimental, that are involved in this activity. Understanding and evaluating how limited the information provided by a measurement result is, and how well it represents the measurand requires understanding these processes, and how they interact with each other so that the reasons for the intrinsic limitations of the measurement results become apparent. A good model is the initial step to understand whether or not a measurement process is adequate for the intended application. As long as forensic metrology is concerned, the intended application for a measurement result is, generally, that of taking a decision: most cases require to ascertain whether a quantity, be it the amount of some substances or the number of features, is above or below a given threshold. Therefore, a good model is expected to help us understanding whether a measurement process is capable, in principle, of providing the desired measurand value in an accurate enough way to allow one to quantify the risk that decisions taken on the basis of measurement results are wrong. In other words, it is the initial step of the long logical process aimed at providing the trier of fact enough quantitative information to decide whether this risk is reasonable or not. In this chapter, after having introduced some basic terms of metrology, to avoid any confusion and misunderstanding on the employed terminology, a possible model for the measurement process is described and discussed to identify the origin of the limited information provided by a measurement result and, hence, the possible sources of measurement uncertainty.
© Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_4
43
44
4 The Measurement Model
4.2 Measurement and Metrology 4.2.1 Terminology The measurement science, similar to every other branch of science, has its own typical terminology that must be known to avoid misunderstanding and to refer to the different processes correctly. The terms used in the measurement science are defined by the International Vocabulary of Metrology (VIM) [1], a document issued by the Joint Committee for Guides in Metrology (JCGM), the International Organization in charge of preparing and maintaining the official Guides in Metrology. Since this is an official document and represents a reference document for the metrology experts, it is worth considering here the most important terms.
4.2.1.1
Metrology
Let us start with the term that appears in the title of this book: Metrology. The VIM defines it in its article 2.2 as: science of measurement and its applications.
It is important to consider also the note to this definition: Metrology includes all theoretical and practical aspects of measurement, whatever the measurement uncertainty and field of application.
This means that forensic metrology, that is the application of measurement related to forensic activities, can be rightfully considered part of metrology, and, consequently, all theoretical aspects of metrology should be considered when applying it.
4.2.1.2
Measurement
The definition of metrology seen above refers to measurement, which is also defined by the VIM in its article 2.1 as: process of experimentally obtaining one or more quantity values that can reasonably be attributed to a quantity .
The keywords here are experimentally, reasonably, quantity, and quantity value. The latter two ones are considered by a specific VIM definition, so let us concentrate on the first two ones. Let us start with experimentally. As we have already seen in Chap. 2, the scientific method advances knowledge also by means of experiments. This definition confirms that measurement belongs to the experimental science and, since it aims at providing quantity values to the considered quantities, it is the most important knowledge tool of the experimental science.
4.2 Measurement and Metrology
45
However, Sect. 2.4 in Chap. 2 has clearly proved that science cannot achieve complete knowledge of a given phenomenon, and this is reflected by the term reasonably in the above definition. While this term might appear as non-scientific, we will see in the next chapters and sections of this book that it warns those who employ measurement results to make decisions that the value a measurement result attributes to a quantity lacks full certainty. Therefore, as it will be proved in the following, every value attributed to a quantity by means of a measurement process must be associated with some additional information capable of quantifying how reasonable that value is. Despite measurement is an experimental activity, the VIM does not disregard the importance of models, as it states, in a note to the above definition: Measurement presupposes a description of the quantity commensurate with the intended use, of a measurement result, a measurement procedure, and a calibrated measuring system operating according to the specified measurement procedure, including the measurement conditions.
This description can be provided by a suitable model, and all points in the above note will be taken into account in the following parts of this chapter, to define such a model. 4.2.1.3
Quantity
The VIM defines, in its article 1.1, a quantity as: property of a phenomenon, body, or substance, where the property has a magnitude that can be expressed as a number and a reference.
An important note to this article explains what a reference is: A reference can be a measurement unit, a measurement procedure, a reference material, or a combination of such.
A strict interpretation of this definition, as also encouraged by another note to this article,1 may consider only physical, chemical, and biological quantities, which are surely the most important ones also in forensic activities. However, this definition has a broader meaning, and can be applied to any kind of quantity, including non-physical ones, such as satisfaction, quality, perceived quality, ..., whose reliable measurement has nowadays become a need in many fields and has given origin to a new branch of metrology, called soft metrology [2–4]. Therefore, this definition will be considered in its broader meaning in the following. 4.2.1.4
Quantity Value
The VIM defines, in its article 1.19, a quantity value as: number and reference together expressing magnitude of a quantity. 1
This note states: The concept “quantity” may be generically divided into, e.g., “physical quantity”, “chemical quantity”, and “biological quantity”, or base quantity and derived quantity.
46
4 The Measurement Model
According to this definition and the employed reference, a quantity value can be either the product of a number and a measurement unit, as when the length of a given rod is specified as: 5.34 m, a number and a reference to a measurement procedure, as when the Rockwell C hardness of a given sample (150 kg load) is specified as: 43.5HRC(150 kg), or a number and a reference material, as when the arbitrary amount-of-substance concentration of lutropin in a given sample of plasma (WHO international standard 80/552) is specified as 5.0 International Unit/l. According to the definitions mentioned above, since measurement is the process of experimentally obtaining one or more quantity values that can reasonably be attributed to a quantity , and since a quantity value is a number and reference together expressing magnitude of a quantity, we may expect that a quantity value represents also a measurement result. Is this correct?
4.2.1.5
Measurement Result
The definition of measurement result is given by the VIM in article 2.9, and states: set of quantity values being attributed to a measurand together with any other available relevant information.
The keywords, here, are set of quantity values and other available relevant information. Indeed, stating that a measurement result is represented by a set of quantity values and not by a single quantity value recognizes that a single measured value cannot fully represent the value of the measurand, that is the quantity intended to be measured.2 Therefore, a set of quantity values is needed to represent the values that can reasonably be attributed to a quantity according to the definition of measurement given in the previous Sect. 4.2.1.2. On the other hand, a set of quantity values itself does not provide useful information about the measurand. A number of questions may be asked and need a clear and reliable answer. For instance, is there a value, inside the given set, which is more likely to represent the measurand? Or, are there values that are more representative of the measurand than others? Can the given set of quantity values be considered as a coverage interval? If so, which is the coverage probability that can be associated with such an interval? It is immediate to recognize that additional and relevant information is needed to obtain a measurement result from a set of quantity values. This additional information should be made available by the measurement process followed to obtain the set of quantity values and, in metrology, it is referred to as measurement uncertainty. Since it has to be sought inside the measurement process, a good model is once again needed to understand where this additional information originates and how relevant it is.
2
VIM, article 2.3.
4.3 The Model
47
4.3 The Model The definitions given in the previous section are quite useful in deriving a model capable of describing the measurement activity. We have seen that a measurement result is a set of quantity values assigned to a measurand, together with any other relevant available information. A quantity value is a number and a reference assigned to the magnitude of a quantity, which, in turn, is a property of a phenomenon, body, or substance. A property (for instance, weight, length, ...) manifests itself as a fact (for instance, the weight of a case, or the length of a table) that belongs to the empirical world and can be observed there. On the other hand, a number and a reference are abstractions that belong to the world of symbols. A first model for the measurement activity can hence be derived as a logical mapping of a single manifestation q of a given property onto a suitable symbol x [5–7], as graphically shown in Fig. 4.1. Therefore, the task of this model is that of describing the logical and physical processes that map a property manifestation onto a symbol. The definition of measurement given in Sect. 4.2.1.2 may lead to assume that these processes are mainly experimental processes, since measurement is defined as the process of experimentally obtaining one or more quantity values. On the other hand, this same definition, as well as that of measurement result given in Sect. 4.2.1.5, refers to one or more quantity values, and to a set of quantity values. This takes into account that, in practice, the actual measurement process differs from the ideal one and, due to a finite sensitivity and resolution, it is likely to map different, but undistinguishable manifestations of the same property onto the same symbol [7], as graphically shown in Fig. 4.2. As a matter of fact, the mapping process cannot be a one-to-one mapping.
Fig. 4.1 Initial model for the measurement activity. q: single property manifestation. x: assigned symbol
Fig. 4.2 Representation of the mapping process, when different properties qi are mapped on the same symbol x
Property
Measurement process
Symbol
48
4 The Measurement Model
The direct consequence of this conclusion is that a single numerical value and a reference cannot represent the value of a measurand, since it is impossible to know which one, among the different properties qi , is represented by the quantity value provided by the single symbol x. For this reason, a set of quantity values is needed to represent all possible different and undistinguishable property manifestations that are mapped onto x, but whose value is actually different from x. This confirms what was already anticipated in Chap. 2, in Sect. 2.5: a measurement result can provide only a finite amount of information about the measurand and cannot provide the true value of the measurand itself with full certainty. Indeed, this is a rather scary conclusion, from a philosophical point of view and also from the more practical point of view of the validity of a measurement result employed in forensic activities, since it may lead to conclude that every measurement result is wrong and, as such, useless. The main question that arises is whether such a finite amount of information, and hence incomplete information on the measurand, does still represent knowledge—at least adequate knowledge for the intended use—and can be used as an input element in a decision-making process. From a technical point of view, the answer is positive, if the measurement result is capable of providing quantitative information also on the lack of completeness in the information it conveys about the measurand, because this additional quantitative information can be used to quantify the risk that the decision-making process returns the wrong decision [4, 8]. In metrology, this is provided by the other available relevant information mentioned by the VIM definition of measurement result reported in Sect. 4.2.1.5 and is generally expressed in terms of measurement uncertainty. The direct consequence, on the model, of this need for additional relevant information, is that the measurement process cannot imply only mere experimental processes that describe the way the measuring equipment has to be connected to the measurand and operated. A number of additional descriptive processes must be considered as well, as shown in Fig. 4.3, that describe the way the mapping process is achieved and, consequently, the way the relevant information is identified and retrieved. The analysis of these processes allows us to understand also where and why the most important contributions to uncertainty originate, often outside the measuring equipment. A complete analysis can be found in textbooks aimed at covering the measurement process from a strictly technical point of view [5, 7]. Here, we will limit ourselves to consider the most important ones, with respect to the specific forensic applications.
Fig. 4.3 Model of the measurement activity, showing the involved processes. q: single property manifestation. x: assigned symbol
Descriptive processes Properties
Experimental processes Measurement
Symbols
4.3 The Model
49
In this respect, it is worth considering that also models are incomplete. George Box used to say that “all models are wrong, some are useful” [9], and we can interpret this somehow paradoxical statement as an invitation to pay particular attention in identifying the limits of validity of the models we use. Having to deal with incomplete models means that their definition requires choices about what must be included and what can be neglected, that several, different models can be defined for the same reality, and that their derivation is unavoidably affected by implicit or explicit goals, a priori knowledge, experience, and opinions. This implicit arbitrariness might appear quite harmful in forensic applications, since the border between a priori knowledge, experience, opinions, and cognitive bias is, or may appear to be fuzzy. To avoid this, the main prerequisites for a good, trustworthy model are transparency and continuous update. A transparent model is a model based on clear and explicit goals and assumptions, to avoid—or minimize—possible misuses or misinterpretations. A continuously updated model looks for and uses feedback from reality to continuously search validation and improve and adapt when the context changes. In metrology, good models do also apply the fundamentals of metrology to quantitatively estimate their lack of completeness, that is the amount of approximation with which they describe reality. On the other side, bad models are opaque, unquestioned, and unaccountable. They appear to be fair, but they generally support distorted conclusions, as it happens whenever a cognitive bias tends to hide or misinterpret pieces of evidence. Keeping this in mind, let us expand the model in Fig. 4.3 to explicit the descriptive processes. To do so, it is worth noting that, regardless of the specific field of application, the measuring activity is not a self-motivating activity. On the other hand, it is a goal-oriented activity, in which measurement results are employed as relevant input elements in some decision-making process aimed at identifying the best actions needed to achieve established goals while satisfying given conditions. It can be readily perceived that, in forensic applications, according to whether criminal or civil law is considered, the decision-making process is the process aimed at identifying the culprit of a crime, or the subject liable for a tort, the actions aim at identifying a fair punishment or fair compensation, while the goal is to administer justice according to the law (the given conditions). According to these considerations, the descriptive processes in the model shown in Fig. 4.3 can be represented in a more explicit way, as shown in Fig. 4.4. Each block in Fig. 4.4 represents an important step in the measurement procedure and all these steps must be performed in order to obtain a measurement result capable of providing as complete information as possible on the measurand. Let us then analyze these steps.
50
4 The Measurement Model goals identification conditions modeling
validation
verification property
experimental processes
decision
action
Fig. 4.4 Model of the measurement activity, showing the involved descriptive processes explicitly
4.3.1 Identification Generally, a measurement result depends not only on the measurand but also on a number of other properties of the measurement context in which that specific measurement activity is performed. For instance, to mention only the most significant ones, the measurement context involves the employed references, a number of empirical properties related to the object under measurement and the experimental setting (time, temperature, electric and magnetic fields, operator skills, instrument properties, ...), and the operations performed during the measurement process. Hence, this identification step implies the identification of the whole measurement context and not only that of the measurand, that is the identification of all relevant properties that may affect the measurement result as well as the possible interactions among them. Three major critical points are relevant in this step. • Measurand identification We have seen in the previous Sect. 4.2.1.5 that a measurement result is a set of quantity values we assign to a measurand that, in turn, is a property of a phenomenon, body, or substance. Therefore, the first step to accomplish is that of identifying which property of the phenomenon, body, or substance we are analyzing is the best-suited one to achieve the established goals under the given conditions.3 3
In the most general case, more than one property must be identified, since more than one property is needed to retrieve complete enough information on the phenomenon, body or substance we wish to characterize. However, for the sake of simplicity, we are considering, in the following, the identification of only one property, since this step can be readily extended to the case where multiple properties must be identified.
4.3 The Model
51
This identification step, hence, is aimed at identifying which property conveys the most relevant information about the measurand needed to accomplish the established goals. For instance, when a subject is suspected of driving under intoxication (DUI), the property that conveys the most relevant information about the potential intoxication is the alcohol concentration in blood (BAC) [10]. • Measurement system identification Having identified the measurand, the measurement system must be identified as the one that appears to better achieve the assigned goal under the given conditions. Given all conditions, a system that achieves the direct measurement of the measurand might not be the most suitable one. For instance, when BAC measurements are involved, the most accurate and direct way to measure it is through the analysis of a blood sample. However, this is an invasive method and is not the most suitable one when roadside analysis is one of the given conditions. In such a case, an indirect measurement, obtained through the measurement of alcohol concentration in breath (BrAC—Breath Alcohol Concentration) might become the optimal choice [10]. • Environment identification The measurand, the measurement system and the environment are not separated entities. Both the measurand and the measurement system exists and operate in a given environment that manifests itself through environmental properties, such as temperature, pressure, humidity, gravity, and time [5, 7]. When a measurement is performed, measurand, measurement system and environment interact. For instance, when temperature is measured by placing a thermometer, or more in general a temperature probe in close contact with a body, it changes the heat transfer of the body itself, and, consequently, its temperature in a way that depends on the respective thermal capacities. Similarly, electronic instruments dissipate energy in the environment, thus contributing to its temperature variations. The main consequence of these interactions is that variations in one or more environmental properties, including time, may have an effect on the measurement result. The effect of time cannot be directly controlled and, as we will see in the next chapters, can be monitored by performing periodic calibrations of the measurement system. On the other hand, the environmental properties (such as temperature, pressure, humidity, ...) can be kept under control and constant within a given tolerance, or their effects on the final measurement result can be identified and quantified. To do this, it is necessary to identify which environment properties, called influence properties, or influence quantities, affect the measurement result in a significant way,4 4
The meaning of significant will be understood once the concept of measurement uncertainty will be explained, in the next Chap. 5. Here, it is sufficient to say that the measurement result is affected in a significant way if the uncertainty contribution associated with the considered influence property is non-negligible with respect to all other uncertainty contributions.
52
4 The Measurement Model
so that suitable actions are adopted to either control them during the measurement process, or correct the obtained measurement result. Once the influence properties have been identified, it is also necessary to identify all significant mutual interactions between measurand, measurement system and environment, so that they can be properly described to achieve an accurate representation of the whole measurement process.
4.3.2 Modeling The identification of measurand, measurement system and influence quantities represents only the first step toward an accurate description of the measurement context. As already mentioned, they interact with each other, and, therefore, the measurement context cannot be completely described if such interactions are not properly described by means of a mathematical model. In such a model, the identified properties are represented by mathematical variables, and the mutual interactions are represented by mathematical relationships. In the considered mathematical model, variables and relationships can assume different mathematical representations. Generally, they are assumed to be deterministic, since a deterministic model is assumed to convey the most accurate information about the context. However, this is not always true, and depends, mainly, on the amount of available information about the context itself. When the amount of available information is not enough to build an accurate deterministic model, different mathematical representations can be usefully employed, such as probabilistic, fuzzy, neural, and neuro-fuzzy models. The forensic domain is, in general, quite reluctant in accepting non-deterministic models, since it is generally thought that only deterministic models can convey certain enough information. However, this is not really true, since, in the presence of only partial information, non-deterministic models might be more accurate than approximated deterministic ones. Modern quantum physics, totally based on probability, is a clear example of the effectiveness of non-deterministic mathematical models. Regardless of the adopted mathematical representation, the modeling step is probably the most critical element in the whole measurement process. The symbolical description that it provides becomes the basis of all subsequent steps and, therefore, it deeply affects the quality5 of the measurement result. The main point is that any model is capable of providing only incomplete knowledge of the measurement context. The reason lies in the incomplete knowledge of the measurand, in not having considered quantities and interactions that, individ-
5
We are intentionally using, at this point of the book, the general-purpose term quality, and not other, more specific and technically correct terms, since it gives an immediate perception that the measurement result, not being ideal, or perfect, can be of good or bad quality. We will see in the next Chapters, how this quality can be quantitatively expressed and evaluated.
4.3 The Model
53
ually taken, have a negligible effect, and in the unavoidable approximation of the mathematical representation of variables and relationships. The main consequence is that no matter how good will be the employed instruments, they cannot provide more complete knowledge than that provided by the model. Therefore, the model represents a sort of upper limit to the quality of the obtained measurement result: the knowledge provided by the measurement result about the measurand can never be more complete than that provided by the model. To fully understand this point, we can once again refer to the measurement of alcohol concentration in blood. We have seen that it is often convenient to quantify this concentration through the measurement of the alcohol concentration in breath. This is possible because a physical model exists that relates the alcohol concentration in an individual breath to his or her alcohol concentration in blood. This relationship is fairly linear in single individuals, but the parameters of this linear relationship vary from individual to individual [10]. Therefore, if an average value, evaluated on a statistically significant population, is taken for these parameters, an approximated model is defined, and the amount of approximation may affect the measured BAC more significantly than the accuracy of the employed instrument, as it will be shown in the next Chap. 9. This is a critical, often neglected point of forensic metrology. We will see, in the next Chap. 5, how this can and shall be considered in evaluating measurement uncertainty.
4.3.3 Experimental Processes The experimental processes referred to by the block in Fig. 4.4, as well as by that of Fig. 4.3, are those directly related to the use of the measuring equipment. While the descriptive processes that have been considered up to this point are quite general and refer to any measurement process, the experimental ones are specific to the given application and depend on the given goals and conditions. It is quite obvious that the measurement of alcohol concentration in breath requires totally different measuring instruments than the measurement of a crack in a metallic rod, and that goals and conditions are probably quite different for the two measurements. Anyway, a careful and accurate description of this process is of utmost importance, since the contribution given by the employed instruments to the quality of the provided measurement result highly depends on the operative conditions under which the instruments have been employed and on their present state (previous usage, age, time from the last calibration, ...). A clear specification of the experimental process is, therefore, needed to instruct the operator on how to perform the required measurement in order to satisfy the given goals under the given conditions. For instance, when roadside instruments (radars or lidars) are used to measure vehicle speed for traffic enforcement purposes—the so-called speed traps—the experimental process is more complex than it is generally thought. The instrument must be placed at a given distance from the roadside, with a given angle with respect
54
4 The Measurement Model
to the road axes, and a given angle with respect to the road surface; moreover, the local temperature must be within the operative range for the given instrument. Not respecting any of these conditions might affect the accuracy of the measured speed significantly and, if a fine is given for speeding, it can be opposed because of not having followed the prescribed experimental process.
4.3.4 Decision It has been already mentioned that measurement results are used as input quantities in a decision-making process. In forensic sciences, this decision implies, generally, the comparison of the measured quantity with a given threshold and the decision is taken on whether the threshold is exceeded or not. Several examples can be given. The most immediate ones are related to the already given examples of BAC and speed measurements. The measured BAC value is compared with the thresholds imposed by the law, and, if it exceeds the threshold, a sanction is issued. The same happens with the speed measured value. Less immediate examples are related to fingerprint or DNA identification, where a number of features (minutiae in fingerprint identification and alleles in DNA profiling) must correspond to a given pattern. This is usually done by defining and measuring a distance between the features in the detected samples and the comparison pattern: if the distance is below a given threshold, the sample is considered the same as the comparison pattern. The decision-making process is, apparently, very simple: everybody is capable of comparing two numerical values, the measured one and the threshold. The reality is much more complex. We have already seen, in Sect. 4.2.1.5, that a measurement result cannot be given as a single quantity value, but, instead, as a set of quantity values, together with any other available information. Next Chap. 5 will prove that this set of quantity values will be given as a coverage interval with a given coverage probability.6 This means that the decision-making process does not simply involve the comparison of two simple numerical values, but the comparison of a numerical value—the threshold—with a coverage interval—the measurement result. The main consequence of this point is, according to decision theory [8], that a risk always exists of wrong decision. If the measurement result is correctly expressed in terms of a correct coverage interval with a correct coverage probability, the risk of wrong decision can be computed as the probability that a measurement result considered to be above a threshold is, actually, below the threshold, and viceversa.7
6
The mathematical definition of such terms will be given in Chap. 5. Here, let us intuitively define such intervals as intervals, about the measured value, in which the unknown true value of the measurand is supposed to lie with a given probability, called coverage probability. 7 This point will be covered with more mathematical details in Chap. 8.
References
55
We will see that this is, probably, the most important result provided by forensic metrology: the risk of wrong decision is quantified and submitted to the trier of fact, who can, then, weigh it and decide whether it is negligible and a verdict can be rendered beyond any reasonable doubt, or not. Looking at Fig. 4.4 it can be noted that the output of the decision block follows two paths. The main one is the one discussed above, and results in a suitable action, so that the given goals can be achieved: in forensic metrology this is generally the report to the trier of fact, who can base his or her decision also on factual data, being aware of the risk of a wrong decision. The second path represents a sort of feedback path going backward to the modeling and identification blocks and is aimed at verifying the correctness of the adopted model and, hence, validating the whole measurement process. It can be intuitively perceived that, if the measurement context has been correctly modeled, and no significant environmental properties and significant interactions have been neglected, the quality of the measurement result will meet the initial expectation.8 Should this not be the case, the model has to be refined, until the quality of the expected result is met, thus proving that all goals have been achieved without violating any of the given conditions.
References 1. BIPM JCGM 200:2012: International vocabulary of metrology—basic and general concepts and associated terms (VIM), 3rd edn. (2012). http://www.bipm.org/utils/common/documents/ jcgm/JCGM_200_2012.pdf 2. Mari, L., Carbone, P., Petri, D.: Measurement fundamentals: a pragmatic view. IEEE Trans. Inst. Meas. 61, 2107–2115 (2012) 3. Mari, L., Carbone, P, Petri, D.: Fundamentals of hard and soft measurement. In: Ferrero A., Petri D., Carbone, P., Catelani, M. eds., Modern Measurements: Fundamentals and Applications, vol. 7, p. 400. Wiley-IEEE Press (2015) 4. Mari, L., Petri, D.: The metrological culture in the context of big data: managing data-driven decision confidence. IEEE Inst. Meas. Mag. 20, 4–20 (2017) 5. Sydenham, P.H., Hancock, N.H., Thorn, R.: Introduction to Measurement Science and Engineering. Wiley, Hoboken, NJ (1992) 6. Luce, R.D., Krantz, D.H., Suppes, P., Tversky, A.: Foundations of measurements. Representation, Axiomatization, and Invariance, vol. III. Dover Publications, Mineola, NY (2006) 7. Ferrero, A., Petri, D.: Measurement models and uncertainty. In: Ferrero A., Petri D., Carbone, P., Catelani M. eds., Modern Measurements: Fundamentals and Applications, vol. 1, p. 400. Wiley-IEEE Press (2015) 8. Peterson, M.: An Introduction to Decision Theory. Cambridge University Press, Cambridge, UK (2009)
8
Once again, these general purpose terms will be abandoned in Chap. 5, where measurement uncertainty will be defined and explained, and the initial expectation will be expressed and quantified as the target uncertainty. Therefore, checking whether the initial expectation has been met will quite simply check whether the obtained measurement uncertainty has not exceeded the target uncertainty.
56
4 The Measurement Model
9. Box, G.E.P.: Robustness in the strategy of scientific model building. In: Launer, R.L., Wilkinson, G.N. eds., Robustness in Statistics, vol. 12, p. 312. Academic Press (1979) 10. Jones, A.W.: Medicolegal alcohol determinations—Blood- or breath-alcohol concentration? Forensic Sci. Rev. 12, 23–47 (2000)
Chapter 5
Measurement Uncertainty
5.1 The Background The model of the measurement activity depicted in Chap. 4 highlighted some important concepts. • Measuring does not involve only experimental processes, but also a number of logical steps—the descriptive processes—that must be carefully considered in order to achieve the desired result. • Goals and conditions must be defined too, so that the most suitable processes can be defined and implemented. • Every identified process implies imperfections which affect the quality of the final measurement result that will never, if not by chance, provide the exact true value of the measurand. Chapter 4 proved that a measurement result will always provide incomplete information about the measurand. This last point appears to be critical in general, from a philosophical and ontological perspective, and, in particular, when forensic applications are involved. Indeed, if the measurement activity aims at being a bridge between the empirical world and the world of abstract concepts, and in particular a bridge that carries quantitative knowledge to abstract concepts, admitting that this quantitative knowledge is incomplete and, hence, only an approximation, infringes the very concept of knowledge. The main question that is often asked is: how can incomplete knowledge be knowledge? When forensic activities are involved, this question changes into: if justice asks science and scientific methods for certainty, how can an approximate knowledge about the investigated phenomenon help this search for certainty? On the other hand, we have seen, in Chap. 2, that science has its own limitations, so it can never answer a question with full certainty. However, science has also studied its limitations since the beginning of modern science, and has tried to quantify them. In particular, one of the tasks of metrology is that of identifying the sources of potential errors and quantifying them. © Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_5
57
58
5 Measurement Uncertainty
Actually, incomplete information about a measurand carries a lot of useful information if an estimate on how incomplete it is can be also provided. It can be stated that, in this case, it provides more useful information than the same result absent this estimate for the simple reason that, in the first case, if a decision is based on that information, it is possible to quantify the risk of a wrong decision, while this is not possible in the second case. In the past, the estimate of how incomplete was the information carried by a measurement result was given in terms of measurement errors. A measurement error is defined as the difference ε between the measured value xm of a measurand x and its true value xt , and it is given by the following equation: ε = xm − xt
(5.1)
However, such an approach is flawed by the fact that, according to the model discussed in Chap. 4, the true value is unknown and unknowable; therefore, (5.1) is meaningless in defining the error, since one term in the difference cannot be known. To solve this philosophical paradox, the estimate of how incomplete is the information provided by a measurement result has been more recently defined in terms of measurement uncertainty. As usual, let us refer to the definition given by the VIM [1] at its art. 2.26: non-negative parameter characterizing the dispersion of the quantity values being attributed to a measurand, based on the information used
This definition, although absolutely correct, as we will see, is not really useful in understanding what this non-negative parameter is, and how it can be obtained from the information used. To do so, we can very usefully refer to another extremely important official document issued by the BIPM: The Guide to the expression of Uncertainty in Measurement (GUM) [2], that explains what uncertainty is very clearly, and provides instructions on how to evaluate it. The GUM states, in point 0.2 of its introduction [2]: It is now widely recognized that, when all of the known or suspected components of error have been evaluated and the appropriate corrections have been applied, there still remains an uncertainty about the correctness of the stated result, that is, a doubt about how well the result of the measurement represents the value of the quantity being measured.
We will come back to this sentence in the following, because it refers to a number of key concepts that are extremely useful in understanding the meaning of measurement uncertainty. Let us now focus on the first important concept expressed by this sentence, particularly important in forensic applications: uncertainty in measurement means doubt! Doubt is an important concept also in jurisprudence. It is known that, in both the Common and Civil Law systems, the trier of fact is requested to ponder any doubt he or she may have so that a verdict “beyond and reasonable doubt” can be rendered.
5.2 The Origin of the Doubt
59
It is also known that, if a reasonable doubt is still present, the well-known “in dubio pro reo”1 principle of jurisprudence must be applied and the defendant exonerated. The doubt about the correctness of a measurement result is not dissimilar to this doubt and must therefore be carefully analyzed and, possibly, quantified, especially in forensic applications, when measurement results might be used to support the decision rendered by the trier of fact.
5.2 The Origin of the Doubt The most natural starting point for this analysis is an investigation about where this doubt originates. While discussing the model defined in Fig. 4.4 in Chap. 4, each single block was considered responsible for introducing approximations in the measurement process. They can, hence, be considered as the origin of the doubt, or, more properly, of the different elements that contribute to the doubt. Let us see the most relevant ones, from the uncertainty point of view.
5.2.1 Definitional Uncertainty We have seen, in Sects. 4.3.1 and 4.3.2 of Chap. 4, that the identification and modeling blocks in Fig. 4.4 describe the measurand, the relevant environmental properties, the measurement system, and their mutual interactions. For the several reasons highlighted in those sections, the resulting description of the measurand and environment is imperfect [3]. Let us consider a simple example. Let us suppose that a mechanical part—for instance a rolling bearing—is under investigation for having caused an accident due to out-of-tolerance dimensions. Let us also suppose that the incriminated dimension is the internal diameter that, under the accident circumstances, was greater than specified, caused the shaft to slip and the consequent overheating caused by friction ignited a fire. The apparently simple goal is, therefore, that of measuring the inner diameter of the rolling bearing. There is no doubt that the geometrical model of the bearing internal ring is a hollow cylinder. If this is assumed to be the model of the measurand, one single measurement of the diameter in a randomly chosen point would provide the desired answer. However, do we have enough information on the measurand to exclude some eccentricity? If not, different measurements along different diameters would provide different values. If we take only a single measurement we may introduce an error, due to our lack of details about the measurand. A second problem might be caused by the interaction between the measurand and the environment. It is well known that the dimensions of a metallic part change as 1
This is a famous Latin sentence that means that, if a doubt (dubio) exists, the rendered verdict must be favorable (pro) to the alleged culprit (reo).
60
5 Measurement Uncertainty
temperature changes. Therefore, if the diameter is measured at a different temperature than the actual operating temperature, and the relationship between temperature and thermal expansion is not considered, an error is introduced as well. This contribution to uncertainty is called definitional because it is originated by an imperfect definition of the measurand and its interactions with the environment and the measurement system. It is considered and defined by the VIM [1] in its art. 2.27 as component of measurement uncertainty resulting from the finite amount of detail in the definition of a measurand.
This definition confirms, once again, that the origin of uncertainty lies in the limited amount of information we have on the different steps of a measurement process. In this case, only a finite amount of details is available about the measurand and its interaction with the environment and the measurement system. This means that full knowledge (infinite amount of details) about the measurand is not available. In particular, definitional uncertainty originates in the identification and modeling steps of the measurement activity, and these steps, as shown in Chap. 4, are the preliminary steps of any measurement activity. Thus, they affect all subsequent steps and, consequently, definitional uncertainty represents the lower bound of measurement uncertainty [3]. No matter how good and accurate will be the employed instruments, the final uncertainty cannot be lower than the definitional one. This is also stated by the VIM, in note 1 to the above-mentioned article [1]: Definitional uncertainty is the practical minimum measurement uncertainty achievable in any measurement of a given measurand.
It is now interesting to compare this definition with the definition of foundational validity for a forensic science method given by the US President’s Council of Advisors on Science and Technology in their 2016 report, known in the forensic field as the PCAST report [4], and already discussed in Sect. 3.4 of Chap. 3. This report defines foundational validity as Foundational validity for a forensic-science method requires that it be shown, based on empirical studies, to be repeatable, reproducible, and accurate, at levels that have been measured and are appropriate to the intended application. Foundational validity, then, means that a method can, in principle, be reliable. It is the scientific concept we mean to correspond to the legal requirement, in Rule 702(c), of “reliable principles and methods”.
It can be stated that definitional uncertainty can be used, once evaluated,2 to quantify how reliable a measurement method can be in principle. Indeed, the reliability of a method, intended as its capability of providing repeatable, reproducible, and accurate3 results does obviously depend on the amount of known details in the definition of the quantity to be measured. Therefore, the foundational validity of a forensic science method can be usefully assessed by means of the definitional contribution to uncertainty. 2
The following sections of this chapter will show how the contributions to measurement uncertainty can be evaluated. 3 It will be soon shown that these terms have a definite meaning in metrology.
5.2 The Origin of the Doubt
61
5.2.2 Instrumental Uncertainty One of the core blocks of the measurement model shown in Fig. 4.4 in Chap. 4 is the one related to the experimental processes. It was shown that these processes involve the measuring system, that is, according to VIM [1] art. 3.2, a “set of one or more measuring instruments and often other devices, including any reagent and supply, assembled and adapted to give information used to generate measured quantity values within specified intervals for quantities of specified kinds”. The imperfect behavior of this system and its components, and the interactions with the operator, the measurand and the environment originate another contribution to the doubt, that is another contribution to uncertainty, that is called instrumental measurement uncertainty, or instrumental uncertainty. It is defined by the VIM [1] in its art. 4.24, as the component of measurement uncertainty arising from a measuring instrument or measuring system in use
Instrumental uncertainty is evaluated by means of a procedure, called calibration, that represents one of the fundamental pillars on which metrology is based, and that will be explained and discussed in Chap. 6. Let us here anticipate that calibration implies the comparison of the value measured by the employed measuring system with the value provided by a standard or a reference system. These values are also affected by uncertainty and, therefore, a part of the instrumental contribution to uncertainty is originated outside the employed measurement system and inside the standard or reference system used in the calibration procedure. This means that the uncertainty contribution associated with the standard, or reference, represents the lower bound of instrumental uncertainty, since, as it will be shown in Chap. 6, and as it can be intuitively perceived, no measuring system can show a lower uncertainty than the system used to calibrate it. This also means that calibration is an important and critical part of the experimental processes, and must be carefully considered, since it has a direct impact on measurement uncertainty. Similarly to what was considered when discussing the definitional uncertainty, it is interesting, also for the instrumental uncertainty, to compare its definition to the concepts discussed in the PCAST report [4], and already analyzed in Sect. 3.4 of Chap. 3. In particular, the definition of validity as applied for a forensic science method appears to be quite interesting. The report defines it as Validity as applied means that the method has been reliably applied in practice. It is the scientific concept we mean to correspond to the legal requirement, in Rule 702(d), that an expert “has reliably applied the principles and methods to the facts of the case”.
In metrology, it can be stated that a measuring method has been reliably applied in practice if the provided measurement result satisfies goals and conditions considered by the measurement model. It was shown, in Sect. 4.3.4, that this can be assessed in terms of measurement uncertainty. As stated in the previous section, definitional uncertainty allows us to quantitatively check whether the employed method
62
5 Measurement Uncertainty
is capable, in principle, to meet the goals while satisfying all conditions. On the other hand, instrumental uncertainty, being related to the specific measurement procedure, allows us to quantitatively check whether the employed measuring system is capable, in practice, to meet the goals while satisfying all conditions. Interestingly, the PCAST definition of validity as applied takes into account also the role of the expert—the operator—that implements the forensic science method. In metrology, the operator is considered as part of the measuring system and therefore contributes to the instrumental uncertainty. Evaluating his or her contribution is probably one of the most difficult tasks in metrology, but, once accomplished, it provides a complete and reliable evaluation of the validity of the measurement result.
5.3 The Effects of the Doubt Section 5.2 has investigated the origin of the doubt, but has given only a few hints on how it manifests itself on the measurement result. To understand this, it is possible to refer to the common experience: this tells us that, if the same measurement is repeated several times by the same operator, under the same measurement conditions, slightly different values will be obtained. It is possible to depict this situation by considering what happens, for instance, in archery. If an archer shoots several arrows into the same target, it is very unlikely that they will all hit the same point on the target. More likely, they will hit different points, as shown in Fig. 5.1, where the point where each shot hits the target is represented by a light gray circle. By observing this figure, it is possible to draw some interesting conclusions. All shots appear to hit the target in a region placed slightly above the center of the target and on its right side. The reason might be an incorrect setting of the bow’s aim, or an incorrect shooting posture of the archer, causing the same average shooting error.
Fig. 5.1 The result of repeated shots on a target. Each shot is represented by a circle. The white diamond represents the center of gravity of the whole set of shots
5.3 The Effects of the Doubt
63
This error can be graphically represented by the displacement of the center of gravity of all shots (the white diamond in Fig. 5.1) with respect to the center of the target. It can be also seen that the shots are randomly spread about this white diamond. The reason, in this case, might be some unpredictable random effect, such as a gust of wind or an uncontrolled movement of the archer, that changes from one shot to the other, causing the arrow to hit the target in a different place. It is therefore possible to identify two kinds of effects: a first kind that repeats itself in the same way at every new measurement—the shots in our example—and a second kind that repeats itself in a different way at every new measurement. In metrology, the effects of the first kind are called systematic effects and are responsible for the systematic errors, while the effects of the second kind are called random effects and are responsible for the random errors.
5.3.1 Systematic Errors According to the above example, a systematic error is defined by the VIM [1], in its art. 2.17, as the component of measurement error that in replicate measurements remains constant or varies in a predictable manner.
The VIM, in a note to this definition, specifies also that: systematic measurement error, and its causes, can be known or unknown. A correction can be applied to compensate for a known systematic measurement error.
This is an important point, and it is worth discussing it more in depth. The systematic effects that cause a systematic error are generally due to the effect of some influence quantity that has not been taken into account in the identification and modeling processes, or the effect of some incorrect setting in the experimental process.4 If the measurement procedure is repeated under the same conditions, the influence quantities are supposed to remain unchanged, as well as the experimental setting. This means that the systematic effect does not manifest itself through variations in the measured values and, consequently, the systematic error may remain undetected and unknown. This might yield large and dangerous errors in the measurement result and, therefore, it is recommended to make all possible efforts to identify such errors and correct them. A way to do this is to change the values of the influence quantities and repeat the measurement, or modify the measurement procedure, to force the systematic effect to change and manifest itself through a different value of the systematic error. Another way is to compare the measurement results obtained by means of two independent measuring systems, one of which must ensure negligible systematic effects with respect to the other [3]. 4
If we consider the archery example, this could be the incorrect setting of the bow’s aim.
64
5 Measurement Uncertainty
The interesting point about systematic errors is that, once they have been identified, they can be compensated for, as stated by the VIM. To understand this point, let us consider a simple example. Let us suppose that we have to measure the length of a steel rod, and that the temperature of the lab where the measurement is performed is 23 ◦ C. Let us suppose that the measured value is 1980 mm. Let us also suppose that this rod was a critical part of a structure that collapsed when, due to a fire, its temperature reached 150 ◦ C, and that we have been asked to evaluate the length of the rod at the moment when the structure collapsed. In this case, temperature is an important influence quantity. Indeed, it is well known that the length of a metallic part depends on its temperature and that, being l1 its length at temperature T1 , its length l at temperature T is given by l = l1 · [1 + λ · (T − T1 )]
(5.2)
where λ is called linear expansion coefficient and depends on the material. Therefore, the measured value is correct only when the temperature is the same as that at measurement time. Any other temperature will introduce a systematic error that must be corrected in order to get the correct length. The required correction is obtained from (5.2), taking into account that, for normal steel, it is λ = 1.7 · 10−5 m ◦ C−1 . The length of the rod, at T = 150 ◦ C is, then, l = 1.894 mm. Therefore, if we measure the rod length at 23 ◦ C, and we do not apply the correction for the systematic error due to the different temperature of 150 ◦ C, we make an error of 4 mm! It is also worth noting that this correction takes into account the interaction of an influence quantity— temperature—with the measurand and is therefore part of the model considered in Chap. 4. However, as stated there, the model is also imperfect and therefore the correction can only reduce the impact of the systematic effects on the measurement result, but cannot cancel it totally. A residual uncertainty component remains due to the lack of complete knowledge of the systematic effects [2].
5.3.2 Random Errors The archery example is also helpful to define the random errors, as the errors due to unpredictable or stochastic temporal and spatial variations of influence quantities, such as, for instance, power supply, vibrations, numerical quantization, noise, . . . , that manifest themselves as variations in the values returned by repeated observations of the measurand [3]. This concept can be found in the definition of random errors given by the VIM in its art. 2.19 [1]: component of measurement error that in replicate measurements varies in an unpredictable manner.
5.3 The Effects of the Doubt
65
Of course, due to the unpredictable nature of the effects that generate random errors, they cannot be compensated. However, it can be intuitively perceived that, thanks to their stochastic nature, if the measurement procedure is repeated a statistically significant number of times, under the same measurement conditions, the average value of all measured values (the center of gravity of the shots in Fig. 5.1) will not be affected by a significant random error.
5.3.3 Measurement Repeatability The archery example is also useful in understanding another important concept that has been mentioned by the PCAST report [4] too: measurement repeatability. This concept is defined by the VIM [1] in its art. 2.21 as measurement precision under a set of repeatability conditions of measurement
where the repeatability conditions are defined in art. 2.20 as condition of measurement, out of a set of conditions that includes the same measurement procedure, same operators, same measuring system, same operating conditions and same location, and replicate measurements on the same or similar objects over a short period of time
In short, measurement repeatability quantifies the dispersion of the values returned by a measuring system under the same measuring conditions, including the same operator. The lower is the dispersion—in the archery example, the closer to each other are the dots in Fig. 5.1—the higher is the precision5 of the measurement result.
5.3.4 Measurement Reproducibility In general, the same measurand or similar measurands are measured in different locations, by different operators, at different times and with different measuring systems. The same measurement results are expected, being the measurand the same. However, due to the imperfect description of the measurement model, the same measured values are very unlikely to be returned. This situation can be explained again with the archery example. Let us suppose that two archers are shooting at the same target, with different bows and at different times. The result is shown in Fig. 5.2, where the same shots as those in Fig. 5.1 are reported, together with the new set of shots (white circles) for the second archer.
5
It is worth recalling that measurement precision or precision is defined by the VIM [1], in its art. 2.15, as closeness of agreement between indications or measured quantity values obtained by replicate measurements on the same or similar objects under specified conditions.
66
5 Measurement Uncertainty
Fig. 5.2 The result of repeated shots by two different archers on the same target. Each shot is represented by a circle, with the gray circles representing the shots of archer 1 and the white circles representing the shots of archer 2
It can be noted that the new set of shots is displaced with respect to the previous one, showing that a different systematic error is present and has not been compensated for, and it is slightly less dispersed than the previous one, showing a slightly better repeatability. The capability of a measurement method to provide similar measurement results under different measurement conditions is called measurement reproducibility and is defined by the VIM [1] in its art. 2.25 as measurement precision under reproducibility conditions of measurement
where the reproducibility conditions are defined in art. 2.24 as condition of measurement, out of a set of conditions that includes different locations, operators, measuring systems, and replicate measurements on the same or similar objects.
In short, measurement reproducibility quantifies the dispersion of the values returned by a measurement method under different measuring conditions, including different measuring systems and different operators. The lower is the dispersion—in the archery example, the closer to each other are the two sets of dots in Fig. 5.2—the higher is the precision of the measurement method. Using the same terms as the PCAST report [4] , the more reliable is the method, both in principle and in practice.
5.3.5 A First Important Conclusion The considerations reported in the previous sections allow us to draw a first significant conclusion. A measurement result cannot be expressed by a single value, because this value would be absolutely meaningless, since this same value can be hardly obtained by a new measurement, no matter if repeated under the same conditions or under different conditions. Moreover, a single measured value cannot be used in a comparison and, hence, in a decision-making process. Since all measured values of the same measurand would be
5.4 The Uncertainty Concept
67
different, we would never be able to identify an individual through a DNA profiling, because the results of different labs would be different, if individually taken. Furthermore, the archery example is again useful in showing that it would be meaningless to “measure” the capability of an archer from a single shot. Looking at Figs. 5.1 and 5.2 we can immediately understand that only the complete set of shots is representative of the ability of the archers. We cannot judge their ability from a single shot, since it might be the very lucky shot of a terrible archer, as well as a very unfortunate shot of an excellent archer. Similarly, we cannot assess the value of a measurand from a single measured value, because it might be very different from the true value. It may happen, for instance, that a single measured value is above a given threshold, while repeated measurements would show that the majority of the measured values are below the threshold: can we sentence a defendant on the basis of a single measured value, if there is a significant risk that this value is not correctly representing the measurand? It should be now clear that a measurement result is meaningful only if it is capable of expressing the whole set of values that can be attributed to the measurand. The reason lies in the always existing doubt about how well the result of the measurement represents the value of the quantity being measured [2], a doubt that must be quantified in such a way that the measurement result can also quantify the dispersion of the values obtained by repeated measurements, under the same or different measurement conditions.
5.4 The Uncertainty Concept The concepts explained in the previous sections are quite useful in fully understanding the meaning and implications of the GUM [2] sentence cited in Sect. 5.1. Let us recall it again, for the sake of clarity: It is now widely recognized that, when all of the known or suspected components of error have been evaluated and the appropriate corrections have been applied, there still remains an uncertainty about the correctness of the stated result, that is, a doubt about how well the result of the measurement represents the value of the quantity being measured.
We can now see that this sentence has a twofold meaning. On one side, it warns the operators that they have to identify the components of error and have to apply the appropriate corrections. It was shown, in Sect. 5.3.1, that the only components of error that can be compensated for are the systematic ones. On the other side, it also states that this is not enough to ensure that the measurement result is correct: other effects—namely, the residual doubt on the correctness of the applied correction and the random effects—may affect the measurement procedure so that a doubt still remains about the correctness of the measurement result. The origin and, above all, the effects of this remaining doubt have been discussed in the previous sections. In particular, Sect. 5.3.5 summarized this discussion showing that, because of the remaining doubt about the correctness of any single measured
68
5 Measurement Uncertainty
value, expressing a measurement result with a single value is meaningless. This is also in perfect agreement with the VIM [1] definition of measurement result recalled in Sect. 4.2.1.5 that refers to a set of quantity values. It was also clearly shown that the dispersion of the set of data about their mean value, once the proper corrections for all identified or suspected components of error have been applied, is a good indication of the quality of the measurement result. The VIM [1] definition of measurement uncertainty as a non-negative parameter characterizing the dispersion of the quantity values being attributed to a measurand, based on the information used, appears now, clearer. We still need to clarify the properties of this parameter and the information we can use to evaluate it. Let us start, in this section, with the properties, since they are useful in understanding how to express uncertainty, and let the used information be considered in the next sections. The GUM is, once again, extremely useful in defining these properties. The GUM, in article 0.4 of its Introduction, defines the ideal method for evaluating and expressing uncertainty. It states [2] The ideal method for evaluating and expressing the uncertainty of the result of a measurement should be: – universal: the method should be applicable to all kinds of measurements and to all types of input data used in measurements. The actual quantity used to express uncertainty should be: – internally consistent: it should be directly derivable from the components that contribute to it, as well as independent of how these components are grouped and of the decomposition of the components into subcomponents; – transferable: it should be possible to use directly the uncertainty evaluated for one result as a component in evaluating the uncertainty of another measurement in which the first result is used.
These three properties that measurement uncertainty has to satisfy are quite self-explaining. The first one—universality—is, probably, the most important one. Indeed, the concept of uncertainty in measurement is universal, since, according to the points discussed in Chap. 4 and in Chap. 5, there is always a doubt about the correctness of a measurement result, regardless to the quantity to be measured and the considered data. However, to ensure universality also in practice, and not only in principle, a universal method must be defined to express and evaluate uncertainty, whose validity holds for every kind of measurement. While this is an important point in any scientific and technical application, it becomes critically important in forensic applications, where the applied methods must be universal, to ensure a fair treatment to all subjects and to ensure credibility of the obtained results. The last two properties are important from the evaluation point of view, since they ensure that the final result is not affected by the order the different contributions to uncertainty are evaluated and the way they are grouped into subcomponents. These
5.5 How to Express Uncertainty
69
points will be discussed more in depth in the following sections, when the methods for expressing and evaluating uncertainty will be treated. Art. 0.4 of the GUM states another important point [2]: Further, in many industrial and commercial applications, as well as in the areas of health and safety, it is often necessary to provide an interval about the measurement result that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the quantity subject to measurement. Thus the ideal method for evaluating and expressing uncertainty in measurement should be capable of readily providing such an interval, in particular, one with a coverage probability or level of confidence that corresponds in a realistic way with that required.
GUM articles 0.2 and 0.4 imply much more than their direct and most immediate meaning. Art. 02 implies that, after correction of all systematic effects, the only remaining effects are the random ones. Therefore, the most immediate mathematical way to express a quantity that varies in a stochastic way is to express it as a random variable inside the probability theory. Under this assumption, probability is a very powerful tool to represent how the measured value distribute and to quantify how large this distribution is, as well as how probable is that the measurand value lies inside an interval of the measured values. The terms coverage probability and level of confidence used in art. 0.4 are also typical terms of the theory of probability and confirm that this theory is the one adopted by the GUM [2] and the scientific and technical community of metrologists to evaluate and express measurement uncertainty. Let us see how.
5.5 How to Express Uncertainty According to the above considerations, we have to refer to probability and statistics to evaluate and express measurement uncertainty. Therefore, we need to introduce a few basic concepts of probability needed to understand how it is used to express uncertainty. In doing so, we will try to limit the mathematical derivations to a minimum, and explain, in a way as simple as possible, the concepts that are behind the equations, so that also non-technical people can perceive their meaning. The readers who wish to understand the mathematical concepts in deeper details are addressed to the bibliography.
5.5.1 Random Variables and Probability Distributions In probability, a random variable, or stochastic variable, is a variable whose possible values are the outcomes of a random phenomenon, that is, a phenomenon that manifests itself in a random, unpredictable way [5]. An example of a random phenomenon is tossing a fair die: the outcome, that is, the face that will be shown, is unpredictable,
70
5 Measurement Uncertainty
but we can intuitively state that, having the die six faces, the probability of each face to show up is 1 over 6, that is, 1/6 or 0.1666 In a stricter mathematical form, a random variable X , such as X : Ω → E, is a measurable function from a set of possible outcomes Ω to a measurable space E. In the previous, very simple example, space Ω is defined by the six faces of the die, the measurable space E is the space of real number, and X takes the same value (1/6) for each outcome. More generally, the probability that X takes a value on a subset S of E, with S ⊆ E, is given by Pr (X ∈ S) = P ({ω ∈ Ω|X (ω) ∈ S})
(5.3)
where P is called probability measure and represents the distribution of variable X . The above example considers a finite number of possible outcomes, so that X is a discrete variable and its distribution P is given by a so-called probability mass function [5] . More in general, the space of possible outcomes is uncountably infinite. In this case, X becomes a continuous variable. In the most common case, that will be considered here, of absolutely continuous variable, its probability distribution is described by a probability density function p (x). In this case, the probability that X takes value on interval a ≤ X ≤ b is given by Pr (a ≤ X ≤ b) =
b
p (x) d x
(5.4)
a
There are several possible probability density functions that describe the probability distribution under different assumption. Two functions are more commonly used in probability: the normal (or Gaussian) distribution and the uniform distribution. • Normal distribution. The normal probability density function is given by the following equation: N (x) = √
1 2π σ 2
e
−(x−μ)2 2σ 2
(5.5)
where μ is the mean value of the distribution, and σ 2 is a parameter, called variance, that represents how wide the distribution is. Low values of σ 2 yield a distribution of probable values that remain close to the mean value μ, while larger values of σ 2 yield a distribution that spreads over a larger interval. The positive square root of σ 2 , that is σ , is called standard deviation of the distribution. Figure 5.3 shows an example of normal probability distribution functions plotted for different values of σ . The normal probability density function represents the behavior of a purely random phenomenon and plays a central role in probability and metrology, as it will be shown later.
5.5 How to Express Uncertainty
71
4
σ = 0.1
3.5 3
p(x)
2.5 2
σ = 0.2
1.5 1
σ = 0.5
0.5 0 -1.5
-1
-0.5
0
0.5
1
1.5
x Fig. 5.3 Example of normal probability distribution functions, plotted for μ = 0 and three different values of σ
• Uniform distribution. The uniform probability density function is defined as p (x) =
1 b−a
0
for a ≤ x ≤ b for x < a and x > b
(5.6)
Figure 5.4 shows an example of uniform probability distribution function, plotted in the case of a = −1 and b = 1. The uniform probability density function is employed when no specific information is available on the way X distributes over a finite interval. In probability, it represents the case of total ignorance.
5.5.1.1
Properties of the Probability Distributions
Normalization Property The first important property of probability distribution functions reflects the fact that the probability of all possible outcomes is 1. This reflects the fact that, even if we do not know which outcome will be generated, we are sure that at least one outcome will be generated. From a strict mathematical point of view, this means that all possible probability density functions p (x) must satisfy the following equation:
+∞
−∞
p (x) d x = 1
(5.7)
72
5 Measurement Uncertainty 0.7
0.6
0.5
p(x)
0.4
0.3
0.2
0.1
0 -1.5
-1
-0.5
0
0.5
1
1.5
x Fig. 5.4 Example of uniform probability distribution function, plotted for a = −1 and b = 1
This integral has also a geometrical interpretation: it provides the area between p (x) and the horizontal axis, and the normalization condition forces this area to be always equal to 1. This explains why, in Fig. 5.3, the different probability density functions show lower peak values as their variances increase: since the area (total probability) has to remain equal to 1, the peak value has to decrease as the width increases. For the same reason, the amplitude of the uniform distribution in (5.6) is equal to 1 : since the area must be 1, and the shape of the uniform distribution is a rectangle b−a 1 . with base equal to b − a, its height cannot be other than b−a Mean Value and Variance We have already introduced the mean value μ and the variance σ 2 of a random variable X as the parameters of the normal probability density function (5.5). These two concepts can be extended to all probability density functions as follows. The mean value of a continuous random variable X , described by a continuous probability density function p (x), is given by μ=
+∞
−∞
x · p (x) · d x
(5.8)
In probability, this value is also called the mathematical expectation, or expectation, E (X ) of X and it represents a sort of location parameter for the distribution, showing the value on which probability focuses. This value represents also the maximum likelihood value for X [5]. If the random variable is discrete and is represented by K values of probability mass function pi , i = 1, · · · , K , its mean value and expectation are given by
5.5 How to Express Uncertainty
73
μ = E (X ) =
K
xi · pi
(5.9)
i=1
It is worth noting that a discrete probability distribution can be generated by taking K outcomes of a continuous distribution. In this case, the obtained mean value μ¯ is called sample mean and can be considered an estimate of the expectation of the continuous distribution that can be obtained when K → ∞. Of course, the greater is K , the better is the estimate. The variance of a random variable X , described by a continuous probability density function p (x), is defined as the expectation of the squared deviation from the mean of X : V ar (X ) = σ 2 = E (X − μ)2
(5.10)
By expanding (5.10), we obtain +∞ x 2 · p (x) · d x − μ2 σ2 =
(5.11)
−∞
It is immediate to check that, if μ = 0, that is the distribution has zero mean value, (5.11) becomes +∞ 2 x 2 · p (x) · d x (5.12) σ = −∞
In probability, the variance represents a second important parameter of a random variable. According to (5.10), we can assign it the meaning of a measure of the dispersion of the possible outcomes of X about its mean value. According to (5.12), the variance of a random variable represented by a uniform distribution with zero mean value over an interval [−a; a] is given by σu2 = a 2 /3. If the random variable is discrete and is represented by K values of probability mass function pi , i = 1, · · · , K , its variance is given by K σ2 = xi2 · pi − μ2 (5.13) i=1
Similar to the case of continuous variable, if μ = 0, (5.13) becomes σ = 2
K
xi2 · pi
(5.14)
i=1
It is worth noting that, similar to the mean value, if the discrete probability distribution is generated by taking K outcomes of a continuous distribution, the obtained variance value s 2 is called sample variance and can be considered an estimate of the
74
5 Measurement Uncertainty
variance σ 2 of the continuous distribution that can be obtained when K → ∞. Of course, the greater is K , the better is the estimate. Covariance Up to now, we have considered only a single random variable X . However, there are several situations that require more than one random variable to be fully described. In such a case, the way such variables combine is strongly influenced by the possible mutual dependence of the outcome of one variable on those of another one. In probability, this dependence is described by a quantity called covariance. Let us consider two random variables X and Y , with mean values μ X and μY , respectively. The covariance between these two variables is defined as Cov (X, Y ) = σ (X, Y ) = E [(X − μ X ) · (Y − μY )]
(5.15)
If the variances σ (X ) and σ (Y ) of X and Y , respectively, are considered, covariance (5.15) satisfies the following properties: σ (X, Y ) = σ (Y, X )
(5.16)
− σ (X ) · σ (Y ) ≤ σ (X, Y ) ≤ σ (X ) · σ (Y )
(5.17)
and
Without entering into the mathematical details, that might become quite complex to follow, let us only discuss the significance of the limit cases of (5.17). When σ (X, Y ) = σ (X ) · σ (Y ) the two random variables are directly correlated. This means that if variable X takes values greater than its mean value, variable Y will surely take values greater than its mean value, and vice versa. Conversely, when σ (X, Y ) = −σ (X ) · σ (Y ), the two random variables are inversely correlated. This means that if variable X takes values greater than its mean value, variable Y will surely take values lower than its mean value, and vice versa. When σ (X, Y ) = 0 there is no correlation between X and Y , and this means that the two variables are not influencing each other. Since the possible values for σ (X, Y ) are limited by (5.17), it is useful to normalize the covariance to the product of the variances. Therefore, the correlation coefficient is defined as r (X, Y ) =
σ (X, Y ) σ (X ) · σ (Y )
(5.18)
Due to (5.16) and (5.17), it is r (X, Y ) = r (Y, X ) and −1 ≤ r (X, Y ) ≤ 1. Figure 5.5 shows four possible cases of correlation between random variables, from no correlation to full direct correlation. The plots show the values taken by the two variables and it can be readily checked that, when there is no correlation, they scatter, in the X, Y -plane, around the mean value of the distribution (0 in this case),
5.5 How to Express Uncertainty
75
r=0
4
2
2
Y
r = 0.45
3
1 0
Y
-2 -4 -3
0 -1
-2
-1
0
1
2
-2 -3
3
-2
-1
0
X
r = 0.71
2
3
1
0
Y
-1 -2 -2
2
r=1
2
1
Y
1
X
0 -1
-1.5
-1
-0.5
0
X
0.5
1
1.5
2
-2 -2
-1.5
-1
-0.5
0
0.5
1
1.5
X
Fig. 5.5 Example of different degrees of correlation between two random variables
while, as the correlation coefficient increases toward 1, they tend to thicken about the 45◦ line in the X, Y -plane. Coverage Intervals Having defined a random variable X and the associated probability density function p (x), it is also possible to define a coverage interval as an interval [a, b] |a ≤ x ≤ b on x. The coverage probability, that is, the probability that an outcome of X falls inside interval [a, b], is given by (5.4). From a geometrical point of view, this coverage probability is given by the area between p (x) and the horizontal axis, for a ≤ x ≤ b. Figure 5.6 shows a coverage probability drawn from −σ to +σ under a normal probability density function with zero mean value. The coverage probability, corresponding to the gray area in Fig. 5.6, is obtained by applying (5.4) to this probability density function and returns +σ −x 2 1 e 2σ 2 d x = 0.683 (5.19) Pr (−σ ≤ X ≤ +o) = √ 2π σ 2 −σ This is a well-known general result in probability [5] and shows that a coverage interval of width ±σ about the mean value has a coverage probability of 0.683 when the probability density function is normal.
76
5 Measurement Uncertainty 2 1.8 1.6 1.4
p(x)
1.2 1 0.8 0.6 0.4 0.2 0 -1.5
-1
-0.5
-σ
0
+σ
0.5
1
X
1.5
Fig. 5.6 Example of coverage interval [μ − σ, μ + σ ] for a random variable with normal probability density function
5.5.2 Standard Uncertainty All previous considerations can be summarized into the following points: 1. The true value of the measurand is unknown and unknowable. 2. Repeated measurements of the same measurand provide different values. 3. If all systematic effects have been corrected for, the different values returned by repeated measurements are due to random effects only. 4. Under these assumptions, each measured value can be mathematically modeled as a single realization of a random variable. 5. According to this model, the measurement result can be mathematically represented by a random variable and the distribution of all measured values can be represented by a probability density function. 6. The standard deviation of the probability distribution provides a quantitative estimate of the dispersion of the measured values about their mean value. According to these points, the standard deviation of the probability density function appears to be the most immediate choice for a quantitative measure of the uncertainty associated to the values that can be attributed to the measurand. Consequently, the GUM [2], in its art. 2.3.1, defines the standard uncertainty as uncertainty of the result of a measurement expressed as a standard deviation.
The suggested notation for the standard uncertainty of a measurement result x is u (x). If the probability density function of the distribution of measured values is also known, a coverage probability (or level of confidence) can be assigned to interval [x¯ − u (x) , x¯ + u (x)] built about the distribution mean value x, ¯ as shown in
5.5 How to Express Uncertainty
77
Sect. 5.5.1.1. Therefore, a coverage interval can be obtained, starting from the standard uncertainty, as recommended by the GUM [2] in its art. 0.4. Of course, the coverage probability is strictly dependent on the considered probability density function. As shown again in Sect. 5.5.1.1, if the probability density function is normal, then the coverage probability is 0.683 or 68.3%.
5.5.3 Expanded Uncertainty and Coverage Factor The standard uncertainty meets the requirement of GUM art. 0.4 [2] only partially, since it allows one to build only a single coverage interval, that does not generally encompass a large fraction of the distribution of values that could reasonably be attributed to the quantity subject to measurement (if the probability density function is normal, it encompasses only the 68.3% of these values). However, considering (5.4), (5.19), and Fig. 5.6, if a larger interval is considered about x¯ than interval [x¯ − u (x) , x¯ + u (x)], a larger fraction of the distribution of values that could reasonably be attributed to the quantity subject to measurement is expected to fall inside this interval. Such an interval is called, by the GUM, expanded uncertainty and is defined, in GUM art. 2.3.5 [2], as quantity defining an interval about the result of a measurement that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the measurand.
The suggested notation for the expanded uncertainty of a measurement result x is U (x). Interval [x¯ − U (x) , x¯ + U (x)] about x¯ is still a coverage interval, since a probability density function is always associated with the measurement result, and the coverage probability can be evaluated by applying (5.4) with a = x¯ − U (x) and b = x¯ + U (x). The GUM states, in Note 1 to art. 2.3.5 that the fraction of the distributions of values considered in the definition of expanded uncertainty may be viewed as the coverage probability or level of confidence of the interval [2]. It is extremely important to note that, given the expanded uncertainty, the coverage probability of the resulting coverage interval [x¯ − U (x) , x¯ + U (x)] about x¯ depends on the probability density function p (x) associated with the measurement result [3, 6]. Indeed, being the coverage probability given by (5.4), the critical role played by p (x) is evident. Indeed, the GUM, in Note 2 to art. 2.3.5 states [2] To associate a specific level of confidence with the interval defined by the expanded uncertainty requires explicit or implicit assumptions regarding the probability distribution characterized by the measurement result and its combined standard uncertainty. The level of confidence that may be attributed to this interval can be known only to the extent to which such assumptions may be justified.
78
5 Measurement Uncertainty
Table 5.1 Value of the coverage factor K p that produces an interval having level of confidence p when assuming a normal distribution Level of confidence p [%] Coverage factor K p 68.27 90 95 95.45 99 99.73
1 1.645 1.960 2 2.576 3
This point is rather obvious, from a strict mathematical point of view, and comes directly from having represented a measurement result with a random variable: a random variable is defined in terms of its probability density function, and therefore no probability value can be given without knowing this function. Unfortunately, in practice, this point is often neglected, so it is important to stress it. Since interval [x¯ − U (x) , x¯ + U (x)] is generally obtained by suitably expanding interval [x¯ − u (x) , x¯ + u (x)], it is often convenient to derive the expanded uncertainty directly from the standard uncertainty, by multiplying it by a suitable coverage factor K , so that U (x) = K · u (x) (5.20) The GUM, in its art. 2.3.6, defines the coverage factor as [2] numerical factor used as a multiplier of the combined standard uncertainty in order to obtain an expanded uncertainty.
Knowing the probability density function associated with the measurement result, it is possible to associate a level of confidence to K . For instance, Table 5.1 shows the values of the levels of confidence that can be associated to some common values of the coverage factor K , when a normal probability distribution is assumed. Table 5.1 considers the same values as those considered by the GUM in its Table G.1 [2].
5.5.4 Methods for Evaluating Standard Uncertainty The GUM recommends two methods for evaluating standard uncertainty, defined as follows: • Type A evaluation (of uncertainty)—GUM art. 2.3.2 [2] method of evaluation of uncertainty by the statistical analysis of series of observations. • Type B evaluation (of uncertainty)—GUM art. 2.3.3 [2] method of evaluation of uncertainty by means other than the statistical analysis of series of observations.
5.5 How to Express Uncertainty
5.5.4.1
79
Type A Evaluation of Uncertainty
As already stated, if we assume that the result of a measurement has been corrected for all recognized significant systematic effects and that every effort has been made to identify such effects (GUM art. 3.2.4 [2]), each measured value can be considered as a realization of the random variable representing the measurement result. Under such assumptions, the distribution of values that characterizes the random variable can be evaluated through the statistical analysis of repeated observations [5]. As discussed in Sect. 5.5.1.1, the best available estimate of the expectation E (X ), that is, the expected value, of a random variable X is given, when n independent observations xk obtained under the same measurement conditions are available, by the arithmetic or sample mean of such observations: n 1 xk x¯ = n k=1
(5.21)
Therefore, the arithmetic mean x¯ returned by (5.21), being the best estimate of the expected value of the measurement result, is taken as its numerical value. However, we know that the measurement result is represented by the whole set of values and that, if we want to represent it in a more synthetic way, we must characterize the variability of the individual observations, that is, their dispersion, through a suitable parameter. Section 5.5.1.1 proved that this parameter is the standard deviation σ (or the positive square root of the variance σ 2 ) of the probability density function associated with variable X . According to statistics [5], a good estimate of σ 2 is provided by the experimental or sample variance of the n independent observations given by 1 ¯ 2 (xk − x) n − 1 k=1 n
s 2 (xk ) =
(5.22)
While this is representative of the dispersion of the measured values, none of these values was taken as the measurement result, since the sample mean x¯ was taken as the numerical value of the measurement result. This mean value was obtained starting from n independent observations of X . A different set of n observations could have been taken and would have provided a slightly different mean value. If x¯ is taken as the numerical value of the measurement result, we are interested in evaluating the dispersion of the possible values that this mean value may take. Statistics helps us again, since it can be proved [5] that the best estimate of the variance of the possible mean values of different sets of observations is given by ¯ = s 2 (x)
s 2 (xk ) n
(5.23)
80 Table 5.2 Measured voltage values
5 Measurement Uncertainty Observation 1 2 3 4 5 6 7
Measured value [V] 7.499 7.501 7.497 7.506 7.503 7.502 7.497
Therefore, the standard uncertainty associated with the measured value x¯ is given, when a Type A evaluation method of uncertainty is used, by the standard deviation obtained by (5.23) as ¯ (5.24) u (x) ¯ = s 2 (x) The major drawback of this Type A evaluation method of standard uncertainty is the need for independent repeated measurements, whose number shall be large enough to be statistically significant, that is, to ensure that x¯ is a good estimate of the expectation E (X ) of the random variable X representing the measurement result. This condition cannot be always satisfied, mainly because of time constraints or a poor repeatability of the measurement result. Moreover, the repeated observations are usually obtained in a relatively short period of time. Therefore, they are affected only by the short-term variations of the influence quantities and the contributions to uncertainty due to variations of the influence quantities that manifest themselves on a longer period of time (such as temperature, for instance) are not generally taken into account by this method, unless the observation period is significantly extended.6 Numerical Example To fully understand the above theoretical derivations, let us suppose to evaluate the standard uncertainty of a digital voltmeter with a Type A evaluation method. Let us suppose to measure a dc voltage V by setting the voltmeter range to 10 V. Let us also suppose to take n = 7 independent measurement values, as shown in Table 5.2. The arithmetic mean of the values in Table 5.2 is V¯ = 7.501 V and is taken as the numerical value of the measurement result. The standard deviation of the observations is σk = 3.2 mV and, therefore, the standard deviation of the mean is σ = √σk7 = 1.2 mV. This means that the standard uncertainty of the measurement
result evaluated with a Type A method is given by u A V¯ = 1.2 mV. 6
While this is possible, in principle, it may require an observation period that is not compatible with the economical or time constraints. Therefore, a different method should be used, if the variations of influence quantities that do not generally manifest themselves in the short term have to be taken into account too.
5.5 How to Express Uncertainty
81
To verify that this is a good estimate of the dispersion of the possible mean values obtained from different sets of n = 7 independent observations, 10 different sets of measurements of the same voltage have been considered and the sample standard deviation σ¯ of these mean values has been computed. The result was σ¯ = 1.1 mV, thus experimentally confirming that (5.23) provides a good estimate of the variance of the mean.
5.5.4.2
Type B Evaluation of Uncertainty
As discussed in Sect. 5.5.4.1, a Type A evaluation method of uncertainty is not always viable, since it is not always possible to collect a sufficient number of independent observations of measurand X , or it is not possible to collect them over a sufficiently long period of time to allow all possible effects to manifest themselves with variations of the measured values. Nevertheless, standard uncertainty has to be evaluated under these conditions too. This can be done, according to the GUM [2], by scientific judgment based on all of the available information on the possible variability of X . Always according to the GUM, the pool of information may include [2] the following: – previous measurement data; – experience with or general knowledge of the behavior and properties of relevant materials and instruments; – manufacturer’s specifications; – data provided in calibration and other certificates; – uncertainties assigned to reference data taken from handbooks. As stated in Sect. 5.5.2, standard uncertainty provides a good estimate of the distribution of the values that can be attributed to the measurand. However, this estimate is not complete and does not carry information about the probability that the measurand value lies inside a given interval, unless assumptions are made about the probability density function associated with this distribution. Therefore, the scientific judgment based on the available pool of information must also identify the most suitable probability density function that models the way the possible values that can be attributed to the measurand distribute. If such assumption can be made, then the expanded uncertainty can be obtained from the standard uncertainty and the coverage probability associated to the coverage interval obtained starting from the expanded uncertainty can be evaluated as well. Sometimes, the pool of available information yields a coverage interval directly, as in the next numerical example. In such a case, the assumption on the probability density function is still needed to associate a coverage factor to the given interval and retrieve the standard uncertainty value. In any case, the correctness of the obtained uncertainty (standard and expanded) values is strongly dependent on the reliability of the available information and the way it is used. Nonetheless, the GUM states that [2]
82
5 Measurement Uncertainty The proper use of the pool of available information for a Type B evaluation of standard uncertainty calls for insight based on experience and general knowledge, and is a skill that can be learned with practice. It should be recognized that a Type B evaluation of standard uncertainty can be as reliable as a Type A evaluation, especially in a measurement situation where a Type A evaluation is based on a comparatively small number of statistically independent observations.
It is worth noting the weigh and importance assigned, by this statement, to the experience and qualification of the operator that performs the measurement and evaluates the related uncertainty. In forensic applications, experience and qualification must be clearly assessed and should also include the capability of clearly explaining the main points on which the Type B evaluation is based and the main implications of the evaluated uncertainty to the trier of fact, who is generally a layman in metrology. Numerical Example Let us consider the same kind of measurement as the one in the numerical example in Sect. 5.5.4.1: the aim is still that of measuring a dc voltage V by means of a digital multimeter on a measuring range of 10 V. Differently from the example in Sect. 5.5.4.1, let us assume that only one measurement can be performed, and let us suppose that the measured value is the same as the third value shown in Table 5.2: Vm = 7.497 V. Under these conditions, standard uncertainty cannot be evaluated by the statistical analysis of a series of data, as in Sect. 5.5.4.1, and it must be evaluated by referring to the available information. Let us suppose that, in this case, the only available information is provided by the manufacturer’s accuracy specifications as an interval of possible values, about the measured value, whose half-amplitude a is given by 0.025% of the reading + 2 digits.7 Therefore, according to the considered reading Vm and the fact that one digit, in the selected range, corresponds to 1 mV, it is a = 7.497 · 25 · 10−4 + 2 · 10−3 = 3.9 mV
(5.25)
In general, this value is obtained by means of a deep analysis of the manufacturing process and calibration of a statistically significant number of, or even all produced instruments. Therefore, interval Vm ± a can be assumed to encompass all values that can reasonably be attributed to the measurand. Since no other information is available on how these possible values distribute over interval Vm ± a, a uniform probability density function can be associated to this interval, since such a function describes, in probability, the lack of specific information. According to (5.11), the variance of a random variable V described by a uniform probability distribution over interval Vm ± a is given by σ 2 = a 2 /3. Therefore, the 7
It is worth noting that this is a very common way of specifying the accuracy of a measuring instrument. This expression quantifies the effects, on the instrument reading, of variations in the influence quantities (such has ambient temperature and instrument aging) with respect to calibration conditions. The coefficients in this expression are generally obtained by means of a statistical analysis of the whole lot of manufactured instruments and depend on both the instrument sensitivity to the influence quantities and the magnitude of their admissible variations [3].
5.5 How to Express Uncertainty
83
Type B evaluation method of standard uncertainty yields, in this case and given (5.25): a u (Vm ) = √ = 2.2 mV 3 5.5.4.3
(5.26)
Combining Standard Uncertainty Values Obtained by Means of Type A and Type B Evaluation Methods
As stated in Sects. 5.5.4.1 and 5.5.4.2, the two evaluation methods—Type A and Type B— suggested by the GUM [2] to evaluate standard uncertainty take into account, generally, different contributions to uncertainty: Type A method is likely to stress the effects due to short-term variations of the influence quantities, while Type B method is more likely to stress the sensitivity to variations of the influence quantities in the whole range of their admissible variations. For instance, in the dc voltage measurement example considered in Sects. 5.5.4.1 and 5.5.4.2, the Type B evaluation method, being based, in that case, on the manufacturer’s specifications, takes into account the effects of the different values that the influence quantities may take, at measurement time, with respect to the value taken at calibration time. The influence quantities—temperature, for instance—show usually very little variations when measurements are repeated over a short duration time interval and are not detected by a Type A evaluation method. On the other hand, the values returned by repeated measurements in a Type A evaluation method are supposed to be affected by the stochastic, short-term variations of the influence quantities—such as noise superimposed to the input voltage, due to electromagnetic interferences—at measurement time and in the specific measurement conditions. Therefore, a better estimate of measurement uncertainty can be attained by combining the standard uncertainty values provided by both Type A and Type B methods, if available. Always considering the dc voltage measurement example, let us call u A (V ) the standard uncertainty value provided by the Type A method, and u B (V ) the standard uncertainty value provided by the Type B method. Since the effects taken into account by the two evaluation methods are due to different physical phenomena, they can be assumed to be statistically independent, and hence not correlated. Under these assumptions, the two standard uncertainty values combine quadratically, as (5.27) u (V ) = u 2A (V ) + u 2B (V ) In the considered example, u A (V ) = 1.2 mV was evaluated by the Type A method, and u B (V ) = 2.2 mV was evaluated by the Type B method, and their combination, according to (5.27), yields u (V ) = 2.5 mV.
84
5 Measurement Uncertainty
5.5.5 Combined Standard Uncertainty Until now, we have considered the simplest case in which the required information could be obtained by measuring a single measurand: in the considered examples, the diameter of a rolling bearing, the length of a steel rod, a dc voltage. However, in most situations, a measurand cannot be measured directly, but through the measurement of other quantities. An immediate example is speed: it is well known that the speed of a body—for instance a vehicle—can be measured by measuring the distance traveled and the time taken to travel that distance. The speed is then obtained as the ratio between these two measured quantities. According to all considerations reported in the previous sections, the measured quantities—distance and time in this example—are affected by measurement uncertainty and, therefore, their measurement must include also the evaluation of a standard uncertainty value. However, we are interested in knowing the uncertainty value associated with the measured speed, not only the values associated with distance and time. The GUM requirements [2] recalled in Sect. 5.4 state that the quantity used to express uncertainty must be internally consistent and transferable. This means that, having estimated the standard uncertainty values associated with distance and time, we must be capable of suitably combining them to evaluate the standard uncertainty value associated with speed. The obtained result is called combined standard uncertainty and can be derived, as suggested by the GUM [2], by exploiting the mathematical properties of the random variables, their expectations and their variances and covariances. This requires some mathematical derivations that are here reported for the sake of completeness. The non-technical readers that are not familiar with the employed notation can skip the mathematical derivation and focus on the subsequent practical implications, reported in Sect. 5.5.5.3.
5.5.5.1
Mathematical Derivation
Let us suppose that a measurand Y cannot be measured directly, and is determined from N other quantities X 1 , X 2 , . . . , X N , through a functional relationship f : Y = f (X 1 , X 2 , . . . , X N )
(5.28)
Generally speaking, quantities X 1 , X 2 , . . . , X N are themselves measurands and may depend on other quantities. In the following, for the sake of clarity and without losing in generality, no further dependence on other quantities will be considered.8
Should any of the X 1 , X 2 , . . . , X N quantities depend on other quantities, the method described in this section is iteratively applied.
8
5.5 How to Express Uncertainty
85
An estimate y of measurand Y is obtained by applying (5.28) to the estimates x1 , x2 , . . . , x N of the input quantities X 1 , X 2 , . . . , X N . Therefore, the measured value y is given by y = f (x1 , x2 , . . . , x N )
(5.29)
Having supposed that the input estimates x1 , x2 , . . . , x N in (5.29) are measurement results, they have an associated standard uncertainty value u(xi ), i = 1, . . . , N . Intuitively, these standard uncertainty values contribute to the value of the standard uncertainty u (y) associated to the final measurement result y. To derive u (y) in a strict mathematical way, let us suppose that function f (·) in (5.29) is fairly linear about the measured value y, at least for small deviations of each input quantity X i about their estimates xi .9 Under this assumption, it is possible to represent function f (·) with the first-order term of a Taylor series expansion [2, 3] about the expectations E (X i ) = μxi of the input quantities. The deviation about the expectation E (Y ) = μ y is given by y − μy =
N ∂f
· x i − μxi ∂ xi i=1
(5.30)
where μ y = f μx1 , μx2 , · · · , μx N . It is worth noting that the Taylor series expansion would require the partial derivatives in (5.30) to be evaluated in the expected values of the input quantities that are generally unknown. In practice, they are evaluated in their estimates x1 , x2 , . . . , x N , since it is assumed that they are reasonably close to the expected values when measurement uncertainty is not a large faction of the measured values. The evaluation of (5.30) requires also that at least one partial derivative takes non-zero values. Should all derivatives be zero or should function f (·) be strongly non-linear, higher order terms must be considered in the Taylor series expansion of f (·).10 Equation (5.30) can be squared, thus yielding
9
y − μy
2
2 N ∂f
= · x i − μxi ∂ xi i=1
(5.31)
This assumption is generally plausible, since the considered variations of X i about xi are due to the possible dispersion of the measured values caused by uncertainty. Since this dispersion is generally small with respect to the considered value, almost all functions can be linearly approximated for small variations about the considered value. 10 Luckily, this is not the case in the vast majority of the practical cases and therefore we consider, here, only the cases for which (5.30) holds.
86
5 Measurement Uncertainty
By expanding the square of the polynomial in (5.31), we get
y − μy
2
=
N N N −1 2
∂f
∂f ∂f 2
· xi − μxi + 2 · · xi − μxi · x j − μx j ∂ xi ∂ xi ∂ x j i=1
i=1 j=i+1
(5.32) By taking into account that
2 = σ y2 is the variance of y; • E y − μy
2 = σx2i is the variance of xi ; • E x i − μxi
• E xi − μxi · x j − μx j = σi, j = σ j,i is the covariance of xi and x j , equation (5.32) can be rewritten in terms of the above expectations, also taking into account that the standard deviations are, actually, standard uncertainties, as
u 2c (y)
=
N ∂f 2 i=1
∂ xi
N N −1
∂f ∂f · u (xi ) + 2 · · u xi , x j ∂ x ∂ x i j i=1 j=i+1 2
(5.33)
where u xi , x j = u x j , xi is the estimated covariance associated with xi and x j .
5.5.5.2
The Law of Propagation of Uncertainty
By considering the correlation coefficient (5.18), (5.33) can be rewritten as
u 2c (y) =
N ∂f 2 i=1
∂ xi
· u 2 (xi ) + 2
N −1 N
∂f ∂f · · u (xi ) · u x j · r xi , x j ∂ xi ∂ x j i=1 j=i+1
(5.34) Equation (5.34) is known as the law of propagation of uncertainty as defined by the GUM [2] and provides the combined standard uncertainty u c (y) associated with measurement result y, as a function of the standard uncertainties u (xi ) associated with the input quantities—that is the actual measured quantities—xi , their possible correlation, expressed by the correlation coefficient r , and the partial derivatives of function f (·) with respect to each input quantity. It is worth noting that the partial derivatives in (5.34) quantify the sensitivity of y to variations in the input quantities xi and show how tolerant a measurement process is to errors in measuring one of the input quantities: the higher is the derivative, the lower has to be the uncertainty associated to that quantity, not to have a large dispersion on the measured y values. It is also worth noting that correlation may increase or decrease uncertainty on y according to the sign of the correlation coefficient, and also according to the sign of
5.5 How to Express Uncertainty
87
the partial derivatives. This means that the same amount of correlation may increase uncertainty in one measurement procedure and decrease it in another procedure, according to the employed measurement function f (·).
In the case of independent estimates xi and x j , it is r xi,x j = 0 for all input quantities, and (5.34) simplifies into u 2c (y)
=
N ∂f 2 i=1
∂ xi
· u 2 (xi )
(5.35)
which represents the law of propagation of uncertainty under the particular condition of no correlation between the input quantities.
5.5.5.3
Implications
According to the mathematical derivation reported in Sect. 5.5.5.1, the law of propagation of uncertainty (5.34), as well as its particular case (5.35), provides the combined standard uncertainty value u c (y) to be associated to a measurement result y. While this derivation is absolutely strict, under the assumptions made in Sect. 5.5.5.1, and provides the correct value of u c (y), it must be underlined that it cannot provide any information about the distribution of the values that can be attributed to Y , but only an estimate u c (y) of its standard deviation. The important consequence of this consideration is that no information is available to use the combined standard uncertainty or any of its multiples to define an interval, about the measured value y, with a given coverage probability. To do so, the probability density function associated to Y must be known. In principle, it could be obtained by combining the probability density functions associated to X 1 , X 2 , . . . , X N according to (5.28). Unfortunately, this can be obtained, in closed form,11 only for a few classes of functions f (·) and requires, in general, cumbersome computations. To overcome this problem, the GUM [2] suggests to refer to an important theorem in probability: the Central Limit Theorem (CLT). This theorem states [5] that, if function f (·) in (5.28) is linear, none of the considered σ X2 i variances dominates over the other ones,12 and N → ∞, then the probability density function associated to Y is normal, no matter on the shapes of the single probability density functions associated to X i .
11
In mathematics, a result can be obtained in closed form if formulas are available that relate the result to the input variables. If this is not the case, only numerical approximations are generally available, as it will be shown in Sect. 5.5.6. 12 From the practical metrological point of view, this means that all standard uncertainty values u (xi ) associated to the measured values xi show the same order of magnitude.
88
5 Measurement Uncertainty
Of course, N can never take an infinite value in practical situations. When N is finite, the probability density function associated to Y is not normal, but it tends to be normal and the larger is N , the better it approximates a normal distribution (5.5). All above considerations have a strong implication on the possibility to evaluate an expanded uncertainty with a known or given coverage probability, starting from the combined standard uncertainty provided by the law of propagation of uncertainty (5.34) (or (5.35) in the particular case of uncorrelated input quantities). This is possible only if the following conditions are met [3]: 1. Function f (·) in (5.28) is linear. 2. The partial derivatives of f (·) with respect to all input quantities exist, and at least one of them is not zero. 3. None of the u (xi ) standard uncertainties associated with the xi estimates of the input variables dominates the others. 4. The total number N of considered input variables is high enough to satisfy the assumption of the Central Limit Theorem (in theory, it should be N → ∞, though, in practice, N ≥ 5 is generally high enough), so that the probability density function associated with Y can be assumed to be normal. If the above conditions are not met, the combined standard uncertainty u c (y) provided by (5.34) or (5.35) is still correct, but cannot be employed to define any coverage interval with known coverage probability . Numerical Example Let us consider a measurement result obtained as the product of two input variables. This is the typical case of the important power and energy measurements13 that are usually obtained as the product of two other quantities: voltage and current in electrical systems, torque and rotating speed on a mechanical shaft, . . .. In this example, a dc electric power measurement is considered. It is well known that, if voltage V across a network section and current I flowing through that section are measured, the electric power across the section is given by P = V · I that, in this example, plays the role of (5.28). Let us suppose that values Vm = 7.497 V and Im = 55.042 mA are the measured values for V and I , respectively. Let us also suppose that the standard uncertainty associated to Vm and Im has been evaluated with a Type B evaluation method , as shown in Sect. 5.5.4.2 and that the following values have been obtained: u (Vm ) = 2.2 mV, and u (Im ) = 31 μA. The measured power value is hence: Pm = Vm · Im = 0.4126 W. In order to evaluate the combined standard uncertainty associated with this value, we need to apply (5.34) and we need, therefore, to evaluate the partial derivatives and the correlation coefficient.
13
It is an important measurement, because the measured values are usually employed as the basis for quantifying economical transactions and, consequently, they are often the object of litigation.
5.5 How to Express Uncertainty
89
It can be readily checked that • ∂∂ VP V , I = Im = 55.042 mA. m m • ∂∂ PI V , I = Vm = 7.497 V. m
m
If we now suppose that voltage and current are measured by two different instruments that have been independently calibrated, we can assume that the two measured values Vm and Im are totally uncorrelated, so that r (V, I ) = 0 and (5.35) can be used to evaluate the combined standard uncertainty instead of (5.34). Taking into account the above values, u c (P) = 0.26 mW is obtained. Instruments are also available with separate voltage and current channels, so that both Vm and Im can be obtained by the same instrument. In this case, the two measured values cannot be assumed to be totally uncorrelated. On the other hand, they cannot be supposed to be totally correlated, even if the instrument is the same, because the input circuits of the voltage and current channels are different and, hence, uncorrelated. A fair estimation of the correlation coefficient, in this case, would be r (V, I ) = 0.8. In such a case, the combined standard uncertainty returned by (5.34) is u c (P) = 0.34 mW. It is worth noting that, in this example, the function that relates the output quantity (P) to the input quantities (V and I ) is strongly non-linear and the output quantity depends on only two other quantities. Therefore, two important assumptions of the Central Limit Theorem are not satisfied, and nothing can be inferred about the probability density function associated with the measured value Pm . The obtained combined standard uncertainty value is absolutely correct, but, because of this reason, it is useless in defining a coverage interval, about the measured value, with an assigned coverage probability.
5.5.6 Monte Carlo Method Situations such as the one considered in the numerical example of Sect. 5.5.5.3 can be met quite often in practice and the law of propagation of uncertainty (5.34) suggested by the GUM [2] does not provide the whole information that a standard uncertainty value is supposed to carry. Indeed, according to the considerations reported in Sect. 5.5.5.3, the following critical situations may occur: – The estimate of the combined standard uncertainty may be incorrect if function f (·) in (5.28) is strongly non-linear, since the employed Taylor series approximation (5.30) does not consider the higher order terms. – Unrealistic coverage intervals are obtained if the probability density function associated to the output quantity departs appreciably from a Gaussian distribution. To overcome this problem, Supplement 1 to the GUM has been issued in 2008 [7], suggesting to propagate distributions using Monte Carlo methods. The scope of this document is, more generally, to provide a general numerical approach, consistent
90
5 Measurement Uncertainty
with the broad principles of the GUM, for carrying out the calculations required as part of an evaluation of measurement uncertainty. Supplement 1 also states that the approach applies to arbitrary models having a single output quantity where the input quantities are characterized by any specified probability density function [7]. Basically, the suggested method is a numerical method and works according to the following steps: 1. M samples are drawn from the probability density functions associated to each input quantity X i in (5.28). If some correlation exists between two input quantities X i and X j , this correlation must be taken into account in drawing samples from the probability density functions associated to X i and X j . 2. M samples yr , r = 1, . . . , M of the probability density function associated to the output quantity Y (the desired measurement result) are obtained by combining the samples drawn in the previous step, according to function f (·) in (5.28). 3. The estimate of the output quantity (the numerical value of the desired measurement result) is obtained as the mean value of the yr samples as [7] y¯ =
M 1 yr M r =1
(5.36)
4. The estimate of the standard uncertainty associated to y¯ is given by the positive square root of the sample variance of the yr samples as [7] u ( y¯ ) =
1 (yr − y¯ )2 M − 1 r =1 M
(5.37)
5. The estimate of any coverage interval at coverage probability p is then possible by applying any of the numerical methods14 available in the literature [8], including the one suggested by the GUM Supplement 1 [7]. The most critical part in applying the Monte Carlo method is the proper choice of the number of samples M. In general, the larger is M, the higher is the accuracy with which the probability density function associated to the output quantity is approximated. However, there is no way to find, in a strict mathematical way, a minimum value of M, above which the desired accuracy is obtained. The GUM supplement 1 [7] suggests a way to relate M to the coverage probability p for the desired coverage interval: M≥
14
1 · 104 1− p
(5.38)
We do not enter into the algorithmic details, because they are quite long and of little interest to the general audience. It is worth noting that the most used computational tools and software do implement such methods.
5.5 How to Express Uncertainty
91
This means that, if a 95% coverage interval ( p = 0.95) is desired, (5.38) suggests to draw at least M = 200,000 samples. While this is generally a reliable indication, the GUM Supplement 1 [7] suggests also an adaptive approach, consisting in implementing the Monte Carlo method with increasing values for M, until the desired results have stabilized in a statistical sense. Numerical Example The same numerical example of electric power measurement as the one in Sect. 5.5.5.3 is considered here, with the same numerical values. A uniform probability density function is associated to the input quantities V and I , in agreement with the assumptions made in evaluating the standard uncertainty values associated with the measured values Vm and Im with a Type B evaluation method (see Sect. 5.5.4.2). Two different instruments are supposed to be employed for voltage and current measurements, so that the measured values for the input quantities can be assumed to be totally uncorrelated. The Monte Carlo method is applied, following the above reported steps, and drawing M = 200,000 uncorrelated samples from the uniform probability distributions associated to V and I . M samples of P are consequently obtained as Pr = Vr · Ir , r = 1, . . . , M. From these M samples, the measured value for P is obtained from (5.36) as Pm = P¯ = 0.4126 V and the estimated combined standard uncertainty is obtained from (5.37) as u c (P) = 0.26 mW. These are the same values as those obtained, under the same measurement conditions, in the numerical example in Sect. 5.5.5.3, thus confirming the correctness of the combined standard uncertainty values provided by the law of propagation of uncertainty suggested by the GUM [2]. A histogram of relative frequencies can be obtained from the Pr samples of P that approximate the shape of the probability density function associated to P. This histogram is shown in Fig. 5.7. It can be readily checked that the shape of the histogram is quite far from that of a normal probability distribution. The dash-dotted line in Fig. 5.7 shows the probability density function that best interpolates the histogram. The dashed line in the same figure shows a normal distribution centered on the same mean value as P¯ (shown by the vertical dotted line in Fig. 5.7) and with standard deviation equal to u c (P). The difference between the two plots is quite evident and gives clear evidence of the error that would have been done in associating a normal probability density function to the measurement result. The error becomes even more evident if the coverage intervals are considered. The half-amplitude of the coverage interval with 95% coverage probability provided by the Monte Carlo method is U95 = 0.48 mW, which is narrower than the value— 0.52 mW—obtained by multiplying u c (P) by a coverage factor K = 2, as suggested to obtain a coverage interval with 95% coverage probability when the probability density function is normal. Similarly, the half-amplitude of the coverage interval with 68.3% coverage probability provided by the Monte Carlo method is U68 = 0.28 mW, larger than the
92
5 Measurement Uncertainty Histogram of the distribution of the measured values of power 2500
Relative Frequency
2000
1500
1000
500
0 0.412
0.4122
0.4124
0.4126
0.4128
0.413
0.4132
Pm [W]
Fig. 5.7 Histogram of the relative frequencies of the samples of the probability density function associated to P. Vertical dotted line: sample mean value. Dash-dotted line: probability density function interpolating the histogram. Dashed line: normal probability distribution with same mean and variance as P
combined standard uncertainty value u c (P) = 0.26 mW provided by the law of propagation of uncertainty. This confirms, once again, that the combined standard uncertainty provided by (5.34) can be used to estimate coverage intervals if and only if the probability density function associated with the output quantity of (5.28) is known. Unless the assumptions of the Central Limit Theorem hold, the Monte Carlo method suggested by the GUM Supplement 1 [7] must be applied to estimate such probability density function.
5.6 Final Remarks The uncertainty concept has been a well-established concept, in scientific metrology, since the mid-80s of the twentieth century, and has more and more become part of the industrial practice since the GUM [2] has been published in 1995.
References
93
Uncertainty in measurement represents one of the three pillars in metrology [6], together with calibration and metrological traceability that will be discussed in Chaps. 6 and 7, respectively. The concepts and the related terminology that have been presented in this chapter have been endorsed by the BIPM in its official documents, they are part of the VIM [1], including its version (the VIML) that includes also the typical terms of the legal metrology. Reference to these documents is made by all technical standards, including the important standards related to quality assessment and control of the ISO 9000 series, whenever they refer to measurement procedures and measurement standards. VIM [1] and GUM [2] have been encompassed in official standard documents by several National Standard Organizations that bind all subjects involved in measurement activity to refer to the concepts and terms included and defined in those documents. Referring to different terms might only generate confusion and should be avoided. A clear example of what should never be done is given by a recent decree issued by the Italian Ministry for the Industrial Development (MiSE) on the periodic verification of the legal metrology instruments [9]. This decree requires, in the technical annexes related to some instruments, the maximum uncertainty of the instrument used to perform the periodic verification to be lower than a given value. The term maximum uncertainty is not defined by the VIM [1] nor the GUM [2] and, therefore, it is not clear what it means. Moreover, the same decree requires the laboratories that perform the periodic verifications to be accredited according to the ISO/IEC Std. 17025 [10]. This means that, if the laboratories have to be compliant with the ISO/IEC Std. 17025, they must refer to standard or expanded uncertainty and are, therefore, not compliant with decree 93/2017 [9]. On the contrary, if they somehow evaluate a maximum uncertainty, they are compliant with the decree, but are not compliant with the ISO/IEC Std. 17025, at the risk of losing their accreditation. The possible, potential legal consequences might be quite dramatic for the labs. This example shows the importance of a uniform language, adopted by all parties involved in the generation and use of measurement results, in order to correctly interpret the limited amount of information a measurement result provides.
References 1. BIPM JCGM 200:2012: International vocabulary of metrology – Basic and general concepts and associated terms (VIM), 3rd edn. (2012). http://www.bipm.org/utils/common/documents/ jcgm/JCGM_200_2012.pdf 2. BIPM JCGM 100:2008: Evaluation of measurement data – Guide to the expression of uncertainty in measurement (GUM), 1st edn. (2008). http://www.bipm.org/utils/common/ documents/jcgm/JCGM_100_2008_E.pdf
94
5 Measurement Uncertainty
3. Ferrero, A., Petri, D.: Measurement models and uncertainty. In: Ferrero A., Petri D., Carbone, P., Catelani M., (eds.), Modern Measurements: Fundamentals and Applications, 1st edn., p. 400. Wiley-IEEE Press (2015) 4. President’s Council of Advisors on Science and Technology (PCAST): Report to the President – Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature – Comparison Methods (2016). https://tinyurl.com/abv5mufh 5. Blitzstein, J.K., Hwang, J.: Introduction to Probability. Chapman and Hall - CRC Press, Boca Raton (2014) 6. Ferrero, A.: The pillars of metrology. IEEE Instrum. Meas. Mag. 18, 7–11 (2015) 7. BIPM JCGM 101:2008: Evaluation of measurement data – Supplement 1 to the “Guide to the expression of uncertainty in measurement” – Propagation of distributions using a Monte Carlo method, 1st edn. (2008). http://www.bipm.org/utils/common/documents/jcgm/JCGM_ 101_2008_E.pdf 8. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (1999) 9. Ministero dello Sviluppo Economico: Regolamento recante la disciplina attuativa della normativa sui controlli degli strumenti di misura in servizio e sulla vigilanza sugli strumenti di misura conformi alla normativa nazionale ed europea. Decreto 21 aprile 2017, n.93 (2017) (in Italian) 10. EN ISO/IEC 17025: General Requirements for the Competence of Testing and Calibration Laboratories (2017)
Chapter 6
Calibration
6.1 An Important Problem in Metrology Chapter 5 has widely discussed how measurement uncertainty can be defined and evaluated, starting from the available information. It has shown that, if properly evaluated, the expanded uncertainty can be used to define a coverage interval about the measured value, in which the unknown true value of the measurand is supposed to lie with a given coverage probability [1, 2]. In principle, if the employed measurement method is reproducible, which means that it shows the property discussed in Sect. 5.3.4, if two independent operators measure the same quantity X in different places and times, their estimates x1m and x2m , together with the evaluated corresponding expanded uncertainty values U (x1m ) and U (x2m ) should provide compatible1 intervals. However, it cannot be excluded a situation, such as the one shown in Fig. 6.1, where the two measurement results are undoubtedly different. Indeed, the two measurement results shown in Fig. 6.1 have different measured values, different values for the expanded uncertainty and the coverage intervals that have been consequently defined about the measured values do not overlap. This situation may happen, in practice, for instance when the different measurement results for the same quantity are obtained by the manufacturer of a part and the customer who has purchased that part and, based on his or her own measurement result thinks that it does not comply with the specifications. The problem, in this case, is how to assess which measurement result is the correct one, provided that both of them are not incorrect. Unless one of the operators does not recognize a macroscopic error, which is very unlikely to occur, this situation might lead to a litigation, with the associated costs. It is therefore important to find an impartial technical referee that can solve the problem by showing which 1
Later in this Chapter, it will be shown that two measurement results are compatible if the coverage intervals provided for the two results with the same coverage probability are at least partially overlapping. © Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_6
95
96
6 Calibration
Fig. 6.1 Example of two independent measurement results x1m and x2m of the same quantity X that are evidently different
measurement result, if any, is correct. These technical referees are the standards [3] and the procedure that ensures that the measurement result is correct is called calibration [4].
6.2 Comparison of Measurement Results Before discussing calibration, it is necessary to understand how two measurement results can be compared, to assess whether they can be assumed to refer to the same quantity or not. To this aim, let us consider and discuss the four different cases shown in Fig. 6.2. • Let us consider situation (a) in Fig. 6.2 at first: the two measured values x1m and x2m and the associated expanded uncertainty U (x1m ) and U (x2m ) show the same value. By also assuming that the probability density function associated to the measurement results is also the same, the two coverage intervals in Fig. 6.2a show the same coverage probability and are fully overlapping. Therefore, it can be concluded that the two measurement results are equal in a strict mathematical sense. It can be readily checked that this situation is very unlikely to occur, since, as proved in the previous Chaps. 4 and 5, it is very unlikely that two measurement procedures return exactly the same value. Therefore, this situation is not generally considered in metrology, and is supposed to be a very particular case of the following situation. • Figure 6.2b considers different numerical values for the measured values x1m and x2m and the associated expanded uncertainty values U (x1m ) and U (x2m ). However, the two coverage intervals built about x1m and x2m with the same coverage probability are partially overlapping. According to the very meaning of measurement uncertainty, as defined by the GUM [2] and recalled in the previous Chap. 5, this means that some values that can reasonably be attributed to X 1 can also be attributed to X 2 . Therefore the two measurement results do not provide clear enough information to state that X 1 and X 2 are different and we must therefore conclude that they refer to the same quantity. On the other hand, situation (b) is different from situation (a), and therefore we cannot state that the two measurement results are equal. In metrology, when two measurement results provide partially overlapping coverage intervals, they are said
6.2 Comparison of Measurement Results
97
(a)
(c)
(b)
(d)
Fig. 6.2 Different situations for two independent measurement results x1m and x2m : a: the results are strictly equal; b and c: the two results are compatible and can be assigned to the same measurand; d: the two results are not compatible and cannot be assigned to the same measurand
to be compatible, thus recognizing that there is not full certainty that they refer to the same quantity, but that there is not full certainty either that they refer to different quantities. It can be readily understood that the only way to resolve this ambiguity is to reduce uncertainty. • Figure 6.2c considers a particular case of situation (b), in which the two coverage intervals associated with x1m and x2m overlap only in one point: the upper edge of the interval associated with x1m and the lower edge of the interval associated with x2m . This means that there is at least one value that can reasonably be attributed, within the given coverage probability, to both X 1 and X 2 , and therefore the two measurement results are still compatible, since it is impossible to state, with full certainty, that the two measurement results refer to different measurands. This is an improbable situation, but still possible. In such cases, a revision of the way uncertainty has been computed is advisable, and, if possible, an attempt should be done to reduce uncertainty at least on one of the two measurement procedures, so that either situation (b) or (d) are found. • Figure 6.2d shows two non-overlapping intervals. This means that none of the values that can reasonably be attributed to X 1 can be attributed also to X 2 and vice versa, within the given coverage probability. Therefore we have a clear indication that the two measurement results refer to different measurands. The above considerations lead to conclude that two measurement results are compatible if the two coverage intervals provided by the expanded uncertainty are overlapping by at least one point. This condition can be mathematically defined if we consider the difference d = |x1 − x2 | between the two measured values x1 and x2 . From a strict mathematical point of view, the two measurement results are different if d > 0. However, from a metrological point of view, we can state that d > 0 only if d ≥ U (d), that is if d is greater than its expanded uncertainty U (d). Indeed, if d
98
6 Calibration
is not greater than U (d), d = 0 is one of the values that can be reasonably attributed to d, and, therefore, it is impossible to state that it is surely d > 0. Let us then evaluate U (d), assuming that u (x1 ) is the standard uncertainty associated to x1 and u (x2 ) is the standard uncertainty associated to x2 . The combined standard uncertainty u c (d) associated to d can be evaluated by applying (5.34). Taking into account that ∂∂dx1 = 1 and ∂∂dx2 = −1, (5.34) yields: u c (d) = u 2 (x1 ) + u 2 (x2 ) − 2 · u (x1 ) · u (x2 ) · r (x1 , x2 ) (6.1) where r (x1 , x2 ) is the correlation coefficient between x1 and x2 . Assuming that the probability density function associated to d is known, it is possible to evaluate the expanded uncertainty U (d) corresponding to a given coverage probability by applying a suitable coverage factor K to u c (d), as shown in Sect. 5.5.3: U (d) = K · u c (d). Therefore, according to the above considerations, we can state that two measurement results x1 and x2 are compatible, and hence refer to the same measurand, if it is |x1 − x2 | ≤ K ·
u 2 (x1 ) + u 2 (x2 ) − 2 · u (x1 ) · u (x2 ) · r (x1 , x2 )
(6.2)
while the same measurement results are different, and hence refer to two different measurands, if it is |x1 − x2 | > K ·
u 2 (x1 ) + u 2 (x2 ) − 2 · u (x1 ) · u (x2 ) · r (x1 , x2 )
(6.3)
6.3 Calibration: The Definition Having defined the necessary background, it is now possible to define the calibration concept. As always, let us start with the definition given by the VIM [1] in its article 2.39. operation that, under specified conditions, in a first step, establishes a relation between the quantity values with measurement uncertainties provided by measurement standards and corresponding indications with associated measurement uncertainties and, in a second step, uses this information to establish a relation for obtaining a measurement result from an indication
To fully understand the meaning of this definition, let us suppose to measure a generic quantity x with an instrument that we wish to calibrate. According to the above definition, the first step requires to establish a relation between the quantity values with measurement uncertainties provided by measurement standards and corresponding indications with associated measurement uncertainties.
6.3 Calibration: The Definition
99
Fig. 6.3 Value of standard xs , with its uncertainty, and value—xm —measured by the instrument under calibration, with its uncertainty
In practice, this requires the availability of standard xs for the considered quantity x and to measure xs with the instrument we wish to calibrate. Standard xs , being a standard, is known together with its standard uncertainty u (xs ). On the other hand, when xs is measured by the instrument to be calibrated, the instrument will return an indication (the measured value) xm , for which, according to the methods seen in Chap. 5, standard uncertainty u (xm ) can be evaluated. The resulting situation is, in general, the one shown in Fig. 6.3. Figure 6.3 represents, graphically, the relation required by the first step of calibration. From the operative point of view, this relation can be obtained by comparing the standard value with the value measured by the instrument when the standard is presented to it as the quantity to be measured. Let us now see how the second step can be performed, that is how a relation can be established for obtaining a measurement result from an indication. According to Fig. 6.3, it can be readily checked that the obtained indication xm differs from the standard value xs by the following value: = xm − xs
(6.4)
The value provided by (6.4) allows us to quantify how wrong is the instrument under calibration, since it provides the difference between the indication of the instrument and the expected value for the measurement result given by the standard value. Hence, we can start building the relation needed to obtain a measurement result from the indication by correcting this indication as follows: xr = xm −
(6.5)
where xr represents the numerical value of the desired measurement result. To complete the transition from an indication to a measurement result, it is now necessary to evaluate the standard uncertainty associated with the numerical value provided by (6.5). To do this, let us consider that this corrected value is affected by two contributions to uncertainty. A first contribution originates inside the instrument and is quantified by u (xm ), while a second contribution is given by the uncertainty associated with the standard, u (xs ). This last contribution reflects the lack of complete knowledge about the value of the standard and, therefore, the lack of complete knowledge about how good is the correction applied by (6.5). It is therefore nec-
100
6 Calibration
essary to combine these two contributions to uncertainty to evaluate the standard uncertainty associated with xr . Since the two contributions are independent, we get: u (xr ) = u (xm )2 + u (xs )2 (6.6) Equations (6.5) and (6.6), together, represent the outcome of the second step of calibration and are the result of the calibration operation. Equation (6.6) shows also that, unless u (xs ) is negligible with respect to u (xm ), calibration contributes to measurement uncertainty, since uncertainty of the employed standard must be considered too.
6.4 Calibration: The Implications The definition discussed in the previous Sect. 6.3 states that calibration is performed under specified conditions. We have seen, in Sect. 4.3.1 of Chap. 4 that influence quantities may affect the measurement result in a significant way. Therefore, the specified conditions must, first of all, specify the values of the influence quantities at calibration time, as well as the allowed range of variation, for these same quantities, during calibration. Since a variation in the values of the influence quantities may significantly affect the measurement result and, consequently, the calibration result, this implies that the calibration result is valid only under the same conditions as the calibration ones. Particular care must then be taken in specifying the calibration conditions, so that they are as close as possible to the actual operating conditions of the instrument. However, this can be only done when the operating conditions are known and do not change, which is unlikely to occur whenever the calibrated instrument is used in field. Whenever the operating conditions are different from the calibration conditions, the calibration results can still be used, provided that a model exists that relates the effects of the influence quantities in the calibration conditions to the effects that the same quantities have in the operating conditions. However, as already pointed out in Sect. 4.3.2 in Chap. 4 and Sect. 5.2.1 in Chap. 5, any model provides finite knowledge of the modeled effects and, consequently, contributes to measurement uncertainty. This additional contribution with uncertainty shall, hence, be combined with the other contributions considered in (6.6) and increases calibration uncertainty. Example Let us consider, as an example, the calibration of a weighing instrument based on the gravimetric operating principle. When this measurement principle is used, the weight w of a mass m is actually measured, and the measured weight and the mass are linked by: w = g · m, where g is the local gravity acceleration.
6.4 Calibration: The Implications
101
Calibration is performed by placing a standard mass m s with known standard uncertainty u (m s ) on the load receptor of the instrument. The weighing instrument reading m m = wm /g together with the estimated uncertainty u (m m ) is then combined with the standard mass value and uncertainty, according to the procedure explained in Sect. 6.3. It can be readily perceived that, while the standard mass value does not depend on g, the measured value does.2 Calibration is expected to provide a correction also for a possible inaccurate evaluation of the g value considered by the weighing operation. However, this correction remains valid only if the balance is not displaced from the calibration location. It is well known that gravity changes with altitude and does also depend on the location. Therefore, if the balance has been calibrated in a location with g1 gravity and is operated in a location with g2 gravity, its reading has to be corrected by a factor kg = g1 /g2 . Since g1 and g2 are known with standard uncertainty u (g1 ) should be combined to evaland u (g2 ) respectively, these two uncertainty values uate the combined standard uncertainty u c kg of the correction factor kg , and this additional contribution to uncertainty has to be combined with the calibration uncertainty evaluated at calibration time, thus increasing it. Variations in the influence quantities are not the only source of difference between calibration and actual operating conditions. Generally, instruments operate on a whole measurement range, thus being able to measure a virtually infinite number of different measurands over that range. In principle, the whole range should be calibrated, but this would require an infinite, or at least a large number of calibration points. However, this would be quite impractical, time and money wise. In general, an instrument is calibrated on a finite and limited number of points, distributed over the whole operating range in such a way that the calibration results provided on these points can be transferred to all other points of the operating range. This is also considered by the VIM [1] that, in its Note 1 to article 2.39, states: “A calibration may be expressed by a statement, calibration function, calibration diagram, calibration curve, or calibration table. In some cases, it may consist of an additive or multiplicative correction of the indication with associated measurement uncertainty”. Let us suppose that a voltmeter, in the 10 V range, is to be calibrated and that only five points (2 V, 4 V, 6 V, 8 V and 10 V) are considered in the calibration procedure. Figure 6.4 shows the deviations, with respect to the considered value, together with the calibration uncertainty. The dashed line in Fig. 6.4 shows the linear regression line that best interpolates the deviations from the reference values of the calibration points. The deviations affecting values different from the calibration ones can be obtained from this graph (calibration diagram, according to the VIM [1]) or directly from the linear regression of the calibration results (calibration function, according to the VIM [1]). 2
Actually, the measured value depends on several other influence quantities, such as, for instance, the air density that may change the air buoyancy effect. However, for the sake of simplicity, we are here considering only the effect of gravity as the simplest example of the influence of calibration conditions on the calibration result.
102
6 Calibration 10-3
Deviation from calibration values [V]
2 1.5 1 0.5 0 -0.5 -1 -1.5 -2
0
2
4
6
8
10
12
Calibration input values [V]
Fig. 6.4 Example of calibration diagram for a voltmeter in the 10 V range
The critical point, in this case, is the evaluation of the standard uncertainty to be associated with the calibration values obtained from the linear regression or, more in general, from the values obtained from the calibration function, diagram, curve or table. From a strict mathematical point, if the function that links the calibration results in the calibration points to the new calibration value in the considered measured value is known,3 the standard uncertainty related to the new calibration value can be obtained by applying the law of propagation of uncertainty (5.34) defined in Chap. 5 to this function. Sometimes, the practice refers to simplified approaches. For instance, if, as in the case of the example shown in Fig. 6.4, the uncertainty associated with the calibration points does not vary in a significant way, a suitable conservative approach might be that of assigning the maximum value of the evaluated uncertainty values to the calibration results obtained from the calibration function. The crucial point is that calibration results are useless and cannot be employed if the calibration conditions are not the same as the operating conditions. If this is not the case, calibration results can still be used only if they can be reported to the actual operating conditions.
3
This is, in principle, the case of the linear regression shown in this example.
6.5 Calibration: How Often?
103
6.5 Calibration: How Often? According to the previous Sects. 6.3 and 6.4, a calibration result provides the deviation (6.4) between the value measured by the instrument under calibration and the value of the reference standard, and the standard uncertainty (6.6) that has to be associated with the measurement result returned by the instrument, after the proper correction (6.5) has been applied. Therefore, calibration allows one to solve the problem reported in Sect. 6.1. In other words, calibration can state whether the unknown true value of the measurand can actually be expected to lie inside the interval, about the measured value, defined by the expanded uncertainty, with the coverage probability assigned to that interval, or not. If the result is positive, it is also possible to state that all measurement results provided by the instrument under calibration up to calibration time were correct. If, on the contrary, the result is negative, then the results provided by the instrument under calibration have become incorrect starting from an unknown time instant between the previous calibration and the present one. Since this time instant is unknown, and may have occurred at any time after the last calibration, all measurement results obtained after the last calibration should be considered incorrect. This means that, from a strict metrological point of view, any decision taken on the basis of these results should be considered incorrect, with the dramatic consequences that such a statement involves: in industrial applications all parts manufactured using the out-of-calibration instrument should be recalled and replaced with new ones; in legal metrology applications all sanctions applied on the basis of such measurement results should be considered invalid, etc. The above considerations would suggest to perform calibrations as often as possible. However, such a practice would be unrealistic both because of the high cost and the need to remove the instrument from operations every time it is sent to calibration. Moreover, we should also consider that the calibration result represents a picture of the instrument state at calibration time, and at the calibration lab. Strictly speaking, there is no guarantee that, during the displacement from the calibration lab to the site where the instrument is operated, nothing has changed or the instrument has suffered damages that do not show at a simple inspection test. It is then imperative to check, on site, that the instrument performance has not changed. This check is called metrological verification and is part of the more general metrological confirmation process. Example To understand what a metrological verification is, let us consider a very simple example. Let us suppose to perform a chemical analysis by using a certified reference material supplied by a certified lab. Let us suppose that the lab has almost run out of this material and that a newly supplied stock has arrived.
104
6 Calibration
Even if the acquired reference material is a certified material, it cannot be excluded that it has been damaged during the shipment, and it is the lab duty to check that its characteristics correspond to the desired one. This can be done in a very simple way, by repeating the analysis on the same substance, the first time using the old reference material, and the second time using the new one. Measurement uncertainty is evaluated on both measurement results, so that they can be compared according to the method shown in Sect. 6.2. If the two measurement results are compatible, because (6.2) is satisfied, the new reference material provides the same results as the old one and the lab can continue to operate. On the contrary, if the two measurement results are not compatible, the lab shall stop operating until a new stock of reference material is received and tested to provide compatible results with those provided previously. Several other methods can be applied to assess whether a measuring instrument, or a measuring equipment, is approaching the out-of-calibration condition. These methods are strongly related to the specific characteristics of the employed measuring system and measurement method, and cannot be analyzed in detail in this book. The principle is, however, the one expressed by the above example and implies metrological skills based on experience to understand how the employed measurement method can still be exploited to retrieve useful information about its performance.
6.6 Calibration Versus Adjustment Calibration is often confused with the adjustment, or adjustment of a measuring system operation. However, this is totally incorrect, and the two terms shall not be confused. Adjustment is defined by the VIM [1] in its article 3.11: set of operations carried out on a measuring system so that it provides prescribed indications corresponding to given values of a quantity to be measured.
A particular case of adjustment is the zero adjustment, defined by the VIM [1] in its article 3.12: adjustment of a measuring system so that it provides a null indication corresponding to a zero value of a quantity to be measured.
This means that an adjustment operation is a physical operation, performed on the measuring system, to adjust its reading to a predefined indication, corresponding to the value of the quantity presented to its input terminals. Therefore, this operation is not a calibration, since it does not follow the two steps that define a calibration, according to the VIM definition [1] given in the previous Sect. 6.3.
6.7 Calibration Versus Verification
105
On the contrary, calibration is a prerequisite for adjustment, as clearly stated by the VIM in its note 2 to article 3.11 [1]. Indeed, the output value to be adjusted should correspond to one of the calibration points, or should be obtained starting from the calibration diagram, or function, or curve. Since adjustment, that is a physical operation performed on the measuring system, may affect the performance of the system itself, according to note 3 to VIM article 3.11, after an adjustment, the measuring system must usually be recalibrated [1].
6.7 Calibration Versus Verification Calibration is also often confused with verification, that is a term used by legal metrology. However, the two concepts, although they may involve similar operations, are significantly different, and should not be confused. Verification, or verification of a measuring instrument, is defined by the International Vocabulary of terms in Legal Metrology (VIML), issued by the International Organization of Legal Metrology (OIML) [5] in its article 2.09: conformity assessment procedure (other than type evaluation) which results in the affixing of a verification mark and/or issuing of a verification certificate.
It is worth recalling that many applications of metrology have a legal aspect, such as when there is a societal need to protect both parties in a commercial exchange of a commodity or a service provided, or where measurements are used to apply a sanction [6]. For this reason, almost all countries worldwide have issued specific laws on metrology, to ensure, based on metrological and financial considerations, the fairness of measurement results when they are employed in the aforementioned applications, so that none of the involved parties have to bear unfair costs. This is usually done by setting maximum admissible errors for the instruments under specified operating conditions, that should not be exceeded when the instruments are operated under the specified conditions. According to the definition given by the VIML [5], the instruments used in legal metrology must be verified periodically, at prescribed intervals of time, to assess whether they do not exceed the maximum admissible errors. This assessment can be performed in a similar way as in the first step of calibration, by measuring a reference quantity under the prescribed operating conditions and checking whether the indication provided by the instrument under verification differs from the reference value by less than the maximum admissible error. In general, this assessment procedure does not require any uncertainty evaluation, provided that the employed references are known with an uncertainty not greater than a specified value. The second step of the calibration procedure is not required, and it is not required to analyze the full range of the instrument. According to these considerations, the great difference between a calibration and a verification is quite evident, as it is evident that the verification procedure, being
106
6 Calibration
nothing else than a conformity assessment, cannot provide any useful information about the capability of a measurement result to quantify the measurand value. Since most litigations in commerce arise from a disagreement about the measured quantity of goods and sanctions are mostly challenged by claiming incorrect measurements, avoiding confusion between verification (generally, though not always performed) and calibration (very rarely done) is of utmost importance to tackle the case correctly.
References 1. BIPM JCGM 200:2012: International Vocabulary of Metrology – Basic and General Concepts and Associated Terms (VIM), 3rd edn., (2012). http://www.bipm.org/utils/common/documents/ jcgm/JCGM_200_2012.pdf 2. BIPM JCGM 100:2008: Evaluation of Measurement Data – Guide to the Expression of Uncertainty in Measurement (GUM), 1st edn., (2008). http://www.bipm.org/utils/common/ documents/jcgm/JCGM_100_2008_E.pdf 3. Cabiati, F.: The system of units and the measurement standards. In: Ferrero A., Petri D., Carbone, P., Catelani M., (eds.), Modern Measurements: Fundamentals and Applications, 2nd edn., p. 400. Wiley-IEEE Press (2015) 4. Ferrero, A.: The pillars of metrology. IEEE Instrum. Meas. Mag. 18, 7–11 (2015) 5. International Organization of Legal Metrology: International Vocabulary of Terms in Legal Metrology (VIML). OIML V1 (2013) 6. International Organization of Legal Metrology: Considerations for a Law on Metrology. OIML D1 (2012)
Chapter 7
Traceability
7.1 Where Are the Measurement Standards? Chapter 5 showed how measurement uncertainty can be evaluated and expressed and Chap. 6 showed how the evaluated measurement uncertainty can be validated by means of a suitable calibration procedure. According to the definition given by the VIM [1] and discussed in Sect. 6.3, calibration requires the availability of measurement standards whose values and uncertainty must be compared with the indications and associated measurement uncertainties provided by the measuring equipment under calibration. Chapter 6 has left a couple of open questions: which are the standards that can be trusted as the correct reference for the quantity that is being measured and where those standards can be found? To answer these questions, let us start with the definition of measurement standard given by the VIM [1] in its art. 5.1: realization of the definition of a given quantity, with stated quantity value and associated measurement uncertainty, used as a reference.
This definition leaves a certain degree of arbitrariness in the realization of the definition of the given quantity, because this depends on the adopted measurement unit, defined by the VIM [1], in its art. 1.9, as real scalar quantity, defined and adopted by convention, with which any other quantity of the same kind can be compared to express the ratio of the two quantities as a number.
It is evident that a reference, to be globally trusted as a reference, cannot show any arbitrariness and, consequently, the convention with which the measurement units are defined and adopted shall be universally agreed upon.
© Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_7
107
108
7 Traceability
7.1.1 The SI and Its Standards Such a consensus was reached on May 20, 1875,1 when the Meter Convention was signed [2]. The Meter Convention is a treaty, signed in 1875 by seventeen states2 and revised in 1921. As of June 2022, the States Parties to the Meter Convention, otherwise called Member States, are 63, and other 37 states are Associate States.3 The scope of the Meter Convention was, as written in the original text, to create and maintain, at their common expense, a scientific and permanent International Bureau of Weights and Measures with its headquarters in Paris. This new structure, the BIPM,4 was charged always according to the original text, with: 1. all comparisons and verifications of the new prototypes of the metre and the kilogram; 2. the conservation of the international prototypes; 3. the periodic comparisons of national standards with the international prototypes and their official copies as well as those of the standard thermometers; 4. the comparison of the new prototypes with the fundamental standards of nonmetric weights and measures used in different countries and in the sciences; 5. the calibration and comparison of geodetic standards; 6. the comparison of precision standards and scales whose verification may be requested, either by Governments, by learned societies, or even by des artistes et des savants (Note: at the time of the signature of the Convention “artistes” and “savants” referred respectively to craftsmen (artisans) who made precision standards and scales and to individual scientists.). It is evident that the main goal of the BIPM is that of establishing a universally agreedupon system of units and related standards to act as a reference for all Member States. At the time the Meter Convention was signed, the system of units was composed by only the units of length (the meter) and mass (the kilogram) and the related standards were the international prototypes of the meter and kilogram.5 These two prototypes represented, at that time, the initial core of the International System of Units, or SI (Système International d’Unités, in French). The SI has largely evolved since the first definition sanctioned by the 1st CGPM (Conférence Générale des Poids et Mesure, or General Conference on Weights and Mesures) in 1889, and it underwent a major revision in 2018, when the 26th CGPM 1
May 20 is now celebrated as the World Metrology Day, in memory of this event. Argentina; Austria-Hungary; Belgium; Brazil; Denmark; France; Germany; Italy; Peru; Portugal; Russia; Spain; Sweden and Norway; Switzerland; Turkey; the United States of America; Venezuela. 3 The official, updated list of Member States and Associate States can be found here: https://www. bipm.org/en/about-us/member-states/. Member States are the states that contribute financially to the BIPM dotation, while Associate States do not contribute to the BIPM dotation, but pay a subscription fee to participate in the services provided by the BIPM. 4 BIPM stands for the French Bureau International des Poids et Mesures: since the original treaty was written in French, all names and organizations are in French. 5 As a historical note, the prototype of the kilogram built in 1889 represented the definition of the unit of mass until 2018, when the International System of Units has been thoroughly revised. 2
7.1 Where Are the Measurement Standards?
109
provided new definitions, based on the universal constants of physics, for the seven base units,6 thus removing the only definition—that of the kilogram—still based on a prototype that could not ensure stability in time. The present SI, as sanctioned by the 26th CGPM, is explained in the BIPM SI brochure [3]. It has been clear since the beginning that only two prototypes, one for the meter and one for the kilogram, were not enough to ensure that all measurement results could be traced back to them, thus granting that every measurement result had the same reference. Copies of these prototypes were distributed to the National Metrology Institutes (NMIs) of the Member States and compared, periodically, to the prototypes kept at the BIPM. This problem grew bigger as new base units were added to the SI, and measurement results have become more and more important in ensuring fair transactions in a commerce that has steadily evolved toward a global economy. The dissemination of standards to the NMIs was not enough in this new scenario, where it is crucial that measurement results of the same measurand obtained in different Member States are ensured to be compatible with each other. To deal with this new problem, the Mutual Recognition Arrangement—MRA was signed in 1999 [4] by the Member States. According to this arrangement, every measurement result provided by a measuring equipment calibrated against the national standard is recognized by all other Member States, as well as the Associate States, as if the measuring equipment were calibrated against their own national standards. According to this short survey on the founding treaty of metrology, it is possible to conclude that the measurement standards to be used as references in the calibration process are the practical realizations of the SI units implemented by the NMIs of each Member and Associate State. Although this provides a sound answer to the first question posed at the beginning of this chapter, it opens a practical problem: it is obviously impracticable, time-wise and money-wise, to use the national standard of a given quantity as the reference for calibrating all devices used to measure that quantity.
7.1.2 The Calibration Hierarchy National standards are, nowadays, complex and expensive experiments. It is quite obvious that they can be used only to calibrate a few, very accurate instruments, which in turn can be used to calibrate other instruments, and so on, implementing what is called a calibration hierarchy. Such a hierarchy is defined by the VIM [1] in its article 2.40:
6
It is worth recalling that the seven base units are the meter, symbol m, unit of length, the second, symbol s, unit of time, the kilogram, symbol kg, unit of mass, the ampere, symbol A, unit of electrical current, the kelvin, symbol K, unit of thermodynamic temperature, the mole, symbol mol, unit of amount of substance, the candela, symbol cd, unit of luminous intensity. The units for all other quantities can be derived from these seven base units.
110
7 Traceability
sequence of calibrations from a reference to the final measuring system, where the outcome of each calibration depends on the outcome of the previous calibration.
According to the definition of calibration discussed in Sect. 6.3, an important calibration outcome is the evaluation of calibration uncertainty, that is the uncertainty value that can be assigned to the measurement results provided by the calibrated instrument.7 By considering that such a value is given by (6.6), it can be readily understood that measurement uncertainty necessarily increases along the sequence of calibrations, as clearly stated by Note 1 to VIM art. 2.40. According to the definition given by the VIM, a calibration hierarchy does not mandatory imply that the reference is a national measurement standard. Consequently, the simple implementation of a calibration hierarchy does not mean, by itself, that the measurement result provided by an instrument calibrated by a laboratory placed at the bottom of the hierarchy is traced to a recognized measurement standard. The first step to ensure traceability requires, hence, that a recognized measurement standard is found on top of the calibration hierarchy. Is this enough?
7.2 Metrological Traceability 7.2.1 Definition Metrological traceability is a property of a measurement result defined by the VIM [1], in its art. 2.41, as: property of a measurement result whereby the result can be related to a reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty
Note 1 to this article specifies that: “For this definition, a ‘reference’ can be a definition of a measurement unit through its practical realization, or a measurement procedure including the measurement unit for a non-ordinal quantity, or a measurement standard”. Moreover, note 2 specifies that: “metrological traceability requires an established calibration hierarchy”. It is now clear, as already stated, that metrological traceability, that is relating a measurement result to a reference, requires the presence of a calibration hierarchy presenting, on top of it, a globally accepted measurement standard. According to note 4 to VIM art. 2.41, “for measurements with more than one input quantity in the measurement model, each of the input quantity values should itself be metrologically traceable and the calibration hierarchy involved may form a branched structure or a network. The effort involved in establishing metrological traceability for each 7
It is worth reminding that calibration uncertainty is valid only under the specified calibration conditions.
7.2 Metrological Traceability
111
input quantity value should be commensurate with its relative contribution to the measurement result”. This relative contribution can be readily obtained by considering how the different contributions to uncertainty combine, according to the law of propagation of uncertainty (5.34) defined in Sect. 5.5.5.2 in Chap. 5. Contributions that show a high sensitivity, quantified by the partial derivatives in (5.34), require a lower calibration uncertainty than the contributions that show little sensitivity and should, therefore, be calibrated by a laboratory that, in the calibration hierarchy, is closer to the reference measurement standard. The above definition of metrological traceability, however, implies more requirements than an established calibration hierarchy. The additional keywords related to these additional requirements are documented and unbroken. Unbroken means that every element of the calibration chain shall show the same traceability property as all other elements. In particular, any instrument belonging to the chain of calibration shall be calibrated by a laboratory that belongs to the same calibration hierarchy and is one step closer to the reference measurement standard. Documented requires that every step in the calibration process is adequately documented and evidence is made available that every action performed during calibration is adequate to ensure that the result can be actually related to the reference measurement standard and that the integrity (unbroken) of the chain of calibration has been preserved. It should now be clear that metrological traceability represents one of the three pillars—the two other ones are uncertainty and calibration— on which metrology is based [5]: if one of these pillars collapses, the whole system collapses. The importance of this third pillar in establishing whether a measurement result has been correctly obtained and the associated measurement uncertainty correctly evaluated on the basis of a proper calibration of the employed measurement equipment is so relevant—especially in the forensic domain—that the validation of the whole system shall be regulated by an officially established accreditation system.
7.2.2 The Accreditation System The first step toward a recognized accreditation system was set, once again, by the MRA [4]: it stated that the NMIs of the Member and Associate States participating in the MRA were required to satisfy the requirements set by the ISO/IEC Guide 25:1990, General requirements for the competence of calibration and testing laboratories. This guide, now referred to as the ISO/IEC Std. 17025:2017 [6], establishes the requirements that a calibration laboratory shall fulfill to prove its competence and capability in calibrating measuring equipment and ensuring that the calibration result can be related to the national standard. The MRA involves only the NMIs and establishes the requirements that these institutes must fulfill to be mutually recognized. Therefore, it limits the application of metrological traceability only to ensure that each national standard is metro-
112
7 Traceability
logically compatible with the corresponding standards of the other Member and Associate States. However, according to the above considerations, NMIs cannot satisfy the calibration needs of all measurement laboratories in their nations. On the other hand, a calibration hierarchy composed by calibration laboratories that fulfill the same requirements as those fulfilled by the NMIs and specified by the ISO/IEC Std. 17025:2017 [6] is, in principle, capable of ensuring traceability to the national measurement standards and satisfying the calibration needs of the nation. The MRA signatory States have consequently established, under the scientific supervision of their NMIs, accreditation bodies aimed at verifying whether a laboratory fulfills, for a specific quantity, the calibration competence required by the ISO/IEC Std. 17025:2017 [6] and, in the case it does, qualifies it as a 17025 accredited laboratory for that specific quantity, thus certifying that the measurement results provided by that laboratory are metrologically traceable to the national measurement standard. The European Union has enacted a slightly stricter regulation, the Regulation 765/2008 [7], that establishes that each EU State shall have only one single accreditation body accredited and ruled by a single European Accreditation (EA) body. The task of these accreditation bodies is always that of verifying laboratory’s compliance with the ISO/IEC Std. 17025:2017 [6]. Therefore, while, for example, the USA has several accreditation bodies, such as the American Association for Laboratory Accreditation (A2LA), the Perry Johnson Laboratory Accreditation (PJLA), and the International Accreditation Service, Inc. (IAS), the European countries have only one accreditation body each, such as the United Kingdom Accreditation Service (UKAS) in the UK, the Comité Français d’Accréditation (COFRAC) in France, the Deutsche Akkreditierungsstelle GmbH (DAkkS) in Germany, the Ente Italiano di Accreditamento (Accredia) in Italy and the Entidad Nacional de Acreditación (ENAC) in Spain. It can be concluded that the accredited laboratories ensure the documented unbroken chain of calibrations required to relate a measurement result to the national reference standard and, consequently, only the instruments that are calibrated by such laboratories ensure the metrological traceability of their measurement results. This does not necessarily mean that the measurement results provided by instruments calibrated by non-accredited laboratories are not valid. It means that only the instruments calibrated by accredited laboratories ensure that the measurement results they provide can be validly traced to the national reference standard. It is therefore advisable that all instruments employed in a critical field, such as that of forensic metrology and, more in general, forensic science, are calibrated by an accredited laboratory and used by a qualified operator who knows how to exploit the data reported in the calibration certificate to establish a relation for obtaining a measurement result from an indication according to the second step of the calibration operation, as discussed in Chap. 6. This is the only way to ensure the scientific validity of the obtained measurement result and use it to take proper decisions.
References
113
References 1. BIPM JCGM 200:2012: International vocabulary of metrology—basic and general concepts and associated terms (VIM), 3rd edn. (2012). http://www.bipm.org/utils/common/documents/jcgm/ JCGM_200_2012.pdf 2. BIPM: The Metre Convention and Annexed Regulations, 1921 revision of the original 1875 version. https://www.bipm.org/utils/common/documents/official/metre-convention.pdf 3. BIPM: The International System of Units (SI), 9th edn. (2019). https://www.bipm.org/utils/ common/pdf/si-brochure/SI-Brochure-9-EN.pdf 4. BIPM CIPM: Mutual recognition of national measurement standards and of calibration and measurement certificates issued by national metrology institutes—MRA (1999). https://tinyurl. com/59f3byu5 5. Ferrero, A.: The pillars of metrology. IEEE Inst. Meas. Mag. 18, 7–11 (2015) 6. EN ISO/IEC 17025: general requirements for the competence of testing and calibration laboratories (2017) 7. REGULATION (EC) No 765/2008: Regulation (EC) of the European Parliament and of the Council setting out the requirements for accreditation and market surveillance relating to the marketing of products and repealing Regulation (EEC) (2008). https://eur-lex.europa.eu/legalcontent/EN/TXT/?uri=celex%3A32008R0765
Chapter 8
Uncertainty and Conscious Decisions
8.1 Measurement Results in Decision-Making Processes The discussion that led, in Chap. 4, to the definition of a measurement model came to the conclusion, in Sect. 4.3, that a measuring activity is not a self-motivating activity, but, on the contrary, it is a goal-oriented activity, where the final goal is that of providing relevant input elements to decision-making processes aimed at identifying the best actions needed to achieve established goals while satisfying given conditions. In the forensic field, decisions do generally involve the comparison of a measurement result with a given threshold. It might be the comparison of an element’s concentration in air, water, or blood with a permissible limit value, the comparison of a dimension or any other physical property with a specification value, or the comparison of a feature with a reference one, just to mention the most common ones. It is a common belief that such a comparison is a trivial operation that is not supposed to raise any significant doubt. Indeed, it is generally believed that the only needed operation is that of comparing two numbers: the measured value and the threshold value, as shown in Fig. 8.1. According to Fig. 8.1a, the measured value xm of quantity x is surely greater than the threshold value t, while according to Fig. 8.1b, the measured value xm is surely lower than the threshold value t. Who can have the least doubt about these two statements? Who can ever doubt that a decision made according to the result of these comparisons can be wrong? Indeed, such a comparison is deadly flawed, since it has been performed without considering the basic fundamental concept in metrology, explained in Chap. 4: the result of a measurement, according to the VIM [1] definition recalled in Sect. 4.2.1.5, is a set of quantity values. Therefore, a single measured value cannot be considered as a measurement result, that is it does not provide enough information about the measurand to ensure that a decision, taken on the basis of this single piece of information, is correct. Hence, it can be readily understood that the comparison of a measurement result with a given threshold can never be such a trivial operation as the comparison between © Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_8
115
116
8 Uncertainty and Conscious Decisions
(a)
(b)
Fig. 8.1 Example of comparison between a measured value xm of quantity x and a threshold value t. a The measured value appears to be certainly greater than the threshold. b The measured value appears to be certainly lower than the threshold
two numerical values. The fact that the measurement result is not a single quantity value, but a set of such values, makes this operation much more complicated, since it is now necessary to compare a single value—the threshold—with a set of values— the measurement result. Since, as seen in Chap. 5, this set of values can be identified and represented in terms of measurement uncertainty, uncertainty must be taken into account in the comparison operation.
8.2 Decisions in the Presence of Uncertainty Section 5.4 of Chap. 5 has widely discussed and explained how the GUM [2] defines the uncertainty concept and its properties. In particular, the GUM requirement to provide an interval about the measurement result that may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the quantity subject to measurement was presented and the subsequent GUM clarification that the ideal method for evaluating and expressing uncertainty in measurement should be capable of readily providing such an interval, in particular, one with a coverage probability or level of confidence that corresponds in a realistic way with that required was also widely discussed, as the basis of the GUM [2] probabilistic approach. It has been proven, in Sect. 5.5.3, that the expanded uncertainty is defined in such a way that it can readily provide such an interval and that, if the probability distribution characterized by the measurement result is known or can be reliably assumed, the coverage probability associated with the obtained interval is also known. Therefore, if the measurement result is expressed in terms of interval [xm − U ; xm + U ]
8.2 Decisions in the Presence of Uncertainty
117
(a)
(b)
(c)
(d)
Fig. 8.2 Example of comparison between a measured value xm of quantity x and a threshold value t, considering uncertainty. a The measured value appears to be certainly greater than the threshold. b The measured value appears to be certainly lower than the threshold. c The measured value cannot be considered certainly above the threshold. d The measured value cannot be considered certainly below the threshold
provided by the expanded uncertainty U about the measured value xm , instead of the single measured value xm , the very simple situation depicted in Fig. 8.1 is changed into the more complex one depicted in Fig. 8.2. It can be immediately recognized that the situations shown in Fig. 8.2a and b do not pose any particular problem. The whole interval of values that “may be expected to encompass a large fraction of the distribution of values that could reasonably be attributed to the quantity subject to measurement” is fully above (Fig. 8.2a) or fully below (Fig. 8.2b) threshold t. Since the probability that the measured value falls outside interval [xm − U ; xm + U ] is 1 − p, where p is the coverage probability assigned to interval [xm − U ; xm + U ], if the associated probability distribution is considered symmetrical with respect to xm ,1 the probability that some of the values that could be attributed to the measurand fall below (or above, in the case of Fig. 8.2b) the threshold is (1 − p)/2. When, as it is customary in the measurement practice, U is evaluated in such a way that p = 95%, it can be readily checked that the probability that the measurand value falls below (above) the threshold is 2.5% if and only if one of the edges of interval [xm − U ; xm + U ] touches the threshold and quickly decreases as the whole 1
This is the situation in the vast majority of the practical cases and can be considered a good approximation in all cases.
118
8 Uncertainty and Conscious Decisions
interval moves away from the threshold. Therefore, it is possible to state that, when the situations shown in Fig. 8.2a and b occur, the measurement result is above or below the threshold, respectively, beyond any reasonable doubt. On the other hand, in the situations shown by Fig. 8.2c and d, the threshold value lies inside interval [xm − U ; xm + U ]. Therefore, values below (case (c)) or above (case (d)) the threshold can be still attributed to the measurand, even if the measured value xm falls above (case (c)) or below (case (d)) the threshold. Intuitively, the probability that the measurand value belongs to these values increases as xm approaches t. In the cases shown in Figs. 8.2c and d, therefore, there is no certainty, on the basis of the measurement result, that the measurand, that is the quantity subject to measurement, is above or below the threshold, respectively. Consequently, taking a decision beyond any reasonable doubt is not possible, unless the doubt is quantified.
8.3 The Probability of Wrong Decision It is quite evident that, in situations such as the ones shown in Figs. 8.2c and d, where values that can be attributed to the measurand fall either above and below the threshold, any decision based on such measurement results shows an implicit probability (or risk) of being a wrong decision. The interesting point is that a correct evaluation of measurement uncertainty is capable of quantifying this probability and, hence, the doubt that the decision is wrong. However, the simple knowledge that the threshold falls inside interval [xm − U ; xm + U ] provided by the expanded uncertainty U about the measured value xm and the position of threshold inside this interval is not enough to quantify the doubt. To fully understand this point, it is important to recall some basic concepts reported in Chap. 5. Section 5.5.3 showed that it is possible to associate a coverage probability to interval [xm − U ; xm + U ], if and only if explicit or implicit assumptions can be made regarding the probability distribution characterized by the measurement result and its combined standard uncertainty. This means that this probability distribution must be known or assumed, by scientific judgment based on all of the available information on the possible variability of the measurand [2]. Once the probability density function p(x) representing the measurement result is known, the probability that the measurand falls below a given threshold t is obtained as [3] t p (x) d x (8.1) Pr (x ≤ t) = −∞
Equation (8.1) has also a geometrical interpretation: it is the area subtended by function p(x) from −∞ to t, as shown in Fig. 8.3, where the distribution of the values that can be attributed to measurand x is assumed to be normal. The probability that the measurand value falls below the threshold is given by the gray area.
8.3 The Probability of Wrong Decision
119
2.5
Probability
2
1.5
1
0.5
0 0.5
1
1.5
2
2.5
Values attributed to measurand
Fig. 8.3 Example of comparison with a threshold. The measurement result is represented by a normal probability distribution about the measured value xm = 1.56, with an 11% relative standard uncertainty. The threshold value is t = 1.5. The probability that x ≤ t is given by the gray area
The integral in (8.1) is not always easy to evaluate. It is generally simpler to refer to the cumulative probability distribution function of a random variable X , defined as [3] x FX (x) = p (x) d x (8.2) −∞
since the equation of FX (x) is generally known for the most common probability density functions. By comparing (8.2) with (8.1), it can be readily obtained that Pr (x ≤ t) = FX (t)
(8.3)
Figure 8.4 shows the cumulative probability distribution function for the same example as the one shown in Fig. 8.3. In this case, the desired probability is given by the value taken by the function at the threshold value. The probability Pr (x > t) that the measurand value is greater than the threshold is readily obtained as Pr (x > t) = 1 − FX (t). It is also worth noting that, according to the concepts explained in Chap. 5 and the fundamental principles of probability [3], the obtained probability value is strongly dependent on the probability distribution considered to represent the measurement result. Therefore, the assumptions made to identify this probability distribution are of critical importance in the correct representation of the measurement result and the
120
8 Uncertainty and Conscious Decisions 1 0.9 0.8
Cumulative probability
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5
1
1.5
2
2.5
Values attributed to measurand
Fig. 8.4 Example of comparison with a threshold using the cumulative probability distribution function. The same measurement result and threshold as those in Fig. 8.3 are considered. The probability that x ≤ t is the function value in t
correct evaluation of the risk of wrong decision. The need for a scientific judgment based on all of the available information on the possible variability of the measurand, and hence for experienced operators, is reaffirmed. Numerical example Let us assume that a measurand x must be compared with a threshold t. For the sake of simplicity, let us neglect the measurement unit and let us only assume that x and t are, obviously, expressed with the same measurement unit. Let us also assume that a measured value xm = 1.56 is obtained by the employed measurement system, and that a standard uncertainty is evaluated as u(x) = 0.11 · xm , that is the standard uncertainty is assumed to be the 11% of the measured value. The distribution of the values that can be attributed to the measurand is supposed to be normal, thus expressed by the same equation as (5.5) p (x) = √
1 2π σ 2
e
−(x−μ)2 2σ 2
(8.4)
where μ = xm and σ = u(x). Let us now assume that a decision must be taken on whether the measured value is above the threshold when t = 1.5. This is the situation shown in Fig. 8.3. Obviously, if the sole numerical values are taken into account, the decision would undoubtedly be that the measured value is above the threshold. However, the gray area in Fig. 8.3 shows that there is a non-zero probability that some of the values that can be attributed to the measurand are not greater than t. To evaluate this probability,
8.3 The Probability of Wrong Decision
121
let us refer to Fig. 8.4: the desired probability is the value assumed by the cumulative probability distribution function at x = t (dashed line in Fig. 8.4). The cumulative probability distribution functions associated with (8.4) is given by [3] x −μ 1 (8.5) 1 + erf √ FX (x) = 2 2·σ where erf(z) =
∞ 2 (−1)n · z 2n+1 π n=0 n! · (2n + 1)
(8.6)
From a strictly theoretical point of view, (8.5) cannot be computed, because of the infinite summation in (8.6). However, (8.6) is generally rapidly converging so that approximating it with a few terms2 provides a very good approximation of FX (x). If the given numerical values are used in (8.6) and (8.5), the probability that measurand x falls below threshold t results to be 36%, showing that there is a quite reasonable doubt that, in the considered example, the measurand does not exceed the threshold, even if the measured value xm does. To show also the effect of the different probability distributions, the same example as the one above is considered again, this time by assuming that the distribution of the values that can be assigned to the measurand is uniform, over approximately the same support as the one considered above. Therefore, the possible measurand values are now supposed to uniformly distribute over interval [xm − 3u(x); xm + 3u(x)] so that, in this case, the probability distribution is given by 1 for xm − 3u(x) ≤ x ≤ xm + 3u(x) p (x) = 6u(x) (8.7) 0 for x < xm − 3u(x) and x > xm + 3u(x) The obtained probability distributions is shown in Fig. 8.5, where the probability that the measurand falls below threshold t = 1.5 is always represented by the gray area. In this simple case, the gray area can be evaluated by means of a simple proportion. However, to be consistent with the general formulation, let us evaluate the cumulative probability distribution function also in this case. It is readily obtained from (8.2) as ⎧ 1 ⎨ 6u(x) · x − FX (x) = 0 ⎩ 1
xm −3u(x) 6u(x)
for xm − 3u(x) ≤ x ≤ xm + 3u(x) for x < xm − 3u(x) for x > xm + 3u(x)
and is shown in Fig. 8.6. 2
In general, limiting the summation to 5–10 terms provides very good approximations.
(8.8)
122
8 Uncertainty and Conscious Decisions 1 0.9 0.8
Probability
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5
1
1.5
2
2.5
Values attributed to measurand
Fig. 8.5 Example of comparison with a threshold. The measurement result is represented by a uniform probability distribution about the measured value xm = 1.56. The threshold value is t = 1.5. The probability that x ≤ t is given by the gray area 1 0.9
Cumulative probability
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.5
1
1.5
2
2.5
Values attributed to measurand
Fig. 8.6 Example of comparison with a threshold using the cumulative probability distribution function. The same measurement result and threshold as those in Fig. 8.5 are considered. The probability that x ≤ t is the function value in t
8.4 Concluding Remarks
123
If the given numerical values are used in (8.8), the probability that measurand x falls below threshold t results now to be 44%, showing that there is, once again, a quite reasonable doubt that, in the considered example, the measurand does not exceed the threshold, even if the measured value xm does, and that this doubt depends on the considered probability distribution in a quite significant way.
8.4 Concluding Remarks The considerations reported in this chapter can be considered as the conclusive evidence of the importance of a correct metrological approach to any decision-making process when the decision is based on a measurement result. It was shown that, in the absence of uncertainty evaluation and specification, decisions appear to be always taken with full certainty, as shown by the two cases of Fig. 8.1. On the contrary, when uncertainty is correctly considered, decisions apparently not involving any doubt can turn into much troublesome decisions, as in cases (c) and (d) of Fig. 8.2. If the measurement results in these two cases are presented to the trier of fact without considering uncertainty, a significant amount of information on the presence of doubt is kept hidden. If uncertainty is correctly evaluated and the distribution of the values that can be attributed to the measurand is also correctly estimated, it is possible to detect whether a decision based on the obtained measurement result involves a doubt about its correctness and the doubt itself can be quantified. The reported numerical example, taken from a real case that will be discussed in the next Chap. 9, shows that the doubt can be quite reasonable and may lead to exoneration or less serious verdict (as in the real case), while the same decision, absent uncertainty, appears doubtless and will almost certainly lead to a sentence of guilt, and, consequently, to a possible miscarriage of justice. Therefore, it can be concluded that metrology must be part of the technical expert’s competence to provide the trier of fact with the whole available quantitative information, possible doubts included, and the fundamental concepts of metrology should be part of the background of the trier of fact, so that he or she can correctly interpret the pieces of information coming from the technical expert. Furthermore, being the metrology concepts universal and uncertainty evaluation based on strict standardized and accredited procedures, as shown in the previous chapters, the quantitative evaluation of the doubt of a wrong decision discussed in this chapter does not leave room to subjective evaluations and interpretations, as recommended by the PCAST report [4].
124
8 Uncertainty and Conscious Decisions
References 1. BIPM JCGM 200:2012: International vocabulary of metrology—basic and general concepts and associated terms (VIM), 3rd edn (2012). http://www.bipm.org/utils/common/documents/jcgm/ JCGM_200_2012.pdf 2. BIPM JCGM 100:2008: Evaluation of measurement data—guide to the expression of uncertainty in measurement (GUM), 1st edn (2008). http://www.bipm.org/utils/common/documents/jcgm/ JCGM_100_2008_E.pdf 3. Blitzstein, J.K., Hwang, J.: Introduction to Probability. Chapman and Hall—CRC Press, Boca Raton, FL (2014) 4. President’s Council of Advisors on Science and Technology (PCAST): Report to the president— forensic science in criminal courts: ensuring scientific validity of feature-comparison methods (2016). https://tinyurl.com/abv5mufh
Part III
Practical Cases
Chapter 9
Breath and Blood Alcohol Concentration Measurement in DUI Cases
9.1 DUI Cases Accidents and injuries caused by drivers who were driving under the influence of alcohol or other drugs have represented a problem since the very beginning of the automobile era, about 150 years ago [1]. Therefore, assessing whether an individual were intoxicated and whether the level of intoxication were compatible with driving a vehicle in a safe way has very soon become a forensic problem [1]. While it was soon clear that the behavior of an individual, after having assumed alcohol or drugs, was determined by its metabolism, and that the quantity of assumed substance could be measured by its concentration in blood, it took several decades to identify a reliable method to measure it, at the beginning limited only to blood alcohol concentration (BAC) determination. Moreover, the direct method of BAC determination requires to take blood samples, and hence qualified medical personnel and lab tests that do not generally provide immediate results. Therefore, such a method was not suitable for quick, roadside tests run by road or highway patrols charged to monitor traffic and identify and sanction misbehaving drivers. For this reason, methods involving the analysis of other bodily fluids, such as urine, sweat, and breath, that could be tested without the need for qualified medical personnel have started to be deeply investigated since the beginning [1]. Breath alcohol concentration (BrAC) determination has appeared to be the most reliable and most rapid method to indirectly measure BAC, since the 40s of last century, when portable instruments capable of detecting BrAC started to be produced. However, the reliability of BrAC methods to measure BAC has been challenged since the beginning and the discussion on whether BrAC measurement may provide a good estimation of BAC is still ongoing [2]. Indeed, BrAC measurement represents a perfect case study for forensic metrology methods, since the desired final result— the BAC measurement—is significantly affected by all contributions to uncertainty, from definitional uncertainty to instrumental uncertainty. It represents also a perfect case study to show how definitional uncertainty can provide a good quantitative esti© Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_9
127
128
9 Breath and Blood Alcohol Concentration …
mation of the foundational validity defined by the PCAST report [3] and instrumental uncertainty can provide a good quantitative estimation of the validity as applied also defined by the PCAST report [3].
9.2 The Measurement Principle The method for measuring BAC through a BrAC measurement is based on a wellknown physical law, Henry’s law, that dates back to 1803 [4]. This law is very simple in its formulation and states that the amount of dissolved gas in a liquid is proportional to its partial pressure1 above the liquid. The proportionality factor is called Henry’s law constant. According to this law, if a gas and a liquid are in equilibrium in a closed recipient, it is possible to know the concentration of a substance in the liquid by measuring its concentration in the gas above the liquid. If the concentration in gas is C g and the concentration in liquid is Cl , the two quantities are related by the following relationship: Cl = k H · C g
(9.1)
where k H is Henry’s law constant for that specific liquid and that specific gas. Henry’s law can be effectively used to model how oxygen (O2 ) and carbon dioxide (CO2 ) are exchanged between breathed air and blood in our lungs. Carbon dioxide leaves blood since its partial pressure in blood is higher than that in the alveolar air. On the other hand, inhaled alveolar air is rich in oxygen, that is released to the blood, since oxygen partial pressure in the alveolar air is higher than that in blood. The same principle can be applied to any other substance dissolved in blood, such as, when alcohol intoxication is considered, ethanol (CH3 CH2 OH) or ethyl alcohol, which is the main intoxicating element of alcoholic beverages. If the substance is present in the blood, but not in the inhaled air, it can be found in the exhaled air in a quantity that is proportional to the quantity in the blood, according to Henry’s law constant. This gas exchange is schematically depicted in Fig. 9.1. When a mixture of human blood and ethanol is considered, Henry’s law constant was proven to be 2100. This means that if the mixture of blood and ethanol is left in equilibrium with the air above it, every 2100 molecule of ethanol in the mixture, 1 molecule will migrate in the air. Therefore, if a concentration of 1 mg/l of ethanol is measured in the air, the concentration in blood would be 2.1 g/l, that is, 2100 times greater than that in the air.
1
It is worth remembering that, in a mixture of gases, the partial pressure of each constituent gas is the pressure of that constituent gas if it alone occupied the same volume as the whole mixture at the same temperature.
9.2 The Measurement Principle
129
Blood Inh
Exh
ale
Bronchiole
ale
da
da
ir
Blood vessel
ir CO2
Alveolus
Permeable wall
Ethanol
O2
Fig. 9.1 Gas exchange inside a pulmonary alveolus
Assuming that the same physical phenomenon occurs inside the pulmonary alveoli, it is possible to state that BAC can be obtained from the measured BrAC by simply multiplying it by 2100. Indeed, this ratio is used in North America. While the measurement principle, being based on a well-known physical law, is scientifically sound, the direct application of the 2100:1 ratio gives origin to reasonable doubts, because the environment in which the gas exchange occurs—the pulmonary alveoli—is significantly different from that of a lab experiment, in dimensions and in the different influence quantities that may affect the exchange, such as the presence of the alveolus permeable wall with different thickness and permeability, different pressure and temperature conditions, unknown turbulence in both blood and breath flow, etc. Taking into account the effect of all influence quantities in a reliable model is almost impossible, since they depend on the anatomy of the different individuals, their age and their health status. Consequently, as seen in Sect. 5.2.1 of Chap. 5, the imperfect knowledge of the measurement model and the influence quantities and their interactions with the measurand originates the definitional uncertainty contribution that must be evaluated to ensure the validity of the measured values.
130
9 Breath and Blood Alcohol Concentration …
9.3 Definitional Uncertainty Evaluation To assess the foundational validity [3] of this method and quantify it in terms of definitional uncertainty, two main questions should be answered to: 1. Does the BAC/BrAC ratio remain the same in a given individual, throughout the different (absorption, distribution, and elimination) stages of the alcohol metabolic process? 2. Does the BAC/BrAC ratio remain constant from individual to individual, or, if it does not, can this variability be quantified and characterized by a probability distribution? Different controlled drinking experiments have been conducted on volunteers to find an answer to the above questions. The best known experiment was performed by Jones and Andersson [5] on 18 volunteers (9 men and 9 women), who were asked to drink the same amount of ethanol per kg of body weight. Venous blood samples were taken from an indwelling catheter at given times after the start of drinking. At the same time, the subjects provided breath samples into the employed breath analyzers, so that the measured BAC and BrAC could be compared and their ratio evaluated. The results [5], also confirmed by other experiments [6], show that the BAC/BrAC ratio has a similar pattern in all subjects, though with different values. The ratio remains pretty much constant in the elimination stage, while it shows significant differences in the earlier stages of absorption and distribution, thus giving evidence of the problems that the presence of mouth alcohol might create, if not properly dealt with, as discussed later. The result of this study proved that, especially in the elimination stage, BAC can be suitably obtained by means of a BrAC measurement by simply multiplying it by a constant. However, it gave also clear evidence that this constant is not the same for all subjects, thus confirming that the second question needs a reliable answer to provide a correct metrological interpretation to the BrAC measurements. Several experiments [7–9] have been conducted to find this answer. The best known was conducted, once again, by Jones and Andersson [7] on a sample of 793 drivers whose BrAC and BAC were simultaneously measured. This experiment was conducted over a 3-year period, from 1992 to 1994. The result of this study showed that the mean value of the BAC/BrAC ratios measured on the whole investigated population is 2300, higher than the theoretical 2100. It showed also, as expected, a significant variability among individuals. This variability can be quantified by the standard deviation of the measured ratios, equal to 250. The considered sample size is high enough, as well as the number of factors that contribute to the dispersion of values, to assume that the ratio values distribute according to a normal probability distribution, centered on 2300, with standard deviation equal to 250. This probability distribution is plotted in Fig. 9.2.
9.3 Definitional Uncertainty Evaluation
131
Fig. 9.2 Probability distribution of the BAC/BrAC ratio among the different individuals
Therefore, when BrAC is measured to obtain BAC, this last value can still be obtained by multiplying the BrAC value by a constant that, after Jones experiment [7], is not the theoretical 2100 Henry’s law constant, but the experimentally obtained 2300. However, this value is not actually a constant, since it may differ from individual to individual, according to the probability distribution shown in Fig. 9.2. This means that when the measured BrAC value is multiplied by 2300, it is indeed multiplied by a probability distribution, centered on 2300, so that a single value is not obtained, but a set of values that can be assigned to the desired BAC value. Under the PCAST report [3] perspective, the obtained distribution represents the foundational validity of the method, since the BAC value obtained from the measured BrAC value is valid within the limit of this probability distribution: a single value is not obtained, but a set of values within which the desired BAC value lies with a given probability. From a strict metrological perspective, this represents the definitional uncertainty contribution of the method. Since standard uncertainty is defined as the standard deviation of a probability distribution, it can be readily obtained, in this case, from the standard deviation of the probability distribution shown in Fig. 9.2. If K H = 2300 is the scale factor to convert the measured BrAC value into the desired BAC value, the definitional uncertainty contribution to standard uncertainty is given, in relative terms, by (9.2) u dr K H = 250/2300 = 0.11
132
9 Breath and Blood Alcohol Concentration …
Fig. 9.3 Probability distribution of the BAC values obtained from a measured BrAC value of 0.37 mg/l
Therefore, every BAC value CB obtained from a measured BrAC value CBr as CB = K H · CBr will be affected by a definitional uncertainty contribution u (CB ) = CB · u dr K H . Numerical example Let us assume that a BrAC value CBr = 0.37 mg/l is measured. The corresponding BAC value is obtained as CB = K H · CBr = 0.851 g/l. The associated contribution to standard uncertainty due to definitional uncertainty is therefore given, according to (9.2), by u (CB ) = CB · u dr K H = 0.851 · 0.11 = 0.094 g/l.2 Therefore, the actual BAC is represented by a normal probability distribution centered on 0.851 g/l, with a standard deviation of 0.094 g/l, as shown in Fig. 9.3.
9.3.1 An Important Influence Quantity: The Presence of Mouth Alcohol When BrAC is measured, the measurement method assumes that all ethanol molecules detected in the exhaled air come from the gas exchange that takes place in the pulmonary alveoli. However, in the first stages after alcohol assumption, several molecules of ethanol, which is a highly volatile substance, are present in the oral cav2
It is worth reminding that the GUM [10] recommends to express uncertainty with only two significant digits.
9.3 Definitional Uncertainty Evaluation
133
ity and in the upper respiratory tract without being present in blood. These molecules are measured by the instrument used to measure BrAC, thus corrupting BAC evaluation that appears to be significantly higher than it really is. To avoid this gross measurement error, the breathalyzers, that are the instruments used to measure BrAC, implement a specific measurement procedure called slope detector. This procedure analyzes the exhaled breath during the whole exhalation. If the ethanol molecules come from the lungs, BrAC is supposed to increase during exhalation, since pulmonary air is exhaled after the air in the upper respiratory tract, with nil or very low ethanol presence, has been exhaled. BrAC is hence supposed to reach a plateau and remain constant during the last part of exhalation. On the other hand, if mouth alcohol is present, it is supposed to be detected by the breathalyzer during the initial stage of exhalation, so that no increment, or very little increment in BrAC, is detected during exhalation. The slope detector analyzes the variation in BrAC all along exhalation and, if no significant increment is detected during exhalation, it assumes that alcohol is present in the oral cavity or in the upper respiratory tract and flags the measurement result as invalid. It was proved by Hlastala et al. [11] that this method works only in the early stage—15 to 30 min—after alcohol assumption, and is likely to fail in detecting the presence of alcohol in the oral cavity or the upper respiratory tract after this time. The reason is that alcohol is highly soluble in water, and the whole respiratory tract is quite humid, with the level of moisture that increases along the trachea. Therefore, at each inspiration act, molecules of alcohol are inhaled from the oral cavity down into the trachea, where they dissolve in the water moisture [12]. The deeper they go, at each inspiration, the later they come back to the breathalyzer, mixing with the alcohol molecules coming from the pulmonary air so that the slope detector is likely to fail in detecting the presence of mouth alcohol [11]. If the slope detector fails in flagging the measurement result as invalid, the measured BrAC may provide a higher BAC than it actually is. The error is difficult to quantify, because of the complexity of the mathematical model of soluble gas exchange in the airways and alveoli [12], and because it depends on several factors, including the actual BAC, the amount of mouth alcohol, the time elapsed after ingestion, and the presence of gastroesophageal reflux [13] that might prolong the presence of mouth alcohol. Experimental tests show that the error might be as high as 20%. This additional uncertainty contribution should be also considered, unless the presence of mouth alcohol can be reasonably excluded or several breath samples are taken, over a period of time of not less than 15 minutes, to be reasonably sure that the measured alcohol comes from the pulmonary air. Such a long observation interval is beneficial also to establish in which stage (absorption, distribution, and elimination) of the metabolic process the samples have been taken by observing their trend.
134
9 Breath and Blood Alcohol Concentration …
9.4 Instrumental Uncertainty Evaluation Up to here only the definitional contribution to uncertainty has been considered, as it is related to the limits of validity of the proposed method. While this is the first contribution to consider, to assess whether the method can be applied in principle [3], instrumental uncertainty shall be considered too, to have a complete quantitative evaluation of measurement uncertainty in the specific case under discussion. As stated in Chap. 6, instrumental uncertainty can be evaluated only through a proper calibration procedure. However, breathalyzers are legal metrology instruments and their features and metrological characteristics are ruled by the recommendations issued by the OIML—International Organization of Legal Metrology— that impose the Maximum Permissible Errors (MPEs) that the instruments shall not exceed during normal operations. In the case of breathalyzers, the pertinent recommendation is recommendation R126 [14]. According to this recommendation, the MPE for type approval, initial verification, and verification after repair is ±0.020 mg/l, or the 5% of the reference value of mass concentration,3 whichever is greater, for a BrAC measurement range up to 2.00 mg/l, which, for a ratio constant K H = 2300, corresponds to a BAC measurement range of 4.6 g/l. For a measurement range greater than 2.00 mg/l, the MPE shall be MPE = 0.5 · Cref − 0.9 mg/l, where Cref is the reference value of mass concentration. The MPE for breathalyzers in service, that is, during periodic verifications after the initial one, shall be, always according to R126 [14], ±0.030 mg/l, or the 7.5% of the reference value of mass concentration, whichever is greater, for a BrAC measurement range up to 2.00 mg/l. For a measurement range greater than 2.00 mg/l, the MPE shall be MPE = 0.75 · Cref − 1.35 mg/l, where Cref is the reference value of mass concentration. Another important parameter considered by the OIML recommendation R126 [14] is the drift. It is recognized that the values measured by the breathalyzers may drift over time, that is, may vary in time even if the mass concentration of the reference gas does not vary. Therefore, recommendation R126 imposes a limit to the permissible drift. Three different situations are considered. • Zero drift—When the reference value of mass concentration is 0.00 mg/l (that is, no ethanol is added), the measured drift shall be less than 0.010 mg/l in 4 h. • Short-term drift—The drift measured for a reference value of mass concentration of 0.40 mg/l, which, for a ratio constant K H = 2300, corresponds to a BAC of 0.92 g/l, shall be less than 0.010 mg/l in 4 h. • Long-term drift—The drift measured for the same reference value of mass concentration of 0.40 mg/l shall be less than 0.020 mg/l in 2 months. This is probably the most important of the three considered limits, because it imposes the maximum permissible drift over time, to ensure that the errors do not increase beyond a given limit in between two successive periodic verifications. 3
This value corresponds to the concentration of ethanol in the reference gas sent to the breathalyzer during the tests, being this concentration expressed in mg of ethanol per liter of gas.
9.5 Practical Cases
135
The OIML recommendation R126 [14] does not impose a specific time interval for periodic verifications, referring it to the National Authorities. However, it imposes that this interval shall not exceed the durability4 specified by the manufacturer. Most National Authorities impose that periodic verifications shall be performed every 12 months. The OIML recommendation R126 [14] is the basis for the laws issued in the countries that employ breathalyzers to sanction DUI. Commercially available breathalyzers do comply with these MPEs and show similar behavior, as shown by intercomparison studies [15]. By comparing the contributions to instrumental uncertainty given by the MPEs imposed by the R126 recommendation [14] and the national laws with the definitional uncertainty contribution discussed in Sect. 9.3, it can be readily perceived that the instrumental uncertainty contribution is quite negligible if compared with the definitional uncertainty contribution. Unfortunately, as we will see in the next Section, discussion, in DUI trials, focuses, when the accuracy of BAC measurements is challenged, on the instruments and not on the validity limits of the method.
9.5 Practical Cases DUI cases have probably represented the most popular application field of forensic metrology [16, 17], because of the many metrological implications that have been highlighted in the previous sections of this chapter and the impressive number of cases treated all over the world. It is almost impossible to cover all cases and, therefore, we are limiting the discussion to only three Italian cases that appear to be emblematic of the different, often partial way metrology-related problems are considered in courts. More in general, the results of BrAC tests are challenged under the following two different perspectives: • The metrological logbook of the employed breathalyzer is scrutinized to check whether all periodic verifications have been performed in due time, according to the time interval imposed by the National Authorities. If the prescribed time interval was exceeded, the capability of the instrument to provide valid results is disputed on a legal, rather than scientific base, since not all operations required by the National Authorities to grant validity of the measured values have been duly accomplished. While this motivation has little to do, technically speaking, with the true capability of the instrument to measure the BrAC values correctly, it is often favorably received by the judges because, being more directly related to legal terms, it is probably easier to understand and justify in their verdicts. • All contributions to measurement uncertainty are evaluated and associated with the measured BrAC value, thus changing a simple measured value into a measure4
Durability is defined as the period of time over which the breathalyzer shall maintain stability of its metrological characteristics.
136
9 Breath and Blood Alcohol Concentration …
ment result. The obtained measurement result is then converted into a BAC value, together with its associated uncertainty value, and compared with the limits set by the National Authorities. The probability of not having exceeded such limits even when the measured value exceeds the limit is also computed, as shown in Chap. 8. The obtained probability—that is, the probability of a wrong decision—is then presented to the trier of fact, who can consider it in his or her verdict. The second approach appears to be better scientifically grounded and more in favor of a fair application of the law, since it results in exoneration only when there is a reasonable doubt, given by the implicit limits of the employed method and instrument, that the measured BAC value does not exceed the law limit. In this respect, this second approach cannot be considered as a legal quibble to grant exoneration to the defendant, since measurement uncertainty does not favor neither party.
9.5.1 Verdict 1759/2016 of Genova Criminal Court This DUI case treated in front of Genova5 Criminal Court is related to a motorbike driver who was pulled over by the police and tested with a breathalyzer, in 2013. According to the Italian law,6 two tests were performed, 5 minutes apart from each other. Once reported to BAC values, the two tests showed a BAC concentration of 1.56 g/l and 1.51 g/l, respectively. According to the law, the lowest value of 1.51 g/l was retained. The Italian traffic law sets strict limits for DUI crimes. In particular, its article 186 states that DUI offense is committed if BAC exceeds 0.5 g/l. A BAC value greater than 0.5 g/l, but lower or equal than 0.8 g/l is punished with a fine from e 543 up to e 2170, and the driving license is suspended for 6 to 12 months (art. 186, paragraph a). A BAC value greater than 0.8 g/l, but lower or equal to 1.5 g/l is punished with a fine from e 800 up to e 3200, imprisonment up to 6 months, and the driving license is suspended for 6 to 12 months (art. 186, paragraph b). A BAC value greater than 1.5 g/l is punished with a fine from e 1500 up to e 6000, imprisonment from 6 to 12 months, and the driving license is suspended for 1 to 2 years (art. 186, paragraph c). In this last case, the vehicle is also confiscated, unless it does not belong to the prosecuted driver. For the sake of completeness, aggravating circumstances are also considered (such as causing an accident or driving at night) which increase sanctions including driving license withdrawal. In this case, the retained value of 1.51 g/l is greater than the upper limit considered in paragraph c of art. 186 of the traffic law, hence the driver was most likely going to be sanctioned with the strictest sanction. However, if measurement uncertainty is 5
Genova is an important town of Italy, sitting on the Mediterranean sea, in the northwest part of Italy. 6 The Italian law that rules BrAC tests and sets requirements for the breathalyzers is DM 196 of May 22, 1990.
9.5 Practical Cases
137
considered, the probability that the actual BAC value is lower than the 1.5 g/l limit is surely non-negligible. Two major contributions were raised by the technical expert7 : 1. Missing uncertainty evaluation. According to all considerations in the previous sections of this chapter and those presented in Chap. 8, it is rather intuitive that measurement uncertainty cannot be disregarded when assessing whether the measured value of the BAC exceeded the law limit or not. 2. Instrument’s drift. As seen in Sect. 9.4, a long-term drift up to 0.020 mg/l in 2 months is considered admissible by the OIML recommendation R126 [14] and the Italian National law on breathalyzers.8 When converted into BAC values, this admissible drift corresponds to 0.046 g/l in 2 months. The analysis of the metrological logbook9 of the employed breathalyzer showed that the last verification performed on the instrument occurred 6 months before the defendant was tested and therefore the instrument’s drift should have been considered in assessing the defendant’s BAC value. The OIML recommendation as well as the manufacturer’s specifications require that the drift does not exceed 0.020 mg/l (in BrAC), or 0.046 g/l (in BAC) in 2 months. Nothing is specified over longer periods of time, but it is expected that, if a drift occurs, it does not stop after 2 months. It is plausible to assume that it continues at the same rate, so that, 6 months after the last verification, the drift can likely be 3 times the allowed one. Since nothing is said about the direction of the drift, it can go either way, and hence increase or decrease the measured values. Considering that the most favorable option to the defendant must be considered in criminal proceedings,10 a positive drift shall be considered, and it is assumed that the value provided by the instrument can be increased by up to 3 times 0.046 g/l, that is 0.138 g/l in terms of BAC. The drift effect can be considered (and was considered by the technical expert) in two ways. The first one assumes the most favorable situation for the defendant: a positive 0.138 g/l drift occurred, so that the breathalyzer reading must be reduced by this amount. Hence, the actual measured value is Cc = 1.51 − 0.138 = 1.372 g/l. Having corrected the measured value for the systematic drift effect, it is possible to consider all other contributions to uncertainty. The contribution u d (C) given by definitional uncertainty can be evaluated, by applying (9.2), as u d (C) = 1.51 · 0.11 = 0.17 g/l
7
(9.3)
It is worth noting that, the case being discussed in front of an Italian court, in a civil law system, technical experts are not considered as technical witnesses. 8 The already mentioned DM 196/1990. 9 The Italian law of legal metrology imposes that a logbook is associated to every legal metrology instrument and that the result of every operation performed on the instrument (periodic verifications as well as repairs) is reported in the logbook. The logbook has to be shown, upon request, by any involved party in a proceeding originated by values measured by that specific instrument. 10 This refers to the already mentioned and known “in dubio pro reo” principle of jurisprudence.
138
9 Breath and Blood Alcohol Concentration …
Probability distribution of the BAC values
3
2.5
2
1.5
1
0.5
0
0.6
0.8
1
1.2
1.4
1.6
1.8
2
BAC values [g/l]
Fig. 9.4 Probability distribution of the possible defendant’s BAC values obtained after correction for the possible drift. The dashed line shows the corrected measured value of 1.372 g/l and the vertical solid bar shows the 1.5 g/l law limit
The instrumental uncertainty contribution u i (C) can be obtained from the manufacturer’s specifications.11 For the measurement range up to 2.0 g/l of BAC, the uncertainty value specified by the manufacturer is the 1.5% of the reading. Therefore, the instrumental uncertainty contribution can be evaluated as u i (C) = 1.51 · 0.015 = 0.023 g/l
(9.4)
These two contributions to uncertainty can be finally combined to get the measurement uncertainty associated with the measured BAC value: (9.5) u (C) = u d (C)2 + u i (C)2 = 0.17 g/l The probability distribution of the possible values of the defendant’s BAC is therefore shown in Fig. 9.4. It is then possible to evaluate the probability that the actual defendant’s BAC value is below the 1.5 g/l law limit by applying (8.5) seen in Sect. 8.3 of Chap. 8: an 80.2% probability is obtained, which quantifies the doubt that the 1.5 g/l BAC limit considered by paragraph c of art. 186 of the Italian traffic law has been actually exceeded by the defendant. As stated above, a second approach is possible to consider the instrument’s drift, less favorable to the defendant, though it exploits the available information in a more 11
The breathalyzer used in this case was a Dräger Alcotest 7110 Standard IR + EC.
9.5 Practical Cases
139
metrologically sound way. While the maximum drift, in 6 months, can be as large as 0.138 g/l, as above computed, the actual drift might assume any value inside this −0.138 — +0.138 g/l interval. Therefore, the possible drift can be considered as an additional contribution u drift (C) to uncertainty. Assuming that the possible drift distributes uniformly inside this interval, according to (5.11), it is 0.138 u drift (C) = √ = 0.080 g/l 3
(9.6)
The definitional uncertainty and instrumental uncertainty contributions are obtained from (9.3) and (9.4), respectively, having considered a measured value of 1.51 g/l of BAC. The following values are obtained: u d (C) = 0.17 g/l and u i (C) = 0.023 g/l. The obtained three contributions to uncertainty can be finally combined to get the measurement uncertainty associated with the measured BAC value: (9.7) u (C) = u d (C)2 + u i (C)2 + u drift (C)2 = 0.19 g/l The probability distribution of the possible values of the defendant’s BAC is therefore obtained also in this case and is shown in Fig. 9.5. The probability that the actual defendant’s BAC value is below the 1.5 g/l law limit can be evaluated also in this case by applying (8.5) seen in Sect. 8.3 of Chap 8:
Probability distribution of the BAC values
2.5
2
1.5
1
0.5
0 0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
BAC values [g/l]
Fig. 9.5 Probability distribution of the possible defendant’s BAC values obtained without applying any correction for the possible drift, but considering the possible drift as an additional contribution to uncertainty. The dashed line shows the measured value of 1.51 g/l and the vertical solid bar shows the 1.5 g/l law limit
140
9 Breath and Blood Alcohol Concentration …
a 47.8% probability is obtained, much lower than the one obtained considering the maximum possible drift in a deterministic way, but still high enough to consider quite reasonable the doubt that the 1.5 g/l BAC limit considered by paragraph c of art. 186 of the Italian traffic law has not been actually exceeded by the defendant.
9.5.1.1
The Verdict
The above considerations were presented to the judge and the role of the different contributions to uncertainty widely explained during a long hearing. The judge understood these concepts and reported them in the motivation of his verdict fairly well. However, he refused to consider uncertainty with the following motivation12 : The scientific method considered by the law to measure BAC through BrAC measurements is based on the application of scientific knowledge in the form of technological instrumentation that can suitably provide interpretation means of objective and positively grounded data related to the individual, apt to make the result, based on an explicative reasoning, coherent and enforced by laws substantiated with juridical qualification. This implies the groundlessness of the main thesis of the defense. Indeed, since it assumes total uncertainty about the correctness of any measured value provided by the breathalyzer, it challenges the framework on which the ascertainment of this considered crime is based and, consequently, the principle defined by the legislator to this aim. In other words, since the law does not explicitly consider that a measurement result is generally affected by uncertainty, the judge is bound to adhere to the law and cannot consider the doubt that, in this particular case where the measured BAC value was very close to the law limit, the limit itself was not actually exceeded. On the other hand, the judge seems to assign greater relevance to the instrument’s handbook. He wrote, in the motivation of the verdict: To fully frame the case in point, the indications related to the breathalyzer’s drift reported in the handbook of the employed instrument cannot be disregarded. According to these specifications, the instrument is subject, in the six months elapsed since the last calibration, to a downgrade of its performance that, when a measurement is performed, can result in a deviation of 1 mg/l with respect to the true value of the measurand. This considered, it cannot be excluded, based on the available pieces of information, that the result provided by the employed instrument had been affected by a drift in the range given by the manufacturer, so that the drift shall be considered, in favor rei, as actually occurred in the terms provided by the manufacturer with respect to the measured value. It seems that the judge felt more confident to trust the data reported in the instrument’s handbook (that has to be officially annexed to the instrument itself) rather than a scientific judgment of measurement uncertainty, performed according to the 12
The official text of the verdict was translated into English by the authors.
9.5 Practical Cases
141
best practice in metrology. In any case, since the possible drift might have reduced the actual BAC value below the 1.5 g/l law limit, the judge reduced the charges from those considered in paragraph c of art. 186 of the Italian traffic law to those considered in paragraph b of the same article. Considering that the evaluated uncertainty values showed that the possible BAC values were well within the range considered by paragraph b, the verdict appears to be fair to the defendant, though probably not based on strict metrological considerations.
9.5.2 Verdict 1574/2018 of Brescia Criminal Court This DUI case treated in front of Brescia13 Criminal Court is related to a DUI offense detected in February 2015 and differs from the previous one for two main reasons: 1. The measured BAC values where much higher than the highest limit, since the two BrAC tests performed on the defendant returned BAC values of 1.74 g/l and 1.70 g/l and this last one was retained. 2. The defendant caused a road accident that, luckily, did not cause any casualty, but only damages to the involved cars. It is worth noting that the accident did not have more dramatic consequences, as it could have had on the highway where it occurred, because the defendant was prompt enough to warn the incoming cars about the obstacles on the lane. Since such a prompt and responsible behavior is generally not compatible with such a high level of alcohol intoxication, it suggested to investigate not only measurement uncertainty, but also the operation of the employed breathalyzer.14 According to the above considerations, the technical expert appointed by the defense analyzed the metrological logbook of the employed breathalyzer before considering measurement uncertainty. In particular, the work of the technical expert focused on the results of the periodic verifications the instrument underwent in the previous years. According to the Italian law ruling breathalyzers,15 the periodic verification is aimed at checking that the errors16 at four different reference values of gas concentration (simulating BrAC values) of 0.200, 0.350, 0.700, and 1.500 mg/l do not exceed the MPEs (maximum permissible errors) imposed by the law. The errors reported in the metrological logbook are shown in Fig. 9.6. The error analysis highlights the following points: • The errors show an increasing trend from 2004 to 2008. This is compatible with the instrument’s drift over time and shows that the instrument was never adjusted. 13
Brescia is an industrial town of Italy, sitting in the middle of the Po plane, in the Northern part of Italy. 14 Also, in this case, a Dräger Alcotest 7110 Standard IR + EC was employed. 15 The already mentioned DM 196/1990. 16 During the tests, the deviation between the value returned by the instrument and the reference value is taken as the measurement error.
142
9 Breath and Blood Alcohol Concentration … 0,300
Measurement error [mg/l]
0,250
0,200
0,150
0,100
0,050
0,000 2004
2005
0,050
Reference value: 0.200 mg/l
2006
2007
2008
2009
2010
2011
2013
2014
2015
Verificaon me [year] Reference value: 0.350 mg/l
Reference value: 0.700 mg/l
Reference value: 1.500 mg/l
Fig. 9.6 Measurement errors reported in the metrological logbook of the employed breathalyzer for the tested values, from 2004 to 2015
Moreover, in 2006 and 2008, the error exceeded the MPEs but the instrument was left in service without annotating that it was operating outside the error limits. • In 2009, the instrument was not verified, because it arrived to the verification lab significantly damaged and was sent to service for repairs. This occurrence was duly annotated in the logbook. • Periodic verifications were resumed in 2010, after the instrument was repaired, and zero errors for all tested values were annotated in the logbook from 2010 on. The technical expert noted that an electronic instrument showing zero errors on all scales is a very unlikely event, not to say that it is practically impossible. There is a reasonable doubt that all verifications performed starting from 2010 were incorrect. The technical expert considered the errors measured from 2004 to 2008 as representative of the instrument behavior and assumed that the same behavior could also be representative of the 2010–2014 period, having the last verification before the defendant was tested occurred in June 2014. The measured defendant BAC value (1.70 g/l) corresponds, once divided by the K H = 2300 scale factor, to a BrAC value of 0.739 mg/l. Therefore, the same error as the one obtained with a reference concentration value of 0.700 mg/l can be attributed to this measured BrAC value. The last valid verification, in 2008, provided a relative error of +22% on this value. Assuming the same behavior of the employed breathalyzer, the corrected measured value of BrAC should therefore be 0.606 mg/l that correspond to a BAC value of 1.39 g/l.
9.5 Practical Cases
143
Probability distribution of the BAC values
3
2.5
2
1.5
1
0.5
0 0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
BAC values [g/l]
Fig. 9.7 Probability distribution of the possible defendant’s BAC values obtained after correction for the instrument error. The dashed line shows the corrected measured value of 1.37 g/l and the vertical solid bar shows the 1.5 g/l law limit
It is then possible to evaluate the measurement uncertainty affecting this BAC value. Since the correction for the potential drift is included in the considered error, the only remaining uncertainty contributions are • definitional uncertainty contribution u d (C) = 0.15 g/l, • instrumental uncertainty contribution u i (C) = 0.021 g/l that are quadratically combined to provide the measurement uncertainty u (C) = 0.15 g/l. The probability distribution of the possible values of the defendant’s BAC is then obtained as shown in Fig. 9.7. The probability that the defendant’s BAC is below the law limit of 1.5 g/l was evaluated by applying (8.5) seen in Sect. 8.3 of Chap. 8 to be 76.4%. Of course, the probability to be above the law limit of 1.5 g/l is only 23.6%, despite the measured value (1.70 g/l) is much higher than the law limit itself.
9.5.2.1
The Verdict
All above considerations were submitted to the judge by the technical expert appointed by the defense, during a dedicated hearing and were well received by the judge. Indeed, the judge recognized, based on the circumstances of the accident and the admission of the defendant of having drunk a couple of beers before driving his car, the existence of the DUI offense.
144
9 Breath and Blood Alcohol Concentration …
However, she also recognized that the measured BAC value does not represent the actual BAC value, absent uncertainty, and that the employed breathalyzer was not fully compliant with the requirements set by the law. She wrote, in the motivation of the verdict17 : The technical expert appointed by the defense ... clarified ... that the BAC value of 1.70 g/l provided, in the case in point, by the breathalyzer did not represent the real measured value, since a numerical value should have been assigned to uncertainty that is always associated with a measurement result. She also noted: Moreover, the driving behavior of the defendant is a piece of evidence eligible to prove, by itself, the existence of a state of alcohol intoxication when the fact occurred, considering the inadequate driving, the loss of control of the vehicle, the kind of the consequent accident and the gravity of the impact that involved other vehicles. However, it shall be noted that the breathalyzer with which the BrAC tests have been conducted was not compliant with the requirements set by the laws in force. Evidence was found, by close scrutiny of the metrological logbook of the breathalyzer ... that the periodic verifications ... exceeded the temporal limits required by the technical specifications to ensure the correct operations of the instrument. The judge took also into account the missing verifications of 2009, when the breathalyzer was damaged, and remarked that the instrument was not verified after it was repaired, despite this is clearly prescribed by the law. She also mentioned the difference in behavior up to 2008 and after the instrument was repaired, lending full credibility to the considerations of the defense. The judge concluded: Therefore, the incorrect execution of the periodic yearly verifications according to the prescribed time interval and conditions, and the anomalous operations of the instrument employed, in this specific case, by the police, have given evidence that the BAC value of 1.70 g/l returned by the alcohol test was affected by uncertainty that leads to re-evaluate the BAC value to 1.39 g/l, as clarified by the defense technical expert. Consequently, the judge reduced the charges from those considered in paragraph c of art. 186 of the Italian traffic law to those considered in paragraph b of the same article. Although, also in this case, the close relationship between measurement uncertainty and risk of wrong decision was not yet perceived, nevertheless the core of the metrological considerations raised by the defense expert were correctly exploited in the motivations adopted to reduce the charges. Taking into account that the measured BAC value was significantly higher than the law limit, this verdict appears to be a noteworthy step forward in referring to the metrology principles to render fair verdicts.
17
The official text of the verdict was translated into English by the authors.
9.5 Practical Cases
145
9.5.3 Verdict 1143/2019 of Vicenza Criminal Court This last DUI case considered here was treated in front of Vicenza18 Criminal Court and is related to a DUI offense detected in October 2016. It poses similar problems as the two previous cases and can be considered as an emblematic one. In this specific case, the offense was committed by a truck driver while driving back the sole tractor unit to the garage. The two BrAC tests he underwent returned two BAC values of 1.69 g/l and 1.62 g/l, and this last value was retained, according to the law. The employed breathalyzer was, in this case, a Lion INTOXILIZER 8000. Also, in this case, the metrological logbook of the breathalyzer was carefully scrutinized by the defense and, similar to the case reported in Sect. 9.5.2, it turned out that the 12-month term between the periodic verifications was always, but, in one case, exceeded in the 2008–2017 period, by up to 3 months. Moreover, also, in this case, zero errors for all tested values were annotated in the logbook starting from 2010. The technical expert appointed by the defense noted, once again, that an electronic instrument showing zero errors on all scales is a practically impossible event, thus doubting that all verifications performed starting from 2010 were incorrectly executed. Moreover, the technical expert remarked that the lab that executed all periodic verifications was not accredited according to the ISO 17025 Standard, and therefore traceability to the national measurement standards was not granted, thus invalidating the verification results and, consequently, preventing to grant validity to the measured values. In any case, measurement uncertainty was evaluated starting from the available data. The handbook of the employed breathalyzer was not available and the manufacturer refused to provide it, stating that the instrument was compliant with the specifications of the Italian law, that is, the already mentioned DM 196/1990. The accuracy data imposed by this law, that encompasses the accuracy data of OIML recommendation R126 [14], were considered in evaluating uncertainty. The first critical point to quantify, as in the other cases, is the effect of the instrument’s drift. In this case, differently from the other case, the instrument’s handbook is not available. The R126 recommendation [14], as well as the Italian DM 96/1990, requires that the drift in BrAC measurements does not exceed 0.02 mg/l in 2 months when BrAC values of 0.4 mg/l are measured. The last verification before BrAC testing the defendant occurred 3 months before the test. Considering a linear drift over time, a maximum drift of 0.03 mg/l can be estimated. However, the technical experts appointed by the defendant noted that this drift is referred to a BrAC value of 0.4 mg/l, while the BrAC value corresponding to the measured BAC value of 1.62 g/l is 0.704 mg/l that is almost twice the value for which the maximum drift of 0.02 mg/l is imposed.
18
Vicenza is a town in the northeast part of Italy, not far from Venice, an industrial and touristic part of Italy.
146
9 Breath and Blood Alcohol Concentration …
It is then plausible to assume, as done by the defense expert, that the drift may vary linearly with the measured concentration value. This yields an estimated drift d: d = 0.03 ·
0.704 = 0.0528 mg/l 0.4
(9.8)
that leads to a drift in the measured BAC value of 0.122 g/l. The measured BAC value should hence be reduced by this drift value, providing an actual BAC value of 1.498 g/l. It is then possible to evaluate the measurement uncertainty affecting this BAC value. Since the correction for the potential drift has been applied, the uncertainty contributions are • definitional uncertainty contribution u d (C) = 0.16 g/l, • instrumental uncertainty contribution u i (C) = 0.023 g/l that are quadratically combined to provide the measurement uncertainty u (C) = 0.17 g/l. The technical expert appointed by the defendant provided also the expanded uncertainty U (C) = 2 · u (C) = 0.34 g/l, stating that the actual BAC value could be found, with a 95% of probability, in interval 1.15l–1.83 g/l, according to the definition of expanded uncertainty given in Sect. 5.5.3 of Chap. 5. The probability distribution of the possible values of the defendant’s BAC was also evaluated, as shown in Fig. 9.8.
Probability distribution of the BAC values
2.5
2
1.5
1
0.5
0 0.8
1
1.2
1.4
1.6
1.8
2
2.2
BAC values [g/l]
Fig. 9.8 Probability distribution of the possible defendant’s BAC values obtained after correction for the instrument drift. The dashed line shows the corrected measured value of 1.498 g/l and the vertical solid bar shows the 1.5 g/l law limit
9.5 Practical Cases
147
The probability that the defendant’s BAC is below the law limit of 1.5 g/l was evaluated by applying (8.5) seen in Sect. 8.3 of Chap. 8 to be 50.5%. It is worth noting that the defendant’s lawyer pointed out that these conclusions were drawn under the assumption that the employed breathalyzer was working inside the specifications set by the law, but this assumption could not be taken for granted, since the lab that performed the periodic verifications could not ensure metrological traceability, not being an accredited laboratory.
9.5.3.1
The Verdict
All above considerations were submitted to the judge by the technical expert appointed by the defense during a dedicated hearing and were carefully considered by the judge. She rejected the defense lawyer claim to invalidate the BAC measured value because of the lack of laboratory accreditation. To this respect, the judge wrote, in the motivations of the verdict19 : As long as the defense objection about the lack of accreditation of Rome CSRPAD20 by Accredia21 is considered, it appears merely specious, considering that DM 196/1990, at art. 3, states “Breathalyzers shall undergo type approval that is granted by the Ministry of Transportation—General Direction of DMV and transportation under concession, upon request of the manufacturer or his delegate and upon positive outcome of verifications and tests performed by the Advanced Center for Research and Tests on Motor Vehicles (CSRPAD) in Rome”. In other words, the judge stated that, having been the verification lab designated by the Ministry, metrological traceability is somehow granted by the law, without any additional need to prove it by giving evidence that its own measuring equipment is constantly monitored to ensure traceability, as requested to any other calibration lab—for instance, to the labs that calibrate speed traps. On the other hand, the judge wrote22 The metrological logbook ... proves that, when the employed breathalyzer was used to test the defendant, it had undergone the yearly verification since three months, ... hence it was surely working. No certain and incontrovertible evidence, however, could be found about error absence in the test performed on the defendant’s breath with the breathalyzer employed by the police ... as stated by the defense technical expert.
19
The official text of the verdict was translated into English by the authors. This is the lab designated by the Italian Ministry of Transportation to execute periodic verifications on the breathalyzers. 21 Accredia is the Italian notified body in charge of the accreditation of the calibration labs. 22 The parts that might disclose the defendant’s identity and are not essential to understand the judge’s reasoning have been omitted. 20
148
9 Breath and Blood Alcohol Concentration …
This is quite an important point, because it is stated, probably for the first time in an Italian courtroom, that, while the result of a verification can be used to assess that an instrument is working, it cannot be used to asses that the measured value is not affected by errors. This point is better highlighted in the following paragraphs of the verdict: The analysis of the metrological logbook shows that the yearly verifications were performed (with the exception of those performed in 2011 and 2016) after the required 12-months period, while, as long as the errors detected by the authorized labs during the verifications are considered, it shall be noted that non-zero errors were detected only the first year, while they were always zero at the subsequent verifications, as noted by the defense technical expert. Therefore, in the case in point, the fact that the yearly verifications were not correctly executed according to the prescribed intervals and conditions, as well as the detected anomalous operations of the instrument employed by the police23 have pointed out that the BAC measured value of 1.62 g/l returned by the BrAC test has been affected by measurement uncertainty, so that the BAC value should be moved into a range in between 1.15 and 1.83 g/l, as clarified by the defense technical expert. This evidential framework hence leads, according to the favor rei principle, to reduce the event to a lighter offense, as for the “second tier” of criminal liability, hence less than 1.50 g/l, with clear consequences in terms of penalty and driving license. This is another important point in this verdict, because it has stated, probably again for the first time in an Italian courtroom, that measurement uncertainty shall be considered when basing a verdict on a measurement result. Once again the risk of a wrong decision that can be computed starting from the distribution of values that can be attributed to the measurand was not explicitly considered by the judge. However, since the considered interval of values, being obtained from the expanded uncertainty, represents a coverage interval with a 95% coverage probability, the judge has probably implicitly considered that the probability that the actual BAC was below the law limit was high enough to represent a reasonable doubt about the entity of the committed DUI offense. Consequently, the judge reduced the charges from those considered in paragraph c of art. 186 of the Italian traffic law to those considered in paragraph b of the same article.
23
For the sake of clarity, it is worth noting that the judge, here, is referring to the anomalous behavior (zero errors) reported in the logbook records of the periodical verification results, and not to a specific anomalous operation when the defendant was tested.
References
149
9.6 Conclusions BrAC tests used in DUI cases are probably the most challenged tests in criminal law, both in Common Law and Civil Law systems. In general, formal aspects, such as missing type approval of the breathalyzer, incomplete notification of their rights to the defendants, or irregular periodic verifications, are challenged by the defense, while the more substantial metrological issues are neglected. This chapter showed that these issues are quite relevant and may give origin to reasonable doubts about the fact that the actual BAC value is lower than the measured one and lower than the law limits. Therefore, the proposed metrological considerations can be used by both the defense, in highlighting whether a reasonable doubt exists, and the prosecutor, in putting in place due diligence to ensure that the measured BAC values are presented together with a properly evaluated measurement uncertainty, to avoid that uncertainty is unduly exploited to the sole aim of exonerating a culprit. The considered practical cases show that metrology is slowly being considered in DUI case solutions, and that the judges’ awareness of metrological issues is positively evolving, as the motivations of the reported verdicts (from 2016 to 2019) show.
References 1. Jones, A.W.: Measuring alcohol in blood and breath for forensic purposes—a historical review. Forensic Sci. Rev. 8, 13–44 (1996) 2. Jones, A.W.: Medicolegal alcohol determinations-Blood- or breath-alcohol concentration? Forensic Sci. Rev. 12, 23–47 (2000) 3. President’s Council of Advisors on Science and Technology (PCAST): Report to the President—Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature— Comparison Methods (2016). https://tinyurl.com/abv5mufh 4. Henry, W.: Experiments on the quantity of gases absorbed by water, at different temperatures, and under different pressures. Phil. Trans. R. Soc. Lond. 93, 29–42 (1803). https:// royalsocietypublishing.org/doi/10.1098/rstl.1803.0004 5. Jones, A.W., Andersson, L.: Comparison of ethanol concentrations in venous blood and endexpired breath during a controlled drinking study. Forensic Sci. Int. 132, 18–25 (2003) 6. Kriikku, P., Wilhelm, L., Jenckel, S., Rintatalo, J., Hurme, J., Kramer, J., Jones, A.W., Ojanperä, I.: Comparison of breath-alcohol screening test results with venous blood alcohol concentration in suspected drunken drivers. Forensic Sci. Int. 239, 57–61 (2014) 7. Jones, A.W., Andersson, L.: Variability of the blood/breath alcohol ratio in drinking drivers. J. Forensic Sci. 41, 916–921 (1996) 8. Gainsford, A.R., Dinusha, M.F., Rodney, A.L., Stowell, A.R.: A large-scale study of the relationship between blood and breath alcohol concentrations in New Zealand drinking drivers. J. Forensic Sci. 51, 173–178 (2006) 9. Stowell, A.R., Gainsford, A.R., Gullberg, R.G.: New Zealand’s breath and blood alcohol testing programs: further data analysis and forensic implications. Forensic Sci. Int. 178, 83–92 (2008) 10. BIPM JCGM 100:2008: Evaluation of measurement data—Guide to the expression of uncertainty in measurement (GUM), 1st edn. (2008). http://www.bipm.org/utils/common/ documents/jcgm/JCGM_100_2008_E.pdf
150
9 Breath and Blood Alcohol Concentration …
11. Hlastala, M.P., Wayne, J.E.L., Nesci, J.: The slope detector does not always detect the presence of mouth alcohol. The Champion 3, 57–61 (2006) 12. Anderson, J., Babb, A., Hlastala, M.P.: Modeling soluble gas exchange in the airways and alveoli. Ann. Biomed. Eng. 31, 1402–1422 (2003) 13. Kechagias, S., Jonsson, K., Franzen, T., Andersson, L., Jones, A.W.: Reliability of breathalcohol analysis in individuals with gastroesophageal reflux. J. Forensic Sci. 44, 814–818 (1999) 14. OIML R126: Evidential breath analyzers. Edition 2012. https://tinyurl.com/ymztwys3 15. Gullberg, R.G.: Estimating the measurement uncertainty in forensic blood alcohol analysis. J. Anal. Toxicol. 36, 153–161 (2012) 16. Vosk, T., Emery, A.F.: Forensic Metrology—Scientific Measurement and Inference for Lawyers, Judges, and Criminalsts. CRC Press, Boca Raton, FL (2015) 17. Vosk, T.: DWI. The Champion 48–56 (2010)
Chapter 10
Forensic DNA Profiling
10.1 Genetics and DNA Profiling The origin of DNA fingerprinting, or more correctly DNA typing or DNA profiling is generally attributed [1] to the work of Alec Jeffreys, an English geneticist who, in 1985, found and isolated regions of DNA containing DNA sequences that repeated themselves many times throughout the DNA (and are therefore relatively easy to isolate) and that are characterized by high polymorphism, that is they appeared to differ from individual to individual. While Jeffreys’ work has opened the way to modern DNA profiling and its use in forensic science, genetics originated over 150 years ago thanks to the intuition of Gregor Mendel [2] and received the first experimental validation when Karl Landsteiner discovered the human ABO blood variants [3]. It is interesting to note that Landsteiner was the first to use the term polymorphism to refer to some biological feature that types different individuals. Mendel’s intuition came from the observation that characteristics of the studied population (such as color or form of traits) showing discontinuous variations group individuals into discrete classes, called phenotypes, and are associated with a pair of genetic information units, called alleles. According to Mendel’s theory, each individual (genotype) in the class can transmit only one allele to each offspring at a time, and each allele has an equal probability (50%) to be transmitted. Moreover, it is assumed that for each of the observable characteristics, called Mendelian characteristics, each pair of alleles can be located in a definite part of the DNA, called locus,1 and the transmission of the alleles located in different loci occurs in an independent way.2
1
The plural for locus is loci. Despite it has been later understood that the information is not independently transmitted for all loci, this assumption does not invalidate the derivations made in this chapter and is therefore retained.
2
© Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0_10
151
152
10 Forensic DNA Profiling
Table 10.1 Probability of allele transmission. The second row in each cell gives the probability of each possible combination of alleles, given the 0.5 probability of transmission of each parental allele Paternal alleles A a 0.5 0.5 Maternal alleles
A 0.5 a 0.5
AA 0.25 aA 0.25
Aa 0.25 aa 0.25
Table 10.2 Probability of allele transmission in a population. The second row in each cell gives the probability of each possible combination of alleles, given the p and q probability of transmission of each parental allele Paternal alleles A a p q Maternal alleles
A p a q
AA p2 aA q·p
Aa p·q aa q2
As a consequence of Mendel’s assumptions, each individual in the population inherits one allele paternally, and the other one maternally. If the inherited alleles are equal, the individual is a homozygote; on the other hand, if the inherited alleles are different, the individual is a heterozygote. Since transmission of one allele or the other cannot be predicted in a deterministic way, it is only possible to predict the probability with which alleles can be inherited. Table 10.1 shows how two generic alleles A and a of a given locus can be inherited by the parents. It can be immediately recognized that homozygote alleles have only a 0.25 (or 25%) probability of being inherited, while heterozygote alleles have a 0.5 (or 50%) probability of being inherited.3 This result can be generalized to an entire population, under the assumption of infinite population size, random mating, absence of mutation, selection, and migration. Assuming that allele A has a p probability of being present in the population, and allele a has a q probability, the probability with which they can be inherited (Hardy-Weinberg principle) is shown in Table 10.2 [2]. This principle can be applied to the alleles of each locus in the DNA profile of each single individual and shows that genetics can only provide an estimate of the probability that a given allele is present in an individual. This means that it is not
3
This is because combinations Aa and aA are the same.
10.1 Genetics and DNA Profiling
153
possible to affirm or exclude the presence of the same allele in another individual, but it is only possible to estimate the probability that another individual in a given population shows the same allele in the same locus. When more loci are considered, the probability that more individuals in the same population share the same set of alleles in the same loci decreases4 significantly, though it does not reduce to zero. The low probability that two individuals can share the same DNA profile is the background on which forensic DNA profiling is based, although confusing this low probability with certainty is also the source of its incorrect and non-scientific use in courtrooms and the origin of many miscarriages of justice. The first problem in using forensic DNA profiling is that genetics is based on probability built on empirical observations, and any conclusion about the fact that a given feature is typical of a single member of an observational class and is not shared among the other members of the same class can be interpreted only as related to that class and a non-zero probability that other individuals, not belonging to that class, share the same feature shall be taken into account. This clashes against the theory of discernible uniqueness [4, 5], which assumed uniqueness of the examined patterns in traditional forensic sciences: it was believed, for instance, that every finger shows a unique set of friction ridges, every gun barrel leaves a unique pattern of marks on the bullets it fired, and bitemarks originated by a given set of teeth show a unique pattern. The same assumption was originally considered for DNA profiles, thus believing that the DNA mark left by an individual is unique and, therefore, capable of identifying the individual who left it with full certainty. The theory of discernible uniqueness has evolved [5] and it has been more and more accepted that claiming uniqueness of patterns is not based on any valid scientific assumption, neither theoretical or empirical, as clearly stated by the 2009 report by the United States National Academy of Sciences (NAS report) [6]: Much forensic evidence—including, for example, bitemarks and firearm and tool mark identifications—is introduced in criminal trials without any meaningful scientific validation, determination of error rates, or reliability testing to explain the limits of the discipline.
It is therefore utterly important to fully understand the probabilistic implications of genetics in assessing the foundational validity of forensic DNA profiling, according to the recommendation of the PCAST report [7]. However, it is equally important to consider that a DNA profile is obtained at the end of a complex measurement process, whose uncertainty, if not properly evaluated and considered, is responsible, together with the improper use of the theory of discernible uniqueness, of the error rates mentioned by the NAS report [6].
4
This can be mathematically proved by the fact that the probability that an individual has a given set of alleles is obtained by multiplying the probabilities, given in Table 10.2, of having each single allele. Therefore, the more alleles are considered, the lower is the resulting probability and, consequently, the probability that more individuals have the same set of alleles.
154
10 Forensic DNA Profiling
This last point, which implies the assessment of the validity as applied, according to the PCAST report [7], has received less attention in the literature and, consequently, in courtrooms, though it has been proved to be the responsible for miscarriages of justice, one of which will be discussed in the next Sect. 10.6. To justify the little attention paid to measurement uncertainty in DNA profile determination it is often claimed that its determination does not imply the determination of a quantity value, since the DNA profile can be considered as a nominal property: an allele is either present or absent in a profile. Hence, measurement uncertainty determination is not required. Indeed, the VIM defines, at art. 1.30, nominal properties as [8]: property of a phenomenon, body, or substance, where the property has no magnitude
It gives also some examples of nominal properties, such as the color of a paint sample or the sequence of amino acids in a polypeptide. In note 1 of art. 2.1, where measurement is defined, the VIM5 states [8]: Measurement does not apply to nominal properties.
This is justified by the fact that phenomena, bodies or substances can only be classified according to their nominal properties: for instance, colored objects can be divided into classes (red, yellow, green, blue, …) according to their colors. The only error that can be made is a classification error, that is an object can be assigned to a different class than the one it belongs to, and the correctness of classification can be expressed in terms of error rate, that is the ratio between the objects assigned to a wrong class and the total number of considered objects. However, there are cases where classification may require the determination of a quantity value through a measurement procedure. Let us suppose that an object has to be classified according to a specific shade of red. It is well known that a color manifests itself as a specific electromagnetic radiation and, therefore, the frequency of that radiation has to be measured to assess whether it corresponds to that of the required shade. The measured frequency value, being a quantity value, is affected by measurement uncertainty, which originates a doubt about the correctness of the decision on whether the object shall be assigned to a specific class, as shown in Chap. 8. Something similar occurs when assessing whether a specific allele is present in a DNA profile. This can be done only at the end of a complex measurement process [9] involving quantitative measurements, as it will be seen in next Sect. 10.5. Moreover, when an obtained DNA profile is used in an identification process, it requires the comparison of the obtained profile with the reference one, usually the crime scene sample. Similarly to the color assignment to a specific class seen above, measurement uncertainty plays a key role in the comparison process, as proved in Chap. 8, and shall be properly evaluated and considered. 5
For the sake of completeness, it is worth noting the a 4th edition of the VIM is under preparation and it will consider also measurement of nominal properties.
10.2 The Scientific Basis of DNA Profiling
155
The following sections of this chapter cover all above mentioned points, aiming at showing the importance of considering all of them in the identification process, since disregarding any of them might result in a dramatically wrong decision. In particular, Sect. 10.5.3 discusses the possible changes in the present scenario introduced by the incoming 4th edition of the VIM.
10.2 The Scientific Basis of DNA Profiling DNA stands for deoxyribonucleic acid and is a molecule composed of two polynucleotide chains, or strands, arranged around each other in the well-known double helix form. This molecule carries genetic instructions for the growth and reproduction of all known organisms. Each polynucleotide is composed of monomeric units—the nucleotides—and each of them is composed of three parts [1]: a nucleobase, a sugar, and a phosphate. There are only four different nucleobases, each of them identified by the initial letter of its name: • • • •
Cytosine, C Guanine, G Adenine, A Thymine, T
These nucleobases are always bonded together as base pairs: adenine and thymine are always bonded as a pair, and cytosine and guanine are also always bonded as another pair. The sequence of these four nucleobases along the two DNA strands encodes genetic information. Human beings have approximately three billion nucleotide positions in their DNA [1]. By considering that each position has four different possible nucleobases (A, C, G, or T), it can be easily understood that the number of possible combinations with which the nucleotides are chained along the DNA strands is so high that it can be considered virtually infinite. The turning point, as long as forensic DNA profiling is considered, came in 1985 when Alec Jeffreys identified regions of DNA that contained sequences of nucleotides that repeated over and over next to each other [1]. Those regions are named Variable Number of Tandem Repeats, or VNTR, and show a high degree of polymorphism, that is the number of repeated sections in a DNA sample could differ from individual to individual. Jeffreys discovered also a way, called restriction fragment length polymorphism (RFLP), to isolate and examine the repeated sections. The major drawback of this technique is that it requires a relatively high quantity of DNA to provide reliable results, and such a high quantity is not always available, especially when the biological traces found on the crime scene are quite small. A few years later, better attention started to be paid to shorter repetitive sequences, called microsatellites or, more frequently, Short Tandem Repeats, or STR, whose length ranges from 2 to 10 base pairs and repeat from 2 to 50 times, typically [10].
156
10 Forensic DNA Profiling
The main reasons for considering STRs were the higher mutation rate with respect to other DNA areas and the possibility to amplify them by means of a Polymerase Chain Reaction, or PCR amplification, thus allowing analysis of much smaller (by several orders of magnitude) biological samples than those allowed by the initial RFLP technique. Both features led to consider STR analysis the perfect tool in forensic typing, since the high mutation rate shown by STRs from generation to generation is in favor of a high degree of polymorphism, and PCR amplification makes it possible to analyze even a few picograms of genetic material. Moreover, also the way to isolate and examine the repeated sections have evolved, and it is now possible to identify the presence of alleles in the different considered loci by means of capillary or gel electrophoresis [1]. To move from forensic typing to forensic identification, however, probability shall be considered once again. As shown in previous Sect. 10.1, it is only possible to estimate the probability with which a specific allele is present in a given locus in an observed population. Therefore, reliable identification is possible only if a sufficient number of loci is considered so that the probability of finding the same pattern in another individual of the same population is low enough to yield a reliable identification. The problem to solve, hence, is to establish a minimum number of loci on which identification can be reliably based. Historically, the number of loci to consider was conditioned by the availability of analysis kits capable of examine multiple STR loci, as well as by the availability of DNA databanks, that is official repositories of DNA profiles of individuals of an observed population, so that a statistical analysis could be performed and the random match probability6 could be estimated. The first national DNA database was made available by the United Kingdom’s Forensic Science Service in 1995 and used six STR loci. Several identification errors gave evidence that six loci were insufficient to ensure reliable identification. Probably, the clearest example of identification error is the one that led to identify a person living in Liverpool, UK, as the perpetrator of the murder of a young lady in Italy, in 2003. The murderer was injured and left abundant blood traces, from which his DNA could be analyzed. The obtained profile matched with that of a bartender of a village north of Liverpool, who was prosecuted for murder. Luckily for him, several reliable witnesses testified that he never left the village at murder time, so he was exonerated. It was clear that the match between the perpetrator’s DNA profile and that of the suspect was either a random match, or, more likely, a measurement error [11]. In 1997, the FBI laboratory selected 13 loci to be included in the Combined DNA Index System, or CODIS, the US DNA databank for forensic purpose [12]. The 13 selected loci were: • CSF1PO • D3S1358 6
The random match probability is the probability that an individual randomly picked in an observed population shares the DNA profile obtained from the available evidence.
10.3 Some Useful Concepts in Probability
• • • • • • • • • • •
157
D5S818 D7S820 D8S1179 D13S317 D16S539 D18S51 D21S11 FGA TH01 TPOX vWA
In 2017, given the availability of reliable kits to isolate more than 13 loci, 7 additional loci have been added to the CODIS, that now considers 20 loci [13]: • • • • • • •
D1S1656 D2S441 D2S1338 D10S1248 D12S391 D19S433 D22S1045
Nowadays, also thanks to an improved electrophoretic resolution offered by modern capillary-based instruments [14], it is possible to analyze more than 20 loci, so that the estimated random match probability is as low as 10−9 or even 10−10 ,7 excluding homozygote twins where the probability of sharing the same DNA profile increases dramatically.
10.3 Some Useful Concepts in Probability The previous section has shown that DNA profiling, together with a reliable DNA databank, can be used in forensic identification. Its validity is generally ensured by a low random match probability. The question is whether the random match probability is a reliable tool to assess the foundational validity of DNA profiling, or it has to be “weighed” by other probabilistic tools, such as conditional probability, odds, and maximum likelihood.
7
This means that there is one chance of randomly picking the same DNA profile among 1 billion, or even 10 billion individuals.
158
10 Forensic DNA Profiling
10.3.1 Conditional Probability and Bayes’ Theorem Chapter 5 introduced the basic concepts of probability that are considered in metrology. The most basic concept is that of a random variable (X ), used to describe a purely stochastic effect and mathematically represented by a probability density function ( p (x)). Such a mathematical model implies that the considered stochastic effect is not affected by any other effect. While this is a very useful assumption to derive several interesting properties and theorems in probability, it does not always represent real stochastic effects, that are often influenced by other effects. To understand this point, let us consider a basic problem in metrology. Let us suppose that quantity x has to be measured and the employed instrument returns value y. If measurement uncertainty is correctly evaluated according to the guidelines given in Chap. 5, the instrument has been calibrated, and the calibration results have been correctly employed to correct the instrument’s reading, according to the guidelines given in Chap. 6, it is possible to evaluate the probability distribution p (y) of the instrument’s readings that can reasonably be attributed to measurand x. It can be intuitively understood that this distribution does quite likely change if the measurand value changes. Therefore, the obtained probability distribution is affected by the value taken by x, and it is mathematically represented by the socalled conditional probability, that is the probability of event y when event x occurs, or the probability of y given x. The mathematical notation used to express such a conditional probability is: p (y|x). This opens an important epistemological problem: are we interested in knowing how the employed instrument responds to the measurand x, or are we interested in the knowing measurand x or, more correctly, the distribution of values that can reasonably be attributed to x when that instrument is employed to measure it under specific measurement conditions? Of course, unless we are the instrument’s manufacturer, we are not interested in the instrument’s response, but in knowing the values that can reasonably be attributed to the measurand. In mathematical terms, this means that we are interested in p (x|y), that is the probability distribution assigned to x, given the measured value y. Of course, in general, it is p (x|y) = p (y|x) Luckily, it is possible to refer to an important theorem in probability, Bayes’ theorem, that states p (x|y) · p (y) = p (y|x) · p (x)
(10.1)
where p (x) and p (y) are called the marginal probabilities of x and y, respectively. It can be proved that: p (y|x) · p (x) · d x (10.2) p (y) = X
10.3 Some Useful Concepts in Probability
159
so that (10.1) and (10.2) yield p (x|y) = X
p (y|x) · p (x) p (y|x) · p (x) · d x
(10.3)
Despite the apparent complexity of (10.3), it can be readily perceived that the desired conditional probability p (x|y) can be evaluated if the distribution p (y|x) of values returned by the employed instrument is known and if p (x) is also known. Moreover, p (y), that is the denominator of (10.3), represents the probability that the employed instrument returns a measured value: if the instrument is working, it is, of course, p (y) = 1, so that (10.3) is further simplified. The key factor in evaluating the desired p (x|y) is, hence, p (x), which represents the a priori knowledge of measurand x. While it might seem odd to have some a priori knowledge of the measurand, given that the measurement process is supposed to provide information on it, it can be readily perceived that the cases in which no information at all is available on the measurand are almost non-existent. In general, at least the possible variation range of the measurand is known so that a uniform probability distribution over that range can be assumed as p (x). If the measurand is the output part of a production process, information is generally available on the dispersion of the produced parts, so that they can be usefully exploited to derive p (x). Of course, if no a priori knowledge is available, p x = 1 shall be assumed,8 and p x|y = p y|x : in other words, the only available knowledge on the measurand is the one coming from the measurement process. Bayes’ theorem is, therefore, extremely important because it proves that any available a priori knowledge of the measurand can be usefully exploited to refine the measurement result, since it provides the distribution of values p x|y that can actually be assigned to the measurand, and not simply the distribution of measured values that the instrument is supposed to provide when it measures that specific measurand. Obviously, the validity of the obtained p x|y distribution is directly dependent on the validity of the a priori knowledge p x . In most applications, p x is derived from previous observations and can be considered at least as reliable as the measured values. Sometimes it comes from the experience of the operator and, in these cases, particular attention shall be paid to avoid incorrect evaluations due to possible cognitive bias [15].
10.3.2 Probability in Forensic Testimony There is little doubt that one of the milestones on how forensic testimony should enter courtrooms is the famous Daubert case [16] and, in particular, Judge Blackmun state8
This means that the only available a priori information is that the measurand exists and takes a value over the universal set.
160
10 Forensic DNA Profiling
ment: “it would be unreasonable to conclude that the subject of scientific testimony must be ‘known’ to a certainty; arguably, there are no certainties in science”. As seen in Chap. 3, these words allowed probability into the courtrooms and allowed it to slowly replace full certainty in experts’ reports. However, the probability is a rather complex theory and even if most people can understand the difference between an 80 and a 20% probability, the way these values are computed starting from the available evidence is often obscure to non-experts—and sometimes to experts too!—so that it might lead to wrong conclusions, as clearly shown in [17]. The random match probability, if considered alone, is a clear example of incorrect use of probability in assessing whether an identification obtained through a featurecomparison method is correct, regardless of the considered feature (fingerprints, DNA, bitemarks, marks on a bullet, …). Indeed, a PRM random match probability gives the probability that two individuals (or items) in the same population share the same profile. Therefore, a low value for PRM gives high confidence that the individual (or item) with the same profile as the one obtained from the available evidence is the one that originated the evidence, so that the random match probability represents a good candidate to quantify the foundational validity of the employed method. On the other hand, a PFP probability always exists that an error occurred in the profile extraction process so that it returns a false positive, that is an individual (or item) is incorrectly identified as the source of the available evidence. This probability, being related to the application stage of the identification method, can be exploited to quantify the validity as applied of the method. The critical point is how to consider both PRM and PFP . Shall they be combined? Shall more confidence be given to one of them, so that, for instance, the higher or the lower one is selected to assign confidence to the employed method? No matter on the way they are processed, it is quite likely that, in doing so, some a priori knowledge is considered. Therefore, the concepts explained in the previous Sect. 10.3.1 can be quite useful in making the whole process transparent and scientifically sound. From a mathematical perspective, the identification problem can be formulated as follows. • An item E of the available evidence, being it a biological sample, a fingerprint, a pattern of marks, …, is compared with the profile of a suspect S (and individual, the barrel of a gun, …). • The comparison, if correctly performed, shall provide a probability that the evidence has been originated by the suspect. This probability is mathematically represented by the conditional probability of E, given origin S: P (E|S). However, since a false positive is possible, also the probability that the evidence was not originated by the suspect, but by somebody else S¯ shall be given: P E| S¯ . Since these are two different conditional probabilities, it is, in general, P (E|S) = 1 − P E| S¯ , so the common mistake of assigning P E| S¯ directly from P (E|S) shall be avoided. • Indeed, the trier of fact is not interested in the probability that the evidence was generated by the suspect, or somebody else, but, instead, in the probability that the suspect (or somebody else) generated the available evidence. In mathematical terms, these are again conditional probabilities, but, in this case, they are expressed
10.3 Some Useful Concepts in Probability
161
¯ as: P (S|E) or P S|E . This is a similar problem as the one seen in the previous Sect. 10.3.1: the focus is not on the value measured by the instrument—in this case, the obtained match—but, instead, on the measurand – in this case who has originated the available evidence. • Bayes’ theorem can be applied here too, provided that some a priori knowledge P (S), as well as P S¯ = 1 − P (S), is available on whether the suspect (or somebody else) originated the evidence. This a priori knowledge should come from any available information other than the match result and, to be reliable, shall be unbiased. • The application of Bayes’ theorem yields P (S) · P (E|S) P (E)
(10.4)
P S¯ · P E| S¯ ¯ P S|E = P (E)
(10.5)
P (S|E) = and
where P (E) is the marginal probability of E: since evidence E exists, it can be assumed P (E) = 1.9 An interesting outcome of (10.4) and (10.5) is obtained by dividing the two equations: P (S|E) P (S) P (E|S) = · ¯ P S|E P S¯ P E| S¯
(10.6)
The three ratios in (10.6) have a well-defined meaning. In particular: is a likelihood ratio. It quantifies the belief of the operator who • Term: PP(E|S) ( E| S¯ ) performed the comparison of the evidence sample E with the mark left by the suspect in the relative probability that the evidence was originated by the suspect and not be somebody else. • Term: PP(S) represents the prior odds. Odds, in probability, represents the ratio ( S¯ ) between the probability of an event ( p) and the probability of the negation of the event (1 − p). In this case it is a prior odds because it is based on the a priori knowledge about the source of the available evidence. • Term: PP(S|E) ¯ ) represents the posterior odds, after the prior odds has been refined ( S|E by the result of the comparison, according to Bayes’ theorem. In other words, it is the refined relative probability that the evidence was originated by the suspect, instead of somebody else. The value assigned to P (E) does not influence the following derivations and, in this respect, can be arbitrarily assumed.
9
162
10 Forensic DNA Profiling
To fully understand the weight that each term in (10.6) may have on the final probability that the suspect has originated the evidence, let us consider some numerical examples. Let us suppose that, before receiving the result of the comparison, P (S) = 20%, that is the probability that the suspect is the source of the available evidence is only the 20%. The prior odds is, therefore, only 0.25, so it is clearly in favor of exonerating the suspect. Let us now suppose that the result of the comparison is provided with a likelihood ratio of 1000, thus assuming that the result of the comparison is 1000 times more in favor of being the suspect the source of the evidence rather than somebody else. Equation (10.6) returns, for the posterior odds, a value of 250. The probability of the 250 = 0.996, that is the prior suspect being the source of the evidence is hence: 1+250 20% probability has become a 99.6% probability. Now the odds are fully against the suspect. If a likelihood ratio of 100 is considered, the posterior odds becomes 25 and the posterior probability of the suspect being the source of the evidence becomes 96.2%, still strongly against the suspect. If a likelihood ratio of 10 is considered, the posterior odds becomes 2.5 and the posterior probability of the suspect being the source of the evidence becomes 71.4%, still significantly less favorable to the suspect than the original 20% probability. The above examples clearly show the high impact of the likelihood ratio of the comparison result on the posterior odds. On one side this seems obvious, since the comparison is expected to provide a significant piece of evidence either in favor or against the suspect. On the other side, such a high impact requires an extremely careful evaluation of the reliability of the comparison result. In order to ensure that Bayes’ theorem provides correct and reliable results, all input data must be evaluated taking into account all quantities and effects that may affect the values assigned to the input data. As mentioned at the beginning of this section, assessing the validity of a featurecomparison method by means of the sole random match probability might yield incorrect results, since it neglects the probability of a false positive, that may always occur. Equation (10.6) may lead to correct results, if the probability of a false positive is considered in evaluating the likelihood ratio. An elegant way to consider the probability of a false positive in the evaluation of the likelihood ratio is given in [18], in the simplified case in which there is a unique source of evidence.10 Without entering the mathematical details of this derivation, that can be found in [18], assuming that PRM (E) is the random match probability of the employed comparison method and PFP (E) is the probability of receiving a false positive result, the likelihood ratio can be rewritten as P (E|S) 1 = PRM (E) + [PFP (E) · (1 − PRM (E))] P E| S¯
10
(10.7)
This is the case, for instance, of a DNA sample originated by a single individual or fingerprints that do not show a superposition of friction ridges from different fingers.
10.3 Some Useful Concepts in Probability
163
10 9
10
8
10
7
10
6
10
5
Random match probability=10
-9
Random match probability=10 -6 Random match probability=10 -4
Likelihood ratio
Random match probability=10
-3
10 4
10 3
10 2
1
10 -9 10
10
-8
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
False positive probability
Fig. 10.1 Likelihood ratio values obtained under different values for the random match probability and the probability of a false positive
Figure 10.1 shows the likelihood ratio values provided by (10.7) for different PRM (E) and PFP (E) values in the range 10−9 —10−2 . It can be readily perceived that the likelihood ratio starts decreasing when the probability of a false positive is higher than the random match probability and tends asymptotically to the inverse of PFP (E) when PFP (E) PRM (E). Figures 10.2 and 10.3 show the posterior odds provided by (10.6) and the related probability P (S|E), respectively, when the likelihood ratio is provided by (10.7) for the same different values of PRM (E) and PFP (E) as those considered in Fig. 10.1 = 0.01 that is quite favorable and under the assumption that the prior odds is PP(S) ( S¯ ) to the suspect. It can be readily checked that the posterior odds follow the same trend as the likelihood ratio, they keep the same value as the one obtained when PFP (E) is negligible with respect of PRM (E) and start decreasing when PFP (E) becomes lower than PRM (E). Moreover, the posterior probability P (S|E) shown in Fig. 10.3 that the suspect is the source of the available evidence starts decreasing from values higher than 90% only when PFP (E) becomes significantly lower than PRM (E), but does not reduce below the 50%. Taking into account that, before considering the comparison result, the assumed probability that the suspect was the source of the available evidence was only P (S) = 0.99%, the influence of the comparison results in increasing this probability is definitely high. It is worth noting that the considered values for the random match probability and the probability of a false positive are quite low, in particular
164
10 Forensic DNA Profiling 10 7
10 6
Random match probability=10 -9 Random match probability=10 -6
Posterior odds
Random match probability=10
10
5
10
4
10
3
-4
Random match probability=10 -3
10 2
10
1
10
0
10
-1
10
-9
10
-8
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
False positive probability
Fig. 10.2 Posterior odds values obtained under different values for the random match probability = 0.01 and the probability of a false positive and under the assumption that PP(S) ( S¯ ) 1
Posterior probability
0.9
0.8
0.7
0.6 Random match probability=10
-9
Random match probability=10 -6 Random match probability=10 -4
0.5
Random match probability=10 -3
0.4 10
-9
10
-8
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
False positive probability
Fig. 10.3 Posterior probability values that the suspect is the source of the evidence, obtained under different values for the random match probability and the probability of a false positive and under the assumption that PP(S) = 0.01 ( S¯ )
10.4 Foundational Validity of DNA Profiling
165
those related to the probability of a false positive that is mainly related to the measurement uncertainty of the comparison process. Assessing those values is then of utmost importance in assessing the validity of the DNA profiling.
10.4 Foundational Validity of DNA Profiling The foundational validity of DNA profiling is extensively debated in the PCAST report [7]. As already discussed in Sect. 5.2 in Chap. 5, the foundational validity of a measurement method is quantified, in metrology, by the definitional uncertainty contribution. According to the measurement model exposed in Chap. 4, definitional uncertainty is related to measurand identification and modeling, and is not affected by the method implemented in practice to obtain a quantitative evaluation of the measurand. The way the implemented measurement method affects the validity of the obtained measurement result is quantified, in metrology, by the instrumental uncertainty contribution, that, according to the discussion in Sect. 5.2 in Chap. 5, is the metrological counterpart of the validity as applied defined by the PCAST report [7]. Therefore, according to the definition of foundational validity given by the PCAST report [7] and that of definitional uncertainty given by the VIM [8], this section covers the part related to the property of the DNA profiling of being reliable in principle and appropriate to the intended scope [7]. The role played by the procedure employed to extract the DNA from the sample evidence, amplify it (if needed) and type the obtained STR loci will be considered in the next Sect. 10.5, while discussing the validity as applied. When considering the foundational validity of DNA profiling methods, the PCAST report [7] refers mainly to the random match probability as a reliable way to assess it, and makes an important and critical distinction among the different nature of the sample evidence: • DNA samples composed of a single-source, • DNA samples composed of simple mixtures, • DNA samples composed of complex mixtures. This distinction is considered here too, because it may have a critical impact in evaluating the random match probability.
10.4.1 Single-Source DNA Samples A DNA sample is a single-source sample when there is the high degree of certainty, based on the available evidence, that it has been left by a single individual. This is, for instance, the case when the victim of a crime fought against the assailant and
166
10 Forensic DNA Profiling
injured him or her, so that several blood stains that do not belong to the victim could be found on the crime scene. In this case, if one or more suspects are identified, the DNA profile found on the crime scene can be compared with the samples taken from the suspects. In such a case, probability P E| S¯ that the DNA sample E was left by somebody else than the considered suspect S is given by the random match probability, by the very definition of random match probability. On the other hand, probability P (E|S) that the DNA sample was left by the considered suspect S can be evaluated only after having evaluated and considered also the measurement uncertainty affecting the determination of the two profiles. Having evaluated both probabilities, all considerations reported in Sect. 10.3.2 apply and the posterior odds can be evaluated according to (10.6). Hence, the evaluation of the random match probability is a critical step. According to the PCAST report [7], the frequencies of the individual alleles were obtained by the FBI based on DNA profiles from approximately 200 unrelated individuals from each of the six population groups. The frequency of an overall pattern of alleles—that is, the random match probability—can be then obtained by applying the Hardy-Weinberg principle recalled in Sect. 10.1 and synthetically illustrated in Table 10.2. The computed resulting probability, when the 13 loci originally considered by the CODIS are taken into account in the computation, is typically in between 10−10 and 10−9 , depending on the assumptions made about the population substructure. Considering that more than 13 loci are now analyzed and hence the random match probability is surely lower, and considering the total number of individuals in the actual population, it could be concluded that the probability that two individuals share the same DNA profile is so low that this occurrence is impossible. However, another important point should be considered: the computed random match probability does not consider close relatives. It is known that homozygote twins share the same DNA profile and that, for first-degree relatives, the random match probability may be on the order of 10−5 —that is at least four orders of magnitude less than the probability computed without considering close relatives—when examining the 13 CODIS loci. This is usually considered a minor issue, since the existence of homozygote twins or close relatives of the suspect is generally known and, consequently, it is possible to investigate whether they might have been present on the crime scene. However, special cases may exist involving suspects coming from broken homes in which the existence of twins or close relatives might be ignored. In such cases, the highest random match probability should be considered (1 if the existence of homozygote twins cannot be excluded, and 10−5 if the existence of close relatives cannot be excluded) and weighed by the probability that a twin or a close relative exists, though his or her existence is ignored. Recent studies, made possible by the analysis of the data collected by the growing number of companies that offer genetic tests and family tree search, show that this probability is not negligible [19] and is estimated in one over a few hundreds chance to have an unknown relative also in families with uneventful existence. To our knowledge, this fact has not yet been considered in DNA forensics, probably because the
10.4 Foundational Validity of DNA Profiling
167
available data are not yet scientifically validated, but should be carefully considered to avoid over-estimating the foundational validity of DNA profiling methods.
10.4.1.1
Cold Hits
All above considerations have been derived under the assumption that the DNA profiles of both the sample evidence and the suspect are typed with no errors, and there is at least one known suspect. The first assumption refers to a very ideal case, and the consequences of removing it will be discussed in next Sect. 10.5. The second assumption is likely to occur in many cases, but there are also several cases where there is no known suspect: this means that a DNA profile can be extracted from the sample evidence collected on the crime scene, but there is no known DNA profile with which it can be compared. Under such situations, the DNA profile is often compared with the profiles that most countries have started to collect into databanks (such as the CODIS in the USA). There are also cases, as the Italian Yara Gambirasio case [20], for which the databank search does not provide any match and the investigators launch a massive screening program involving several thousands of people living in the area where the crime was perpetrated, in the hope to find a match. This kind of investigation, often referred to as the cold hit technique, shows, from the statistical point of view, a significant difference with respect to the case of the DNA of a known suspect compared with the one extracted from the sample evidence. In this latter case, the random match probability represents the probability that the DNA profile of a known individual matches another profile chosen at random in the population. In other words, it represents the probability that somebody else than the suspect is the origin of the DNA profile found on the crime scene. When a search is run in a databank, the situation is quite different, as clearly explained in [21]: in this case, the probability of finding a matching pair in the population shall be considered, instead of the probability of finding a match between any particular pair. This problem is similar to the well known birthday problem in statistics [17]: while there is 1 chance in 365 that any particular pair of individuals share the same birthday, the probability of finding a pair of individuals in a given sample population who share the same birthday is higher (and increases as the sample size increases), because each individual in the population has a chance to match every other person in the same group. It can be proven [22, 23] that, if the random match probability PRM (E) represents the probability with which a particular profile occurs in a population of N unrelated individuals, the probability of finding at least one additional individual in the same population is given by PRM N (E) = 1 − (1 − PRM (E)) N
(10.8)
Figure 10.4 gives an idea on how much the match probability increases as the population size increases. It can be readily checked that, for a random match proba-
168
10 Forensic DNA Profiling
Probability of finding two matching profiles
10
-2
Random match probability=10 -10 Random match probability=10
-9
Random match probability=10
-8
10 -3
10
-4
10 -5
10
-6
10 -7 3 10
10
4
10
5
10
6
Population size
Fig. 10.4 Probability that some pair will match, despite the low values of the random match probability, obtained under different values for the random match probability and the size of the considered sample population
bility PRM (E) = 10−9 , the probability of finding at least two individuals who have the same profile climbs from 10−6 for a population of 1000 individuals to 10−3 for a population of 1 million individuals. Therefore, when cold hits are used to identify the potential origin of the sample evidence, the simple random match probability cannot be considered in evaluating the foundational validity of the DNA profiling methods. According to the size of the databank used for the search, the probability provided by (10.8) should be used instead in the evaluation of the likelihood ratio and the posterior odds (10.6). Failing to correctly consider the match probability may cause wrong decisions, as highlighted in [21].
10.4.2 Simple-Mixture DNA Samples According to the PCAST report [7], a simple-mixture DNA sample is originated by the DNA profiles of two contributors, one of which is known with certainty. This is generally the case of sexual assaults, where the known contributor is the victim and the unknown contributor is the perpetrator. Another typical case is that of a fighting between the victim and his or her aggressor, when both are injured and bleeding, and their blood mix up, thus originating a simple-mixture DNA sample.
10.4 Foundational Validity of DNA Profiling
169
In the presence of simple-mixture DNA samples, having the DNA profile of one contributor readily available, it is possible to isolate it. When the unknown DNA profile can be well differentiated by the known one, as in the case of sexual assaults, when the known profile comes from vaginal epithelial cells and the unknown one from sperm cells, the foundational validity of the method can be assessed as in the previous case of single-source DNA samples. In particular, the same distinction between the DNA profile of a known suspect and a cold hit shall be made. When the type of two cells is the same, as in the case of blood stains with mixed blood, the PCAST report [7] states that “one DNA source may be dominant, resulting in a distinct contrast in peak heights between the two contributors; in these cases, the alleles from both the major contributor (corresponding to the larger allelic peaks) and the minor contributor can usually be reliably interpreted, provided the proportion of the minor contributor is not too low [24]”. However this statement appears to be supported only by experts’ interpretation, in practical cases, rather than by any theoretical statistical analysis of this kind of DNA profiles allowing one to state that different contributors generate different allelic peaks that can therefore be distinguished, at least in principle. On the contrary, strictly controlled experiments were conducted, showing that this statement is unsupported by the experimental evidence [25]. Therefore, it can be concluded that the analysis of simple-mixture DNA samples may lead, in principle, to a reliable identification of the source of the unknown DNA profile only if the profile can be well differentiated by the known one. In this case, the probability that another individual has the same profile as the alleged suspect can be evaluated, starting from the random match probability, as shown in the previous Sect. 10.4.1. In all other cases, there is no universally agreed-upon model on which the foundational validity of methods for isolating the different contributions to a mixture of DNA profiles from the sample evidence can be estimated. In the case of simplemixture DNA cases, this may still be possible in some particular cases [24, 26], but the validity of the obtained result must be discussed, case by case, in terms of validity as applied, as it will be shown in next Sect. 10.5.
10.4.3 Complex-Mixture DNA Samples Complex-mixture DNA samples are defined as mixtures with more than two contributors [7]. They can be originated by multiple unknown individuals in unknown proportions and can be found, for instance, in blood stains left by multiple injured individuals, or in traces left on objects that can be used by multiple individuals, such as door handles, steering wheels, lift buttons, … As discussed in the previous Sect. 10.4.2, there is no model proving that different profiles can be successfully isolated from a mixture. Therefore, there is no available scientific evidence on which the foundational validity of DNA profiling methods can be grounded, as also recognized by the PCAST report [7]. Therefore, this appears
170
10 Forensic DNA Profiling
to deny the validity of identifications based on cold hits, since no methods have been proven to validly isolate profiles to be subsequently searched in the available databanks. If the DNA profile of a suspect is available, attempts can be done to compare it with the available, complex-mixture sample and check if it matches part of that profile. However, such a procedure may easily become subjective [7], mostly based on the interpretation given by the operator that performs the comparison [21, 24, 26]. Assessing how reliable such a procedure is involves the evaluation of the validity as applied rather than the foundational validity, as well as a careful evaluation of all uncertainty contributions, including those originated by the operator, and will therefore be discussed in the next section. The above short discussion is not intended to deny the possibility of exploiting also complex-mixture DNA samples in feature-comparison methods, but only to focus the attention on the lack of a theoretical model, differently from the case of singlesource and simple-mixture samples, that supports the validity in principle—that is independently of the method employed in practice to isolate the DNA profile— of using such complex mixtures to isolate and identify the DNA profiles of each single contributor. It is also intended to draw the attention of the reader to the importance of a correct metrological approach to DNA sample exploitation, including the evaluation, case by case, of all contributions to uncertainty on which the evaluation of the final match probability of the profile obtained from the sample evidence and that obtained from the suspect can be based.
10.5 Validity as Applied of DNA Profiling Up to this point, the foundational validity of DNA profiling has been discussed and the probability PFP (E) of receiving a false positive result about an item E of available evidence was only considered from a mere theoretical point of view in (10.7) to show how non-negligible values for PFP (E) may significantly decrease the likelihood ratio about the evidence being originated by the suspect rather than somebody else. The correct estimation of the probability of errors in extracting a DNA profile and attributing it to an individual is therefore of critical importance, since it may deeply affect the validity of the whole profiling. Since these errors may occur during the whole evaluation procedure, their evaluation is the first step in assessing the validity as applied of DNA profiling, as requested by the PCAST report [7]. While several attempts have been made to estimate the validity as applied, none of them followed a strict metrological approach to evaluate and combine the different uncertainty contributions affecting the considered case, thus leaving more than reasonable doubts about the correctness of the obtained results. In the following part of this Section, the different steps of the profiling procedure will be briefly considered, highlighting the possible sources of uncertainty. The available error rate evaluation exercises will be then discussed, as well as the fulfillment—or lack of
10.5 Validity as Applied of DNA Profiling
171
fulfillment—of the requirements set by the present standards for the competence of testing laboratories, including the forensic laboratories.
10.5.1 STR Typing As already stated in Sect. 10.2, nowadays DNA profiling is largely based on the analysis of the short term repeats (STR). The procedure for obtaining a DNA profile consists of the following steps [1, 9, 10, 23]: • • • • •
DNA extraction, DNA quantification, DNA amplification via PCR (Polymerase Chain Reaction), separation of alleles according to sequence length via electrophoresis, detection via fluorescent dyes.
10.5.1.1
DNA Extraction
DNA extraction is the process employed to isolate purified DNA from forensic samples, both those collected on the crime scene and the reference ones taken on potential suspects or individuals who are not suspected but were on the crime scene. This is a most critical process in the whole typing process, because it has to extract as much DNA quantity as possible, without degrading and contaminating it, but, at the same time, removing all elements that may inhibit the subsequent amplification process via PCR [9]. There are presently three groups of validated methods for DNA extraction, that differ for the obtained purification and the possibility of being automated. • Organic (phenol-chloroform) extraction. After having broken down the cell membrane, the obtained cell lysate is purified by mixing it with a phenolchloroform solution. The DNA remains in the supernatant aqueous solution and is further purified either by precipitation with ethanol or by filtration. This method is very efficient in extracting purified DNA, free of PCR inhibitors, but it is time consuming and difficult to automate [9]. • Solid-phase extraction. This method exploits the DNA ability to bind with silica in the presence of chaotropic salts. Therefore, after breaking down the cell membrane, the obtained lysate is added to a binding buffer of chaotropic salt so that it can be adsorbed by silica. If silica-coated paramagnetic beads are used, the subsequent washing step and DNA elution are facilitated by the application of a magnetic field, without any need for a centrifugation step. This method is faster than the organic one, is very efficient in removing PCR inhibitors, and can be automated [9].
172
10 Forensic DNA Profiling
• Chelating resins (Chelex). Cell membranes can be broken down by adding a chelating resin to the forensic sample and boiling the solution. DNA is protected by the resin, remains in the supernatant solution, and can be separated by centrifugation. The main drawback of this method lies in the fact that DNA is denatured in single-stranded DNA that can be analyzed only by PCR methods [9]. While this method is quite fast, the obtained DNA purity is lower than that provided by the other two methods. While slight variations of the above three methods are also employed, they represent the most widely employed methods for DNA extraction. Purity and absence of PCR inhibitors are the metrics used to assess the quality of DNA extraction. However, the risk of contamination is also present in this stage and should be also quantified. Even a minimal contamination can represent a serious threat, especially in the presence of low-quality samples, because it can be amplified by the subsequent PCR amplification step and be responsible for false positive errors.
10.5.1.2
DNA Quantification
The quantity of purified DNA obtained from the extraction process is seldom enough to grant a correct DNA typing. Therefore, as it will be shown in the next Sect. 10.5.1.3, amplification is almost always necessary. The correct choice of the amplification factor—or, using a term taken from signal processing, the amplification gain—has a critical importance to avoid the generation of STR artifacts that might lead to an incorrect interpretation of the obtained DNA profile. It is therefore important to quantify the amount of DNA in order to select the optimal amplification gain. Of course, this cannot be done with direct weighing, and indirect procedures are used that yield also some additional useful information, such as quantification of PCR inhibitors and, in the case of DNA mixtures, a quantitative estimation of the proportion of male and female components [9]. No matter on how it is achieved, DNA quantification is the result of a measurement procedure and, as such, measurement uncertainty must be evaluated and expressed together with the measured value. Since the measured DNA quantity is used to select the amplification factor, due to the associated measurement uncertainty, more amplification factors can be selected, similarly to the scale factor used to convert a measured BrAC value into a BAC value, as seen in Chap. 9. Therefore, a doubt arises, quantified by the uncertainty associated to the amplification factor, that the optimal amplification is chosen, and its impact on the quality of the final amplified DNA should be evaluated.
10.5 Validity as Applied of DNA Profiling
173
Fig. 10.5 Thermal cycling pattern for PCR. Typical timing is one minute each at 94, 60 and 72 ◦ C, and 1–2 min setting time between the three temperature values
10.5.1.3
DNA Amplification
As already stated, forensic samples collected on the crime scene have a very small quantity of DNA material—sometimes in the order of less than 100 pg11 —that is useless to generate a valid DNA profile. In general, the DNA quantity required to generate a valid DNA profile should be higher than some hundreds of nanograms, and preferably in the order of a few micrograms.12 The technique that makes DNA profiling possible when the available DNA material is less than some hundreds of nanograms is the polymerase chain reaction, or PCR, described in 1985 by Kary Mullis [27]. This technique is an enzymatic process in which a specific region of DNA—in forensic DNA analysis the STR loci of interest—is replicated a number of times given by the number of PCR cycles performed, so that copies of that region are obtained. The region of interest is isolated by binding proper oligonucleotide primers (that, as it will be shown in the next Sect. 10.5.1.4, play a significant role in the allele detection process) and the whole PCR process requires to follow strict thermal cycling patterns, as shown in Fig. 10.5 [1]. Typically, these thermal cycles are repeated about 30 times, to obtain the desired quantity of DNA material. PCR amplification, however, needs to be strictly controlled, to avoid undesired results in the obtained amplified DNA [1, 28], also called amplicons. Temperature represents an influence quantity and uncertainty in its measurement by the control system represents a source of uncertainty. Moreover, the PCR process may be performed on solutions ranging from 5 µl to 100 µl and evaporation must be minimized to avoid degradation in the obtained amplicons [1]. A picogram, symbol pg, corresponds to 10−12 g, that is one millionth of a millionth of a gram. A nanogram, symbol ng, corresponds to 10−9 g, that is one billionth of a gram. A microgram, symbol µg, corresponds to 10−6 g, which is one millionth of a gram.
11 12
174
10 Forensic DNA Profiling
Contamination in the initial DNA material can also be a problem when several PCR cycles are performed to obtain usable material from low DNA quantities. Under such conditions, small amounts of contaminating DNA, so small to remain undetected during the extraction process, may be amplified by the PCR process and show up as an unknown DNA profile. Such a phenomenon is a well-known problem in signal processing [29], where high-gain amplification of noisy signals is generally avoided, to avoid spurious components in the amplified signal. The possibility that undetected contamination might cause false positive errors is testified by the results of a strictly controlled experiment performed by Cale and colleagues [25]. Actually, that experiment was aimed at proving that DNA can be transferred, so that if individual A comes in close contact with individual B and individual B is present on the crime scene, the DNA of individual A can be left on the crime scene together with that of individual B, thus placing individual A on the crime scene. The experiment run by Cale and colleagues proved that DNA transfer is possible, and sometimes the transferred DNA appears to be preponderant on that of the individual who was actually present on the crime scene. An undesired outcome of such an experiment was the presence of an unknown human DNA that did not belong to any of the experimenters, nor any of the operators who performed the DNA typing. Since the experiment was performed in a strictly controlled environment, the most probable reason for the presence of unknown human DNA is undetected contamination by an undetected DNA fragment amplified by the PCR process [29]. Therefore, both uncertainty evaluation in the measurement process of all quantities that may influence the amplification process, and a rigorous estimation of the probability that the selected amplification factor—or the number of PCR cycles—may cause amplification of undesired contaminants appear to be of critical importance in evaluating, case by case, the probability of false positive errors, such as those occurred in [25].
10.5.1.4
Separation and Detection of Alleles
Once the DNA material has been sufficiently amplified, the PCR products, that is the alleles in the considered loci, must be suitably separated, to identify them. Separation can be achieved by exploiting the natural negative electric charge of the DNA fragments [1]. Indeed, it is well known that, if a dc electric field is applied to an electric charge, the charge travels along the direction of the electric field, with negative charges traveling toward the positive electrode and positive charges traveling in the opposite direction, toward to negative electrode. Moreover, if the traveling path is somehow impeded, smaller DNA fragments will move faster than larger ones so that they can be suitably separated by observing them through an observation window close to the end of the traveling path [1, 9, 23]. Such a method is called electrophoresis, and the traveling path can be realized by either a slab gel—thus implementing a gel electrophoresis— or a capillary tube—thus implementing a capillary electrophoresis.
10.5 Validity as Applied of DNA Profiling
175
Gel electrophoresis requires a careful preparation of the gel, and a careful loading of the DNA samples into each well, to prevent contamination. It has the advantage that different DNA samples can be loaded onto the same slab, so that immediate comparison can be made, as in the example shown in Fig. 10.6. On the other hand, an electropherogram—that is the result of electrophoresis— such as that in Fig. 10.6 may pose interpretation issues, because the marks left on the electropherogram can be blurred, not clearly visible, thus originating doubts about the actual presence of the corresponding allele in the analyzed DNA sample. More recently, capillary electrophoresis has replaced gel electrophoresis in forensic labs. The main advantages are the possibility of automating the whole process, the lower quantity of DNA material required and the possibility of a better control of the process parameters [1, 14]. Moreover, the PCR primers can incorporate multiple fluorescent dyes that fluoresce at different wavelengths, so that they can emit visible light, at different colors, when the detection window on the capillary is illuminated by an argon-ion laser. The emitted light can be then detected with an optical detector or a camera [9, 23]. An example of the resulting electropherogram is shown in Fig. 10.7, where three colors (blue, green, and black) are used to identify the alleles belonging to different loci. The amplitude of each allelic peak can be measured by measuring the fluorescence of each dye, and the employed measurement unit is the relative fluorescent unit, or RFU, thus yielding a quantitative evaluation of the obtained profile, instead of a qualitative one provided by the gel electrophoresis. From a strict metrological perspective, this is probably the main advantage of capillary electrophoresis, since it is possible to evaluate the uncertainty contributions in fluorescence measurement and, in principle, combine these contributions with those originated during the previous steps. It is then possible to evaluate the probability that the detected allele peaks do actually come from the analyzed DNA sample, as shown in Chap. 8. This advantage counterbalance the disadvantage of capillary electrophoresis of being intrinsically slow, since DNA samples can be analyzed one at a time, and not
Ladder Profile 1 Profile 2 Profile 3 Profile 4 Profile 5 Ladder Fig. 10.6 Example of the result of a gel electrophoresis, with the comparison of 5 different profiles. A ladder is added on the top and bottom of the slab, as a reference
176
10 Forensic DNA Profiling
Fig. 10.7 Example of the result of a capillary electrophoresis, with three different colors used for different kinds of loci
simultaneously as with the gel electrophoresis, or using more expensive equipment if capillary array systems are employed to process multiple samples simultaneously.
10.5.2 Possible Profile Irregularities All steps in the STR typing process briefly recalled in Sect. 10.5.1 can be affected, as briefly pointed out in the previous points, by undesired effects that might show up in irregularities in the obtained electropherogram, make its interpretation quite difficult and, in the end, cause identification errors. The most evident effect, which may originate in every step of the STR typing process, is the presence of noise in the electropherogram. Noise is a well-known phenomenon in measurement and manifests itself as random variations of the analyzed signal that have a sort of blurring effect on the obtained values: in the case of the electropherograms, the most critical effect is the presence, on the zero level, of oscillations that can be confused with allele peaks generated by a degraded DNA sample. To avoid confusing noise-generated peaks with actual alleles, a threshold is set, and only the peaks in the obtained electropherogram that are above the threshold is recognized as real alleles. However, this poses two critical metrological problems: 1. the threshold value shall be set only after a careful evaluation and composition of the noise contributions generated by each step in the STR typing procedure;
10.5 Validity as Applied of DNA Profiling
177
Fig. 10.8 Example of electropherogram obtained from a degraded DNA sample. The alleles pointed by the arrows show a much lower amplitude than the corresponding peaks in Fig. 10.7
2. when comparing the electropherogram peaks with the threshold value, uncertainty on these peaks must be correctly evaluated, since it deeply affects the probability of the decision on whether the peak is actually above the threshold or not, as shown in Chap. 8, and, hence, the probability that the identified allele does actually belong to the DNA under test. While noise is an always-present irregularity, other typical irregularities may be present in the obtained electropherogram, as shown in the following points. To show their effects in a way as clear as possible, the electropherogram of the example shown in Fig. 10.7 was modified to simulate each effect. Therefore, the figures shown in the following points have to be compared with Fig. 10.7 to fully perceive the considered irregularity.
10.5.2.1
Degraded DNA
Exposure of a DNA sample to environmental factors, such as, for example, high temperature or UV rays, for a long time may break down molecules yielding degradation of the DNA. The likely effect is the one shown in Fig. 10.8, where some allele peaks are much lower than the corresponding ones in Fig. 10.7. The risk is that the obtained peaks are close to the noise threshold, thus increasing the risk of either identifying a noise component as an allele or considering an existing allele as noise and not identifying it.
178
10 Forensic DNA Profiling
Fig. 10.9 Example of electropherogram showing severe peak imbalance. The alleles pointed by the arrows should have the same amplitude as shown in Fig. 10.7
10.5.2.2
Peak Imbalance
PCR amplification of low DNA amount may cause a number of stochastic effects, modifying the peak amplitude and location in the obtained electropherogram. One of these effects is shown in Fig. 10.9 where the two alleles pointed by the arrow should have the same amplitude, as shown in Fig. 10.7, being heterozygote alleles of the same individual. Since the alleles of an individual who is heterozygous in a given locus show approximately the same amplitude in the electropherogram, a severe peak imbalance, such as the one shown in Fig. 10.9, may prevent from recognizing these alleles as heterozygote alleles of the same individual. While this might not be a problem in the case of a single-source DNA sample, it might lead to an incorrect attribution of the alleles when analyzing DNA mixtures.
10.5.2.3
Stutters
A stutter, or stutter peak, is a non-allelic peak—that is a peak showing the presence of an allele that does not belong to the analyzed DNA—that appears on an electropherogram close to an allelic peak—that is a peak showing the presence of an allele that does belong to the analyzed DNA—as shown in Fig. 10.10. The presence of stutter peaks becomes critical when their amplitudes become comparable with that of the allelic peaks. If a stutter appears close to a homozygote
10.5 Validity as Applied of DNA Profiling
179
Fig. 10.10 Example of electropherogram showing the presence of a stutter peak, pointed by the arrow. This peak is not present in the electropherogram in Fig. 10.7
allele, it may induce the operator to recognize it as a heterozygote allele affected by imbalance, thus causing a possible identification error. If it appears close to heterozygote alleles, as it is the case in the example of Fig. 10.10, it may be recognized as an additional homozy-24ptote allele, thus causing again a possible identification error, especially when DNA mixtures are analyzed.
10.5.2.4
Allele Drop-In
Allele drop-in is similar to a stutter peak and is generally caused by high signal gain that overamplifies noise components. The effect, also in this case, is a non-allelic peak that appears on an electropherogram close to an allelic peak, as shown in Fig. 10.11. Also in the case of allele drop-in with an amplitude comparable with that of the allelic peaks, the risk is that of interpreting a homozygote allele as a heterozygote one, as in Fig. 10.11, or as an additional allele. In both cases, an identification error is possible.
10.5.2.5
Allele Drop-Out
Allele drop-out is the opposite phenomenon as the allele drop-in and is caused by a loss of signal, either in the electrophoresis or the PCR amplification stage. The effect is that of a significant reduction in the amplitude of one or more allelic peaks
180
10 Forensic DNA Profiling
Fig. 10.11 Example of electropherogram showing the presence of an allele drop-in, pointed by the arrow. This peak is not present in the electropherogram in Fig. 10.7
that cause them to disappear or be attenuated below the noise threshold, as shown in Fig. 10.12. Allele drop-out may cause several interpretation problems: a heterozygote allele can be shown as a homozygote one, an existing allelic peak can disappear from the electropherogram, or a pair of heterozygote alleles may be seen as two different alleles. In all cases, an identification error is a possible consequence.
10.5.3 Probability of Identification Errors All points mentioned in the previous sections show that identification errors are possible, and that a probability always exists that some of the identified alleles do not belong to the analyzed DNA profile, but are artifacts, or, similarly, that an existing allele is actually missing in the obtained profile. It would hence be advisable that a probability is associated to every identified allele, thus quantifying the reliability—or credibility—of the actual presence of such an allele in the analyzed profile. While the possibility of making errors in DNA analysis, both in forensic and biomedical applications, is widely recognized, a proper metrological characterization of the obtained results, to associate an uncertainty value to the detected elements in the analyzed profile, is still largely missing. Since most laboratories operate under a strict quality assurance policy, attempts were made to estimate the error rate in forensic [30] and biomedical [31] labs by analyzing the reported quality issue notifications.
10.5 Validity as Applied of DNA Profiling
181
Fig. 10.12 Example of electropherogram showing the presence of an allele drop-out, pointed by the arrow. A peak is present, in that position, in the electropherogram in Fig. 10.7 and has disappeared here
The error rate estimates were in good agreement (both studies showed an error rate of about 0.5%) and the highlighted error rate appears to be acceptable. However it suffers from a basic misunderstanding: all considered errors were detected by the lab personnel, sometimes before the wrong result was delivered, sometimes only after the results was challenged by inconsistency with other available data or results, and it does not consider errors they might have remained undetected and are the most critical ones. Moreover, this result can be associated with the lab, providing at most a generic contribution to uncertainty originated by the lab itself, but cannot represent the uncertainty value associated with the single DNA profiling result. Results are available of collaborative exercises performed by laboratories located in different regions [32, 33] in a very similar ways as the interlaboratory comparisons or proficiency tests. In particular, in the exercise reported in [32], stains were prepared on washed cotton using liquid blood samples donated by staff members of the lab that organized and supervised the exercise. Mixed stains were prepared from a 50/50 mixture of two donor blood samples of the same ABO blood group and sent to the 16 participant laboratories. The participants were then asked to identify the DNA profiles and the main aim of the exercise was to compare the results obtained from the VWA and THO1 loci. The allelic designations returned by the 16 laboratories for each sample at the VWA and THO1 loci matched those obtained by the originating laboratories, and no major problems were reported by the participating labs. However, the results obtained at other loci were not all directly compatible. This might indicate that special care
182
10 Forensic DNA Profiling
was adopted in the allelic designation of the loci of interest, thus obtaining the best results, while the other results, obtained under more routinely based operations were not as accurate. The exercise reported in [33] did not involve DNA typing, but the interpretation of two genetic mixture profiles: the 25 participant laboratories were asked to write a complete report, based on the received profiles in pdf format, as their institution usually emits. Several differences were found in the submitted reports and the most critical issue was that, with rare exceptions, the reports did not follow the ISO/IEC 17025:2017 [34] requirements, despite the ILAC recommendation [35] that the forensic labs follow such Standard. It is worth noting that none of the mentioned exercises considered measurement uncertainty as a key point in evaluating the performance of the laboratories and comparing the obtained results, despite most laboratories considered by the studies [30, 31] or participating in the interlaboratory comparisons [32, 33] were accredited as test labs according to the ISO/IEC 17025:2017 Standard [34] and despite the need for accounting for uncertainty in DNA sequencing data has been highlighted, at least when medical treatment is guided by genomic profiling [36]. Despite both the studies on the error rate estimation and the collaborative exercises achieved interesting results, they appear to be aimed more at the assessment of the foundational validity of the DNA profiling methods rather than the assessment of the validity as applied. Indeed, these studies give evidence of the overall experimental limitations in the application of the DNA profiling methods, in a similar way as genetic studies and probabilistic analysis presented in Sect. 10.3 give evidence of the theoretical limitations. Validity as applied, in forensic applications [7], should provide a clear statement about the validity of the obtained DNA profile in the specific case submitted to the trier of fact, not only a general statement about the validity of the method, especially considering all possible error causes highlighted in Sect. 10.5.1. Such a statement should be expressed in terms of a probability value, associated to each and every identified allele, evaluating the probability that the allele itself does belong to the DNA profile contained in the analyzed forensic sample. A significant step forward in this direction was taken by the EU, with the Council Framework Decision 2009/905/JHA on the accreditation of forensic service providers carrying out laboratory activities [37], whose objective is clearly stated in its art. 1.1: The purpose of this Framework Decision is to ensure that the results of laboratory activities carried out by accredited forensic service providers in one Member State are recognized by the authorities responsible for the prevention, detection and investigation of criminal offences as being equally reliable as the results of laboratory activities carried out by forensic service providers accredited to EN ISO/IEC 17025 within any other Member State.
Art. 1.2 clarifies how this purpose shall be achieved: This purpose is achieved by ensuring that forensic service providers carrying out laboratory activities are accredited by a national accreditation body as complying with EN ISO/IEC 17025.
10.5 Validity as Applied of DNA Profiling
183
Finally, art. 2 sets the scope, stating that the decision shall apply to laboratory activities resulting in DNA profile and dactyloscopic data, and art. 7 states that Member States shall take the necessary steps to comply with the provisions of the Framework Decision in relation to DNA profiles by November 30, 2013. In the United States, accreditation of Forensic Labs has been mostly voluntary, although it has become mandatory in some states. According to the Bureau of Justice Statistics, about 90% of the publicly funded forensic crime labs in the United States are accredited. To guide laboratories throughout the accreditation process and in the application of Std. ISO/IEC 17025 [34], ILAC issued a Guide on Modules in a Forensic Science Process [35], in which great attention is paid to method validation and measurement uncertainty is considered as an issue to be determined to validate the method. Since Std. ISO/IEC 17025 requires measurement uncertainty to be stated in the test report, not finding it in the vast majority—if not all—DNA test reports might appear quite surprising. However, as already mentioned in Sect. 10.1, DNA profiles are considered as nominal properties and, according to the VIM [8], measurement does not apply to nominal properties. Consequently, any obtained DNA profile is not considered as a measurement result, and uncertainty is not evaluated nor reported. This is not the place to discuss the correctness of such a statement, but, for the sake of completeness, it is worth reporting that the position of the scientific community has changed since the 3rd edition of the VIM has been first published in 2008, so that a 4th edition is presently under advanced revision stage and will soon be published with major changes related to nominal properties. As briefly recalled in Sect. 10.1, nominal properties may imply measurement activities to be identified, depending on how deeply they need to be investigated. This concept has been thoroughly discussed in [38], where the need for associating uncertainty also to the evaluation of nominal properties is clearly stated: ... measurement uncertainty has the crucial role of establishing the quality of the information obtained by measurement and therefore of providing information on the degree of public trust that can be attributed to measurement results: the common structure of the evaluations of quantities and nominal properties highlighted so far would then be seriously flawed without some uncertainty treatment for nominal property evaluations, that in turn would convey information on the quality of such evaluations.
Such a statement is fully compliant with the uncertainty concept considered by the GUM [39] and explained in Chap. 5, and generalized by the Supplement 1 to the GUM [40] under the assumption of encoding the information on a quantity, conveyed by the measurement result, by a probability distribution. A more recent paper [41] reconsiders the term examinand, as nominal property intended to be examined and also in agreement with the terminology adopted by clinical laboratories [42], and the term examination, as process of experimentally obtaining one or more nominal property values that can reasonably be attributed to a nominal property. This paper proposes to consider also the examination uncertainty, similarly to measurement uncertainty, to quantify the dispersion of values than can be attributed to the nominal property.
184
10 Forensic DNA Profiling
It is then likely that the definition of examination will be modified, in the 4th edition of the VIM, as the process of experimentally obtaining one or more values of nominal properties that can reasonably be attributed to a nominal property together with any other available relevant information, where the relevant information may be about the reliability of the values obtained by the examination, such that some may be more representative of the examinand than others. It is also likely that the new VIM edition may consider that in some cases an examination is performed through intermediate stages which are measurements, and whose results are used to obtain the examination result: under such conditions, it would be normal to consider measurement uncertainty as elements of the relevant information to be associated to examination result. It is also likely that the new VIM edition considers examination uncertainty, maybe calling it examination reliability in agreement with the term adopted in [42], and defining it as the probability that an examined value is the same as a reference value of a nominal property. This will probably require also the definitions of examination standards, examination calibration and examination traceability. This is not the place to disclose anticipations about these definitions that are still under discussion and might undergo changes. However, it is quite likely that the present practice of not associating any information about the probability that an examined value is the same as a reference value to the obtained DNA profiles will not be supported by the restrictive interpretation of the VIM statement measurement does not apply to nominal properties, because the new VIM edition will consider declaring the examination reliability when reporting an examination result. It is then advisable that forensic labs start considering this point and all people involved in forensic activities start demanding that such important relevant information is declared. In our opinion, this is the only scientific way to quantify the validity as applied of DNA profiling methods.
10.6 An Emblematic Case: The Perugia Crime There is a rich literature showing how errors in DNA profile identification and interpretation have led to several miscarriages of justice [18, 21, 22], often supported by a wrong interpretation of statistics and probability [17]. Here we decided to refer to only one case, that had some global relevance, because it is the only case, to the authors’ knowledge, in which metrological issues plaid a significant role, with wrong as well as correct interpretations, in the different stages and courts of the trials. This case is generally referred to as the Perugia case, or Perugia crime, since it refers to a murder committed in the Italian town of Perugia in 2007.
10.6 An Emblematic Case: The Perugia Crime
185
10.6.1 The Crime Perugia is a small town (166,000 inhabitants) located in the middle of the Italian peninsula, about midway between Florence and Rome. Founded by the Etruscans, it flourished during the Middle Age and is the seat of one of the most ancient universities in Italy, established in 1308. It is also the seat of a University for Foreigners, established in 1925 and oriented to offer foreign students advanced studies of Italian language and culture. The presence of this university has given Perugia the typical character of a university town, with a significant number—typically between 1000 and 2000—of foreign students that puts some pressure on the accommodation system. It is customary, among students, to rent an apartment and share the cost of the rent. This was the case of a 22-years old British student—Meredith Kercher—who shared an apartment with a 20-years old US student—Amanda Knox—and two Italian ladies who, a bit older than Meredith and Amanda, had already a job. Since these two ladies were not involved in the crime, to protect their privacy we call them Ms. A and Ms. B. The apartment was located at the first floor of a two-store cottage, had four bedrooms, two bathrooms (one used by Meredith and Amanda and one used by the two Italian ladies), and a living room with a kitchenette used by all occupants of the apartment. The ground floor of the cottage was rented by four Italian male students.13 Both Meredith Kercher and Amanda Knox arrived in Perugia in September 2007. They lived the “normal” life of non-resident students, seeing other students in their free time, being involved in romantic relationships, making occasional use of alcohol, hasheesh or marihuana. They both mixed with the students who lived in the apartment at the ground floor and Amanda had started a romantic relationship with a 23-years old Italian student—Raffaele Sollecito—who was studying computer science at the State University of Perugia and whom she met at a classical music concert she attended with Meredith in October. A fourth character in this story is Rudi Guede, a 21-years old Ivorian man, who had lived in Italy since he was 5-years old and had a criminal record for minor burglary crimes. He used to spend time with the students who lived in the ground-floor apartment, often playing basketball in a nearby field or partying in their apartment. In several such occasions he met both Meredith and Amanda. The crime was committed during the night between the 1st and the 2nd of November 2007, respectively, a Thursday and a Friday. The 1st of November is a national holiday in Italy, and the 2nd is All Souls Day and, when close to a week end, as it was the case in 2007, it is traditionally a day off for most activities, including university classes. Ms. B took advantage of the four days off to visit with her family that lived outside Perugia and left the apartment on October 31st in the afternoon, at the end of her working day. Similarly, Ms. A planned to spend those days with her boyfriend and celebrate the birthday of the boyfriend of one of her friends, which 13
These data, as well as the following short report on the crime, are taken from the verdict issued by the Criminal Court of Perugia in 2009 [43].
186
10 Forensic DNA Profiling
was November 1st. Therefore, both Ms. A and Ms. B left the apartment on October 31st and did not plan to come back before next Sunday. On the morning of November 2nd an event occurred, apparently unrelated with the crime, that led to its discovery. Two cell phones were found in the yard of a cottage situated not far from the apartment where Meredith and Amanda lived. The cell phones appeared to be thrown into the yard from the outside road. The person who found them brought them to the postal police office, a branch of the state police that, in Italy, has jurisdiction on everything related to communications. After a quick investigation, the postal police found that one phone had a SIM card in the name of Ms. A. No data could be found about the second phone, that, for this reason, was supposed to operate on a SIM card of a foreign service provider. Two officers of the postal police decided to go to the apartment where Ms. A lived (the same as the one where Meredith and Amanda lived) to give the cell phone back to her and further investigate on the reasons for its accidental discovery in the nearby yard. When they arrived to the apartment, they found Amanda and Raffaele, who were sitting in the outside yard, waiting for the police they called because Amanda, coming back home after the night spent at Raffaele’s apartment, found the door unlocked and a broken window in Ms. A’s room. As Ms. A declared later, Amanda called her in the morning telling her that she came back home to take a shower, and she found the apartment’s door unlocked, the window in Ms. A’s room broken and this same room ransacked. She also declared that she asked Amanda where Meredith was, and she answered she didn’t know. After receiving this call, Ms. A came back home with her boyfriend, her friend, and her friend’s boyfriend. The four of them arrived a few minutes later than the postal police officers. In the meanwhile, Amanda went back to Raffaele’s apartment and they both came back. Ms. A recognized the cell phones. The one with an Italian number was the phone she lent Meredith to call Italian phones. The other cell phone was Meredith’s English phone she used to call home. The discovery of these two phones and the lack of news from Meredith gave rise to serious concerns, and they entered the apartment to check Meredith’s room. The room’s door was locked, but this was normal, as Meredith always locked her door when she was sleeping. However, she didn’t wake up when Ms. A knocked at the door. At this point, the police officers decided to smash down the door, and they found Meredith’s corpse laying on the floor, covered by a quilt.
10.6.2 The Investigation The investigation was oriented, since the very beginning, to find a plausible answer to the following questions: • what killed the victim; • the time of death;
10.6 An Emblematic Case: The Perugia Crime
187
• who was in the house and, in particular, in the victim’s room, during the period of time during which the victim’s death could be located; • whether or not the broken window of Ms. A’s room and related ransack could be related to the murder or were simply staged to put the investigators off the scent; • which was the murder weapon. To provide an answer to these questions, several medical doctors, geneticists and forensic labs were involved, and the main results of such an investigation are briefly summarized in the following.
10.6.2.1
Cause and Time of Death
Both the examination of Meredith’s corpse on the crime scene and the subsequent autopsy led to clearly identify the cause of death: suffocation as a result of a knife wound in her throat. Meredith’s body was almost totally naked under the quilt, her bra cut with a knife and ripped off. Minor ecchymosis found on her face, upper thighs, and vulvar area led to suppose that she was the victim of a sexual assault, rather than the victim of a burglar she discovered in the apartment. Her hands did not show wounds or major bruises, thus confirming the assumption that she did not fight her attacker, either because she knew him or her, or because she was pinned and could not react. At first, the murderer tried to strangle her and then hit her with a knife, cutting the hyoid bone and thus causing death by suffocation. The analysis of the wound led the coroner to conclude that the employed knife was sharply pointed, with a single, smooth blade. The determination of the time of death was a bit more complex and inaccurate, mainly because corpse examination was delayed by 11 h, after the crime was discovered, to preserve the crime scene as long as possible to collect as much evidence as possible. Based on the body temperature and death stiffening, the time of death could be placed in between 8 pm of November 1st and 4 am of November 2nd, with a middle value falling at 11 pm of November 1st. Based on the analysis of the stomach’s content, death could be placed not later than 2 or 3 h after the last assumption of food. Taking into account the testimonies given by two friends of Meredith who spent part of November 1st afternoon and evening with her, they had dinner together at about 8 pm, and she went back home, alone, at about 9 pm. Therefore, the most plausible time window for Meredith’s death could be placed in between 10 pm and midnight of November 1st, since this time window is in agreement with the obtained scientific evidence and the witnesses’ declarations.
10.6.2.2
People in Meredith’s Apartment
This part of the investigation was both the easiest and the most complex to perform. Several footprints left by shoes whose soles were soaked in Meredith’s blood led
188
10 Forensic DNA Profiling
the investigator to Rudy Guede. His presence on the crime scene was confirmed by DNA analysis on physiological material found in one of the apartment’s bathrooms and on the victim’s body that gave clear and undeniable evidence of his presence on the crime scene. His DNA was found also in the vaginal swab taken on Meredith during the first stage of investigation, although no sperm was found. This also proved that Meredith underwent a sexual assault, although it did not involve a rape. Due to the large amount of biological samples, the results of the DNA analysis were considered reliable and, together with the presence of Guede’s footprints, led to the conclusion that Guede was present on the crime scene when Meredith was killed, and he was quite likely involved in the crime. The presence of other people and their possible involvement in the crime was much more difficult to prove. The presence of Ms. A and Ms. B could be easily excluded, thanks to the numerous testimonies that confirmed that they were away from Perugia the night when the crime was committed. The presence of Amanda Knox and Raffaele Sollecito could be neither confirmed nor excluded. Amanda lived in the same apartment and, therefore, her DNA could be found there, though it was not found on Meredith’s corpse. Raffaele did not live in the apartment, but, being Amanda’s boyfriend, was often present in the apartment and his DNA could be easily found. On the other hand, Amanda declared to the police officers who questioned her that she spent the night from November 1st to November 2nd, when the crime was committed, at Raffaele’s apartment and she came back to her apartment only in the morning of November 2nd to take a shower. She noticed that the door of Meredith’s room was closed and that there were a few blood stains in the bathroom she shared with Meredith, but she assumed it was menstrual blood so she had no suspects that something strange could have occurred, at least until she noticed the broken window in Ms. A’s room. These declarations sounded somehow confused and contradictory to the investigators, although Amanda tried to explain her confusion with the assumption of marihuana in the evening of November 1st, and led to some additional search in Raffaele’s apartment, as reported in the following Sect. 10.6.2.4.
10.6.2.3
The Broken Window: A Staged Ransack?
According to the testimonies of Meredith’s friends and those of Ms. A and Ms. B, Meredith paid particular attention in locking the door of her apartment, especially when she knew that she was alone in the apartment, as in night of the crime. She also would have never let Rudi Guede, or any other person with whom she was not really familiar, enter the apartment when she was alone. This led the investigators to assume that Rudi was not let into the apartment by Meredith and either he forced the door or a window, or somebody else, who had the apartment’s keys, let him in. The broken window in Ms. A’s room, the related ransack and the fact that Rudi had a criminal record for similar burglary crimes could have led the investigators to assume that he broke the window, entered the apartment through the broken window
10.6 An Emblematic Case: The Perugia Crime
189
and killed Meredith when she caught him when she came back home in the evening of November 1st. A stone, found in Ms. A’s room, seemingly thrown from the outside to break the window, gave apparently credit to this reconstruction of the events. However, this assumption did not stand a more careful investigation. To enter through the broken window, the murderer should have climbed the outside wall, which was not easy to climb. In doing so, he or she should have left marks on the wall, and no marks were found showing that somebody had recently climbed the wall. Moreover, the relative position of the window, the shutters, and the stone were somehow inconsistent with a throw from the outside. All the above considerations led the investigators to conclude that the broken window and the ransack were staged after the crime, to cover the fact that other people were present at the crime scene, they probably let Rudi in the apartment and try to put the whole responsibility of the crime on Rudi’s shoulders.
10.6.2.4
The Murder Weapon
The retrieval of the murder weapon does always represent an important step in the investigation process and, in this specific case, was even more important due to the doubts on who was really present on the crime scene. The coroner’s report had established that death was caused by suffocation as a result of a knife wound in the victim’s throat and that the employed knife was sharply pointed, with a single, smooth blade. Rudi Guede had a pocket knife that did not correspond to that description, although it might have been used to cut Meredith’s bra and cause some of the minor injuries found on her corpse. The apartment was also searched, but no knives were found corresponding to the description given by the coroner. Since the investigators suspected that also Amanda and Raffaele could have been on the crime scene, mainly due to the lack of a solid alibi and the contradictions in Amanda’s examination, they decided to search Raffaele’s apartment too. The search and the way it was conducted are thoroughly reported in the verdict issued by the Criminal Court of Perugia in 2009 [43]. During the search, a kitchen knife was found whose dimensions and shape could be related to those indicated by the coroner as the most likely dimensions of the knife that killed Meredith. This knife attracted the attention of the investigators because of some scratches on the blade, not visible in normal lighting conditions and barely visible under a strong light and only under specific angles. They thought that some biological material could have remained trapped in these scratches, even after the knife had been cleaned, should have that knife been used to commit the murder. Indeed, some biological material was found inside the scratches and on the handle, the DNA was analyzed and the one on the blade was attributed to Meredith, while the one on the handle was attributed to Amanda. The knife was then considered as the murder weapon by the investigators and, having been found in Raffaele’s house, it also placed Amanda and Raffaele on the crime scene.
190
10 Forensic DNA Profiling
10.6.3 The Trials 10.6.3.1
The First Trial and the Verdict
According to the investigation results briefly reported in the previous Sect. 10.6.2, Rudi Guede, Amanda Knox and Raffaele Sollecito were charged of Meredith’s murder, prosecuted, and tried before the court of Perugia. Rudi Guede defense motioned for a summary judgment that, in the Italian judicial procedure grants a reduction in sentence in the case of guilt. Therefore, Rudi Guede trial followed a different path in front of a different court than that of Knox and Sollecito. The evidence collected during the investigation was considered overwhelming and Guede was sentenced to 30 years imprisonment. The defense appealed and the sentence was reduced to 16 years imprisonment by the appeal court. Amanda’s and Raffaele’s trial followed a quite different path, not only because of the different procedures. The collected evidence was challenged by the defense, in particular the one related to the murder weapon. The expert witnesses who were called to testify by the prosecutor and the judge in charge of the preliminary investigations14 confirmed that the knife could be considered the murder weapon, although with different degrees of certitude: some stated that it was compatible with the wound, others stated that could not be certainly stated that it was incompatible. The expert witnesses called to testify by the defense challenged these statements, also on the basis of the blood stains found on the bed sheets, where the knife was probably laid down temporarily, that were declared not compatible with the dimension of that knife. The second investigation result challenged by the defense was the DNA analysis of the biological material found on the knife’s blade. In particular, the defense highlighted the small quantity of this material, that requested high PCR amplification, resulted in low-amplitude peaks in the electropherogram, and prevented repetition of the test. Doubts were also expressed about the possible degradation of the biological sample, and the possible contamination with other samples during the tests. The geneticist who performed all DNA analysis ordered by the prosecutor office was examined, as expert witness, during several hearings. She declared that the small quantity of biological matter did not provide peaks higher than 50 RFUs for all alleles in the 16 analyzed loci. However, despite these problems, she declared that she “obtained genetic profiles …not so high in their peaks, but complete in almost all their parts”15 [43]. It is worth noting that measurement uncertainty was never mentioned by this expert witness, nor by the attorney and the defense lawyers who examined her. The metrological concepts of sensitivity and resolution, that play and important role 14
In the Italian judicial system the investigation results that lead to request prosecution of the defendants are subject to scrutiny by an independent judge that has to validate them and allow prosecution and the consequent trial. 15 This statement is included in the verdict. The official text, in Italian, was translated into English by the authors.
10.6 An Emblematic Case: The Perugia Crime
191
in the presence of low-amplitude peaks, were never considered, not even by the expert witnesses called to testify by the defense. The forensic lab of the geneticist that performed the DNA analysis was not an accredited laboratory according to Std. ISO/IEC 17025 [34], although it was in the early stages of the accreditation process, and was implementing a Quality Assurance system certified according to Std. ISO 9001 [44]. This DNA analysis, having attributed the DNA profile sampled on the blade of the knife found in Raffaele’s apartment, was considered as conclusive evidence in placing both Raffaele and Amanda on the crime scene. This evidence reinforced other pieces of evidence analyzed by the court, and in particular the staged ransack of Ms. A’s room, which led to conclude that Amanda had to be in the apartment to let Rudi in, the partial cleaning of blood stains outside Meredith’s room, Meredith’s cell phones stolen and thrown in a dark and unattended place, the contradictions in Amanda’s alibi, and also the fact that she tried to accuse another person, who was found totally unrelated to the facts and that cost Amanda a sentence for slander. This led the court to conclude: “The set of the considered elements, individually evaluated, represents a complete and coherent picture, without missing elements and inconsistency, and leads, as a necessary and strictly consequential conclusion, to lay the blame for the crime to both defendants, who must be, therefore, be held criminally liable for them”16 [43]. Consequently, on December 5, 2009, over two years after Meredith Kercher was killed, Amanda Knox was sentenced to 26 years imprisonment and Raffaele Sollecito was sentenced to 25 years imprisonment.
10.6.3.2
The Appeal Trial and the Verdict
Both the prosecutor and the defendants appealed against the verdict. The prosecutor appealed against the exclusion of the aggravating factors and the granting of mitigating circumstances and asked for a reformulation of the sentence, while the defendants challenged the court’s interpretation of the scientific evidence as well as the other circumstantial evidence that led to “an incorrect line of reasoning that resulted in a guilt statement based on a subjective belief, supported by probabilistic data, at best, rather than on objective and significant pieces of evidence, so that any reasonable doubt about defendants guilt was excluded”17 [45]. Moreover, the defendants submitted the request of repeating the DNA analysis on the knife that was considered, in the proceedings at first instance, as the main piece of evidence supporting all other pieces of circumstantial evidence. The appeal court accepted the request and appointed a team of experts, chaired by two professors of forensic medicine of Rome “La Sapienza” university. The court motivated this decision stating that, differently from the court of the proceedings at first instance, 16
The official Italian text was translated into English by the authors. The official Italian text was translated into English by the authors here and wherever it is cited in the following parts of this section.
17
192
10 Forensic DNA Profiling
they “did not believe that the personal (scientific) competence of the judges were good enough to settle a scientific dispute on the basis of a scientific judgment without the assistance of trustworthy experts, appointed by the court to settle the dispute within an adversarial procedure involving the experts appointed by the parties” [45]. Consequently, the court asked the appointed experts to repeat the DNA analysis and report on the reliability of the identification provided by the biological samples collected from the knife. In the case the analysis could not be repeated, the experts were asked to evaluate the reliability of said identification, based on the relevant documentation available in the files of the proceedings. The team of experts tried, at the presence of the experts appointed by the prosecutor’s office and the defendants, to extract the DNA from the knife’s blade, and in particular from the scratches identified by the expert who ran the DNA analysis during the investigations. Unfortunately, the real-time PCR analysis showed that the DNA concentration could not be determined. Moreover, a small amount of the swabbed material underwent a cytological analysis that did not show the presence of human cells. Therefore, it was concluded that the DNA analysis could not be repeated and the experts proceeded to consider the relevant documentation to answer the question asked by the court. The experts reviewed all steps followed by the expert geneticist that analyzed the knife, starting from the way the samples were collected to the way they were analyzed. The analysis of the electrophoretic plots showed that the collected samples did not contain enough DNA to provide a reliable result: had the DNA quantification be obtained employing a real-time PCR technique, instead of a fluorometric quantification, the sample could have been classified as a Low Copy Number (LCN) sample and processed according to the indication of the state of the art [46].18 However, due to the incorrect classification, the collected sample was not processed with the attention required in processing LCN samples, and almost all issues highlighted in Sect. 10.5 affected the obtained results. This led the experts appointed by the court to disagree with the conclusion on the certain attribution of the genetic profile obtained from the sample collected on the knife’s blade to the victim, because the obtained genetic profile appeared to be unreliable, since it was not based on scientifically validated analytical procedures [45]. Moreover, the experts stated that they could not exclude that the result obtained on that sample could be affected by contamination that occurred during one or more of the collection, handling, and analysis steps [45]. Despite the experts did not perform a measurement uncertainty analysis to quantify the doubt that the obtained DNA profile could not be attributed to the victim, the court considered quite reasonable such a doubt and concluded that the result obtained by the police forensic lab, upon appointment by the prosecutor, “cannot be considered
18
It is worth mentioning that this paper advices against using LCN samples in forensic DNA profiling, since the obtained results are considered non-reliable. It is suggested that LCN samples are used only in the identification of missing people (including the victims of mass disasters) or for research purposes.
10.6 An Emblematic Case: The Perugia Crime
193
reliable because provided by a procedure that did not employ the due diligence suggested by the international scientific community” [45]. Having considered this piece of evidence as unreliable caused the logic behind the 2009 verdict to fail, in the opinion of the appeal court. Since the juridical analysis of the appeal court’s verdict it outside the scope of this book, let us only synthetically consider that all other pieces of circumstantial evidence were analyzed by the court, from the staged ransack of Ms. A’s room to the testimonies of the witnesses, and considered, absent the scientific proof that could place the defendants on the crime scene beyond any reasonable doubt, non-reliable. It is worth noting that the appeal court expressed also a strong doubt that the knife found in Raffaele’s apartment could be the murder weapon, so that they considered unjustified also the minor charge, set on Amanda, of carrying a concealed weapon. Consequently, on October 3, 2011 the defendants were exonerated from all charges and set free. The only confirmed charge was that of slander for Amanda Knox, though the aggravating factors were not confirmed.
10.6.3.3
The Appeal to the Supreme Court and the Verdict
The prosecutor and Meredith’s family members, who were the civil party in the previous proceedings, appealed against the appeal verdict to the Supreme Court. The Italian Supreme Court, called Corte di Cassazione or, more briefly, Cassazione, is at the top of the ordinary jurisdiction. The most important function conferred to this court is to ensure “the exact observance and uniform interpretation of the law, the unity of the national objective law, compliance with the limits of the various jurisdictions”.19 One of the key features of its mission is aimed at ensuring certainty in the interpretation of the law taking into account that the current rules do not allow the court to further investigate the facts of a case and bind it to consider only the documents already considered in the proceedings in front of the lower courts, and only to the extent that they are necessary to make and motivate a decision. The appeal to this court against the verdicts rendered by the lower courts can be submitted only in the case of violation of substantive law or procedures, defective statements (lack of motivations, insufficient or contradictory reasoning) in the verdict, or incorrect jurisdiction assignment. Whenever the court recognizes the existence of one of the above violations, it has the power and duty to invalidate the verdict of the lower courts, and also the power to state specific interpretation of the law principles relevant to the considered case and that may become precedents, though not strictly binding, for similar cases. According to these rules, the motivations submitted by the appellants could not refer to the merit of the verdict, but rather to the procedures followed to render the verdict. Without entering into juridical details on the Italian procedures, which are far beyond the scope of this book, the main motivations submitted to the Supreme Court can be shortly summarized in two main points [47]. 19
From Italian law n. 12 of January 30, 1941, art. 65.
194
10 Forensic DNA Profiling
• The appeal court had no right to order a new technical appraisal on the DNA found on the knife’s blade, since all tests performed during investigation did not violate the procedures in force nor the right of the parties to be present at all technical activities. Moreover, the court should not have left the decision on whether repeating the tests or not to the appointed experts, since the court has the responsibility for deciding whether a test shall be performed or not and cannot delegate it to the experts. • The verdict did not consider the different available pieces of evidence as a whole body of evidence, into which each element could reinforce the other ones, but rather individually, thus missing the complete framework that, in the opinion of the appellants, should have led to a verdict of guilt, as in the first proceedings. The Supreme Court recognized the right, of the appeal court, to order new analysis, since the court has the right to decide whether new investigations are needed or not. On the other hand, the Supreme Court censored the way the appeal court conducted the new genetic tests. Indeed, the experts appointed by the court detected a third trace of biological material on the knife’s blade, but decided not to extract and type the DNA because, according to their scientific judgment, the sample was again an LCN sample and, therefore, it would have returned an unreliable result. The Supreme Court stated that the decision of whether performing a DNA analysis or not was in charge of the court, not on the technical experts, “after having weighed the different opinions, with equal scientific dignity”20 [47]. Commenting and discussing this point from a juridical perspective is beyond the scope of this book. Different opinions can be expressed on it, as it is also clear from the motivation of a subsequent, final verdict of the Supreme Court, though of a different Chamber, as briefly reported in the next Sect. 10.6.3.5. Here it is worth noting that the decision of the experts appointed by the appeal court is fully justified by the state of the art in metrology: whenever the measurand value is below the resolution of the instrument employed to measure it, the result cannot be considered valid. The opinion of the experts appointed by the prosecutor and the civil party that the state of the art had evolved since the original analysis performed by the geneticist appointed by the prosecutor, so that reliable results were possible also in the presence of LCN samples, may be valid in principle, but might be incorrect in practice, because the validity of the obtained results depends on the features of the employed instrumentation. Once again, it is worth noting that the correct evaluation of measurement uncertainty or examination reliability, according to the terminology that the 4th edition of the VIM is likely to adopt (see Sect. 10.5.3), is the only strict and scientific way to assess whether a measurement result can be considered suitable for the intended use. The Supreme Court also recognized the validity of the second main complaint of the appellants about the way the appeal court weighed the other pieces of evidence. In particular, the Supreme Court stated that “the missing organic analysis prevented to fill the gaps that each single piece of circumstantial evidence shows, by overcoming the limitation in the capability of proving the existence of an unknown fact, also 20
The official Italian text was translated into English by the authors here and wherever it is cited in the following parts of this section.
10.6 An Emblematic Case: The Perugia Crime
195
considering that ‘the overall framework can assume a well-defined and unambiguous meaning of proof, through which it is possible to define the logical proof of the fact …that does not represent a less qualified tool, with respect to a direct or historical piece of evidence, when it is obtained with methodological strictness that justifies and strengthens the concept of free conviction of the judge’ (Verdict 4.2.1992 n. 6682 of the Joint Chambers of Cassazione)” [47]. Consequently, on March 25, 2013, the Supreme Court set aside the judgment of the appeal court and ordered a new appeal proceedings in front of the appeal court of Florence.
10.6.3.4
The Second Appeal Trial and the Verdict
This second appeal trial took place in front of the appeal court of Florence, as ordered by the Supreme Court. This appeal court reconsidered all points covered in the earlier proceedings and motivated its decision in a 337 pages long verdict [48]. The court recognized that the available evidence was mostly circumstantial and, in the preliminary remarks, quoted the Supreme Court’s verdict that requested “a global and organic analysis of the circumstantial evidence” aimed at “not only proving the presence of the two defendants on the crime scene, but also outlining their subjective position as Guede’s accomplices, inside the framework of possible situations ranging from the original agreement on committing a murder, through a change in a plan initially aimed at involving the victim in an undesired sexual game, to forcing the victim in a group hot sexual game that got out of control”.21 [48] Differently from the earlier proceedings, the appeal court could now consider also the final verdict of Guede’s trial, not yet available when the earlier proceedings were celebrated, and that considered him guilty for Meredith’s murder in complicity with other, unidentified persons. Starting from the lack of any clear sign showing that the victim fought against her aggressor, the court assumed that a single individual could not have committed the murder alone, even if no other DNA than that belonging to Rudi Guede could be found on the victim’s corpse. Not having found any evidence that could relate somebody else than Guede and the defendants to the apartment inhabited by the victim, the court interpreted all available pieces of circumstantial evidence, including all testimonies, in a similar way as the court of the proceedings at first instance, rejecting the interpretations given by the court of the first appeal trial. Once again, the analysis of the DNA traces found on the knife identified as the murder weapon represented the central piece of evidence around which all other pieces of circumstantial evidence could be interpreted against the defendants. The court rejected the conclusion of the experts appointed by the first appeal court about a possible contamination of the exhibits, and in particular a small piece of the victims’ bra with a clip on which the investigators found the presence of 21
The official Italian text was translated into English by the authors here and wherever it is cited in the following parts of this section.
196
10 Forensic DNA Profiling
Raffaele Sollecito’s DNA. The court stated that “talking about exhibits contamination in general and abstract possibilist terms, as done by the defense lawyers and the experts appointed by the court, …is meaningless in a criminal trial, and is objectively misleading” [48]. On the contrary, a possible contamination should have been proved on the basis of evident failures in the way the exhibits had been preserved and handled during the tests. It is worth noting that this position shows apparent disregard of the most recent scientific concerns about the possibility of undetected exhibits contamination, despite the high care in handling them, that could show up due to the high PCR amplification required by LCN samples, as the ones considered by the experts in the first appeal trial [25, 29]. Indeed, one of the most enlightening papers [25] on this topic was not yet published at the time this trial was celebrated, although it had been surely originated by concerns, in the scientific community, about undesired and undetected contamination of the analyzed samples, concerns that were not ignored by the experts appointed by the first appeal court. Probably, as it often occurs, the court of this second appeal trial did not fully perceive that the validity as applied of a forensic method can be quite different from its foundational validity. The court focused then its attention on the samples taken from the knife found in Raffaele’s apartment. Here too the fundamental metrological concept that any measurement result below the resolution of the employed method and instrument cannot be considered a reliable result was apparently not understood and the court appears to share the position of the expert witness called by the prosecutor in the first appeal trial that a statistical evaluation was done “aimed at estimating the probability that the obtained profile could belong to a different individual than the victim. The determination of the Random Match Probability resulted in a probability of 1 over 300 millions of billions. This value allows one to attribute the analyzed sample to a single contributor that the expert witness assumes to be the victim” [48]. Without commenting the rather unlikely correctness of the computed probability (1 in 3 · 1017 !), it is worth recalling that the random match probability is not the correct way to assess whether an obtained profile belongs to an individual or not, as widely discussed in Sects. 10.3.2 and 10.4. Moreover, considering that the Supreme Court censored the decision of the first appeal court of accepting the experts’ decision of not analyzing the additional trace of genetic material found on the blade, this court appointed two experts belonging to the forensic scientific department of the Italian Carabinieri police22 , and ordered them to analyze that additional trace. The experts reported that also that sample was an LCN sample, but, differently from the experts appointed by the first appeal court, they analyzed it anyway and provided the analysis results in terms of likelihood ratio, although they did not disclose whether the amplitude of the obtained alleles was higher than the measurement 22
The official Italian name of this department is Reparto Investigativo Speciale (RIS). The Carabinieri are a military force, under the authority of the Ministry of Defense, and are the national gendarmerie of Italy who primarily carries out domestic policing duties, together with the Polizia di Stato who acts under the authority of the Ministry of the Interior.
10.6 An Emblematic Case: The Perugia Crime
197
resolution or not. The results showed that the obtained profile could not be attributed to the victim, nor to Rudi Guede or Raffaele Sollecito, while it could be attributed with high likelihood to Amanda Knox. This last result was not at all surprising, since Amanda had often been in Raffaele’s apartment, and it is not unreasonable to assume that she used the knife while helping Raffaele cooking their meal. On the other hand, the absence of the victim’s DNA should have confirmed the conclusions of the experts appointed in the first appeal trial that the result of the analysis performed on the other samples found on the blade was unreliable. The court, however, interpreted this result as a proof that the experts appointed by the first appeal court could and should have analyzed again the samples found on the blade and that their conclusion that the results obtained by the police expert at the investigation stage were not reliable was not acceptable, since the additional, LCN sample could be analyzed and provided, according to the RIS experts, reliable results. This led the court to conclude that the whole framework of circumstantial evidence, including the DNA analysis results challenged by the first appeal court, was proving beyond any reasonable doubt that the defendants participated, with Rudi Guede, in Meredith’s murder. Consequently, on January 30, 2014, the court confirmed the sentence rendered by the court at first instance and increased Amanda’s sentence to 28 years imprisonment, since the request of the prosecutor to consider the aggravating factors in the slander was accepted [48].
10.6.3.5
The Second Appeal to the Supreme Court and the Final Verdict
The defendants appealed again against this second appeal verdict to the Supreme Court. As already explained in Sect. 10.6.3.3, the Italian Supreme Court (Corte di Cassazione) is not allowed to further investigate the facts but has the right and the duty to analyze the motivations of the appellants and assess whether they can be accepted or shall be rejected, and, if accepted, it has the right and the duty to analyze the motivation of the appealed verdict and the logical reasoning that led to such a verdict. Should flaws be found in the logical reasoning or should the motivation be not coherent with that reasoning, the Supreme Court has the right to set aside the appealed sentence, with or without ordering a repetition of the appeal proceedings. Therefore, the verdict of the Supreme Court is mainly based on purely juridical points [49], whose discussion is beyond the scope of this book. The following analysis is, hence, covering only the points that are more directly related with the scientific evidence. The Supreme Court noted, since the very beginning of its analysis, that the previous proceedings were characterized by a wavy path, “whose oscillations are, however, also the result of blatant investigative blunders or ’oversights’ and culpable lack of investigations that, if performed, would have quite likely led to outline a framework, if not of certainty, at least of reassuring reliability in assessing either guilt or innocence of the appellants. Such a scenario, intrinsically contradictory, represents by
198
10 Forensic DNA Profiling
itself a first, telltale indication of an evidential framework that was anything but characterized by evidence beyond any reasonable doubt”23 [49]. Later on, the court stated that “when the central part of technical and scientific investigation activity is represented by specific genetic investigation, whose contribution has become more and more relevant, the reliable parameter of correctness shall be nothing else than the respect of the standard procedures established by international protocols that encompass the fundamental rules set by the scientific community, based on statistical and epistemological observations” [49]. The analysis of the available pieces of evidence led the Supreme Court to state that only two points were fully certain: Amanda slandering a person who was proved to be totally unrelated to the facts and Rudi Guede found guilty, together with other unknown accomplices, of Meredith’s murder and sentenced to 16-years imprisonment. However, this was considered not enough to conclude that the unknown accomplices were Amanda and Raffaele for two main reasons: the total lack of their traces on the victim’s body and in the room where the murder was committed, while, on the contrary, numerous traces left by Guede could be found, and the lack of a plausible motive. The only pieces of evidence that could relate the defendants to the crime scene and prove that they were directly involved in the crime were the DNA profiles found on the blade of the knife considered as the alleged murder weapon and the small part and clip of Meredith’s bra, torn apart from the bra and found on the floor many days after the crime was committed. However, also because of “a deplorable carelessness in the initial investigations” [49], the samples could have been contaminated, and, in any case, the amount of DNA extracted from those samples was not enough to provide a reliable profile, according to the analytical protocols recommended by the scientific community. The Supreme Court censored the choice, made by the police forensic department “of preferring the analysis of the genetic profile of the samples found on the knife, rather than assessing their biological nature, given that the low quantity of the biological samples did not allow both tests: the qualitative analysis would have ’consumed’ the sample and made it not exploitable for further tests. This choice is quite controversial, since detecting blood traces, that could have been attributed to Meredith Kercher, would have given the proceedings a terrific piece of evidence, proving indubitably that the weapon had been used to commit the murder. Having found it in Raffaele Sollecito’s apartment, where Amanda Knox spent part of her time, would have then made possible every other deduction upon its merit. On the contrary, the attribution of the detected traces to Amanda Knox’s genetic profile provided an ambiguous result, or rather useless, given that the young American lady had a relationship with Raffaele Sollecito, spending part of her time in his apartment and part in her apartment” [49]. The Supreme Court concluded that “the intrinsic inconsistency of the body of evidence, resulting from the text of the appealed verdict, invalidates in nuce24 the 23
The official Italian text was translated into English by the authors here and wherever it is cited in the following parts of this section. 24 The court used this Latin locution that means from its very origin.
10.6 An Emblematic Case: The Perugia Crime
199
connective logic of this verdict, thus implying that it must be set aside” [49]. The Supreme Court also stated that no reasons existed to order a repetition of the appeal proceedings, since it was not possible to perform new analysis on the genetic material, since no material was left by the previous tests. Consequently, the Supreme Court, on March 27, 2015, almost eight years after the crime was committed, exonerated Amanda and Raffaele from the charge of having killed Meredith. This sentence was final.
10.6.4 Conclusive Remarks This case shows, emblematically, the consequences of disregarding the fundamental concepts in science and, in particular, metrology. It is so emblematic that it has become a case study also outside Italy [50]. It is apparent, from the analyzed documents, that the different conclusions about guilt or innocence were based on the result of the genetic tests and the attribution of the DNA profile found on the blade of the knife, allegedly considered the murder weapon, to the victim. Had measurement uncertainty, or examination reliability as the 4th edition of the VIM will quite likely call it, be evaluated, the doubt that the obtained profile could or could not be attributed to the victim would have been quantified, thus giving the judge a scientific element on which assessing whether a guilt sentence could have been rendered beyond any reasonable doubt. Its evaluation would have also reinforced the opinion submitted by the experts appointed by the court of the first appeal proceedings. Moreover, the awareness of the high measurement uncertainty, or low examination reliability, of the DNA profile extracted from an LCN sample would have turned the initial investigation toward a different path, in search of other, more reliable pieces of evidence or even in search of other suspects [50]. This is not the only case that would have greatly benefited of a stricter metrological approach to the collection and formation of the scientific evidence. Not doing so may result, as in this case, in assigning high reliability to a piece of evidence, thus considering it, erroneously, as conclusive evidence. The consequence is that when the same piece of evidence undergoes a stricter scrutiny, its reliability may result much lower than expected, thus preventing from rendering a verdict beyond any reasonable doubt. From this point of view, justice is done, because no risk was taken in sentencing innocent individuals. However, the criminals who killed a young, innocent victim have not been identified and punished, probably because of a deplorable lack of metrological competences (to paraphrase the words of the Supreme Court mentioned in Sect. 10.6.3.5) and, from this point of view, justice is not done. One more reason to conclude that forensic metrology, if correctly applied, can help justice.
200
10 Forensic DNA Profiling
References 1. Butler, J.M.: Fundamentals of Forensic DNA Typing. Elsevier, Burlington, MA, USA (2009) 2. Amorim, A.: Basic principles. In: Siegel, J.A., Saukko, P.J., Houck, M.M. (eds.) Encyclopedia of Forensic Sciences, 2nd edn, pp. 211–213. Elsevier, Burlington, MA, USA (2003) 3. Carracedo, A.: Forensic genetics: history. In: Siegel, J. A., Saukko, P.J., Houck, M.M.: Encyclopedia of Forensic Sciences, 2nd edn, pp. 206–210. Elsevier, Burlington, MA, USA (2003) 4. Saks, M.J., Koehler, J.: The coming paradigm shift in forensic identification science. Science 309(5736), 892–895 (2005) 5. Biedermann, A., Thompson, W.C., Vuille, J., Taroni, F.: After uniqueness: the evolution of forensic science opinion. Judicature (102), 18–27 (2018) 6. National Academy of Science, National Research Council, Committee on Identifying the Needs of the Forensic Sciences Communit: Strengthening Forensic Science in the United States: A Path Forward, Doc. n. 228091 (2009). https://tinyurl.com/zsrzutar 7. President’s Council of Advisors on Science and Technology (PCAST): Report to the President—Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature— Comparison Methods (2016). https://tinyurl.com/abv5mufh 8. BIPM JCGM 200:2012: International vocabulary of metrology—basic and general concepts and associated terms (VIM), 3rd edn (2012). http://www.bipm.org/utils/common/documents/ jcgm/JCGM_200_2012.pdf 9. Alonso, A.: DNA extraction and quantification. In: Siegel, J.A., Saukko, P.J., Houck, M.M. (eds.) Encyclopedia of Forensic Sciences, 2nd edn, pp. 214–218. Elsevier, Burlington, MA, USA (2003) 10. Lareu, M.: Short tandem repeats. In: Siegel, J.A., Saukko, P.J., Houck, M.M. (eds.) Encyclopedia of Forensic Sciences, 2nd edn, pp. 219–226. Elsevier, Burlington, MA, USA (2003) 11. Ferrero, A., Scotti, V.: The story of the right measurement that caused injustice and the wrong measurement that did justice; how to explain the importance of metrology to lawyers and judges [Legal Metrology]. IEEE Instrum. Meas. Mag. 18(6), 18–19 (2015) 12. Budowle, B., Moretti, T.M., Niezgoda, S.J., Brown, B.L.: CODIS and PCR-based short tandem repeat loci: law enforcement tools. In: Proceedings of the Second European Symposium on Human Identification, pp. 73–88. Promega Corporation, Madison (1998). https://tinyurl.com/ uczzfhmt 13. Hares, D.M.: Expanding the CODIS core loci in the United States. FSI Genet. 6(1), 52–54 (2012) 14. Western, A.A., Nagel, J.H.A., Benschop, C.C.G., Weiler, N.E.C., de Jong, B.J., Sijen, T.: Higher capillary electrophoresis injection as an efficient approach to increase the sensitivity of STR typing. J. Forensic Sci. 54, 591–598 (2009) 15. Dror, I.E.: Human expert performance in forensic decision making: seven different sources of bias. Aust. J. Forensic Sci. 49, 541–547 (2017) 16. Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993) 17. Aitken, C.G., Taroni, F., Bozza, S.: Statistics and the Evaluation of Evidence for Forensic Scientists, 3rd edn. Wiley, Hoboken, NJ, USA (2021) 18. Thompson, W.C., Taroni, F., Aitken, C.G.G.: How the probability of a false positive affects the value of DNA evidence. J. Forensic Sci. 48(1), 47–54 (2003) 19. Pappas, S.: Genetic testing and family secrets, Monitor on Psychology 49(6) (2018). http:// www.apa.org/monitor/2018/06/cover-genetic-testing 20. Vaira, M., Garofano, L.: The yara gambirasio case. In: Proceedings of the 69th Annual Scientific Meeting of the American Academy of Forensic Sciences, pp. 816–817. New Orleans, LA, USA (2017) 21. Thompson, W.C.: The potential for error in forensic DNA testing (and how that complicates the use of DNA databases for criminal identification). Gene Watch 21(3–4), 1–49 (2008) 22. National Research Council: The Evaluation of Forensic DNA Evidence. National Academy Press, Washington, D.C., USA (1996). https://tinyurl.com/3ftrrw25
References
201
23. Balding, D.J., Steele, C.D.: Weight-of-Evidence for Forensic DNA Profiles. Wiley, Chichester, U. K., Second Edition (2015) 24. Clayton, T.M., Whitaker, J.P., Sparkes, R., Gill, P.: Analysis and interpretation of mixed forensic stains using DNA STR profiling. Forensic Sci. Int. 91, 55–70 (1998) 25. Cale, C.M., Earll, M.E., Latham, K.E., Bush, G.L.: Could secondary DNA transfer falsely place someone at the scene of a crime? J. Forensic Sci. 61, 196–203 (2016) 26. Heidebrecht, B.J.: Mixture interpretation (Interpretation of Mixed DNA Profiles with STRs Only). In: Siegel, J.A., Saukko, P.J., Houck, M.M. (eds.) Encyclopedia of Forensic Sciences, 2nd edn, pp. 243–251. Elsevier, Burlington, MA, USA (2003) 27. Mullis, K., Faloona, F., Scharf, S., Saiki, R., Horn, G., Erlich, H.: Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb. Symp. Quant. Biol. 51, 263–273 (1986) 28. Gill, P., Whitaker, J., Flaxman, C., Brown, N., Buckleton, J.: An investigation of the rigor of interpretation rules for STRs derived from less than 100 pg of DNA. Forensic Sci. Int. 112, 17–40 (2000) 29. Ferrero, A.: DNA profiling: a metrological and signal processing perspective. IEEE Instrum. Meas. Mag. 20(1), 4–7 (2017) 30. Kloosterman, A., Sjerps, M., Quak, A.: Error rates in forensic DNA analysis: definition, numbers, impact and communication. Forensic Sci. Int. Genet. 12, 77–85 (2014) 31. Plebani, M., Carraro, P.: Mistakes in a stat laboratory: types and frequency. Clin. Chem. 43(8), 1348–1351 (1997) 32. Anderson, J.F., Martin, P., Carracedo, A., Dobosz, M., Eriksen, B., Johnson, V., Kimpton, C., Kloosterman, A., Konialis, C., Kratzer, A., Phillips, P., Mevåg, B., Pfitzinger, H., Rand, S., Rosén, B., Schmitter, H., Schneider, P., Vide, M.: Report on the third EDNAP collaborative STR exercise. Forensic Sci. Int. 78, 83–93 (1996) 33. Barrio, P.A., Crespillo, M., Luque, J.A., Aler, M., Baeza-Richer, C., Baldassarri, L., Carnevali, E., Coufalova, P., Flores, I., García, O., García, M.A., González, R., Hernández, A., Inglés, V., Luque, G.M., Mosquera-Miguel, A., Pedrosa, S., Pontes, M.L., Porto, M.J., Posada, Y., Ramella, M.I., Ribeiro, T., Riego, E., Sala, A., Saragoni, V.G., Serrano, A., Vannelli S.: GHEP-ISFG collaborative exercise on mixture profiles (GHEP-MIX06). Reporting conclusions: results and evaluation. Forensic Sci. Int.: Genet. 35, 156–163 (2018) 34. EN ISO/IEC 17025: general requirements for the competence of testing and calibration laboratories (2017) 35. ILAC, Modules in a forensic science process, ILAC-G19:08/2014 (2014) 36. O’Rawe, J.A., Ferson, S., Lyon, G.J.: Accounting for uncertainty in DNA sequencing data. Trends Genet. 31(2), 61–66 (2015) 37. EU Council Framework Decision: accreditation of forensic service providers carrying out laboratory activities, Council Framework Decision 2009/905/JHA of 30 Nov 2009 (2009) 38. Mari, L.: Toward a harmonized treatment of nominal properties in metrology. Metrologia 54(5), 784–795 (2017) 39. BIPM JCGM 100:2008: Evaluation of measurement data—guide to the expression of uncertainty in measurement (GUM) 1st edn (2008). http://www.bipm.org/utils/common/documents/ jcgm/JCGM_100_2008_E.pdf 40. BIPM JCGM 101:2008: Evaluation of measurement data—supplement 1 to the “Guide to the expression of uncertainty in measurement”—propagation of distributions using a Monte Carlo method, 1st edn (2008). http://www.bipm.org/utils/common/documents/jcgm/JCGM_ 101_2008_E.pdf 41. Mari, L., Narduzzi, C., Nordin, G., Trapmann, S.: Foundations of uncertainty in evaluation of nominal properties. Measurement 152, 1–7 (2020) 42. Nordin, G., Dybkaer, R., Forsum, U., Fuentes-Arderiu, X., Pontet, F.: Vocabulary on nominal property, examination, and related concepts for clinical laboratory sciences (IFCC-IUPAC Recommendations 2017). Pure Appl. Chem. 90(5), 1–23 (2018) 43. Corte di Assise di Perugia: Sentenza del 4–5 dicembre 2009 7, 1–427 (2009) 44. EN ISO 9001: Quality management systems—requirements (2015)
202
10 Forensic DNA Profiling
45. Corte di Assise di Appello di Perugia: Sentenza del 3 ottobre 2011 n. 4/2011 C.A.A., pp. 1–143 (2011) 46. Budowle, B., Eisenberg, A.J., van Daal, A.: Validity of low copy number typing and applications to forensic science. Croat. Med. J. 50, 207–217 (2009) 47. Corte di Cassazione, Prima sezione penale: Sentenza del 25 marzo 2013 422/2013, 1–74 (2013) 48. Corte di Assise di Appello di Firenze: Sentenza del 30 gennaio 2014 11/13, 1–337 (2014) 49. Corte di Cassazione, Quinta sezione penale: Sentenza del 27 marzo 2015 36080/2015, 1–52 (2015) 50. Gill, P.: Analysis and implications of the miscarriages of justice of Amanda Knox and Raffaele Sollecito. Forensic Sci. Int. Genet. 23, 9–18 (2016)
Index
A Accreditation body, 112 Accreditation system, 111 Accredited laboratory, 112, 147, 182, 191 Adjustment, 104, 105 Adjustment of a measuring system, 104 Allele, 151–154, 156, 171, 174–182 Amplicon, 173 A priori knowledge, 159–161
B Bayes’ theorem, 158 Big Data, 18 BIPM, 108, 109 Blood Alcohol Concentration (BAC), 51, 53, 54, 127–149, 172 Body of precedents, 7 Breath Alcohol Concentration (BrAC), 51, 127–137, 140–142, 144, 145, 148, 149, 172 Breathalyzer, 133–138, 140–142, 144, 145, 147, 149
C Calibration, 51, 61, 82, 93, 96, 98–100, 104, 105, 107, 110, 111, 134 Calibration hierarchy, 109–111 Capillary electrophoresis, 174–176 Central limit theorem, 87–89, 92 Civil law, 5, 8–12, 14, 27, 30–32, 58, 137, 149 Cognitive bias, 49, 159 Cold hit, 167–170 Combined DNA Index System (CODIS), 156, 166, 167
Combined standard uncertainty, 84, 86–89, 91, 92, 98, 101, 118 Common law, 5–12, 14, 27, 31–33, 58, 149 Conditional probability, 157, 158, 160 Conférence Générale des Poids et Mesure (CGPM), 108 Correlation, 86, 87, 90 Correlation coefficient, 74, 86, 88, 89, 98 Covariance, 74, 84, 86 Coverage factor, 77, 78, 81, 91, 98 Coverage interval, 54, 75, 77, 81, 88–92, 95– 97, 148 Coverage probability, 54, 69, 75–77, 81, 87– 91, 95–98, 116, 118, 148 Cumulative probability distribution function, 119–121
D Daubert, 29, 33, 159 Definitional uncertainty, 59–61, 127, 129– 132, 135, 137, 139, 143, 146, 165 Descriptive processes, 57 Direct correlation, 74 Discernible uniqueness, 153 Distinguishing, 7 DNA profile, 152–154, 156, 157, 166–171, 173, 174, 180–184, 191, 199 DNA profiling, 35, 54, 67, 151, 153, 155, 157, 165, 167–171, 173, 181, 182, 184 DNA sample, 155, 168–170 DNA typing, 151, 172, 174, 182 Doubt, 3, 11, 25, 39, 55, 58, 59, 61, 62, 67, 68, 115, 118, 121, 123, 129, 136, 138, 140, 142, 148, 149, 154, 170, 172, 175, 189–193, 197–199
© Springer Nature Switzerland AG 2022 A. Ferrero and V. Scotti, Forensic Metrology, Research for Development, https://doi.org/10.1007/978-3-031-14619-0
203
204 Driving Under Intoxication (DUI), 51, 127, 135, 136, 141, 143, 145, 148, 149
E Electropherogram, 175–181, 190 Electrophoresis, 156, 174, 179 Environment, 51, 52, 60 Environmental properties, 51, 59 Environment identification, 51 Examinand, 183 Examination, 183, 184 Examination reliability, 184, 194, 199 Examination uncertainty, 183, 184 Exchange principle, 29 Expanded uncertainty, 77, 78, 81, 88, 93, 95– 98, 103, 116, 118, 146, 148 Expectation, 72, 79, 80, 84–86 Expected value, 79, 85 Experimental processes, 53, 57 Experiments, 17, 20, 21 Expert witness, 31, 32
F Fair trial, 9, 11, 14, 27 Forensic metrology, 25, 44, 53, 55, 112, 127 Foundational validity, 35–38, 60, 128, 130, 131, 153, 157, 160, 165, 167–170, 182 Frye, 32
G Gain, 172 Gaussian distribution, 70 Gel electrophoresis, 175, 176 GUM, 29, 38, 58, 68, 76–78, 81, 83, 84 91– 93, 96, 116, 132, 183 GUM supplement 1, 89–92
H Henry’s law, 128 Henry’s law constant, 128, 131 Heterozygote, 152, 178–180 Homozygote, 152, 178–180
I Identification, 50, 154 Influence quantity, 51, 52, 63, 64, 80, 82, 83, 100
Index Instrumental uncertainty, 61, 62, 127, 134, 135, 138, 139, 143, 146, 165 International Organization of Legal Metrology (OIML), 134, 137, 145 International System of Units (SI), 108, 109 Inverse correlation, 74
L Law of propagation of uncertainty, 86–89, 91, 92, 102, 111 Legal certainty, 9 Legal metrology, 93, 134, 137 Level of confidence, 69, 76, 77 Likelihood ratio, 161–163, 168, 170 Loci, 151, 153, 156, 157, 165, 166, 173–176, 181, 182, 190 Locus, 151–153, 156, 178 Low Copy Number (LCN), 192, 194, 196, 197, 199
M Mathematical expectation, 72 Maximum likelihood, 157 Maximum Permissible Error (MPE), 134, 135, 141, 142 Mean value, 72 Measurand, 24, 43, 46, 47, 49–53, 57, 59, 60, 64, 65, 67, 76, 81, 84, 109, 118, 140, 148, 158, 159, 161, 165 Measurand identification, 50 Measurement, 44, 45, 154 Measurement error, 58 Measurement process, 159 Measurement result, 46, 54, 57, 67, 68, 115, 136, 140, 144, 148, 165, 183 Measurement system identification, 51 Measurement systems, 51, 52, 59, 60 Measurement uncertainty, 53, 58, 68, 69, 107, 116, 137–139, 143–146, 148, 149, 154, 172, 182–184, 192, 194, 199 Measurement unit, 107 Measurend, 158 Measuring system, 61 Meter Convention, 108 Metrological confirmation, 103 Metrological traceability, 93, 110–112, 145, 147 Metrological verification, 103 Metrology, 24, 25, 38, 39, 43, 44, 49, 57, 115, 158, 199
Index Model, 18–25, 27, 33, 43, 45–49, 52, 53, 55, 59, 64, 100, 115, 165 Monte Carlo method, 89–92 Mutual Recognition Arrangement (MRA), 109, 111, 112
N NAS report, 34, 35, 153 National Metrology Institute (NMI), 109, 111 Nominal property, 154 Normal distribution, 70, 87, 91
O Odds, 157, 161–163, 166, 168 Ordeals, 6 Overruling, 7
P PCAST report, 35–38, 60–62, 65, 66, 123, 128, 131, 153, 154, 165, 166, 168– 170 Polymerase Chain Reaction (PCR), 156, 171–174, 178, 179, 190, 192, 196 Polymorphism, 151 Precision, 65, 66 Prediction, 21 Presumption of innocence, 9–11 Probability, 39, 69, 158 Probability density function, 70, 76, 78, 87– 92, 96, 98, 118, 158 Probability distribution, 39, 69, 158 Probability mass function, 70 Procedural law, 12
Q Quantity, 44–46 Quantity value, 44–46, 54, 115, 116, 154
R Random effects, 63, 76 Random errors, 63–65 Random match, 156 Random match probability, 156, 157, 160, 162, 163, 165–169, 196 Random variable, 69, 75, 76, 84, 158 Reference, 45, 46 Reference system, 61 Relative Fluorescent Unit (RFU), 175, 190
205 Repeatability, 18, 37, 38, 65, 66, 80 Repeatability conditions, 65 Reproducibility, 18, 37, 38, 65, 66, 95 Reproducibility conditions, 66 Resolution, 47, 190 Risk of wrong decision, 54, 55, 118, 144, 148
S Sample mean, 73, 79 Sample variance, 73, 79, 90 Science, 17, 18 Sensitivity, 47 Short Tandem Repeats (STR), 155, 156, 165, 171, 176 Slope detector, 133 Standard, 61, 96, 99, 107–109, 111, 145 Standard deviation, 70, 76, 79, 80, 86, 87, 131, 132 Standard uncertainty, 76–78, 80, 81, 83–86, 88–91, 93, 98–103, 120, 131, 132 Stare decisis, 6, 7 Statistics, 69 Stochastic variable, 69 Substantive law, 12 Symbol, 47 Systematic effect, 63, 64, 76 Systematic error, 63, 64, 66, 67
T Technical expert, 30, 31 Terminology, 44 Total ignorance, 71 Type A, 78–81, 83 Type B, 78, 81, 83, 88, 91
U Uncertainty, 19, 25, 38, 39, 43, 46, 48, 58– 60, 68, 69, 81, 82, 92, 93, 100, 110, 111, 116, 123, 132, 134, 137–141, 143–145, 149, 153, 166, 170, 180, 181, 183, 190 Uniform distribution, 70, 71, 122
V Validity as applied, 35, 36, 38, 61, 62, 128, 154, 160, 165, 169, 170, 182, 184 Variable Number of Tandem Repeats, 155 Variance, 70, 73, 74, 79, 84, 86 Verification, 105
206 Vocabulary of Metrology (VIM), 44–46, 48, 58, 93, 98, 101, 104, 107, 110, 115, 154, 155, 165, 183, 184
Index Z Zero adjustment, 104