238 59 23MB
English Pages XIV, 346 [353] Year 2020
Mark K. Hinders
Intelligent Feature Selection for Machine Learning Using the Dynamic Wavelet Fingerprint
Intelligent Feature Selection for Machine Learning Using the Dynamic Wavelet Fingerprint
Mark K. Hinders
Intelligent Feature Selection for Machine Learning Using the Dynamic Wavelet Fingerprint
123
Mark K. Hinders Department of Applied Science William & Mary Williamsburg, VA, USA
ISBN 978-3-030-49394-3 ISBN 978-3-030-49395-0 https://doi.org/10.1007/978-3-030-49395-0
(eBook)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Squirrels are the bane of human existence.1 Everyone accepts this. As with deer, many more squirrels exist today than when Columbus landed in the New World in 1492. Misguided, evil squirrel lovers provide these demons with everything they need to survive and thrive—food (bird feeders), water (birdbaths), and shelter (attics).
Wisdom from “7 Things You Must Know About Stupid Squirrels” by Steve Bender https://www. notestream.com/streams/56e0ec0c6e55e/ Also: “Tapped: A Treasonous Musical Comedy” A live taping at Theater Wit in Chicago, IL from Saturday, July 25, 2016. See https://www.youtube.com/ watch?v=JNBd2csSkqg starting at 1:59.
1
Preface
Wavelet Fingerprints Machine learning is the modern terminology for what we’ve always been trying to do, namely, make sense of very complex signals recorded by our instrumentation. Nondestructive Evaluation (NDE) is an interdisciplinary field of study which is concerned with the development of analysis techniques and measurement technologies for the quantitative characterization of materials, tissues, and structures by non-invasive means. Ultrasonic, radiographic, thermographic, electromagnetic, and optic methods are employed to probe interior microstructure and characterize subsurface features. Applications are in non-invasive medical diagnosis, intelligent robotics, security screening, and online manufacturing process control, as well as the traditional NDE areas of flaw detection, structural health monitoring, and materials characterization. The focus of our work is to implement new and better measurements with both novel instrumentation and machine learning that automates the interpretation of the various (and multiple) imaging data streams. Twenty years ago, we were facing applications where we needed to automatically interpret very complicated ultrasound waveforms in near real time. We were also facing applications where we wanted to incorporate far more information than could be presented to a human expert in the form of images. Time-scale representations, such as the spectrogram, looked quite promising, but there’s no reason to expect that a boxcar FFT would be optimal. We had been exploring non-Fourier methods for representing signals and imagery, e.g., Gaussian weighted Hermite polynomials, and became interested wavelets. The basic idea is that we start with a time-domain signal and then perform a time-scale or time–frequency transformation on it to get a two-dimensional representation, i.e., an image. Most researchers then immediately extracted a parameter from that image to collapse things back down to a 1D data stream. In tissue characterization with diagnostic ultrasound, one approach was to take a boxcar FFT of the A-scan line, which returned something about the spectrum as a function of anatomical depth. The slope, mid-point value, and/or intercept gave parameters that
vii
viii
Preface
seemed to be useful to differentiate healthy from malignant tissues. B-scans are made up of a fan of A-scan lines, so one could use those spectral parameters to make parameter images. When Jidong Hou tried that approach using wavelets instead of windowed FFTs, he rendered them as black and white contour plots instead of false-color images. Since they often looked like fingerprints, we named them wavelet fingerprints.
Since 2002, we have applied this wavelet fingerprint approach to a wide variety of different applications and have found that it's naturally suited to machine learning. There are many different wavelet basis functions that can be used. There are adjustable parameters that allow for differing amounts of de-noising and richness of the resulting fingerprints. There are all manner of features that can be identified in the fingerprints, and innumerable ways that identified features can be quantified. In this book, we describe several applications of this approach. We also discuss stupid squirrels and flying saucers and tweetstorms. You may find some investment advice. Williamsburg, Virginia April 2020
Mark K. Hinders
Acknowledgements
MKH would especially like to thank his research mentors, the late Profs. Asim Yildiz and Guido Sandri, as well as their research mentors, Julian Schwinger, and J. Robert Oppenheimer. Asim Yildiz (DEng, Yale) was already a Professor of Engineering at the University of New Hampshire when Prof. Schwinger at Harvard told him that he was “doing good physics” already so he should “get a union card.” Schwinger meant that Yildiz should get a Ph.D. in theoretical physics with him at Harvard, which Yildiz did while still keeping his faculty position at UNH and mentoring his own engineering graduate students, running his own research program, etc. He also taught Schwinger to play tennis, having been a member of the Turkish national team as well as all the best tennis clubs in the Boston area. When Prof. Yildiz died at age 58, his genial and irrepressibly jolly BU colleague took on his orphaned doctoral students, including yours truly, even though the students’ research areas were all quite distant from his own. Prof. Sandri had done postdoctoral research with Oppenheimer at the Princeton Institute for Advanced Study and then was a senior scientist at Aeronautical Research Associates of Princeton for many years before spending a year in Milan at Instituto Di
ix
x
Acknowledgements
Mathematica del Politecnico and then joining BU. In retirement, he helped found Wavelet Technologies, Inc. to exploit mathematical applications of novel wavelets in digital signal processing, image processing, data compression, and problems in convolution algebra. Pretty much all of the work described in this book benefited from the talented efforts of Jonathan Stevens, who can build anything a graduate student or a startup might ever need, whether that’s a mobile robot or an industrial scanner or an investigational device for medical ultrasound, or a system to stabilize and digitize decaying magnetic tape recordings. The technical work described in this volume is drawn from the thesis and dissertation research of my students in the Applied Science Department at The College of William and Mary in Virginia, which was chartered in 1693. I joined the faculty of the College during its Tercentenary and co-founded the new Department of Applied Science at America’s oldest University. Our campus abuts the restored eighteenth-century Colonial Williamsburg where W&M alum Thomas Jefferson and his compatriots founded a new nation. It’s a beautiful setting to be pushing forward the boundaries of machine learning in order to solve problems in the real world.
Contents
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
1 1 5 8 9 14 22 24 25 26 30 30 31 34 35 37 40
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction to Lamb Waves . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Simulation Methods for SHM . . . . . . . . . . . . . . . . . . . . . . . 2.4 Signal Processing for Lamb Wave SHM . . . . . . . . . . . . . . . 2.5 Wavelet Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Wavelet Fingerprinting . . . . . . . . . . . . . . . . . . . . . . . 2.6 Machine Learning with Wavelet Fingerprints . . . . . . . . . . . . 2.7 Applications in Structural Health Monitoring . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
45 45 48 49 51 52 54 58 64
1 Background and History . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Has Science Reached the End of Its Tether? . . . . . . . . 1.2 Did the Computers Get Invited to the Company Picnic? 1.2.1 But Who Invented Machine Learning, Anyway? 1.3 You Call that Non-invasive? . . . . . . . . . . . . . . . . . . . . 1.4 Why Is that Stupid Squirrel Named Steve? . . . . . . . . . . 1.5 That’s Promising, but What Else Could We Do? . . . . . 1.5.1 Short-Time Fourier Transform . . . . . . . . . . . . . 1.5.2 Other Methods of Time–Frequency Analysis . . . 1.5.3 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 The Dynamic Wavelet Fingerprint . . . . . . . . . . . . . . . . 1.6.1 Feature Creation . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Edge Computing . . . . . . . . . . . . . . . . . . . . . . . 1.7 Will the Real Will West Please Step Forward? . . . . . . . 1.7.1 Where Are We Headed? . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
xi
xii
Contents
2.7.1 Dent and Surface Crack Detection in Aircraft Skins . 2.7.2 Corrosion Detection in Marine Structures . . . . . . . . 2.7.3 Aluminum Sensitization in Marine Plate-Like Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.4 Guided Wave Results—Crack . . . . . . . . . . . . . . . . 2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.... .... . . . .
. . . .
. . . .
65 73
. 82 . 90 . 98 . 102
. . . . . . . 115 . . . . . . . 115 . . . . . . . 118
3 Automatic Detection of Flaws in Recorded Music . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Digital Music Editing Tools . . . . . . . . . . . . . . . . . . . . 3.3 Method and Analysis of a Non-localized Extra-Musical Event (Coughing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Errors Associated with Digital Audio Processing . . . . . 3.5 Automatic Detection of Flaws in Cylinder Recordings Using Wavelet Fingerprints . . . . . . . . . . . . . . . . . . . . . 3.6 Automatic Detection of Flaws in Digital Recordings . . . 3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
129 132 137 140
4 Pocket Depth Determination with an Ultrasonographic Periodontal Probe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Classification: A Binary Classification Algorithm . . 4.6.1 Binary Classification Algorithm Examples . . 4.6.2 Dimensionality Reduction . . . . . . . . . . . . . . 4.6.3 Classifier Combination . . . . . . . . . . . . . . . . 4.7 Results and Discussion . . . . . . . . . . . . . . . . . . . . . 4.8 Bland–Altman Statistical Analysis . . . . . . . . . . . . . 4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
143 143 145 147 150 152 155 157 158 159 159 163 166 166 167
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . 123 . . . . . . . 126
5 Spectral Intermezzo: Spirit Security Systems . . . . . . . . . . . . . . . . . . 173 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 6 Classification of Lamb Wave Tomographic Rays in to Distinguish Through Holes from Gouges . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pipes . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
181 181 183 185
Contents
xiii
6.3.1 Apparatus . . . . . . . . . . . . . . . . . . . . 6.3.2 Ray Path Selection . . . . . . . . . . . . . . 6.4 Classification . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Feature Extraction . . . . . . . . . . . . . . 6.4.2 DWFP . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Feature Selection . . . . . . . . . . . . . . . 6.4.4 Summary of Classification Variables . 6.4.5 Sampling . . . . . . . . . . . . . . . . . . . . . 6.5 Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Results and Discussion . . . . . . . . . . . . . . . . 6.6.1 Accuracy . . . . . . . . . . . . . . . . . . . . . 6.6.2 Flaw Detection Algorithm . . . . . . . . 6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
185 187 188 188 189 190 191 192 192 196 196 197 203 204
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
207 207 209 209 212 215 216 222 223 223 225 227 231 231 233 234 237 238 243 244
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot . . . . . . . . . . . . . . . . . 8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 rMary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Sensor Modalities for Mobile Robots . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
247 247 249 249
7 Classification of RFID Tags with Wavelet Fingerprinting . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . 7.2 Classification Overview . . . . . . . . . . . . 7.3 Materials and Methods . . . . . . . . . . . . 7.4 EPC Extraction . . . . . . . . . . . . . . . . . . 7.5 Feature Generation . . . . . . . . . . . . . . . 7.5.1 Dynamic Wavelet Fingerprint . . 7.5.2 Wavelet Packet Decomposition . 7.5.3 Statistical Features . . . . . . . . . . 7.5.4 Mellin Features . . . . . . . . . . . . 7.6 Classifier Design . . . . . . . . . . . . . . . . . 7.7 Classifier Evaluation . . . . . . . . . . . . . . 7.8 Results . . . . . . . . . . . . . . . . . . . . . . . . 7.8.1 Frequency Comparison . . . . . . . 7.8.2 Orientation Comparison . . . . . . 7.8.3 Different Day Comparison . . . . 7.8.4 Damage Comparison . . . . . . . . 7.9 Discussion . . . . . . . . . . . . . . . . . . . . . 7.10 Conclusion . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
xiv
Contents
8.3
Investigation of Sensor Modalities Using rMary . . . . . 8.3.1 Thermal Infrared (IR) . . . . . . . . . . . . . . . . . . . 8.3.2 Kinect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Pattern Classification . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Compiling Data . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Aligning Reflected Signals . . . . . . . . . . . . . . . 8.4.3 Feature Creation with DWFP . . . . . . . . . . . . . 8.4.4 Intelligent Feature Selection . . . . . . . . . . . . . . 8.4.5 Statistical Pattern Classification . . . . . . . . . . . . 8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Proof-of-Concept: Acoustic Classification of Stationary Vehicles . . . . . . . . . . . . . . . . . . 8.5.2 Acoustic Classification of Oncoming Vehicles . 8.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
251 251 252 256 261 265 265 266 271 273 275 278
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
278 280 290 293
9 Cranks and Charlatans and Deepfakes . . . . . . . . . . . . . . . . 9.1 Digital Cranks and Charlatans . . . . . . . . . . . . . . . . . . . 9.2 Once You Eliminate the Possible, Whatever Remains, No Matter How Probable, Is Fake News . . . . . . . . . . . 9.3 Foo Fighters Was Founded by Nirvana Drummer Dave Grohl After the Death of Grunge . . . . . . . . . . . . . . . . . 9.4 Digital Imaging Is Why Our Money Gets Redesigned so Often . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Social Media Had Sped up the Flow of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Discovering Latent Topics in a Corpus of Tweets . . . . . 9.6.1 Document Embedding . . . . . . . . . . . . . . . . . . . 9.6.2 Topic Models . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.3 Uncovering Topics in Tweets . . . . . . . . . . . . . . 9.6.4 Analyzing a Tweetstorm . . . . . . . . . . . . . . . . . . 9.7 DWFP for Account Analysis . . . . . . . . . . . . . . . . . . . . 9.8 In-Game Sports Betting . . . . . . . . . . . . . . . . . . . . . . . . 9.9 Virtual Financial Advisor Is Now Doable . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 297 . . . . . . . 297 . . . . . . . 302 . . . . . . . 305 . . . . . . . 310 . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
313 314 315 316 320 321 331 335 337 340
Chapter 1
Background and History Mark K. Hinders
Abstract Machine learning is the modern lingo for what we’ve been trying to do for decades, namely, to make sense of the complex signals in radar and sonar and lidar and ultrasound and so forth. Deep learning is fashionable right now and those sorts of black-box approaches are effective if there is a sufficient volume and quality of training data. However, when we have appropriate physical and mathematical models of the underlying interaction of the radar, sonar, lidar, ultrasound, etc. with the materials, tissues, and/or structures of interest, it seems odd to not harness that hard-won knowledge. We explain the key issue of feature vector selection in terms of autonomously distinguishing rats from squirrels. Time–frequency analysis is introduced as a way to identify dynamic features of varmint behavior, and the dynamic wavelet fingerprint is explained as a tool to identify features from signals that may be useful for machine learning. Keywords Machine learning · Pattern classification · Wavelet · Fingerprint · Spectrogram
1.1 Has Science Reached the End of Its Tether? Radar blips are reflections of radio waves. Lidar does that same thing with lasers. Sonar uses pings of sound to locate targets. Autonomous vehicles use all of these, plus cameras, in order to identify driving lanes, obstructions, other vehicles, bicycles and pedestrians, deer and dachshunds, etc. Whales and dolphins echolocate with sonar. Bats echolocate with chirps of sound that are too high in frequency for us to hear, which is called ultrasound. Both medical imaging and industrial nondestructive testing have used kHz and MHz frequency ultrasound for decades. Infrasounds are frequencies that are too low for us to hear, which hasn’t found a practical application yet. It all started with an unthinkable tragedy. M. K. Hinders (B) Williamsburg Machine Learning Algorithmics, Williamsburg, VA 23187-0767, USA e-mail: [email protected] URL: http://www.wmla.ai © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. K. Hinders, Intelligent Feature Selection for Machine Learning Using the Dynamic Wavelet Fingerprint, https://doi.org/10.1007/978-3-030-49395-0_1
1
2
M. K. Hinders
Not long after its maiden voyage in 1912 when the unsinkable Titanic struck an iceberg and sank [1] Sir Hiram Maxim self-published a short book and submitted a letter to Scientific American [2] entitled, “A New System for Preventing Collisions at Sea” in which he said: The wreck of the Titanic was a severe and painful shock to us all; many of us lost friends and acquaintances by this dreadful catastrophe. I asked myself: “Has Science reached the end of its tether? Is there no possible means of avoiding such a deplorable loss of life and property? Thousands of ships have been lost by running ashore in a fog, hundreds by collisions with other ships or with icebergs, nearly all resulting in great loss of life and property.” At the end of four hours it occurred to me that ships could be provided with what might be appropriately called a sixth sense, that would detect large objects in their immediate vicinity without the aid of a searchlight. Much has been said, first and last, by the unscientific of the advantages of a searchlight. Collisions as a rule take place in a fog, and a searchlight is worse than useless even in a light haze, because it illuminates the haze, making all objects beyond absolutely invisible. Some have even suggested that steam whistle or siren might be employed that would periodically give off an extremely powerful sound, that is, a veritable blast, an ear-piercing shriek, and then one is supposed to listen for an echo, it being assumed that if any object should be near, a small portion of the sound would be reflected back to the ship, but this plan when tried proved an absolute failure. The very powerful blast given off by the instrument is extremely painful to the ears and renders them incapable of hearing the very feeble echo which is supposed to occur only a few seconds later. Moreover, sirens or steam whistles of great power are extremely objectionable on board passenger ships; they annoy the passengers and render sleep impossible. It is, therefore, only too evident that nothing in the way of a light or noise producing apparatus could be of any use whatsoever.
Maxim knew that bats used some form of inaudible sound—outside the range of human hearing—to echolocate and feed, but he thought it was infrasound rather than ultrasound. He then proceeded to describe an extremely low-frequency directional steam whistle or siren that could be used to (echo)locate icebergs during foggy nights when collisions were most likely to occur. Whether his patented apparatus would have been effective at preventing collisions at sea is a question that’s a little like whether Da Vinci’s contraptions would have flown. He got the general idea right, and can be credited with stimulating the imaginations of those who subsequently worked out all the engineering details. His figure reproduced below is quite remarkable. The key idea is that the time delay of the echoes determines distance because the speed of sound is known, but more importantly the shape of the echoes gives information about the object which is returning those echoes. Analysis of those echo waveforms can, in principle, tell the difference between a ship and an iceberg, and also differentiate large and small icebergs. He even illustrates how clutter affects the echoes differently from backscattering targets. Science has not reached the end of its tether, even after a century of further development. This is exactly how radar and sonar and ultrasound work (Fig.1.1). Maxim’s suggested apparatus embodies a modified form of “siren,” through which high-pressure steam can be made to flow in order to produce sound waves having about 14 to 15 vibrations per second, and consequently not coming within the frequency range of human hearing. These waves, it is correctly asserted, would be
1 Background and History
3
Fig. 1.1 The infrasonic echo waves would be recorded by a stretched membrane that the infrasound waves would vibrate, and those membrane vibrations could jingle attached bells or wiggle pens tracing lines on paper as Maxim illustrated [3]. Maxim’s concept was discussed in Nature, a leading scientific journal yet today [4]
capable of traveling great distances, and if they struck against a body ahead of the ship they would be reflected back toward their source, “echo waves” being formed [3]. This self-published pamphlet was discussed in [4]. Well more than a century before Maxim, Lazzaro Spallanzani performed extensive experiments on bats, and concluded that bats possess some sort of sixth sense, in that they use their ear for detecting objects, if not actually seeing them [5]. His blinded bats got around just fine in either light or dark, and bats he deafened flew badly and hurtled themselves into objects in both dark and light. The problem was that no technology existed until the early twentieth century that could measure the ultrasonic screeches bats were using to echolocate. Prof. G.W. Pierce in the Physics Department at Harvard happened to build such a device, and in 1938 an undergraduate zoology student, who was banding bats to study their migration patterns, asked if they could use that (ultra)sonic detector apparatus to listen to his bats. Young Donald Griffin brought a cage full of bats to Prof. Pierce’s acoustics laboratory wherein humans first heard their down-converted “melody of raucous noises from the loudspeaker” of the device [6, 7]. Donald Griffin and a fellow student found that bats could use reflected sounds to detect objects (Fig. 1.2). What Griffin discovered about bats just before WWII, other scientists discovered about dolphins and related mammals just after WWII. Curiously, unequivocal demonstration of echolocation in dolphins wasn’t accomplished until 1960. Nevertheless, the first underwater sonar apparatus was constructed in order to detect submarines in 1915, using the concepts sketched out by Maxim. By WWII sonar was an essential part of antisubmarine warfare, spurred on by Nazi U-boats sinking merchant ships transporting men and materiel to European allies [10].
4
M. K. Hinders
Fig. 1.2 Donald R. Griffin: 1915–2003 [8, 9]. Dr. Griffin coined the term echolocation to describe the phenomenon he continued to study for 65 years
Practical radar technology was ready just in time to turn the tide during the Battle of Britain, thanks to the cavity magnetron [11] and amateur scientist and Wall Street tycoon Alfred Lee Loomis, who personally funded significant scientific research at his private estate before leading radar research efforts during World War II [12, 13]. The atomic bomb may have ended the war, but radar and sonar won it.1 During the entirety of the Cold War, uncounted billions have been spent continuing to refine radar and sonar technology, countermeasures, counter-countermeasures, etc. with much of that mathematically esoteric scientific work quite highly classified. The result is virtually undetectable submarines that patrol the world’s oceans standing ready to assure mutual destruction as a deterrent to sneak attack. More visibly, but highly stealthy, radar evading fighters and bombers carrying satellite-guided preci1 Time
magazine was all set to do a cover story about the importance of radar technology in the impending Allied victory, but the story got bumped off the August 20, 1945 cover by the A-bombs dropped on Japan and the end of WWII: “The U.S. has spent half again as much (nearly $3 billion) on radar as on atomic bombs. As a military threat, either in combination with atomic explosives or as a countermeasure, radar is probably as important as atomic power itself. And while the peacetime potentialities of atomic power are still only a hope, radar already is a vast going concern—a $2 billion-a-year industry, six times as big as the whole prewar radio business.”
1 Background and History
5
sion weapons can destroy any fixed target anywhere on the planet with impunity while minimizing collateral damage. It’s both comforting and horrifying at the same time. What humans have been trying to figure out since Maxim’s 1912 pamphlet, is how best to interpret the radar blips and sonar pings and lidar reflections and ultrasound images [14]. The fundamental issue is that the shape, size, orientation, and composition of the object determines the character of the scattered signal, so an enormous amount of mental effort has gone into mathematical modeling and computer simulation to try to understand enough about that exceedingly complex physics in order to detect navigation hazards, enemy aircraft and submarines, tumors, structural flaws, etc. Much of that work has been in what is called by mathematical physics forward scattering, where I know what I transmit, and I know the size & shape & materials & location & orientation of the scattering object. I then want to predict the scattered field so I’ll know what to look for in my data.
The real problem is mathematically much more difficult, called inverse scattering, wherein I know what I transmit, and I measure some of the scattered field. I then want to estimate the size & shape & materials & location & orientation of the scatterer.
Many scientific generations have been spent trying to solve inverse scattering problems in radar and sonar and ultrasound. In some special cases or with perhaps a few too many simplifying assumptions, there has been a fair amount of success [15].
1.2 Did the Computers Get Invited to the Company Picnic? Once upon a time “computer” was a job description [16, 17]. They were excellent jobs for mathematically talented women at a time when it was common for young women to be prevented from taking math classes in school. My mother describes being jealous of her high school friend who was doing math homework at lunch, because she wasn’t allowed to take math. As near as I can tell it wasn’t her widower father who said this, but it was instead her shop-keeper/school-teacher aunt who was raising her. This was the 1950s, when gender roles had regressed to traditional forms after the war. Recently, a mathematically talented undergraduate research student in my lab was amused that her MATLAB code reproduced instantly plots from a 1960 paper [18] that had required “the use of two computers for two months” until I suggested that she check the acknowledgements where the author thanked those two computers by name. That undergraduate went on to earn a degree with honors— double majoring in physics and anthropology, with a minor in math and research in medical ultrasound—and then a PhD in medical imaging. My mother became a teacher because her freshman advisor scoffed when she said she wanted to be a freelance writer and a church organist: you’re a woman, you’ll be either a teacher
6
M. K. Hinders
Fig. 1.3 Computers hard at work in the NACA Dryden High Speed Flight Station “Computer Room” in 1949 [19]. Seen here, left side, front to back, Mary (Tut) Hedgepeth, John Mayer, and Emily Stephens. Right side, front to back, Lilly Ann Bajus, Roxanah Yancey, Gertrude (Trudy) Valentine (behind Roxanah), and Ilene Alexander
or a nurse. She started teaching full time with an associate degree and earned BA, MA, and EdD while working full time and raising children. I remember her struggles with inferential statistics and punch cards for analyzing her dissertation data on an electronic computer (Fig. 1.3). Hedy Lamarr was the face of Snow White and Catwoman. In prewar Vienna she was the bored arm-candy wife of an armaments manufacturer. In Hollywood, she worked 6 days a week under contract to MGM, but kept equipment in her trailer for inventing between takes. German U-boats were devastating Atlantic shipping, and Hedy thought a radio-controlled torpedo might help tip the balance. She came up with a brilliant, profoundly original jamming-proof way to guide a torpedo to the target: frequency hopping. The Philco Magic Box remote control [20] probably inspired the whole thing. The American film composer George Antheil said, “All she wants to do is stay home and invent things.” He knew how to synchronize player pianos, and thought that pairs of player-piano rolls could synchronize the communications of a ship and its torpedo, on 88 different frequencies of course. They donated their patented invention to the Navy, who said, “What do you want to do, put a player piano in a torpedo? Get out of here.” The patent was labeled top secret and the idea hidden away until after the war. They also seized her patent because she was an alien. The Navy just wanted Hedy Lamarr to entertain the troops and sell war bonds. She sold $25 M worth of them. That could have paid for quite a few frequency-hopping torpedoes (Fig. 1.4).
1 Background and History
7
Fig. 1.4 Hedy Lamarr, bombshell and inventor [21]. You may be surprised to know that the most beautiful woman in the world made your smartphone possible
Computers have been getting smaller and more portable since their very beginning. First, they were well-paid humans in big rooms, then they were expensive machines in big rooms, then they got small enough and cheap enough to sit on our desktops, then they sat on our laps and we got used to toting them pretty much everywhere, and then they were in our pockets and a teenager’s semi-disposable supercomputer comes free with an expensive data plan. Now high-end computer processing is everywhere. We might not even realize we’re wearing a highly capable computer until it prompts us to get up and take some more steps. We talk to computers sitting on an end table and they understand just like in Star Trek. In real life, the artificially intelligent machines learn about us and make helpful suggestions on what to watch or buy next and where to turn off just up ahead. With the advent of machine learning we can take a new approach to making sense of radar and sonar and ultrasound. We simply have to agree to stipulate that the physics of that scattering is too complex to formally solve the sorts of inverse problems we are faced with, so we accept the more modest challenge of using forward scattering models to help define features to use in pattern classification. Intelligent feature selection is the current incarnation of the challenge that was laid down by Maxim. In this paradigm, we use advanced signal processing methods to proliferate candidate “fingerprints” of the scattering object in the time traces we record. Machine learning then formally sorts out which combinations of those signal features are most useful for object classification.
8
M. K. Hinders
1.2.1 But Who Invented Machine Learning, Anyway? Google [22, 23] tells me that machine learning is a subset of artificial intelligence where computers autonomously learn from data, and they don’t have to be programmed by humans but can change and improve their algorithms by themselves. Today, machine learning algorithms enable computers to mine unstructured text, autonomously drive cars, monitor customer service calls, and find me that thing I didn’t even know I needed to buy or watch. The enormous amount of data is the fuel these algorithm need, surpassing the limitations of calculation that existed prior to ubiquitous computing (Fig. 1.5). Probably machine learning has its roots at the dawn of electronic computation when humans were encoding their own algorithmic processes into machines. Alan Turing [25] described the “Turing Test” in 1950 to determine if a computer has real intelligence via fooling a human into believing it is also human. Also in the 1950s, Arthur Samuel [26] wrote the first machine learning program to play the game of
Fig. 1.5 A mechanical man has real intelligence if he can fool a redditor into believing he is also human [24]
1 Background and History
9
checkers, and as it played more it noted which moves made up winning strategies and incorporated those moves into its program. Frank Rosenblatt [27] designed the first artificial neural network, the perceptron, which he intended to simulate the thought processes of the human brain. Marvin Minsky [28] convinced humans to start using the term Artificial Intelligence. In the 1980s, Gerald Dejong [29] introduced explanation-based learning where a computer analyzes training data and creates a general rule it can follow by discarding unimportant data, and Terry Sejnowski [30] invented NetTalk, which learns to pronounce words the same way a baby does. 1980s style expert systems are based on rules. These were rapidly adopted by the corporate sector, generating new interest in machine learning. Work on machine learning then shifted from a knowledge-driven approach to a data-driven approach in the 1990s. Scientists began creating programs for computers to analyze large amounts of data and learn from the results. Computers started becoming faster at processing data with computational speeds increasing 1,000-fold over a decade. Neural networks began to be fast enough to take advantage of their ability to continue to improve as more training data is added. In recent years, computers have beaten learned human at chess (1997) and Jeopardy (2011) and Go (2016). Many businesses are moving their companies toward incorporating machine learning into their processes, products, and services in order to gain an edge over their competition. In 2015, the non-profit organization OpenAI was launched with a billion dollars and the objective of ensuring that artificial intelligence has a positive impact on humanity. Deep learning [31] is a branch of machine learning that employs algorithms to process data and imitate the thinking process of carbon-based life forms. It uses layers of algorithms to process data, understand speech, visually recognize objects, etc. Information is passed through each layer, with the output of the previous layer providing input for the next layer. Feature extraction is a key aspect of deep learning which uses an algorithm to automatically construct meaningful “features” of the data for purposes of training, learning, and understanding. The data scientist had heretofore been solely responsible for feature extraction, especially in cases where an understanding of the underlying physical process aids in identifying relevant features. In applications like radar and sonar and lidar and diagnostic ultrasonography, the hard-won insight of humans and their equation-based models are now working in partnership with machine learning algorithms. It’s not just the understanding of the physics the humans bring to the party, it’s an understanding of the context.
1.3 You Call that Non-invasive? It almost goes without saying that continued exponential growth in the price performance of information technology has enabled the gathering of data on an unprecedented scale. This is especially true in the healthcare industry. More than ever before, data concerning people’s lifestyles, data on medical care, diseases and treatments,
10
M. K. Hinders
and data about the health systems themselves are available. However, there is concern that this data is not being used as effectively as possible to improve the quality and efficiency of care. Machine learning has the potential to transform healthcare by deriving new and important insights from the vast amount of data generated during the delivery of healthcare [32]. There is a growing awareness that only a fraction of the enormous amount of data available to make healthcare decisions is being used effectively to improve the quality and efficiency of care and to help people take control of their own health [33]. Deep learning offers considerable promise for medical diagnostics [34] and digital pathology [35–45]. The emergence of convolutional neural networks in computer vision produced a shift from hand-designed feature extractors to automatically generated feature extractors trained with backpropagation [46]. Typically, substantial expertise is needed to implement machine learning, but automated deep learning software is becoming available for use by naive users [47] sometimes resulting in coin-flip results. It’s not necessarily that there’s anything wrong with deep learning approaches, it’s typically that there isn’t anywhere near enough training data. If your black-box training dataset is insufficient, your classifier isn’t learning, it is merely memorizing your training data. Naive users will often underestimate the amount of training data that’s going to be needed by a few (or several) orders of magnitude and then not test their classifier(s) on different enough datasets. We are now just beginning to see AI systems that outperform radiologists on clinically relevant tasks such as breast cancer identification in mammograms [48], but there is a lot of angst [49, 50] (Fig. 1.6). Many of the things that can now be done with machine learning have been talked about for 20 years or more. An example is a patented system for diagnosis and treatment of prostate cancer [52] that we worked on in the late 1990s. A talented NASA gadgeteer and his business partner conceived of a better way to do ultrasoundguided biopsy of the prostate (US Patent 6,824,516 B2) and then arranged for a few million dollars of Congressional earmarks to make it a reality. In modern language, it was all about machine learning but we didn’t know to call it that back then. Of course, training data has always been needed to do machine learning; in this case, it was to be acquired in partnership with Walter Reed Army Medical Center in Washington, DC. The early stages of a prostate tumor often go undetected. Generally, the impetus for a visit to the urologist is when the tumor has grown large enough to mimic some of the symptoms of benign prostatic hyperplasia (BPH) which is a bit like being constipated, but for #1 not #2. Two commonly used methods for detecting prostate cancer have been available to clinicians. Digital rectal examination (DRE) has been used for years as a screening test, but its ability to detect prostate cancer is limited. Small tumors often form in portions of the prostate that cannot be reached by a DRE. By the time the tumor is large enough to be felt by the doctor through the rectal wall, it is typically a half inch or larger in diameter. Considering that the entire prostate is normally on the order of one and a half inches in diameter, this cannot be considered early detection. Clinicians may also have difficulty distinguishing between benign abnormalities and prostate cancer, and the interpretation and results of the examination may vary with the experience of the examiner. The prostate-
1 Background and History
11
Fig. 1.6 A server room in a modern hospital [51]
specific antigen (PSA) is an enzyme measured in the blood that may rise naturally as men age. It also rises in the presence of prostate abnormalities. However, the PSA test cannot distinguish prostate cancer from benign growth of the prostate and other conditions of the prostate, such as prostatitis. If the patient has a blood test that shows an elevated PSA level, then that can be an indicator, but the relationship between PSA level and tumor size is not definitive, nor does it give any indication of tumor location. PSA testing also fails to detect some prostate cancers—about 20% of patients with biopsy-proven prostate cancer have PSA levels within normal range. Transrectal ultrasound (TRUS) is sometimes used as a corroborating technique; however, the images produced can be ambiguous. In the 1990s, there were transrectal ultrasound scanners in use, which attempted to image the prostate through the rectal wall. These systems did not produce very good results [53]. Both PSA and TRUS enhance detection when added to DRE screening, but they are known to have relatively high false positive rates and they may identify a greater number of medically insignificant tumors. Thus, PSA screening might lead to treatment of unproven benefit which then could result in impotence and incontinence. It’s part of what we currently call the overdiagnosis crisis. When cancer is suspected because of an elevated PSA level or an anomalous digital rectal exam, a fan pattern
12
M. K. Hinders
Fig. 1.7 Prostate cancer detection and mapping system that “provides the accuracy, reliability, and precision so additional testing is not necessary. No other ultrasound system can provide the necessary precision.” The controlling computer displays the data in the form of computer images (presented as level of suspicion (LOS) mapping) to the operating physician as well as archiving all records. Follow-on visits to the urologist for periodic screenings will use this data, together with new examination data, to determine changes in the patient’s prostate condition. This is accomplished by a computer software routine. In this way, the physician has at his fingertips all of the information to make an accurate diagnosis and to choose the optimum protocol for treatment
of biopsies is frequently taken in an attempt to locate a tumor and determine if a cancerous condition exists. Because of the crudeness of these techniques, the doctor has only a limited control over the placement of the biopsy needle, particularly in the accuracy of the depth of penetration. Unless the tumor is quite large, the chance of hitting it with the biopsy needle is not good. The number of false negatives is extremely high. In the late 1990s, it was estimated that 80% of all men would be affected by prostate problems with advancing age [54, 55]. Prostate cancer was then one of the top three killers of men from cancer, and diagnosis only occurred in the later stages when successful treatment probabilities are significantly reduced and negative side effects from treatment are high. The need was critical for accurate and consistently reliable, early detection and analysis of prostate cancer. Ultrasound has the advantage that it can be used to screen for a host of problems before sending patients to more expensive and invasive tests and procedures that ultimately provide the necessary diagnostic
1 Background and History
13
Fig. 1.8 The ultrasound scan of the prostate from within the urethra is done in lockstep with the complementary, overlapping ultrasound scan from the TRSB probe which has been placed by the doctor within the rectum of the patient. The two ultrasound scanning systems face each other and scan the same volume of prostate tissue from both sides. This arrangement offers a number of distinct advantages over current systems
precision. The new system (Fig. 1.7) was designed consisting of three complimentary subsystems: (1) transurethral scanning stem (TUScan), (2) transrectal scanning, and (3) slaved biopsy system, referred to together as TRSB. One of the limitations of diagnostic ultrasound is that while higher frequencies give better resolution, they also have less depth of penetration into body tissue. By placing one scanning system within the area of interest and adding a second system at the back of the area of interest (Fig. 1.8) the necessary depth of penetration is halved for each system. This permits the effective use of higher frequencies for better resolution. The design goal was to develop an “expert” system that uses custom software to integrate and analyze data from several advanced sensor technologies, something that is beyond the capacity of human performance. In so doing, such a system could enable detection, analysis, mapping, and confirmation of tumors of a size that is as much as one-fourth the size of those detected by traditional methods.
14
M. K. Hinders
Fig. 1.9 The advanced prostate imaging and mapping system serves as an integrated platform for diagnostics and treatments of the prostate that integrates a number of ultrasound technologies and techniques with “expert system” software to provide interpretations, probability assessment, and level of suspicion mapping for guidance to the urologist
That’s all very nice, of course, but the funding stream ended before clinical data was acquired to develop the software expert system. Machine learning isn’t magic; training data is always needed. Note in Fig. 1.9 that in the 1990s we envisioned incorporating medical history into the software expert system. That became possible after the 2009 Stimulus Package included $40 billion in funding for healthcare IT [56] to “make sure that every doctor’s office and hospital in this country is using cutting-edge technology and electronic medical records so that we can cut red tape, prevent medical mistakes, and help save billions of dollars each year.” Real-time analysis, via machine learning, of conversations between humans is now a routine part of customer service (Fig. 1.10).
1.4 Why Is that Stupid Squirrel Named Steve? Context matters. Machine learning is useful precisely because it helps us to draw upon a wealth of data to make some determination. Often that determination is easy for humans to make, but we want to off-load the task onto an electronic computer. Here’s a simple example: squirrel or rat? Easy, but now try to describe what it is about squirrels and rats that allow you to tell one from another. The context is that rats are nasty and rate a call to the exterminator while squirrels are harmless and even a little cute unless they’re having squirrel babies in my attic in which case they rate a call to a roofer to install aluminum fascias. The standard machine learning approach to distinguishing rats from squirrels is to train the system with a huge library of images that have been previously identified as rat or squirrel. But what if you wanted to articulate specific qualities that distinguish rats from squirrels? We’re going to need to select some features.
1 Background and History
15
Fig. 1.10 The heart of the system is software artificial intelligence that develops a level of suspicion (LOS) map for cancer throughout the prostate volume. Rather than depending on a human to search for suspect features in the images, the multiple ultrasonic, 4D Doppler, and elastography datasets are processed in the computer to make that preliminary determination. A false color LOS map is then presented to the physician for selection of biopsy at those areas, which are likely cancer. The use of these two complementary ultrasound scanners within the prostate and the adjacent rectum provides all of the necessary information to the system computer to enable controllable, precise placement of the integral slaved biopsy needle at any selected point within the volume of the prostate that the doctor wishes. Current techniques have poor control over the depth of penetration, and only a limited control over the angle of entry of the biopsy needle. With current biopsy techniques, multiple fans of 6 or more biopsies are typically taken and false negatives are common
Fig. 1.11 It’s easy to tell the difference between a rat and a squirrel
16
M. K. Hinders
Fig. 1.12 Only squirrels climb trees. I know this because I googled “rats in trees” and got no hits
Fig. 1.13 Squirrels always have bushy tails. Yes I’m sure those are squirrels
1 Background and History
17
Fig. 1.14 Rats eat whatever they find in the street. Yes, that’s the famous pizza rat [57]
Here are some preliminary assessments based on the images in Figs. 1.11, 1.12, 1.13, 1.14, and 1.15: • If squirrels happen to have tails, then bushiness might work pretty well, because rat tails are never bushy. • Ears, eyes, and heads look pretty similar, at least to my eye. • Body fur may or may not be very helpful. • Feet and whiskers look pretty similar. • Squirrels are more likely to be up in trees, while rats are more likely to be down in a gutter. • Both eat junk food, but squirrels especially like nuts. • Rats are nasty and they scurry whereas squirrels are cute and they scamper. We can begin to quantify these qualities by drawing some feature spaces. For example, any given rat or squirrel can be characterized by it’s up in a tree or down in the gutter or somewhere in between to give an altitude number. The bushiness of the tail can be quantified as well (Fig. 1.16). Except for unfortunate tailless squirrels, most of them will score quite highly in bushiness while rats will have a very low bushiness score unless they just happen to be standing in front of something that looks like a bushy tail. Since we’ve agreed that both rats and squirrels eat pizza, but squirrels have
18
M. K. Hinders
Fig. 1.15 Rats are mean ole fatties. Squirrels are happy and healthy Fig. 1.16 Feature space showing the distribution of rats and squirrels quantified by their altitude and bushiness of their tails
a strong preference for nuts, maybe plotting nuttiness versus bushiness (Fig. 1.17) would work. But recall that squirrels scamper and rats scurry. If we could quantify those, my guess is that would give well-separated classes, like Fig. 1.18. The next step in pattern classification is to draw a decision boundary that divides the phase space into the regions with the squirrels and the rats, respectively. Sometimes this is easy (Fig. 1.19) but it will never be perfect. Once this step is done, any new image to be classified gets a (scamper, scurry) score which defines a point in the phase space relative to the decision boundary. Sometimes drawing the
1 Background and History
19
Fig. 1.17 Feature space showing the distribution of rats and squirrels quantified instead by their nuttiness and bushiness of their tails
Fig. 1.18 An ideal feature space gives tight, well-separated clusters, but note that there always seems to be a squirrel or two mixed in with the rats and vice versa
Fig. 1.19 Easy decision boundary
decision boundary is tricky (Fig. 1.20) and so the idea is to draw it as best you can to separate the classes. It doesn’t have to be a straight line, of course, but it’s important not to overfit things to the data that you happen to be using to train your classifier (Fig. 1.21).
20
M. K. Hinders
Fig. 1.20 Trickier decision boundary
Fig. 1.21 Complicated decision boundary
These phase spaces are all 2D, but there’s no apparent reason to limit things to two or three or more dimensions. In 3D, the decision boundary line becomes a plane or some more complicated surface. If we consider more features then the decision boundary will be a hypersurface that we can’t draw, but that we can define mathematically. Maybe then we define a feature vector consisting of (altitude, bushiness, nuttiness, scamper, scurry). We could even go back and consider a bunch of other characteristics of rats and squirrels and add them. Don’t go there. Too many features will make things worse. Depending upon the number of classes you want to differentiate, there will be an optimal size feature vector and if you add more features the classifier performance will degrade. This is another place were human insight comes into play (Fig. 1.22). If instead of trying to define more and more features we think a bit more deeply about features that look promising, we might be able to do more with less. Scamper and scurry are inherently dynamic quantities, so presumably we’ll need multiple image frames rather than snapshots, which shouldn’t be a problem. A game I play when out and about on campus, or in a city, is to classify the people and animals around me according to my own personal (scamper, scurry) metrics. Try it sometime. Scamper implies a lightness of being. Scurry is a bit more emotionally dark and urgent. Scamper almost rhymes with jumper. Scurry actually rhymes with
1 Background and History
21
Fig. 1.22 The curse of dimensionality is surprising and disappointing to everybody who does pattern classification [58]
hurry. Let’s agree to stipulate that rats don’t scamper.2 In a sequence of images, we could simply track the center of gravity of the varmint over time, which will easily allow us to differentiate scampering from scurrying. A scampering squirrel will have a center of gravity that goes up and down a lot over time, whereas the rat will have a center of gravity that drops a bit at the start of the scurry but then remains stable (Fig. 1.23). If we also calculate a tail bushiness metric for each frame of the video clip we could form an average bushiness in order to avoid outliers due to some extraneous bushiness from a shrubbery showing up occasionally in an image. We then plot these as (CG Stability, Average Bushiness) as sketched in Fig. 1.24 in case you haven’t been following along with my phase space cartoons decorated with rat-and-squirrel clipart.
2 I’ve
watched the videos with the title “Rats Scamper Outside Notre Dame Cathedral as Flooding Pushes Rodents Onto Paris Streets” (January 24, 2018) but those rats are clearly scurrying. Something must have gotten lost in translation. I wonder if rats are somehow to blame for the Notre Dame fire? Surely it wasn’t squirrels nesting up in the attic!
22
M. K. Hinders
Fig. 1.23 Motion of the center of gravity is the key quantity. The blue dashed line shows the center of gravity while sitting. The green line shows the center of gravity lowered during a scurry. The red line shows the up and down of the center of gravity while scampering Fig. 1.24 Use multiple frames from video to form new more sophisticated feature vectors while avoiding the curse of dimensionality
1.5 That’s Promising, but What Else Could We Do? We might want to use the acoustic part of the video, since it’s going to be available anyway. We know that • • • • •
Rats squeak but squirrels squawk. Rats giggle at ultrasound frequencies if tickled. Squirrel vocalizations are kuks, quaas, and moans. If frequencies are different enough, then an FFT should provide feature(s). Time-scale representations will allow us to visualize time-varying frequency content of complex sounds.
Many measurements acquired from physical systems are one-dimensional timedomain signals, commonly representing amplitude as a function of time. In many cases, useful information can be extracted from the signal directly. Using the waveform of an audio recording as an example, the total volume of the recording at any point in time is simply the amplitude of the signal at that time point. More in-depth analysis of the signal could show that regular, sharp, high-amplitude peaks are drum hits, while broader peaks are sustained organ notes. Amplitude, peak sharpness, and
1 Background and History
23
peak spacing are all examples of features that can be used to identify particular events occurring in the larger signal. As signals become more complicated, such as an audio recording featuring an entire orchestra as compared to a single instrument or added noise, it becomes more difficult to identify particular features in the waveform and correlate them to physical events. Features that were previously used to differentiate signals then no longer do so reliably. One of the most useful, and most common, transformations we can make on a time-domain signal is the conversion to a frequency-domain spectrum. For a real signal f (x), this is accomplished with the Fourier transform 1 F(ω) = √ 2π
∞
−∞
f (t) e−iωt dt.
(1.1)
The resultant signal F(ω) is in the frequency domain, with angular frequency ω related to the natural frequency ξ (with units cycles per second) by ω = 2π ξ . An inverse Fourier transform will transform this signal back to the time domain. Since this is the symmetric formulation of the transform, the inverse transform can be written as ∞ 1 F(ω) eiωt dω. (1.2) f (t) = √ 2π −∞ Since the Fourier transform is just an extension of the Fourier series, looking at this series is the best way to understand what actually happens in the Fourier transform. The Fourier series, discovered in 1807, decomposes any periodic signal into a sum of sines and cosines. This series can be expressed as the infinite sum ∞
f (t) =
a0 + an cos(nt) + bn sin(nt), 2 n=1
(1.3)
where the an and bn are the Fourier coefficients. By finding the values of these coefficients that best describe the original signal, we are describing the signal in terms of some new basis functions: sines and cosines. The relation to the complex exponential given in the Fourier transform comes from Euler’s formula, e2πiθ = cos 2π θ + i sin 2π θ. In general, any continuous signal can be represented by a linear combination of orthonormal basis functions (specifically, the basis functions must define a Hilbert space). Sines and cosines fulfill this requirement and, because of their direct relevance to describing wave propagation, provide a physically relatable explanation for what exactly the decomposition does—it describes the frequency content of a signal. In practice, since real-world signals are sampled from a continuous measurement, calculation of the Fourier transform is accomplished using a discrete Fourier transform. A number of stable, fast algorithms exist and are staples of any numerical signal processing analysis software. As long as the Nyquist–Shannon sampling theorem is
24
M. K. Hinders
respected, sampling rate f s must be at least twice the maximum frequency content present in the signal, no information about the original signal is lost.
1.5.1 Short-Time Fourier Transform While the Fourier transform allows us to determine the frequency content of a signal, all time-domain information is lost in the transformation. The spectrum of the audio recording tells us which frequencies are present but not when those notes were being played. The simple solution to this problem is to look at the Fourier transform over a series of short windows along the length of the signal. This is called the short-time Fourier transform (STFT), and is implemented as 1 STFT { f (t)} (τ, ω) ≡ F(τ, ω) = √ 2π
∞ −∞
f (t) w(t ¯ − τ ) e−iωt dt,
(1.4)
where w(t ¯ − τ ) is a windowing function that is nonzero for only a short time, typically . Since this a Hann window, described in the discrete domain by w(n) ¯ = sin2 Nπn −1 is an invertible process it is possible to recreate the original signal using an inverse transform, but windowing of the signal makes inversion more difficult. Taking the squared magnitude of the STFT (|F(τ, ω)|2 ) and displaying the result as a color-mapped image with frequency on the vertical axis and time on the horizontal axis shows the evolution of the frequency spectrum as a function of time. These plots are often referred to as spectrograms, an example of which is shown in Fig. 1.25. It is important to note that this transformation from the one-dimensional time domain to a joint time–frequency domain creates a two-dimensional representation of the signal. Adding a dimension to the problem gives us more information about our signal at the expense of more difficult analysis.
Fig. 1.25 The spectrogram (bottom) of the William and Mary Alma Mater, performed by the William and Mary Chorus, provides information about the frequency content of the signal not present in the time-domain waveform (top)
1 Background and History
25
The more serious limitation of the STFT comes from the uncertainty principle known as the Gabor limit, 1 (1.5) Δt Δω ≥ , 2 which says that a function cannot be both time and band limited. It is impossible to simultaneously localize a function in both the time domain and the frequency domain, which leads to resolution issues for the STFT. A short window will provide precise temporal resolution and poor frequency resolution, while a wide window has the exact opposite effect.
1.5.2 Other Methods of Time–Frequency Analysis The development of quantum mechanics in the twentieth century ushered in a number of alternative time–frequency representations because the mathematics are similar in the position-momentum and time–frequency domains. One of these is the Wigner– Ville distribution, introduced in 1932, which maps the quantum mechanical wave function to a probability distribution in phase space. In 1948, Ville wrote a time– frequency formulation, W (τ, ω) =
∞
f −∞
τ+
t 2
t f∗ τ − e−iωt dt, 2
(1.6)
where f ∗ (t) is the complex conjugate of f (t). This can be thought of as the Fourier transform of the autocorrelation of the original signal f (t), but because it is not a linear transform, cross-terms occur when the input signal is not monochromatic. Gabor also tried to improve the resolution issues with the STFT by introducing the transform ∞ 2 e−π(t−τ ) e−iωt f (t) dt, (1.7) G(τ, ω) = −∞
which is basically the STFT with a Gaussian window function. Like the STFT, this is a linear transformation and there is no problem with cross-terms. By combining the Wigner–Ville and Gabor transforms, we can mitigate the effects of the crossterms and improve the resolution of the time–frequency representation. One possible representation of the Gabor–Wigner transform is D(τ, ω) = G(τ, ω) × W (τ, ω).
(1.8)
The spectrogram (STFT) is rarely the optimal time–frequency representation, but there are others such as the Wigner (Fig. 1.26) and positive transforms (Fig. 1.27). We can also use wavelet transforms to form analogous time-scale representations. There are many mother wavelets to choose from.
26
M. K. Hinders
Fig. 1.26 Time-domain waveform is shown (bottom) and its power spectrum (rotated, left) along with the Wigner transform as a false color time–frequency image
1.5.3 Wavelets The overarching issue with any of the time–frequency methods is that the basis of the Fourier transform is chosen with the assumption that the signals to be analyzed are periodic or infinite in time. Most real-world signals are not periodic but change character over time. This problem becomes even more clear when looking at finite signals with sharp discontinuities. Approximating such signals as linear combination of sinusoids creates overshoot at the discontinuities. The well-known Gibbs phenomenon is illustrated in Fig. 1.28. Instead we can use a basis of finite signals, called wavelets [59], to better approximate real-world signals. The wavelet transform is written as
1 Background and History
27
Fig. 1.27 Time-domain waveform is shown (bottom) and its power spectrum (rotated, left) along with the positive transform as a false color time–frequency image Fig. 1.28 Attempting to approximate a square wave using Fourier components (sines and cosines) creates large oscillations near the discontinuities. Known as the Gibbs phenomenon, this overshoot increases with frequency (as more sums are added to the Fourier series) but eventually approaches a finite limit
28
M. K. Hinders
Fig. 1.29 A signal s(t) is decomposed into approximations (A) and details (D), corresponding to low- and high-pass filters, respectively. By continually decomposing the approximation coefficients in this manner and removing the first several levels of details, we have effectively applied a low-pass filter to the signal
1 W(τ, s) = √ s
∞
−∞
f (t) ψ
t −τ s
dt.
(1.9)
A comparison to the STFT (1.4) shows that this transform decomposes the signal not into linear combinations of sines and cosines, but into linear combinations of wavelet functions ψ(τ, s). We can relate this to the Fourier decomposition (1.3) by defining the wavelet coefficients c jk = W k2− j , 2− j .
(1.10)
Here, τ = k2− j and is referred to as the dyadic position and s = 2− j and is called the dyadic dilation. We are decomposing our signal in terms of a wavelet that can move (position τ ) and deform by stretching or shrinking (scale s). This transforms our original signal into a joint time-scale domain, rather than a frequency domain (Fourier transform) or joint time–frequency domain (STFT). Although the wavelet transform doesn’t provide any direct frequency information, scale is related to the inverse of frequency, with low-scale decompositions relating to high frequency and vice versa. This relationship is often exploited to de-noise signals by removing information at particular scales (Fig. 1.29). In addition to representing near-discontinuous signals better than the STFT, the dyadic (factor-of-two) decomposition of the wavelet transform allows an improvement in time resolution at high frequencies (Fig. 1.30).
1 Background and History
29
Fig. 1.30 The STFT has similar time resolution at all frequencies, while the dyadic nature of the wavelet transform affords better time resolution at high frequencies (low-scale values)
In the time domain, wavelets are completely described by the wavelet function (mother wavelet ψ(t)) and a scaling function (father wavelet φ(t)). The scaling function is necessary because stretching the wavelet in the time domain reduces the bandwidth, requiring an infinite number of wavelets to accurately capture the entire spectrum. This is similar to Zeno’s paradox, in which trying to get from point A to point B by crossing half the remaining distance each step is logically fruitless. The scaling function is an engineering solution to this problem, allowing us to get close enough for all practical purposes by covering the rest of the spectrum. In order to completely represent a continuous signal, we must make sure that our wavelets form an orthonormal basis. Since as part of the decomposition we are allowed to scale and shift our original wavelet, we only need to ensure that the mother wavelet is continuously differentiable and compactly supported. For our analysis, we typically use the wavelet definitions and transform algorithms included in MATLAB.3 The Haar wavelet is the simplest example of a wavelet—a discontinuous step function with uniform scaling function. The Haar wavelet is also the first (db1) of the Daubechies family of wavelets abbreviated dbN, with order N the number of vanishing moments. Historically, these were the first compactly supported orthonormal set of wavelets and were soon followed by Daubechies’ slightly modified and least asymmetric Symlet family. The Coiflet family, also exhibiting vanishing moments, was also created by Daubechies at the request of other researchers. The Meyer wavelet has both its scaling and wavelet functions defined in the frequency domain, but is not technically a wavelet because its wavelet function is not compactly supported. However, ψ → 0 as x → ∞ fast enough that the pseudowavelet is infinitely differentiable. This allows the existence of good approximations for use in discrete wavelet transforms, and we often consider the Meyer and related Discrete Meyer functions as wavelets for our analysis. Both the Mexican hat and Morlet wavelets are explicitly defined and have no scaling function. The Mexican hat wavelet is proportional to the second derivative function of the Gaussian probability density function, while the Morlet wavelet is 2 defined as ψ(x) = Ce−x cos(5x), with scaling constant C. 3 See,
for example https://www.mathworks.com/products/wavelet.html.
30
M. K. Hinders
1.6 The Dynamic Wavelet Fingerprint While alternative time–frequency transformations can improve the resolution limits of the STFT, they often create their own problems such as the cross-terms in the Wigner–Ville transform. Combinations of transforms can reduce these effects while still offering increased resolution, but this then comes at the cost of computational complexity. Wavelets offer an alternative basis for decomposition that is more suited to finite real-world signals, but without the direct relationship to frequency. One of the issues with time–frequency representations of signals is the added complexity of the resultant time–frequency images. Just as displaying a onedimensional signal requires a two-dimensional image, viewing a two-dimensional signal requires a three-dimensional visualization method. Common techniques include three-dimensional surface plots that can be rotated on a computer screen or color-mapped two-dimensional images where the value at each point is mapped to a color. While these visualizations work well for human interpretation of the images, computers have a difficult time distinguishing between those parts of the image we care about and those that are just background clutter. This difficulty with image segmentation is especially true for noisy signals. The human visual system is evolutionarily adapted to be quite good at this4 but computers lack such an advantage. Automated image segmentation methods work well for scenes where a single object is moving in a predictable path across a mostly stationary background. We have developed an alternative time–frequency representation called the dynamic wavelet fingerprint (DWFP) that we have found useful to reveal subtle features in noisy signals. This technique takes a one-dimensional time-domain waveform and converts it to a two-dimensional time-scale image [60] generating a presegmented binary image that can be analyzed using image processing techniques.
1.6.1 Feature Creation The DWFP process first filters a one-dimensional signal using a stationary discrete wavelet transform. This decomposes the signal into wavelet components at a set number of levels, removes the chosen details, and then uses the inverse stationary wavelet transform to recompose the signal. The number of levels, details to remove, and wavelet used for the transform are all user-specified parameters. A Tukey window can also be applied to the filtered signal at this point to smooth out behavior at the edges. Next, the wavelet coefficients are created using a continuous wavelet transform. The normalized coefficients form a three-dimensional surface, and can be thought of as “peaks” or “valleys” depending on if the coefficients are positive or negative. 4 Those
us.
ancient humans who didn’t notice that tiger behind the bush failed to pass their genes on to
1 Background and History
31
Slicing this surface (both slice thickness and number of slices are user parameters) and projecting the slices to a plane generate a two-dimensional binary image. The vertical axis of this image is scale (inversely related to frequency), and the horizontal axis remains time, allowing direct comparison to the original one-dimensional timedomain signal. The image often resembles a set of fingerprints (hence the name), but most importantly the image is pre-segmented and can be easily analyzed by standard image processing techniques. Since the slicing process does not distinguish between peak (positive coefficients) and valleys (negative coefficients) we can instead do the slicing operation in two steps, keeping the peak and valley projections separate. This generates two fingerprint images for each signal—one for peaks and one for valleys— which can be analyzed separately or combined into a (still segmented) ternary image. A number of additional features can be extracted from this fingerprint image. Some of the features we extract are functions of time, for example, a simple count of the number of ridges at each time point. However, many of the features that we want to extract from the image are tied to a particular individual fingerprint, requiring us to first identify and consecutively label the individual fingerprints. We use a measure of nearly connectedness, in which pixels of the same value within a set distance of each other are considered connected, to label each individual fingerprint. This measure works well as long as each individual fingerprint is spatially separated from its neighbor, something that is not necessarily true for the ternary fingerprint images. For those cases, we actually decompose the ternary image into two separate binary images, label each one individually, and then recombine and relabel the two images (Fig. 1.31). In some cases, the automated labeling will classify objects as a fingerprint even though they may not represent our expectation of a fingerprint. While this won’t affect the end results because such fingerprints won’t contain any useful information, it can slow down an already computationally intensive process. To reduce these false fingerprints, an option is added to restrict the allowed solidity range for an object to be classified as an individual fingerprint.
1.6.2 Feature Extraction Once the location and extent of each individual fingerprint has been determined, we apply standard image processing libraries included in MATLAB to extract features from the image. The resemblance of our images to fingerprints, for which a large image recognition literature already exists, can be exploited in this process. These parameter waveforms are then linearly interpolated to facilitate a direct comparison to the original time-domain signal. Typically, about 25 one-dimensional parameter waveforms are created for each individual measurement. Some of these features are explained in more detail below.
32
M. K. Hinders
Fig. 1.31 To consecutively label the individual fingerprints within the fingerprint image, the valleys (top left) and peaks (top right) images are first labeled individually and then combined into an overall labeled image (bottom)
A number of features are extracted from both the raw signal and the wavelet fingerprint image using the MATLAB image processing toolbox regionprops analysis to create an optimized feature vector for each instance. 1. Area: Number of pixels in the region 2. Filled Area: Number of pixels in the bounding box (smallest rectangle that completely encloses the region) 3. Extent: Ratio of pixels to pixels in bounding box, calculated as
Area Area of bounding box
4. Convex Area: Area of the convex hull (the smallest convex polygon that contains the area) 5. Equivalent Diameter: Diameter of a circle with the same area as the region, calculated as 4·Area π
6. Solidity: Proportion of pixels in the convex hull to those also in the region, calculated as Area Convex Area
7. xCentroid: Center of mass of the region along the horizontal axis 8. yCentroid: Center of mass of the region along the vertical axis 9. Major Axis Length: Pixel length of the major axis of the ellipse that has the same normalized second central moments as the region
1 Background and History
33
Table 1.1 List of user parameters in DWFP creation and feature extraction process Setting Options Description Wavelet filtering filtmethod wvtpf numlvls swdtoremove Wavelet transform wvt ns normconstant numslices slicethickness Feature extraction saveimages fullorred solidity_range
Filt, filtandwindow, window, none wavelet name Z+ [Z+ ]
How to filter data Filtering wavelet Number of levels to filter Details to remove
wavelet name Z+ Z+ Z+ R+
Transform wavelet Number of scales for transform Normalization constant Number of slices Thickness of each slice
Binary switch Full, reduced [R ∈ [0, 1], R ∈ [0, 1]]
Save fingerprint images? Require certain solidity Allowable solidity range
10. Minor Axis Length: Pixel length of the minor axis of the ellipse that has the same normalized second central moments as the region 11. Eccentricity: Eccentricity of the ellipse that has the same normalized second central moments as the region, computed as the ratio of the distance between the foci of the ellipse and its major axis length 12. Orientation: Angle (in degrees) between the x-axis and the major axis of the ellipse that has the same second moments as the region 13. Euler Number: Number of objects in the region minus the number of holes in those objects, calculated using 8-connectivity 14. Ridge count: Number of ridges in the fingerprint image, calculated by looking at the number of transitions between pixels on and off at each point in time
The user has control of a large number of parameters in the DWFP creation and feature extraction process (Table 1.1), which affect the appearance of the fingerprint images, and thus the extracted features. There is no way to tell a priori which combination of parameters will create the ideal representation for a particular application. Past experience with analysis of DWFP images helps us to avoid an entirely brute force implementation for many applications. However, in some cases, the signals to be analyzed are so noisy that humans are incapable of picking out useful patterns in the fingerprint images. For these applications, we use the formal machine learning language of pattern classification and a computing cluster to run this feature extraction process in parallel for a large number of parameter combinations. We first have to choose a basis function, which could be sinusoidal, wavelet, or whatever. Then we have to identify something(s) in the resulting images that can
34
M. K. Hinders
be used to make features. The tricky bit is automatically extracting those image features: it’s computationally expensive. Then we have to somehow downselect the best image features from all the ones that are promising. You’ll be unsurprised that the curse of dimensionality comes into play here. All we wanted to do was distinguish rats from squirrels and somehow we’re now doing wavelet transforms to tell the difference between scampering-squawking squirrels and scurrying-squeaking rats. It would seem simpler just to brute force the problem by training a standard machine learning system with a large library of squirrel versus rat images. Yes, that would be simpler conceptually. Go try that if you like. We’ll wait.
1.6.3 Edge Computing Meanwhile, the world is pushing hard into the Internet of Things (IoT) right now. The sorts of low-power sensor hubs that are in wearables have an amazing amount of capability, both in terms of sensors and local processing of sensor data which enables what’s starting to be called distributed machine learning, edge computing, etc. For the last 30 years, my students and I have been modeling the scattering of radar, sonar, and ultrasound waves from objects, tissues, materials, and structures. The reflection, transmission, refraction and diffraction of light, and the conduction of heat are also included in this body of work. As computers have become more and more capable, three-dimensional simulations of these interactions have become a key aspect of sorting out very complex behaviors. Typically, our goal has been to solve inverse problems where we know what the excitation source is, and some response of the system is measured. Success is being able to automatically and in (near) real time deduce the state of the object(s), tissue(s), material(s), and/or structure(s). We also want quantitative outputs with a resolution appropriate for that particular case. The mathematical modeling techniques and numerical simulation methods are the same across a wide range of physical situation and field of application. Why the sky is blue [61–65] and how to make a bomber stealthy [66–69] both utilize Maxwell’s equations. Seismic waves and ultrasonic NDE employ identical equations once feature sizes are normalized by wavelength. Sonar of whales and submarines is identical to that of echolocating bats and obstacle-avoiding robots. In general, inverse scattering problems are too hard, but bats do avoid obstacles and find food. Radar detects inbound threats and remotely estimates precipitation. Cars park themselves and (usually) stop before impact even if the human driver is not paying close attention and I’m jaywalking with a cup of WaWa coffee trying not to spill. Medical imaging is now so good that we find ourselves in an overdiagnosis dilemma. Computed tomography is familiar to most people from X-ray CT scanners used in medicine and baggage screening. In a different configuration, CT is a workhorse method for seismic wave exploration for natural resources. We adapted these methods for ultrasound-guided wave characterization of flaw in large engineered structures like aircraft and pipelines. By extracting features from signals that have gone through the region of interest in many directions, the recon-
1 Background and History
35
struction algorithms output a map of the quantities of interest, e.g., tissue density or pipe wall thickness. The key is understanding which features to extract from the signals for use in tomographic reconstruction, but that understanding comes from an analysis of the signal energy interacting with the tissue, structure, or material variations that matter. For many years now, we’ve been developing the underlying physics to enable robots to navigate the world around them. We tend to focus on those sensors and imagers where the physics is interesting, and then develop the machine learning algorithms to allow the robots to autonomously interpret its sensors and imagers. For autonomous vehicles, we now say IoV instead, and there the focus is sensors/interpretation, but also communication among vehicles which is how an autonomous vehicle sees around corners and such. The gadgets used for computed tomography tend to be large and expensive and often immobile, but their primary constraint is measurement geometries that can be overly restrictive. Higher and higher fidelity reconstructions require more and more ray paths, each of which is a signal from which features must be extracted in (near) real time. Our primary effort over the last two decades has been signal processing methods to extract relevant features from these sorts of signals, and we eventually began calling these sets of algorithms narrow-AI expert systems. They’re always carefully tailored and finely tuned to the particular measurement scenario of interest at that time, and they use whatever understanding of the appropriate scattering problem we can bring to bear.
1.7 Will the Real Will West Please Step Forward? In colonial India, British bureaucrats had a problem. They couldn’t tell one native pensioner from another, so when one died another person would show up and continue to collect that pension. Photo ID wasn’t a thing yet, and illiterate Indians couldn’t sign for their cash. Fingerprints provided the answer, because there is this belief that they are unique to each individual. In Victorian times, prisons switched over from punishment to rehabilitation, but if you’re going to give harsher sentences to repeat offenders you have to be able to tell whether someone has been to prison before. Again, without photography and hence no ability to store and transmit images, it would be pretty simple to avoid harsher and harsher sentences by giving a fake name every subsequent time you were arrested. The question then becomes: What features of individuals can be measured to uniquely identify them? Hairstyle or color won’t work. Eye color would work, but weight wouldn’t. Scars, tattoos, etc. should work if there are any. Alphonse Bertillon [71] had the insight that bones don’t change after you’ve stopped growing, so a list of your particulars including this kind of measurement should allow you to be uniquely identified. His system included 11 measurements:
36
M. K. Hinders
Fig. 1.32 In 1903, a prisoner named Will West arrived at Leavenworth. William West had been serving a life sentence at Leavenworth since 1901. They had identical Bertillon measurements. Fingerprints were distinct, though. See: [70]
• Height, • Stretch: Length of body from left shoulder to right middle finger when arm is raised, • Bust: Length of torso from head to seat taken when seated, • Length of head: Crown to forehead, • Width of head: temple to temple, • Length of right ear, • Length of left foot, • Length of left middle finger, • Length of left cubit: elbow to tip of middle finger, • Width of cheeks, and • Length of left little finger. This can be pretty expensive to do, and can’t distinguish identical twins: Will West and William West were both in Leavenworth and had identical Bertillon measurements, Fig. 1.32. Objections to this system included the cost of instruments employed and their liability to become out of order; the need for specially instructed measurers of superior education; errors frequently crept in when carrying out the processes and were all but irremediable; and modesty for women. The last objection was a problem in trying to quantify recidivism among prostitutes. The consensus was that a small number of repeat offenders were responsible for a large proportion of crimes, and that these “female defective delinquents” spread STDs. Bertillon anthropometry required a physical intimacy between the operator and the prisoner’s body that was not deemed appropriate for female prisoners. This was a time when doctors did make house calls, but typically wouldn’t have their female patients disrobe at all and may not even lay on hands because such things would be immodest. It’s no wonder that a common diagnosis was “women’s troubles.” Also, a problem for Bertillonage was that Bouffant hairstyles throw off height measures. When people complained to Bertillon, he responded that of course you can’t make it work, you’re not French. Fingerprints seemed like a better solution all around. They also have the advantage of being cheap and easy, and the Henry fingerprint classification system allowed
1 Background and History
37
fingerprint information to be written down and filed for later use, and also sent by telegraph or letter.: • • • • • •
Basic patterns: arch, loop, whorl, and composite. Fingers numbered 1–10 starting at left pinkie. Primary classification a fraction with odd number fingers over even number fingers. Add 1, 2, 4, 8, 16 when whorls appeared. 1/1 is zero whorls, 32/32 is ten whorls. Filing cabinets had 32 rows and 32 columns.
Secondary classification indicated arches, tented arches, radial loops, or ulnar loops, with right-hand numerators and left denominators. Further subclassification of whorls was done by ridge tracing since whorls originate in two deltas. Loops were subclassified by ridge counting, i.e., the number of ridges between delta and core. Hence, 5/17 R/U OO/II 19/8 for the author of a surprisingly interesting history of fingerprinting [72] who has whorls in both thumbs; high-ridge-count radial loops in the right index and middle fingers; low-ridge-count ulnar loops in the left index and middle fingers; nineteen-ridge-count loop in the right little finger; and eight-ridgecount loop in the left little finger. This pattern classification system worked pretty well until the database got large enough that searching it became impractical. There was also a terrific battle for control over the official fingerprint repository between New York and the FBI. You can guess who won that. The key message is that the way biometrics have always worked is not by comparing images, but by comparing features extracted from them. In fingerprint analysis that’s the way it works to this day, with the language being the number of points that match. People still believe that fingerprints are unique to the individual, and don’t realize that extracting features from fingerprints can be quite subjective. I wonder what a jury would say if the suspect’s cubit (length from elbow to fingertip) was used as proof of guilt (Fig. 1.33).
1.7.1 Where Are We Headed? So you see, machine learning is the modern lingo for what we’ve always been trying to do. Sometimes we’ve called it pattern classification or expert systems, but the key issue is determining what’s what from measured data streams. Machine learning is usually divided into supervised and unsupervised learning. Supervised learning requires training data with known class labels, and unless one has a sufficient amount of relevant training data these methods will return erroneous classifications. Unsupervised learning can reveal structures and inter-relations in data with no class labels required for the data. It can be used as a precursor for supervised learning, but can also uncover the hidden thematic structure in collections of documents, phone conversations, emails, chats, photos, videos, etc. This is important because more than 90% of all data in the digital universe is unstructured. Most people have an idea that every customer service phone call is now monitored, but that doesn’t mean that a supervisor is listening in, it means
38
M. K. Hinders
Fig. 1.33 From top left to bottom right: loop, double loop, central pocket loop, plain whorl, plain arch, and tented arch [72]
that computers are hoovering up everything and analyzing the conversations. Latent topics are the unknown unknowns in unstructured data, and contain the most challenging insights for humans to uncover. Topic modeling can be used as a part of the human–machine teaming capability that leverages both the machine’s strengths to reveal structures and inter-relationships, and the human’s strengths to identify patterns and critique solutions using prior experiences. Listening to calls and/or reading documents will always be the limiting factors if we depend on humans with human attention spans, but topic modeling allows the machines to plow through seemingly impossible amounts of data to uncover the unknown unknowns that could lead to actionable insights. Of course, we don’t want to do this in a truly unsupervised fashion, because we’ve been trying to mathematically model and numerically simulate complicated systems (and their responses) for decades. Our current work is designed to exploit those decades of work to minimize the burden of getting a sufficiently large and representative set of training data with assigned class labels. We still have both supervised and unsupervised machine learning approaches working in tandem, but now we invoke insight provided by mathematical models and numerical simulations to add what we sometimes call model-assisted learning. One area that many researchers are beginning to actively explore is health care, with the goal of replacing the human scribes with machine learning systems that transcribe conversations between and among patients, nurses, doctors, etc. in real time while also drawing upon the patients’ digital medical records, scans, and current bodily function monitors. Recall Fig. 1.9 where we imagined doing this 20 years ago.
1 Background and History
39
Most people have heard about Moore’s Law even if they don’t fully appreciate that they’re walking around with a semi-disposable supercomputer in their pocket and perhaps even on their wrist. What’s truly new and amazing, though, is the smarts contained in the low-power sensor hubs that every tablet, smartphone, smartwatch, etc. are built around. Even low-end fitness trackers do so much more than the old-fashioned pedometers handed out in gym class. On-board intelligence defeats attempts to “finish your PE homework” on the school-bus ride home with the old shakey-shakey, even if some of the hype around tracking sleep quality and counting calories burned is a bit overblown. These modern MEMS sensor hubs have local processors to interpret the sensor data streams in an efficient enough manner that the battery draw is reduced by 90% compared to utilizing the main (general purpose) CPU. Of course, processing sensor data streams to extract features of interest is at the heart of machine learning. Now, however, sensor hubs and processing power are so small and cheap that we can deploy lots and lots of these things—connected to each other wirelessly and/or via the Internet—and do distributed machine learning on the Internet of Things (IoT). There’s been a lot of hype about the IoT because even non-technical business-channel talking head experts understand that something new is starting to happen and there’s going to be a great deal of money to be made (and lost). The hardware to acquire, digest, and share sensor data just plummeted from $104 to $101 and that trend line seems to be extending. Music and video streaming on-demand are the norm, even in stadiums where people have paid real money to experience live entertainment. Battery fires are what make the news, but my new smoke detector lasts a decade without needing new batteries when I change the clocks twice a year which happens automatically these days anyway. Even radar has gotten small enough to
Fig. 1.34 Kids today may only ever get to drive Cozy Coupes, since by the time they grow up autonomous vehicles should be commonplace. I’ll expect all autonomous vehicles to brake sharply if I step out into the street without looking
40
M. K. Hinders
put on a COTS drone, and not just a little coffee can radar but a real phased array radar using meta-materials to sweep the beam in two directions with an antennalens combination about the size and weight of an iPad and at a cost already an order of magnitude less than before. For this drone radar to be able to sense-andavoid properly, though, the algorithms are going to need to improve because small drones operate in a cluttered environment. This explains why their output looks a lot like B-mode ultrasound, and while it’s true that power lines and barbed-wire fences backscatter radar similarly, the fence will always have bushes, trees, cows, etc. that confound the signal. This clutter problem is a key issue for autonomous ground robots, because there’s no clear distinction between targets and clutter and some (me) would say pedestrians are neither (Fig. 1.34).
References 1. 1,340 Perish as Titanic sinks, only 886, mostly women and children, rescued. New York Tribune, New York. Page 1, Image 1, col. 1. Accessed 16 Apr 1912 2. Maxim SH (1912) Preventing collisions at sea, a mechanical application of the bat’s sixth sense. Sci Am 80–82. Accessed 27 July 1912 3. Maxim SH (1912) A new system of preventing collisions at sea. Cassel and Co., London, 147 p 4. A new system of preventing collisions at sea. Nature 89(2230):542–543 5. Dijkgraaf S (1960) Spallanzani’s unpublished experiments on the sensory basis of object perception in bats. Isis 51(1):9–20. JSTOR. www.jstor.org/stable/227600 6. Griffin DR (1958) Listening in the dark: the acoustic orientation of bats and men. Yale University Press, New Haven. Paperback – Accessed 1 Apr 1986. ISBN-13: 978-0801493676 7. Grinnell AD (2018) Early milestones in the understanding of echolocation in bats. J Comp Physiol A 204:519. https://doi.org/10.1007/s00359-018-1263-3 8. Donald R. Griffin obituary. http://www.nytimes.com/2003/11/14/nyregion/donald-r-griffin88-dies-argued-animals-can-think.html 9. Donald R. Griffin: 1915–2003. Photograph by Greg Auger. Bat Research News 45(1) (Spring 2004). http://www.batresearchnews.org/Miller/Griffin.html 10. Au WWL (1993) The sonar of dolphins. Springer, New York 11. Brittain JE (1985) The magnetron and the beginnings of the microwave age. Physics Today 38:7, 60. https://doi.org/10.1063/1.880982 12. Buderi R (1998) The invention that changed the world: how a small group of radar pioneers won the second world war and launched a technical revolution. Touchstone, Reprint edition. ISBN-13: 978-0684835297 13. Conant J (2002) Tuxedo park: a wall street tycoon and the secret palace of science that changed the course of world war II. Simon and Schuster, New York 14. Denny M (2007) Blip, ping, and buzz: making sense of radar and sonar. Johns Hopkins University Press, Baltimore. ISBN-13: 978-0801886652 15. Bowman JJ, Thomas BA, Senior, Uslenghi PLE, Asvestas JS (1970) Electromagnetic and acoustic scattering by simple shapes. North-Holland Pub. Co., Amsterdam. Paperback edition: CRC Press, Boca Raton. Accessed 1 Sept 1988. ISBN-13: 978-0891168850 16. Grier DA (2005) When computers were human. Princeton University Press, Princeton 17. The human computer project needs help finding all of the women who worked as computers or mathematicians at the NACA or NASA. https://www.thehumancomputerproject.com/ 18. Anderson VC (1950) Sound scattering from a fluid sphere. J Acoust Soc Am 22:426. https:// doi.org/10.1121/1.1906621
1 Background and History
41
19. NASA Dryden Flight Research Center Photo Collection (1949) NASA Photo: E49-54. https:// www.nasa.gov/centers/dryden/multimedia/imagegallery/Places/E49-54.html 20. Covert A (2011) Philco mystery control: the world’s first wireless remote. Gizmodo. Accessed 11 Aug 2011. https://gizmodo.com/5857711/philco-mystery-control-the-worldsfirst-wireless-remote 21. “Bombshell: the Hedy Lamarr story” Director: Alexandra Dean opened in theaters on November 24, 2017. http://www.pbs.org/wnet/americanmasters/bombshell-hedy-lamarr-story-fullfilm/10248/, https://zeitgeistfilms.com/film/bombshellthehedylamarrstory. Photo credit to https://twitter.com/Intel - Accessed 11 Mar 2016 22. Marr B (2016) A short history of machine learning – every manager should read. Forbes. Accessed 19 Feb 2016. https://www.forbes.com/sites/bernardmarr/2016/02/19/ashort-history-of-machine-learning-every-manager-should-read/65578fb215e7 23. Gonzalez V (2018) A brief history of machine learning. Synergic Partners. Accessed Jun 2018. http://www.synergicpartners.com/en/espanol-una-breve-historia-del-machine-learning 24. Johnson D (2017) Find out if a robot will take your job. Time. Accessed 19 Apr 2017. http:// time.com/4742543/robots-jobs-machines-work/ 25. Alan Turing: the enigma. https://www.turing.org.uk/ 26. Professor Arthur Samuel. https://cs.stanford.edu/memoriam/professor-arthur-samuel 27. “Professor’s perceptron paved the way for AI – 60 years too soon”. https://news.cornell.edu/ stories/2019/09/professors-perceptron-paved-way-ai-60-years-too-soon 28. Minsky M, Professor of media arts and sciences. https://web.media.mit.edu/~minsky/ 29. DeJong G (a.k.a. Mr. EBL). http://mrebl.web.engr.illinois.edu/ 30. Sejnowski T, Professor and computational neurobiology laboratory head. https://www.salk. edu/scientist/terrence-sejnowski/ 31. Foote KD (2017) A brief history of deep learning. Dataversity. Accessed 7 Feb 2017. http:// www.dataversity.net/brief-history-deep-learning/ 32. US Food and Drug Administration (2019) Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) - discussion paper and request for feedback. www.fda.gov 33. Philips (2019) Adaptive intelligence. The case for focusing AI in healthcare on people, not technology. https://www.usa.philips.com/healthcare/resources/landing/adaptive-intelligencein-healthcare 34. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas M, Kern C, Ledsam JR, Schmid MK, Balaskas K, Topol EJ, Bachmann LM, Keane PA, Denniston AK (2019) A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health 1(6):e271–e297. https://doi.org/10.1016/S2589-7500(19)30123-2 35. Krupinski EA, Graham AR, Weinstein RS (2013) Characterizing the development of visual search expertise in pathology residents viewing whole slide images. Hum Pathol 44(3):357– 364. https://doi.org/10.1016/j.humpath.2012.05.024 36. Janowczyk A, Madabhushi A (2016) Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J Pathol Inform 7:29 37. Roy S, Kumar Jain A, Lal S, Kini J (2018) A study about color normalization methods for histopathology images. Micron 114:42–61. https://doi.org/10.1016/j.micron.2018.07.005 38. Komura D, Ishikawa S (2018) Machine learning methods for histopathological image analysis. Comput Struct Biotechnol J 16:34–42. https://doi.org/10.1016/j.csbj.2018.01.001 39. Landau MS, Pantanowitz L (2019) Artificial intelligence in cytopathology: a review of the literature and overview of commercial landscape. J Am Soc Cytopathol 8(4):230–241. https:// doi.org/10.1016/j.jasc.2019.03.003 40. Kannan S, Morgan LA, Liang B, Cheung MKG, Lin CQ, Mun D, Nader RG, Belghasem ME, Henderson JM, Francis JM, Chitalia VC, Kolachalama VB (2019) Segmentation of glomeruli within trichrome images using deep learning. Kidney Int Rep 4(7):955–962. https://doi.org/ 10.1016/j.ekir.2019.04.008
42
M. K. Hinders
41. Niazi MKK, Parwani AV, Gurcan MN (2019) Digital pathology and artificial intelligence. Lancet Oncol 20(5):e253–e261. https://doi.org/10.1016/S1470-2045(19)30154-8 42. Wang S, Yang DM, Rong R, Zhan X, Xiao G (2019) Pathology image analysis using segmentation deep learning algorithms. Am J Pathol 189(9):1686–1698. https://doi.org/10.1016/ j.ajpath.2019.05.007 43. Wang X et al (2019) Weakly supervised deep learning for whole slide lung cancer image analysis. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2019.2935141 44. Abels E, Pantanowitz L, Aeffner F, Zarella MD, van der Laak J, Bui MM, Vemuri VN, Parwani AV, Gibbs J, Agosto-Arroyo E, Beck AH, Kozlowski C (2019) Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the Digital Pathology Association. J Pathol 249:286–294. https://doi.org/10.1002/path.5331 45. Tajbakhsh N, Jeyaseelan L, Li Q, Chiang J, Wu Z, Ding X (2019) Embracing imperfect datasets: a review of deep learning solutions for medical image segmentation. https://arxiv.org/abs/1908. 10454 46. Janke J, Castelli M, Popovic A (2019) Analysis of the proficiency of fully connected neural networks in the process of classifying digital images. Benchmark of different classification algorithms on high-level image features from convolutional layers. Expert Syst Appl 135:12– 38. https://doi.org/10.1016/j.eswa.2019.05.058 47. Faes L, Wagner SK, Fu DJ, Liu X, Korot E, Ledsam JR, Back T, Chopra R, Pontikos N, Kern C, Moraes G, Schmid MK, Sim D, Balaskas K, Bachmann LM, Denniston AK, Keane PA (2019) Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study. Lancet Digit Health 1(5):e232–e242. https:// doi.org/10.1016/S2589-7500(19)30108-6 48. McKinney SM, Sieniek M, Godbole V et al (2020) International evaluation of an AI system for breast cancer screening. Nature 577:89–94. https://doi.org/10.1038/s41586-019-1799-6 49. Marks M (2019) The right question to ask about Google’s project nightingale. Slate. Accessed 20 Nov 2019. https://slate.com/technology/2019/11/google-ascension-project-nightingaleemergent-medical-data.html 50. Copeland R, Mattioli D, Evans M (2020) Paging Dr. Google: how the tech giant is laying claim to health data. Wall Str J. Accessed 11 Jan 2020. https://www.wsj.com/articles/pagingdr-google-how-the-tech-giant-is-laying-claim-to-health-data-11578719700 51. Photo from https://www.reddit.com/r/cablegore/ but it gets reposted quite a lot 52. Rous SN (2002) The prostate book, sound advice on symptoms and treatment. W. W. Norton & Company, Inc., New York. ISBN 978-0-393-32271-2 [53] 53. Imani F et al (2015) Computer-aided prostate cancer detection using ultrasound RF time series. In vivo feasibility study. IEEE Trans Med Imaging 34(11):2248–2257. https://doi.org/10.1109/ TMI.2015.2427739 54. Welch HG, Schwartz L, Woloshin S (2012) Overdiagnosed: making people sick in the pursuit of health, 1st edn. Beacon Press, Boston. ISBN-13: 978-0807021996 55. Holtzmann Kevles B (1998) Naked To the bone: medical imaging in the twentieth century, Reprint edn. Basic Books, New York. ISBN -13: 978-0201328332 56. Agarwal S, Milch B, Van Kuiken S (2009) The US stimulus program: taking medical records online. McKinsey Q. https://www.mckinsey.com/industries/healthcare-systems-and-services/ our-insights/the-us-stimulus-program-taking-medical-records-online 57. Pizza Rat is the nickname given to a rodent that became an overnight Internet sensation after it was spotted carrying down a slice of pizza down the stairs of a New York City subway platform in September 2015. https://knowyourmeme.com/memes/pizza-rat 58. Surprised Squirrel Selfie image at https://i.imgur.com/Tl1ieNZ.jpg. https://www.reddit.com/ r/aww/comments/4vw1hk/surprised_squirrel_selfie/. This was a PsBattle: a squirrel surprised by a selfie three years ago 59. Daubechies I (1992) Ten lectures on wavelets. Society for Industrial and Applied Mathematics. https://epubs.siam.org/doi/abs/10.1137/1.9781611970104 60. Hou J (2004) Ultrasonic signal detection and characterization using dynamic wavelet fingerprints. Doctoral dissertation, William and Mary, Department of Applied Science
1 Background and History
43
61. Howard JN (1964) The Rayleigh notebooks. Appl Opt 3:1129–1133 62. Strutt JW (1871) On the light from the sky, its polarization and colour. Philos Mag XLL:107– 120, 274–279 63. van de Hulst HC (1981) Light scattering by small particles. Dover books on physics. Corrected edition. Accessed 1 Dec 1981. ISBN-13: 978-0486642284 64. Kerker M (1969) The scattering of light and other electromagnetic radiation. Academic, New York 65. Bohren C, Huffman D (2007) Absorption and scattering of light by small particles. Wiley, New York. ISBN: 9780471293408 66. Knott EF, Tuley MT, Shaeffer JF (2004) Radar cross section. Scitech radar and defense, 2nd edn. SciTech Publishing, Raleigh 67. Richardson D (1989) Stealth: deception, evasion, and concealment in the air. Orion Books, London. ISBN-13: 978-0517573433 68. Sweetman B (1986) Stealth aircraft: secrets of future airpower. Motorbooks Intl, London. ISBN-13: 978-0879382087 69. Kenton Z (2016) Stealth aircraft technology. CreateSpace Independent Publishing Platform, Scotts Valley. ISBN-13: 978-1523749263 70. Mistaken identity. Futility Closet. Accessed 29 Apr 2011. http://www.futilitycloset.com/2011/ 04/29/mistaken-identity-2/ 71. Cellania M (2014) Alphonse Bertillon and the identity of criminals. Ment Floss. Accessed 21 Oct 2014. https://www.mentalfloss.com/article/59629/alphonse-bertillon-and-identitycriminals 72. Cole SA (2002) Suspect identities a history of fingerprinting and criminal identification. Harvard University Press, Cambridge. ISBN 9780674010024
Chapter 2
Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves Mark K. Hinders and Corey A. Miller
Abstract Structural health monitoring is a branch of machine learning where we automatically interpret the output of in situ sensors to assess the structural integrity and remaining useful lifetime of engineered systems. Sensors can often be permanently placed in locations that are inaccessible or dangerous, and thus not appropriate for traditional nondestructive evaluation techniques where a technician both performs the inspection and interprets the output of the measurement. Ultrasonic Lamb waves are attractive because they can interrogate large areas of structures with a relatively small number of sensors, but the resulting waveforms are challenging to interpret even though these guided waves have the property that their propagation velocity depends on remaining wall thickness. Wavelet fingerprints provide a method to interpret these complex, multi-mode signals and track changes in arrival time that correspond to thickness loss due to inevitable corrosion, erosion, etc. Guided waves follow any curvature of plates and shells, and will interact with defects and structural features on both surfaces. We show results on samples from aircraft and naval structures. Keywords Lamb wave · Wavelet fingerprint · Machine learning · Structural health monitoring
2.1 Introduction to Lamb Waves Ultrasonic guided waves are naturally suited to structural health monitoring of aerospace and naval structures, since large areas can be inspected with a relatively small number of transducers operating in various pitch-catch and/or pulse-echo scenarios. They are confined to the plate-/shell-/pipe-like structure itself and so follow its shape and curvature, with sensitivity to material discontinuities at either surface or in the interior. In structures, these guided waves are typically referred to as Lamb waves, a terminology we will tend to use here. Lamb waves are multi-modal and dispersive. M. K. Hinders (B) · C. A. Miller Department of Applied Science, William & Mary, Williamsburg, VA, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. K. Hinders, Intelligent Feature Selection for Machine Learning Using the Dynamic Wavelet Fingerprint, https://doi.org/10.1007/978-3-030-49395-0_2
45
46
M. K. Hinders and C. A. Miller
Fig. 2.1 Dispersion curves for an aluminum plate. Solutions to the Rayleigh–Lamb wave equations are plotted here for both symmetric (solid lines) and antisymmetric (dashed lines) for both phase and group velocities
When plotted versus a combined frequency-thickness parameter, Fig. 2.1, the phase and group velocities of the symmetric and antisymmetric families of modes are as shown for aluminum plates, although other structural materials have similar behavior. With the exception of the zeroth-order modes, all Lamb wave modes have a cutoff frequency-thickness value where their phase and group velocities tend to infinity and zero, respectively, and hence below which those modes do not propagate.
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
47
Fig. 2.2 Lamb wave tomography allows the reconstruction of thickness maps in false-color images because slowness is related to plate thickness via the dispersion curves. Each reconstruction requires many criss-cross Lamb wave measurements with arrival times of the modes of interest automatically extracted from the recorded waveforms
Fig. 2.3 A typical Lamb waveform with modes of interest indicated
Characteristic change of group velocity with thickness changes is what makes Lamb waves so useful for detecting flaws, such as corrosion and disbonds, that represent an effective change in thickness. We exploited these properties years ago in making Lamb wave tomographic reconstructions, as shown in false-color thickness maps in Fig. 2.2 for a partially delaminated doubler (left) a dished-out circular thinning (middle) and a circular flat-bottom hole (right). The key technical challenge to using Lamb waves effectively turns out to be automatically identifying which modes are which in very complex waveform signals. A typical signal for an aluminum plate is shown in Fig. 2.3, with modes marked according to the group velocities from the dispersion curve. In sharp contradistinction to traditional bulk-wave ultrasonics, it’s pretty clear that peak-detection sorts of approaches will fail rather badly. Although the dispersion curves tell us that the S2 mode is fastest, for this particular experimental case that mode is not generated with much energy and the S2 mode happens to have an amplitude more or less in the noise. The A0 , S0 , and A1 modes have higher amplitude, but those three modes all have about the same group velocity so they arrive jumbled together in time. Angle
48
M. K. Hinders and C. A. Miller
Fig. 2.4 Lamb wave scattering from flaws is inherently a three-dimensional process, as can be seen from the screenshots of simulations of Lamb waves interacting with an attached thickness (left) and a rectangular thinning (right)
blocks, comb transducers, etc. can be employed to select purer modes, or perhaps a wiser choice of frequency-thickness product can give easier to interpret signals. For example, f d = 4 MHz-mm could give a single fast S1 mode with all other modes much slower. Of course, most researchers simply choose a value below f d = 2 MHz-mm where all but the two fundamental modes are cut off. Some even go much lower in frequency where the A0 mode is nearly cut off and the S0 mode is not very dispersive, although that tends to minimize the most useful aspects of Lamb waves for flaw detection which is that the different modes each have different throughthickness displacement profiles. Optimal detection of various flaw types depends on choice of modes with displacement profiles that will interact strongly with it, i.e., scatter from it. Moreover, this scattering interaction will cause mode mixing to occur which can be exploited to better identify, locate, and size flaws. Lamb wave scattering from flaws is inherently a three-dimensional process, as can be seen from the screenshots of simulations of Lamb waves interacting with an attached thickness (left) and a rectangular thinning (right) in Fig. 2.4.
2.2 Background There is a large literature on the use of Lamb waves for nondestructive evaluation and structural health monitoring. Mathematically, they were first described by Lamb in 1917 [1] with experimental confirmation at ultrasonic frequencies published by Worlton in 1961 [2]. Viktorov’s classic book [3] still provides one of the best practical descriptions of the use of Lamb waves although the Dover edition of Graff’s book [4] is more widely available and gives an excellent discussion of the mathematical necessities. Rose’s more recent text [5] is also quite popular with Lamb wave researchers. Overviews of this research area can be found in several recent review articles [6–11] and books such as [12, 13]. There are two distinct guided wave propagation approaches available for structural health monitoring. The first is characterized by trying to simplify the signal inter-
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
49
pretation by selectively generating a single pure guided wave mode either by using more complex transducers, e.g., angle blocks or interdigitated comb transducers, or by selecting a low-enough frequency-thickness that only the first two guided wave modes aren’t cut off. The second approach aims to minimize the size and complexity of the transducer itself but to then choose a particular range of frequency-thickness values where the multi-mode guided wave signals are still manageable. The first approach also typically uses a tone-burst excitation in order to narrow the frequency content of the signals, whereas the second approach allows either a spike excitation or a walking tone-burst scheme with a broader frequency content. Which is the appropriate choice depends on the specific SHM application, although it’s important to point out that rapid advances in signal processing hardware and algorithms means that the second approach is the clear winner over the long term. The “pure mode” approach is inherently limited by trying to peak detect as is done in traditional ultrasound. Also, for scattering flaws such as cracks it’s critical both to have some bandwidth to the signals, so that the frequency dependence of scattering can be exploited to size flaws, and to be able to interpret multi-mode signals because guided wave mode conversion is the most promising approach to identify flaws without the need to resort to baseline subtraction. Many structural health monitoring approaches still use baseline subtraction [14– 21] which requires one or more reference signals recorded from the structure in an unflawed state. Pure modes [22–32] and/or dispersion and temperature correction [33–39] are also often employed to simplify the signal interpretation, although mode conversion during scattering at a crack often enables baseline-free approaches [40– 55] as does the time-reversal technique [56–64].
2.3 Simulation Methods for SHM In optimizing guided wave SHM approaches and defining signal processing strategies, it’s important to be able to simulate the propagation of guided waves in structures and to be able to visualize how they interact with flaws. Most researchers tend to use one of the several commercially available finite element method (FEM) packages for this or the semi-analytic finite element (SAFE) technique [65–80]. Not very many researchers are doing full 3D simulations, since this typically requires computer clusters rather than desktop workstations in order to grid the full volume of the simulation space at high enough spatial resolution to accurately capture multi-mode wave behavior as well as the flaw geometry. This is an issue because guided wave interaction with flaws is inherently a 3D process, and two-dimensional analyses can give quite misleading information. The primary lure of FEM packages is their ability to model complex structures, although this is less of an advantage when simulating guided wave propagation.
50
M. K. Hinders and C. A. Miller
A more attractive approach turns out to be variations of the finite-difference timedomain (FDTD) method [81, 82] where the elastodynamic field equations and boundary conditions are discretized directly and the simulation is stepped forward in time, recording the stress components and/or displacements across the 3D Cartesian grid at each time step. Fellinger et al. originally developed the basic equations of the elastodynamic finite integration technique (EFIT) along with a unique way to discretize the material parameters for ensured continuity of stress and displacement across the staggered grid in 1995 [83]. Schubert et al. demonstrated the flexibility of EFIT with discretizations in Cartesian, cylindrical, and spherical coordinates for a wide range of modeling applications [84–86]. Although commercial codes aren’t mature, these approaches are relatively straightforward to implement on either multi-core workstations or large computer clusters in order to have sufficient processing power and memory to perform high-resolution 3D simulations of realistic structures. In our experience [87–92], the finite integration technique tends to be quite a bit faster than the finite element method because the computationally intensive meshing step is eliminated by using a simple uniform Cartesian grid. For pipe inspection, a different, but only slightly more complex, cylindrical discretization of the field equations and boundary conditions is used, which optimizes cylindrical EFIT for simulating guided wave propagation in pipe-like structures. In addition to simple plate-like or pipe-like structures, EFIT can also be used to simulate wave propagation in complex built-up structures. Various material combinations, interface types, etc. can be simulated directly with cracks and delaminations introduced by merely adjusting the boundary conditions at the appropriate grid points. Simulation methods are also critical to optimizing guided wave structural health monitoring because the traditional approaches for modeling scattering from canonical flaws [93–95] fail for guided waves or are limited to artificial 2D situations. For low-enough frequency-thickness, the Mindlin plate theory [96] allows for analytical approaches to Lamb wave scattering from simple through-thickness holes and such, and can even account for some amount of mode conversion, but at the cost of assuming a simplistic through-thickness displacement profile. Most researchers have typically used boundary element method (BEM) and related integral equation approaches to simulate 2D scattering, but these are in some sense inherently lowfrequency methods. The sorts of high-frequency approaches that served the radar community so well until 3D computer simulations became viable aren’t appropriate for guided wave SHM, although there was a significant effort to begin to derive a library of diffraction coefficients three decades ago. With currently available computers, almost all researchers are now using FEM or EFIT to isolate the details of guided wave interaction with flaws. There are also a variety of experimental studies reported recently [40–110] to examine Lamb wave scattering from flaws, including Lamb wave tomography [104–113] which we worked on quite bit a number of years ago [114–125].
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
51
2.4 Signal Processing for Lamb Wave SHM Advanced signal processing methods are necessary for guided wave SHM both because the signals are quite complex and because identifying and quantifying small flaws while covering large structures with a minimum number of sensors means propagation distances are going to be large and the “fingerprint” of the flaw scattering will usually be quite subtle. There is also the confounding issue that environmental changes, especially temperature, will affect the guided wave signals. This is a particular problem for baseline subtraction methods which assume that one or more baseline signals have been recorded with the structure in an unflawed state so that some sort of signal difference metric can be employed to indicate the presence of damage. With baseline subtraction approaches, there is also the issue of natural fluctuations (noise) in the Lamb waveforms themselves, which usually means that some sort of simplified representation of the signal, such as envelope, is rendered before subtracting off the baseline. The danger of this is that the subtle flaw fingerprints may be further suppressed by simplifying the signals. Other ways to compare signals in time domain are cross-correlation sorts of approaches, with or without stretching or multiple baselines to account for temperature variations, etc. Because scattering from flaws is frequency dependent, and because different Lamb wave modes have differing through-thickness displacement profiles and frequency-dispersion properties, the most promising signal processing approaches for Lamb wave SHM include joint time–frequency and time–scale methods. Echolocation [126] is actually quite similar to structural health monitoring with guided waves in that the time delay is used to locate flaws, while the character of the scattered echoes is what allows us to identify and quantify the flaws. A 2D color image time–frequency representation (TFR) typically has time delay on the horizontal axis and frequency on the vertical axis. The simplest way to form a spectrogram is via a boxcar FFT, where an FFT is performed inside of a sliding window to give the spectrum at a sequence of time delays. Boxcar FFT is almost never the optimal TFR, however, since it suffers rather badly from an uncertainty effect. Making the time window shorter to better localize the frequency content in time means that there often aren’t enough sample points to accurately form the FFT. Lengthening the window to get a more accurate spectrum doesn’t solve the problem, since then the time localization is imprecise. Alternative TFRs have been developed to overcome many of the deficiencies of the traditional spectrogram [127]. However, since guided wave SHM signals are typically composed of one or more relatively short wave pulses, albeit often overlapping, it is natural to explore TFRs that use basis functions with compact support. Wavelets [128] are very useful for analyzing time-series data because the wavelet transform allows us to keep track of both time and frequency, or scale features. Whereas Fourier transforms break down a signal into a series of sines and cosines in order to identify the frequency content of the entire signal, wavelet transforms keep track of local frequency features in the time domain. Ultrasonic signal analysis with wavelet transforms was first studied by Abbate in 1994 who found that if the mother wavelet was well defined there was good
52
M. K. Hinders and C. A. Miller
peak detection even with large amounts of added white noise [129]. Massicotte, Goyette, and Bose then found that even noisy EMAT sensor signals were resolvable using the multi-scale method of the wavelet transform [130]. One of the strengths compared to the fast Fourier transform was that since the extraction algorithm did not need to include the inverse transform, the arrival time could be taken directly from the time–frequency domain of the wavelet transform. In 2002, Perov et al. considered the basic principles of the formulation of the wavelet transform for the purpose of an ultrasonic flaw detector and concluded that any of the known systems of orthogonal wavelets are suitable for this purpose as long as the number of levels does not drop below 4–5 [131]. In 2003, Lou and Hu found that the wavelet transform was useful in suppressing non-stationary wideband noise from speech [132]. In a comparison study between the Wigner–Ville distribution and the wavelet transform, preformed by Zou and Chen, the wavelet transform out performed the Wigner–Ville in terms of sensitivity to the change in stiffness of a cracked rotor [133]. In 2002, Hou and Hinders developed a multi-mode arrival time extraction tool that rendered the time-series data in 2D time-scale binary images [134]. Since then this technique has been applied to multi-mode extraction of Lamb wave signals for tomographic reconstruction [121, 123], time-domain reflectometry signals wiring flaw detection [135], acoustic microscopy [136], and a periodontal probing device [137]. Wavelets remain under active study worldwide for the analysis of a wide variety of SHM signals [138–153] as do various other time–frequency representations [154–159]. The Hilbert–Huang transform (HHT) [160–164] along with chirplets, Golay codes, fuzzy complex numbers, and related approaches are also gaining popularity [65–166]. Our preferred method, which we call dynamic wavelet fingerprints, is discussed in some detail below. Wavelets are often ideally suited to analyzing non-stationary signals, especially since there are a wide variety of mother wavelets that can be evaluated to find those that most parsimoniously represent a given class of signals. The wavelet transform coefficients can be rendered in an image similar to the spectrogram, except that the vertical axis will now be “wavelet scale” instead of frequency. The horizontal axis will still be time delay because the “wavelet shift” corresponds to that directly. Nevertheless, these somewhat abstract time-scale images can be quite helpful for identifying subtle signal features that may not be resolvable via other TFR methods.
2.5 Wavelet Transforms Wavelets are ideally suited for analyzing non-stationary signals. They were originally developed to introduce a local formulation of time–frequency analysis techniques. The continuous wavelet transform (CWT) of a square-integrable, continuous function s(t) can be written as C(a, b) =
+∞
−∞
∗ ψa,b (t)s(t)dt,
(2.1)
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
53
where ψ(t) is the mother wavelet, ∗ denotes the complex conjugate, and ψa,b (t) is given by t −b −p . (2.2) ψa,b (t) = |a| ψ a Here, the constants a, b ∈ R, where a is a scaling parameter defined by p ≥ 0, and b is a translation parameter related to the time localization of ψ. The choice of p is dependent only upon which source in the literature is being referred to, much like the different conventions for the Fourier transform, so we choose to implement the most common value of p = 1/2. The mother wavelet can be any square-integrable function of finite energy and is often chosen based on its similarity to the inherent structure of the signal being analyzed. The scale parameter a can be considered to relate to different frequency components of the signal. For example, small values of a result in a compressed mother wavelet, which will then highlight many of the highdetail characteristics of the signal related to the signal’s high-frequency components. Similarly, large values of a result in stretched mother wavelets, returning larger approximations of the signal related to the underlying low-frequency components. To better understand the behavior of the CWT, it can be rewritten as an inverse Fourier transform, C(a, b) =
1 2π
+∞
−∞
∗ √ ˆ sˆ (ω) a ψ(aω) e jωb dω,
(2.3)
ˆ where sˆ (ω) and ψ(ω) are the Fourier transforms of the signal and wavelet, respectively. From Eq. (2.3), it follows that stretching a wavelet in time causes its support in the frequency domain to shrink as well as shift its center frequency toward a lower frequency. This concept is illustrated in Fig. 2.5. Applying the CWT with only a single mother wavelet can therefore be thought of as applying a bandpass filter, while a series of mother wavelets via changes in scale can be thought of as a bandpass filter bank. An infinite number of wavelets are therefore needed for the CWT to fully represent the frequency spectrum of a signal s(t); since every time the value of the scaling parameter a is doubled, the bandwidth coverage is reduced by a factor of 2. An efficient and accurate discretization of this involves selecting dyadic scales and positions based on powers of two, resulting in the discrete wavelet transform (DWT). In practice, the DWT requires an additional scaling function to act as a low-pass filter to allow for frequency spectrum coverage from ω = 0 up to the bandpass filter range of the chosen wavelet scale. Together, scaling functions and wavelet functions provide full-spectrum coverage for a signal. For each scaled version of the mother wavelet ψ(t), a corresponding scaling function φ(t) exists. Just as Fourier analysis can be thought of as the decomposition of a signal into various sine and cosine components, wavelet analysis can be thought of as a decomposition into approximations and details. These are generated through an implementation of the wavelet and scaling function filter banks. Approximations are the high-scale
54
M. K. Hinders and C. A. Miller
Fig. 2.5 Frequency-domain representation of a hypothetical wavelet at scale parameter values of a = 1, 2, 4. It can be seen that increasing the value of a leads to both a reduced frequency support and a shift in the center frequency component of the wavelet toward lower frequencies. In this sense, the CWT acts as a shifting bandpass filter of the input signal
(low-frequency) components of the signal revealed by the low-pass scaling function filters, while details are the low-scale (high-frequency) components revealed by the high-pass wavelet function filter. This decomposition process is iterative, with the output approximations for each level used as the input signal for the following level, illustrated in Fig. 2.6. In general, most of the information in a time-domain signal is contained in the approximations of the first few levels of the wavelet transform. The details of these low levels often have mostly high-frequency noise information. If we remove the details of these first few levels and then reconstruct the signal with the inverse wavelet transform, we will have effectively de-noised the signal, keeping only the information of interest.
2.5.1 Wavelet Fingerprinting Once a raw signal has been filtered, we then pass it through the DWFP algorithm. Originally developed by Hou [134], the DWFP applies a wavelet transform on the original time-domain data, resulting in an image containing “loop” features that resemble fingerprints. The wavelet transform coefficients can be rendered in an image similar to a spectrogram, except that the vertical axis will be scale instead of frequency. These time-scale image representations can be quite helpful for identifying subtle signal features that may not be resolvable via other time–frequency methods. Combining Eqs. 2.1 and 2.2, the CWT of a continuous square-integrable function s(t) can be written as 1 C(a, b) = √ a
+∞ −∞
s(t)ψ ∗
t −b dt. a
(2.4)
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
55
Fig. 2.6 The signal is decomposed into approximations (A1 ) and details (D2 ) at the first level. The next iteration then decomposes the first-level approximation coefficients into second-level approximations and details, and this process is repeated for the desired number of levels. For wavelet filtering, the first few levels of details can be removed, effectively applying a low-pass filter to the signal
Unlike the DWT, where scale and translation parameters are chosen according to the dyadic scale (a = 2m , b = n2m , n, m ∈ Z2 ), the MATLAB implementation of the CWT used here utilizes a range of real numbers for these coefficients. A normal range of scales includes a = 1, . . . , 50 and b = 1, . . . , N for a signal of length N . This results in a two-dimensional array of coefficients, C(a, b), which are normalized to the range of [−1, 1]. These coefficients are then sliced in a “thick” contour manner, where the number of slices and thickness of each slice is defined by the user. To increase efficiency, the peaks (C(a, b) ≥ 0) and valleys (C(a, b) < 0) are considered separately. Each slice is then projected onto the time-scale plane. The resulting slice projections are labeled in an alternating, binary manner, resulting in a binary “fingerprint” image, I (a, b): s(t)
DW F P(ψa,b )
−→
I (a, b).
(2.5)
56
M. K. Hinders and C. A. Miller
The values of slice thickness and number of slices can be varied to alter the appearance of the wavelet coefficients, as can changing which mother wavelet is used. The process of selecting mother wavelets for consideration is application-specific, since certain choices of ψ(t) will be more sensitive to certain types of signal features. In practice, mother wavelets used are often chosen based on preliminary analysis results as well as experience. In general, most of the information in a signal is contained in the approximations of the first few levels of the wavelet transform. The details of these low levels often have mostly high-frequency noise information. If we set the details of these first few levels to zero, when we reconstruct the signal with the inverse wavelet transform we have effectively de-noised our signal to keep only information of the Lamb wave modes of interest. In our work, we start with the filtered ultrasonic signal and take a continuous wavelet transform (CWT). The CWT gives a surface of wavelet coefficients, and this surface is then normalized between [0–1]. Then, we perform a thick contour slice operation where the user defines the number of slices to use: the more the slices, the thinner the contour slice. The contour slices are given the value of 0 or 1 in alternating fashion. They are then projected down to a 2D image where the result often looks remarkably like the ridges of a human fingerprint, hence the name “wavelet fingerprints.” Note that we perform a wavelet transform as usual, but then instead of rendering a color image we form a particular type of binary contour plot. This is illustrated in Fig. 2.7. The wavelet fingerprint has time on the horizontal axis and wavelet scales on the vertical axis. We’ve deliberately shown this at a “resolution” where the pixilated nature of the wavelet fingerprint is obvious. This is important because each of the pixels is either black or white: it is a binary image. The problem has thus been transformed from one-dimensional signal identification problem to a 2D image recognition scenario. The power of the dynamic wavelet fingerprint (DWFP) technique is that it converts the time-series data into a binary matrix that is easily stored and transferred, and is amenable to edge computing implementations. There is also robustness to the simple algorithm (Fig. 2.8) since different mother wavelets emphasize different features in the signals. The last piece of the DWFP technique is recognition of the binary image features that correspond to the waveform features of interest. We have found that different modes are represented in unique features in our applications. We’ve also found that using a simple ridge-counting algorithm on the 2D images is often a helpful way to identify some of the features of interest. In Fig. 2.9, we show a small portion of a fingerprint image, blown up so the ridge counting at each time sample can be seen. Figure 2.10 shows longer fingerprints for two waveforms with and without a flaw in the location indicated by the dashed rectangle. In this particular case, the flaw is identified by thresholding the ridge-count metric, as indicated by the bottom panel. Once such a feature has been identified in the time-scale space, we know its arrival in the time domain as well and we can then draw conclusions about its location based on our knowledge of that guided wave mode velocity.
2 Intelligent Structural Health Monitoring with Ultrasonic Lamb Waves
57
0.5 0 -0.5 -0.2
0 Time (ms)
0.2
Scales
60 40 20
-0.2
0 Time (ms)
0.2
Fig. 2.7 A visual summary of the DWFP algorithm [134]. A time-domain signal for which a set of wavelet coefficients is generated via the continuous wavelet transform. The coefficients are then “thickly” sliced and projected onto the time-scale plane, resulting in two-dimensional binary images, shown here with white peaks and gray valleys for distinction
The inherent advantage of traditional TFRs is thus to transform a one-dimensional time-trace signal into a two-dimensional image, which then allows powerful image processing methods to be brought to bear. The false-color images also happen to be visually appealing, but this turns out to be somewhat of a disadvantage when the goal is to automatically identify the features in the image that carry the information about the flaw(s). High-resolution color imagery is computationally expensive to both store and process, and segmentation is always problematic. This latter issue is particularly difficult for SHM because it’s going to be something about the shape of the flaw signals(s) in the TFR image that we’re searching for automatically via image processing algorithms. A binary image requires much less computer storage than does a greyscale or color image, and segmentation isn’t an issue because it’s a trivial matter to decide between black and white. These sorts of fingerprint images can be formed from any TFR, of course, although wavelets seem to work quite well for guided wave ultrasonics and a variety of applications that we have investigated. Figure 2.10 shows two cases with and without flaws.
58
M. K. Hinders and C. A. Miller Dynamic Wavelet Fingerprint Algorithm Nondestructive Evaluation Laboratory – William & Mary Applied Science Department
The code at the bottom is a MATLAB algorithm to create a 2D fingerprint image from a 1D signal. This is the same algorithm used in the Wavelet Fingerprint Tool Box. This algorithm can easily be implemented in any programming language that can perform a continuous wavelet transform. The following table description the variables passed to the function. datain wvt ns numridges rthickness
The raw 1D signal in which the wavelet fingerprint is created. The name of the mother wavelet. For example: ‘coif2’, ‘sym2’, ‘mexh’, ‘db10’. The number of scales to use in the continuous wavelet transform (start with 50). The number of ridges used in the wavelet fingerprint. (start with 5) The thickness of the ridges normalized to 1. (start with 0.12)
The output variable fingerprint contains the wavelet fingerprint image. It is an array (length(rawdata) by ns) of 1’s and 0’s where the 1’s represent the ridgelines. The following is a sample call of this function. >>
fingerprint = getfingerprint( rawdata, ‘coif3’, 50, 5, 0.12 );
MATLAB Wavelet Fingerprint Algorithm function [ fingerprint ] = getfingerprint( datain, wvt, ns, numridges, rthickness) cfX = cwt(datain, 1:ns, wvt); cfX = cfX./ max(max(abs(cfX))); fingerprint(1:ns,1:length(datain))=0;
% get continuous wavelet transform coefficients % normalize coefficients % set image size and all values to zero
% Ridge locations is an array that holds the center of each slice (ridge) rlocations = [-1:(1/numridges):-(1/numridges) (1/numridges):(1/numridges):1]; for sl = 1:length(rlocations) % Loop through each slice for y = 1:ns % Loop through cfX array for x = 1:length(rawdata) if (cfX(y,x)>=(rlocations(sl)-(rthickness/2))) & (cfX(y,x) p(ωk |x), k = j.
(6.1)
By using Bayes theorem, Eq. (6.1) can be rewritten as p(x|ω j ) p(ω j ) > p(x|ωk ) p(ωk ), k = j.
(6.2)
In this way, the sensed object associated with the feature vector x is assigned to the class ω j with the highest likelihood. In practice, there are several feature vectors xi , each with an associated class label wi taking on the value of one of the ω j . Classification generally involves calculating those posterior probabilities p(ωi |x) using some mapping. If N represents the number of objects to be classified and M is the number of features in the feature vector x, then pattern classification will be performed on the features xi that have an associated array of class labels wi that take on values ω1 , ω2 , ω3 for i = 1, . . . , N . The most useful classifier for this dataset was support vector machines (SVM) [27].
6.3 Method 6.3.1 Apparatus The experimental apparatus is shown in Fig. 6.3a. In operating the scanner, the transmitting transducer remains fixed while the receiving transducer steps by motor along the circumference of the pipe until it returns to the starting position. The transmitting transducer indexes by one unit step, and the process repeats. This process gives a dataset with helical criss-cross propagation paths which allows for mode-slowness reconstruction via helical ultrasound tomography. In this study, tomographic scans of a single aluminum pipe were used (Fig. 6.1). The pipe was 4 mm thick with a circumference of 19 inches, and the transducers were placed 12.25 in apart. Tapered delay lines were used between the transducer face and the pipe surface. Two different kinds of flaws were introduced into the pipe: a shallow, interior-surface gouge flaw approximately 3.5 cm in diameter, and
186
C. B. Acosta and M. K. Hinders
Fig. 6.3 a The experimental apparatus, in which motors drive a transmitting and receiving transducer around the perimeter of the pipe, and b an illustration of the ray path wrapping of the “unrolled” pipe. In (b), the shortest ray path for the same transducer positions is actually the one that wraps across the 2D boundary
a through-hole 3/8 in in diameter. The positions of the flaws introduced to the pipe can be seen in Fig. 6.1, where the geometry of the figure corresponds to the unrolled pipe. There were 180 steps of each of the transducers used. Multiple scans of the pipe were performed while the frequency of the transducer ranged from 0.8 − 0.89 MHz in units of 0.1 Mhz. For classification, only three of these frequencies were selected: 0.8 Mhz, 0.84 MHz, and 0.89 MHz. The pipe was scanned under these conditions, then the hole was increased to 1/2 in diameter, and the process was repeated.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
187
6.3.2 Ray Path Selection In order to perform classification on the ray paths involved, we first calculated which transducer and receiver index positions, called i 1 and i 2 , respectively, correspond to a ray path intersecting one of the flaws. To this end, the physical specifications of the pipe are mapped to that of a plate, with the positions of the transmitting and receiving transducers on the left- and right-hand sides. Figure 6.3b demonstrates the unwrapping of the pipe into 2D geometry and shows that some transmitting and receiving transducers result in a ray path that wraps around the boundary of the 2D space. The equation for the distance the Lamb wave travels on the surface of the pipe can be derived from the geometry as shown in Fig. 6.4 and is given by [16] s=
L 2 + a2 =
L 2 + γ 2r 2 ,
where L is the axial distance between the transducers, a is the arc length distended by the axial and actual distance between the transducers, and r is the radius of the pipe. The variable γ is the smallest angle between the transducers, γ = min {(φ1 − φ2 + 2π ), |φ1 − φ2 |, (φ1 − φ2 − 2π )} , where φ1 and φ2 are the respective angles of the transducers. The transmitting and receiving transducers have indices represented by i 1 and i 2 , respectively, so that i 1 , i 2 = 1, . . . , 180. Then if L is the distance between the transducers, and Y is the circumference of the pipe, both measured in centimeters, then the abscissa position of the transducers is x1 = 0 and x2 = L, and the radius of the pipe is given by r = Y/(2π ). The indices can be converted to angles using the fact that there are 180 transducer positions in one full rotation around the pipe. This gives
Fig. 6.4 The derivation of the equation for the distance a ray path travels on the surface of a pipe is shown, where L is the axial distance between the transducer, s is the distance traveled by the ray path, and a is the arc length distended by the axial and actual distance between the transducers
188
C. B. Acosta and M. K. Hinders
φ1 = i 1
2π 180
and similarly for φ2 . Substituting these into the expression for γ , the minimum angle is 2π min {(i 1 − i 2 + 180), |i 1 − i 2 |, (i 1 − i 2 − 180)} (6.3) γ = 180 and the axial distance between the transducers is already given as f z = L. Then substitution into Eqn. (6.3) yields the helical ray path distance between the transducers. The positions of the flaws were added to the simulation space, and ray paths were drawn between the transducers using coordinates (x1 , y1 ) and (x2 , y2 ) that depend on i 1 and i 2 , as described above. If the ray path intersected any of the flaws, that ray path was recorded to have a class label corresponding to that type of flaw. The labels included no flaw encountered (ωi = 1), gouge flaw (ωi = 2), and hole flaw (ωi = 3). For the identification of class labels, the flaws were approximated as roughly octagonal in shape, and the ray paths were approximated as lines with a width determined by the smallest pixel size, which is 0.1 · (Y/180) = 0.2681 mm. In reality, Lamb waves have some horizontal spread. These aspects of the ray path simulation may result in some mislabeled ray paths.
6.4 Classification As already mentioned, classification will be performed on ray paths with a limitation on the distance between the transducers. In preliminary tests, it was noted that when all the distances were considered, many of the ray paths that actually intersected the hole flaw were labeled as intersecting no flaws. The explanation is that Lamb waves tend to scatter from hole flaws [28], which means no features will indicate a reduction in thickness of the plate, but the distance traveled will be longer. Therefore, limiting the classification on ray paths of a certain distance improve the classification by reducing the influence of scattering effects.
6.4.1 Feature Extraction Let D represents the distance limit. The next step of classification is to find Mmany features for feature vectors xi such that si ≤ D, i = 1, .., N . Feature extraction used dynamic wavelet fingerprinting (DWFP), while feature selection involved either selecting points at the relevant mode arrival times for a tomographic scan using a single transducer frequency, or selecting points at only one or two mode arrival times for three different transducer frequencies at once.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
189
6.4.2 DWFP The DWFP technique (Fig. 6.5) applies a wavelet transform on the original timedomain waveform, which results in “loop” features that resemble fingerprints. It has previously shown promise for a variety of applications including an ultrasonographic periodontal probe [29–32], detection of ultrasonic echoes in thin multilayered structures [33], and structural monitoring with Lamb waves [17, 34–36]. The Lamb wave tomographic waveforms were fingerprinted using the DWFP algorithm without any preprocessing or filtering. Let φi (t) represent a waveform selected from a Lamb wave tomographic scan (i = 1, . . . , N ). The first step of the DWFP (Fig. 6.5a, b) involves applying a wavelet transform on each of the waveforms. The continuous wavelet transform can be written as +∞ φ(t)ψa,b (t)dt. (6.4) C(a, b) = −∞
Here, φ(t) represents a square-integrable 1D function, where we are assuming φ(t) = φi (t), and ψ(t) represents the mother wavelet. The mother wavelet is transformed in time (t) and scaled in frequency ( f ) using a, b ∈ R, respectively, where a ∝ f and b ∝ t, in order to form the ψa,b (t) in Eq. (6.4). The wavelet transform on a single waveform (Fig. 6.5a) results in wavelet coefficients (Fig. 6.5b). Then, a slicing algorithm is applied to create an image analogous to the gradient of the wavelet coefficients in the time-scale plane, resulting in a binary image, I (a, b). The mother wavelets selected were those that previously showed promise for similar applications, including Debauchies 3 (db3) and Symelet 5 (sym5). The resulting image I contains fingerprint-like binary contours of the initial waveform φi (t). The next step is to apply image processing routines to collect properties from each fingerprint object in each waveform. First, the binary image I is labeled with the 8-connected objects, allowing each individual fingerprint in I to be recognized as a separate object using the procedure in Haralick and Shapiro [37]. Next, properties
Fig. 6.5 The DWFP technique begins with (a) the ultrasonic signal, where it generates (b) wavelet coefficients indexed by time and scale, where scale is related to frequency. Then the coefficients are sliced and projected onto the time-scale plane in an operation similar to a gradient, resulting in (c) a binary image that is used to select features for the pattern classification algorithm
190
C. B. Acosta and M. K. Hinders
are measured from each fingerprint. Some of these properties include counting the on- and off-pixels in the region, but many involve finding an ellipse matching the second moments of the fingerprint and measuring properties of that ellipse such as eccentricity. In addition to the orientation measure provided by the ellipse, another measurement of inclination relative to the horizontal axis was determined by Horn’s method for a continuous 2D object [38]. Lastly, further properties were measured by determining the boundary of the fingerprint and fitting 2nd or 4th order polynomials. The image processing routines result in fingerprint properties Fi,ν [t] relative to the original waveform φi (t), where ν represents an index of the image processingextracted fingerprint properties (ν = 1, . . . , 17). These properties are discrete in time because the values of the properties are matched to the time value of the fingerprint’s center of mass. Linear interpolation yields a smoothed array of property values, Fi,ν (t).
6.4.3 Feature Selection The DWFP algorithm results in a 2D array of fingerprint features for each waveform, while only a 1D array of features can be used for classification. For each waveform φi (t), the feature selection procedure finds M-many features xi, j , j ≤ M from the 2D array of wavelet fingerprint features Fi,ν (t). In this case, the DWFP features selected were wavelet fingerprint features that occurred at the predicted mode arrival times for all fingerprint features under study. At the frequency-thickness product used, there were are four modes available: S0, S1, A0, and A1. However, as Fig. 6.6
Expected arrival times from tomographic scan at i1=45, i2=135 1500 1000 500 0 Waveform S0 S1 A0 A1
−500 −1000 −1500
0
1000
2000
3000
4000
5000
Samples
Fig. 6.6 A sample waveform for a single pair of transducer positions (i 1 , i 2 ) = (45, 135) is shown here along with the predicted Lamb wave mode arrival times
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
191
shows, the S1 arrival time often occurs early in the signal, and there may not always be DWFP fingerprint feature available. In addition, because there were several different transducer frequencies studied, in order that the number of features remained manageable, there were two different feature selection schemes used. 1. All modes: All four mode arrival times for all 17 fingerprint properties from both mother wavelets are used, but only one transducer frequency is studied from the range {0.8, 0.84, 0.89} MHz. There are M = 136 features selected. 2. All frequencies: One or two mode arrival times for all 17 fingerprint properties from both mother wavelets are used for all three frequencies at once. The modes used include S0, A0, A1, in which case there are M = 102 features used. There were also combinations of S0& A0, S0& A1, and A0& A1 mode arrival times used for all properties, frequencies, and mother wavelets, in which case there were M = 204 features selected. The class imbalance problem must be considered for this dataset [39]. The natural class distribution for the full tomographic scan has a great majority of ray paths intersecting with no flaws. In this case, only 3% of the data intersects with the hole flaw, and 10.5% intersects with the gouge flaw. These are poor statistics to build a classifier. Instead, ray paths were randomly selected from the no flaw cases to be included for classification so that |ω1 |/|ω2 | = 2. In the resulting class distribution used for classification, 9% of the ray paths intersect with the hole flaw and 30% intersect with the gouge flaw, so that the number of ray paths used for classification is reduced from N = 32, 400 to N = 11, 274 for all ray path distances. One advantage of limiting the ω1 cases is so that classification can proceed more rapidly. Randomly selecting the ω1 cases to be used does not adversely affect the results, and later, the ω1 ray paths not chosen for classification will be used to test the pipe flaw detection algorithm.
6.4.4 Summary of Classification Variables The list below provides a summary of the variables involved in the classification design. 1. The same pipe was scanned twice, once when the hole was 3/8 in diameter, and again when the hole was enlarged to 1/2 in diameter. 2. The pipe had two different flaws, a gouge flaw (ω2 ) and a hole flaw (ω3 ). The ray paths that intersected no flaw (ω1 ) were also noted. 3. A range of transducer frequencies was used, including 0.8 − 0.89MHz. For classification, only three of these frequencies were selected: 0.8MHz, 0.84MHz, and 0.89MHz. 4. Classification was restricted by ray paths that had a maximum path length D such that s ≤ D. There were 91 different path lengths for all transducer positions. Three different values of D were selected to limit the ray paths selected for classification.
192
5. 6. 7. 8.
9.
C. B. Acosta and M. K. Hinders
These correspond to the 10th, 20th, and 91st path length distances, or D10 = 31.21 cm, D20 = 31.53 cm, and D91 = 39.38 cm. The latter case considers all the ray path distances. For the feature extraction, two different wavelets were used (db3 and sym5), and 17 different wavelet fingerprint properties were extracted. Two different feature selection schemes were attempted, varying either the modes selected or the frequencies used. One classifier was selected here (SVM). Other classifiers (such as quadratic discriminant) had lower accuracy. Classification will be performed on training and testing datasets drawn from each individual tomographic scan using the reduced class distribution. The resulting flaw detection algorithm will be tested with the ω1 ray paths that were excluded from the distribution used for classification. The classification tests on the 1/2 in hole tomographic scan will use the tomographic scan of the 3/8 in hole solely for training the classifier and the 1/2 in hole raypaths solely for testing the classifier. This does not substantially alter the expression of the classification routine in Table 6.1
6.4.5 Sampling The SVM classifier was used on the classifier configured by the options listed above in Sect. 6.4.4. However, SVM is a binary classifier, and three classes were considered in this study. Therefore, the one-versus-one approach was used, in which pairs of classes are compared at one time for classification, and the remaining class is ignored [40]. The process is repeated until all permutations of the available classes are considered. In this case, classification compared ω1 versus ω2 , ω1 versus ω3 , and ω2 versus ω3 . For each pair of binary classes, the training and testing sets were split via bagging by randomly selecting roughly twice as many samples from the more highly populated class as the less populated class for training the SVM classifier and splitting those sets in half for training and testing. The process is repeated until each ray path has been selected several times for training. The results are collapsed by majority rule normalized by the number of samples drawn in the bagging process. Table 6.1 displays pseudocode representing the sampling algorithm that splits the data into training and testing sets and the means by which the majority vote is decided.
6.5 Decision As Table 6.1 shows, a single configuration of the classifier variables C (described in Sect. 6.4.4) takes an array of feature vectors xi , i = 1, . . . , N and their corresponding class labels wi ∈ {ω1 , ω2 , ω3 } and produces an array of predicted class labels λi after
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
193
Table 6.1 The sampling algorithm used to split the data into training and testing sets for SVM classification is described here. It is similar to bagging
dimensionality reduction. Each index i corresponds to a single ray path between the two transducer indices. The classifier performance can be evaluated by measuring its accuracy, which is defined as A(ωk ) =
|(wi = ωk )&(λi = ωk )| , i = 1, . . . , N ; k = 1, 2, 3. |(wi = ωk )|
(6.5)
However, a further step is required in order to determine whether or not there are in fact any flaws present in the pipe scanned by Lamb wave tomography. The predicted labels of the ray paths are not sufficient to decide whether or not there are flaws. Therefore, we will use the ray path drawing algorithm described in Sect. 6.3.2 to superimpose the ray paths that receive predicted labels λi in each class ω1 , ω2 , ω3 . If several ray paths intersect on a pixel, their value is added, so the higher the value of the pixel, the more ray path intersections occur at that point. This technique averages out the misclassifications that occur in the predicted class labels. The ray paths for
194
C. B. Acosta and M. K. Hinders a)
b)
c)
0
0
5
5
10
10
15
15
15
20
20
20
25
C [cm]
5 10
C [cm]
C [cm]
0
25
25
30
30
30
35
35
35
40
40
40
45
45 0
10
20 L [cm]
30
15
10
5
45 0
0
10
20 L [cm]
4
2
0
30
8
6
10
20 L [cm]
4
2
30
6
Fig. 6.7 The ray paths that have received a predicted class label of (a) no flaw, (b) gouge flaw, and (c) hole flaw are drawn here. The classifier configured here used all the available ray path distances (D = D91 ) 0
0 5
5
10
10
15
15
15
20
20
20
25
C [cm]
5 10
C [cm]
C [cm]
c)
b)
a) 0
25
25 30
30
30
35
35
35
40
40
40 45
45
45 0
10
0
1
30
20 L [cm]
2
0
0
10
30
20 L [cm]
2
4
0
10
0
1
20 L [cm]
2
30
3
Fig. 6.8 The ray paths that have received a predicted class label of (a) no flaw, (b) gouge flaw, and (c) hole flaw are drawn here. Unlike Fig. 6.7, the ray path lengths limit used was D20
the predicted class labels associated with each flaw type are drawn separately, so that the more ray paths that have been predicted to be associated with a particular flaw intersect in the same region, the more likely a flaw exists at that point. Figures 6.7 and 6.8 show the ray paths drawn for each predicted class label. Both of these examples were configured for the 3/8 in hole and selected the A0 mode from all three transducer frequencies. However, Fig. 6.7 classified all ray paths regardless of length, and Fig. 6.8 restricted the ray path distances to D20 . The ray path intersections were originally drawn at 10× resolution but the images were later smoothed by
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
195
averaging over adjacent cells to 1× resolution. The larger the pixel value, the more ray paths intersected in that region, and the more likely a flaw exists at that point. Note that in Fig. 6.8b, c, the pixel values are higher at the locations where their respective flaws actually exist. Figure 6.8a also shows more intersections at that point, because of the geometry of the pipe and the way the transducers scanned, those regions of the pipe actually did have more ray path intersections than elsewhere. That’s why those regions were selected to introduce flaws into the pipe. But the largest pixel value when the ray paths predicted to intersect no flaws is smaller than the ray paths predicted to intersect either of the other flaws. Also, Fig. 6.8a shows a higher average pixel value, showing that more intersections occur throughout the pipe rather than focused on a smaller region, such as Fig. 6.8b, c. However, due to the scattering of the Lamb waves, Fig. 6.7 is not as precise. Figure 6.7a does show a higher average pixel value, but Fig. 6.7b, c seem to both indicate hole flaws in the middle of the pipe, and neither seem to indicate the correct location of the gouge flaw. In order to automate the flaw detection process from these ray intersection plots, image recognition routines and thresholds were applied. The process of automatically detecting flaws in the image of ray path intersections (U ) include 1. Apply a threshold h I to the pixel values of the image. If |U > h I | = ∅, then no flaws are detected. Otherwise, define U = U (U > h I ) 2. Check that the size of the nonzero elements of U are smaller than half its total area 1 U (i1, i2) < |U |. 2 i1 i2 3. Apply a threshold h a to the area of U . If i1 i2 U (i1, i2) ≤ h a , then no flaws are detected. Otherwise, decide that U accurately represents the flaws in the region. Return an image of U . This algorithm is only intended to be performed on the ray paths that predicted a flaw location. It does not tend to work well on the ray path intersections that predicted no flaw, since those depend on the geometry of the object being scanned. Figure 6.9 shows predicted flaw locations relative to the sample ray path intersection images given in Figs. 6.7 and 6.8. Specifically, Fig. 6.9a, b gives the predicted flaw locations from Fig. 6.7b, c, while Fig. 6.9c, d gives the predicted flaw locations from Fig. 6.8b, c. Note that the images produced by the classifier are designed to accept all ray path distances (Fig. 6.9a, b) shows a flaw location that is more discrete and closer to the size of the actual flaw, but it is not as accurate at predicting whether or not flaws exist as the classifier designed to accept ray paths for classification restricted by path length (Fig. 6.9c, d).
196
C. B. Acosta and M. K. Hinders b) 1 flaws detected
a) 3 flaws detected 0 Circumference of pipe (cm)
Circumference of pipe (cm)
0 10 20 30 40 0
20 10 Length of pipe (cm)
10 20 30 40 0
30
30
d) 1 flaws detected
c) 1 flaws detected 0 Circumference of pipe (cm)
0 Circumference of pipe (cm)
20 10 Length of pipe (cm)
10 20 30 40 0
20 10 Length of pipe (cm)
30
10 20 30 40 0
20 10 Length of pipe (cm)
30
Fig. 6.9 The results of the automatic pipe flaw detector routine on the ray path intersection images in Figs. 6.7 and 6.8 that predicted flaws are shown here. The approximate location of the flaw is shown in the image. Here, subplots (a) and (b) show the predicted flaw locations for the gouge and hole flaws, respectively, when all distances were used in classification. Similarly, subplots (c) and (d) show the predicted flaw locations for the gouge and hole flaws when the ray path distance was limited to D20
6.6 Results and Discussion 6.6.1 Accuracy The classifier variables described in Sect. 6.4.4 were explored using the classification routine sketched out in Table 6.1 with predicted labels collapsed majority voting. The accuracy of the classifier will be examined before the predictions of the pipe flaw detector are given. Each table of the accuracy results shows the classifier configuration, including two different tomographic scans of the aluminum pipe with different hole flaw diameters, as well as the frequencies and modes selected for classification. The classifiers were also limited by the maximum distance between the transducers. For ease of comparison, the average accuracy over each class type ωk is computed in the last column.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
197
Table 6.2 shows the accuracy A(ωk ) for both tomographic scans of the pipe when only a single transducer frequency was used at a time, and therefore all the available Lamb wave modes were selected. The table is grouped by different maximum distance lengths used in the classifier. Meanwhile, Tables 6.3, 6.4 and 6.5 show the classifier accuracy for classifiers in which features were selected for only certain modes but for three different values of the transducer frequency at once. These classification results show that no classifier configuration had more than 80% average accuracy per class. Many classifiers scored lower than 50% average accuracy. However, the detection routine does not require high accuracy classifiers, and as already mentioned, the diffraction effect of Lamb waves around the hole flaw could certainly explain the lower accuracy of the classifiers for that hole type. Other patterns that emerge in the data show that the smaller values of the maximum path length distance yield higher average accuracy by as much as 20%. In addition, the feature selection strategy that utilizes only select modes for all three transducer frequencies seems to work slightly better than the feature selection routine that selects all modes for only one transducer frequency. Lastly, the tomographic scan of the aluminum pipe with the 3/8 in hole seems to have higher accuracy than the scan of the pipe with the 1/2 in hole. These results make sense given the fact that the 3/8 in hole was trained and tested on unique subsets of the ray paths drawn from the 3/8 in hole tomographic scan, which yields somewhat optimistic results. Meanwhile, the results for the 1/2 in hole scan were tested with ray paths drawn from the 1/2 in hole scan but trained with ray paths selected from the 3/8 in hole scan.
6.6.2 Flaw Detection Algorithm As mentioned in Sect. 6.5, a more promising way of determining whether or not flaws exist in the pipe is to use the ray path simulation routine described in Sect. 6.3.2. Because the result of the pipe flaw detection routine is an image (Fig. 6.9), and an exhaustive depiction of the resulting flaw prediction images would take up an unnecessary amount of space, the results will be presented in qualitative format. The pipe detection routine was tested on all of the classifier configurations and a judgment was made about which flaw the routine was predicting. Figure 6.10 shows some of the different ways that the flaw detector routine can present results. Figure 6.10a shows a correctly identified flaw location for the gouge flaw, despite the fact that the flaw detector claims the flaws drawn at the bottom of the plot to be separate. That’s because the pipe in reality wraps the 2D plot into a cylinder, so the regions identified as flaws at the bottom of the plot actually can be associated with the correct location of the gouge flaw at the top of the plot. However, Fig. 6.10b shows an incorrect assignment of an indentation flaw in the center of the plot, where the hole flaw actually resides. Similarly, Fig. 6.10c shows a correct identification of the hole flaw, despite the fact that the two identified flaws in the middle of the plot do not seem connected. This is an artifact of the drawing algorithm. But Fig. 6.10d
198
C. B. Acosta and M. K. Hinders
Table 6.2 The classifier accuracy when the feature selection utilized only one transducer frequency and all modes are shown below. The results are grouped by the maximum ray path distance selected. The average accuracy per class is shown in the last column Hole Dia. [in]
Frequencies used (MHz)
Modes used
A(ω) (%) D (cm)
ω1
ω2
ω3
Average
3/8
0.80
S0, A0, A1
D10
77.7
62.4
72.4
70.8
3/8
0.84
S0, A0, A1
D10
78.5
67.3
42.9
62.9
3/8
0.89
S0, A0, A1
D10
70.8
77.6
69.5
72.6
1/2
0.80
S0, A0, A1
D10
78.5
43.2
22.8
48.2
1/2
0.84
S0, A0, A1
D10
84.0
10.9
30.9
41.9
1/2
0.89
S0, A0, A1
D10
52.3
62.4
39.8
51.5
3/8
0.80
S0, A0, A1
D20
70.5
50.6
62.8
61.3
3/8
0.84
S0, A0, A1
D20
73.9
59.9
41.9
58.5
3/8
0.89
S0, A0, A1
D20
68.3
70.0
58.1
65.5
1/2
0.80
S0, A0, A1
D20
75.8
47.7
23.3
48.9
1/2
0.84
S0, A0, A1
D20
93.0
21.2
14.6
42.9
1/2
0.89
S0, A0, A1
D20
52.6
64.9
34.4
50.6
3/8
0.80
S0, A0, A1
D91
93.8
14.1
27.4
45.1
3/8
0.84
S0, A0, A1
D91
93.8
14.6
19.8
42.7
3/8
0.89
S0, A0, A1
D91
91.7
17.0
28.2
45.6
1/2
0.80
S0, A0, A1
D91
73.7
42.7
16.1
44.1
1/2
0.84
S0, A0, A1
D91
82.9
35.7
7.6
42.0
1/2
0.89
S0, A0, A1
D91
41.9
61.1
15.4
39.5
shows a falsely-positive-identified region at the top of the plot predicted to be a hole flaw where the gouge flaw actually exists. This is a failure of specificity rather than sensitivity. Tables 6.6, 6.7 and 6.8 show the performance of the pipe flaw detector routine. All of the different frequencies, modes, and hole diameters are displayed in each table, but Table 6.6 shows the results for a maximum ray path length of D = D10 used in the classification, and likewise Tables 6.7 and 6.8 use a maximum ray path distance of D20 and D91 , respectively. The latter case considers all possible ray path lengths. The results are grouped in this way because the threshold values of h I and h a had to be adjusted for each value of D. Obviously, the smaller the value of D, the fewer number of ray paths included in the classification and the smaller the intersection between the predicted ray paths. The last two columns show the pipe flaw detector routine applied to ray paths with a predicted class of either gouge flaw (ω2 ) or hole flaw (ω3 ), respectively. The type of flaw predicted by the detection routine is displayed under those columns, including the possibility of no flaws displayed (ω1 ) or both the gouge flaw and the indentation flaw displayed (ω2 , ω3 ). These judgments are made according to the guidelines for false positives and true positives shown in Fig. 6.10.
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
199
Table 6.3 The classifier accuracy when the feature selection used only one or two modes at once but all transducer frequencies are shown here. The maximum ray path distance used here was D10 Hole Dia. Frequencies Modes A(ω) (%) [in] used (MHz) used D (cm) ω1 ω2 ω3 Average 3/8 3/8 3/8 3/8 3/8 3/8 1/2 1/2 1/2 1/2 1/2 1/2
0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89
S0 A0 A1 S0, A0 S0, A1 A0, A1 S0 A0 A1 S0, A0 S0, A1 A0, A1
D10 D10 D10 D10 D10 D10 D10 D10 D10 D10 D10 D10
71.5 84.2 75.7 87.5 81.0 85.3 82.8 67.5 67.7 82.3 81.8 74.6
65.0 72.6 68.0 79.2 74.9 80.9 34.0 46.9 34.7 40.6 38.6 49.8
61.9 64.8 52.4 73.3 74.3 72.4 16.3 34.1 40.7 29.3 23.6 39.8
66.1 73.8 65.4 80.0 76.7 79.5 44.4 49.5 47.7 50.7 48.0 54.8
Table 6.4 The classifier accuracy when the feature selection used only one or two modes at once but all transducer frequencies are shown here. The maximum ray path distance used here was D20 Hole Dia. Frequencies Modes A(ω) (%) [in] used (MHz) used D (cm) ω1 ω2 ω3 Average 3/8 3/8 3/8 3/8 3/8 3/8 1/2 1/2 1/2 1/2 1/2 1/2
0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89 0.80, 0.84, 0.89
S0 A0 A1 S0, A0 S0, A1 A0, A1 S0 A0 A1 S0, A0 S0, A1 A0, A1
D20 D20 D20 D20 D20 D20 D20 D20 D20 D20 D20 D20
69.4 76.5 74.2 81.8 79.9 81.1 82.6 65.4 67.8 80.1 79.5 71.0
57.8 64.6 64.1 66.5 70.3 71.7 33.2 51.0 46.0 44.5 43.4 56.2
60.5 54.9 54.9 66.0 64.7 60.9 21.3 31.6 30.4 26.9 25.7 30.0
63.6 70.5 69.2 74.2 75.1 76.4 57.9 58.2 56.9 62.3 61.5 63.6
These results show that Table 6.6, in which D = D10 = 31.21 cm is the maximum permitted path length used for classification, has the highest number of true positives and the fewest number of false negatives for all the configurations used. Table 6.8 shows the worst discriminative ability when all the possible path lengths (D = D91 ) were applied. However, only one of these classifier configurations needs to perform
200
C. B. Acosta and M. K. Hinders
Table 6.5 The classifier accuracy when the feature selection used only one or two modes at once but all transducer frequencies are shown here. All ray paths were used in this configuration (D = D91 ) Hole Dia. [in]
Frequencies used (MHz)
Modes used
A(ω) (%) D (cm)
ω1
ω2
ω3
Average
3/8
0.80, 0.84, 0.89
S0
D91
92.6
14.3
24.1
43.7
3/8
0.80, 0.84, 0.89
A0
D91
93.1
14.6
25.9
44.5
3/8
0.80, 0.84, 0.89
A1
D91
92.6
15.2
29.7
45.8
3/8
0.80, 0.84, 0.89
S0, A0
D91
93.8
16.1
32.1
47.3
3/8
0.80, 0.84, 0.89
S0, A1
D91
93.2
16.7
31.7
47.2
3/8
0.80, 0.84, 0.89
A0, A1
D91
93.5
16.9
32.7
47.7
1/2
0.80, 0.84, 0.89
S0
D91
67.4
40.8
17.0
41.7
1/2
0.80, 0.84, 0.89
A0
D91
59.6
44.8
20.9
41.8
1/2
0.80, 0.84, 0.89
A1
D91
57.8
48.8
19.9
42.2
1/2
0.80, 0.84, 0.89
S0, A0
D91
64.1
46.0
23.5
44.6
1/2
0.80, 0.84, 0.89
S0, A1
D91
64.6
48.5
19.6
44.2
1/2
0.80, 0.84, 0.89
A0, A1
D91
59.4
53.0
22.5
44.9
Table 6.6 The predicted flaw types using the pipe flaw detection routine are shown here. The columns show the predicted ray path types that whose intersection forms the figures as shown in Figs. 6.9 and 6.10. The rows show the different variables used to configure the classifier. All of the classifiers shown here used D = D10 , h I = 1, h a = 10 Classifier configuration
Flaw type tested
Hole
Frequencies
Modes
Dia. (cm)
used (MHz)
used
ω2
ω3
3/8
0.8
S0, A0, A1
ω2
ω3
3/8
0.84
S0, A0, A1
ω2
ω3
3/8
0.89
S0, A0, A1
ω2
ω3
3/8
0.8, 0.84, 0.89
S0
ω2
ω3
3/8
0.8, 0.84, 0.89
A0
ω2
ω3
3/8
0.8, 0.84, 0.89
A1
ω2
ω3
3/8
0.8, 0.84, 0.89
S0, A0
ω2
ω3
3/8
0.8, 0.84, 0.89
S0, A1
ω2
ω3
3/8
0.8, 0.84, 0.89
A0, A1
ω2
ω3
1/2
0.8
S0, A0, A1
ω2
ω1
1/2
0.84
S0, A0, A1
ω1
ω3
1/2
0.89
S0, A0, A1
ω2 , ω3
ω3
1/2
0.8, 0.84, 0.89
S0
ω2
ω1
1/2
0.8, 0.84, 0.89
A0
ω2 , ω3
ω3
1/2
0.8, 0.84, 0.89
A1
ω2
ω3
1/2
0.8, 0.84, 0.89
S0, A0
ω2
ω1
1/2
0.8, 0.84, 0.89
S0, A1
ω2
ω1
1/2
0.8, 0.84, 0.89
A0, A1
ω2 , ω3
ω3
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
201
Table 6.7 Similarly to Table 6.3, the qualitative performance of the pipe flaw detector routine is shown here using D = D20 , h I = 2, h a = 5 Classifier Configuration Flaw Type Tested Hole Frequencies Modes Dia. (cm) used (MHz) used ω2 ω3 3/8 3/8 3/8 3/8 3/8 3/8 3/8 3/8 3/8 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2
0.8 0.84 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8 0.84 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89
S0, A0, A1 S0, A0, A1 S0, A0, A1 S0 A0 A1 S0, A0 S0, A1 A0, A1 S0, A0, A1 S0, A0, A1 S0, A0, A1 S0 A0 A1 S0, A0 S0, A1 A0, A1
ω2 ω2 ω2 ω2 ω2 ω2 ω2 ω2 ω2 ω2 ω2 ω2 , ω3 ω2 ω2 , ω3 ω2 ω2 ω2 ω2
ω3 ω3 ω3 ω3 ω3 ω3 ω3 ω3 ω3 ω3 ω1 ω1 ω1 ω3 ω1 ω3 ω1 ω1
well in order to select a classifier to be applied in further applications. It is not necessary (or possible) that all combinations of the classifier variables should perform well. Clearly it is possible to find at least one classifier configuration that accurately discriminates between the types of flaws. In addition, the results for the second tomographic scan of the pipe, when the hole diameter was increased to 1/2 in, tends to have more false positives and false negatives than the original scan of the pipe at a hole diameter of 3/8 in. As described above, the classification for the 1/2 in hole pipe was performed using the 3/8 in hole pipe as a training set, so the accuracy of the classifier tends to be lower than that of the 3/8 in pipe when mutually exclusive sets of ray paths drawn from the same tomographic scan were used for training and testing. Since the same threshold values h I , h a were used for all the classifier variables that used the same maximum distance D, it might be possible to adjust these threshold values in the future to optimize for a training set based on a wider variety of data and a testing set drawn from a new tomographic scan.
202
C. B. Acosta and M. K. Hinders
Table 6.8 Similarly to Tables 6.3 and 6.5, the qualitative performance of the pipe flaw detector routine is shown here using D = D91 (so all ray paths were used), h I = 3, h a = 20 Classifier Configuration Flaw Type Tested Hole Frequencies Modes Dia. (cm) used (MHz) used ω2 ω3 3/8 3/8 3/8 3/8 3/8 3/8 3/8 3/8 3/8 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2
0.8 0.84 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8 0.84 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89 0.8, 0.84, 0.89
S0, A0, A1 S0, A0, A1 S0, A0, A1 S0 A0 A1 S0, A0 S0, A1 A0, A1 S0, A0, A1 S0, A0, A1 S0, A0, A1 S0 A0 A1 S0, A0 S0, A1 A0, A1
ω2 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3
ω3 ω3 ω3 ω3 ω3 ω3 ω3 ω3 ω3 ω2 , ω3 ω1 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3 ω2 , ω3
Lastly, recall that the distribution of ray paths drawn from the three classes ω1 , ω2 , ω3 was adjusted so that a smaller number of ω1 cases were randomly selected to be used in classification. The ω1 ray paths not chosen for classification would later be used to test the pipe flaw detection algorithm. The training set used for classification was the same as before but it was tested using the ω1 ray paths originally excluded from classification in the results above. As expected, some of these ray paths were falsely identified by the classifier as ω2 or ω3 . The pipe flaw detection algorithm was performed on these predicted classes to see if the intersection of these misclassified ray paths would yield a falsely identified flaw. In fact, for all of the classifier variables described in Sect. 6.4.4, all concluded with an assignment of ω1 (no flaw identified).
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish … b)
a) 0
0
10
10
20
20
30
30
40
40 0
20
10
0
30
10
c) 0
10
10
20
20
30
30
40
40 10
20
30
20
30
d)
0
0
203
20
30
0
10
Fig. 6.10 The qualitative evaluation of different flaw images produced by the flaw detector routine for the (a)–(b) gouge and (c)–(d) hole flaws. False positives are shown in subplots (b) and (d), while subplots (a) and (c) show true positives
6.7 Conclusion These results demonstrate classifiers that were able to distinguish between gouge flaws, hole flaws, and no hole present in an aluminum pipe. The type of image produced by the flaw detection routine is similar to the result of the Lamb wave tomographic scan itself, which is a 2D color plot of the changes in Lamb wave velocity over the surface of the pipe. However, because it utilizes pattern classification and image recognition techniques, the method described here can be more specific, identifying the flaw type, and it may be able to work on smaller flaws. The higher accuracy rate of the classifier on limited ray path distances demonstrates that pattern classification aids the successful detection of flaws in geometries where full scans cannot be completed due to limited access. The method does require building a training dataset of different types of flaws, and it can take long computer times to classify and test an entire scan. It may be best used on isolated locations where a flaw is suspected rather than in an exhaustive search over a large area. In order to be most accurate, the training dataset should include flaws from different indi-
204
C. B. Acosta and M. K. Hinders
vidual tomographic scans but of a structure similar to the intended application of the intelligent flaw detector. The application of pattern classification techniques to Lamb wave tomographic scans may be able to improve the detection capability of structural health monitoring. Acknowledgements The authors would like to thank Dr. Jill Bingham for sharing the data used in this paper and Dr. Corey Miller for assistance with the tomographic reconstructions. Johnathan Stevens constructed the pipe scanner apparatus.
References 1. Mallet L, Lee BC, Staszewski WJ, Scarpa F (2004) Structural health monitoring using scanning laser vibrometry: II. Lamb waves for damage detection. Smart Mater Struct 13:261–269 2. Farrar CR, Nix DA, Duffey TA, Cornwell PJ, Pardoen GC (1999) Damage identification with linear discriminant operators. In: Proceedings of the 17th Intl Modal Anal Conference, vol 3727, pp 599–607 3. Lynch JP (2004) Linear classification of system poles for structural damage detection using piezoelectric active sensors. Proc SPIE 5391:9–20 4. Nair KK, Kiremidjian AS, Law KH (2006) Time series-based damage detection and localization algorithm with application to the asce benchmark structure. J Sound Vib 291:349–368 5. Kercel SW, Klein MB, Pouet BF (2000) Bayesian separation of lamb wave signatures in laser ultrasonics. In: Priddy KL, Keller PE, Fogel DB (eds) Proceedings of the SPIE, vol 4055, pp 350–361 6. Kessler SS, Agrawal P (2007) Technical Report, Metis Design Corporation 7. Mita A, Fujimoto A (2005) Active detection of loosened bolts using ultrasonic waves and support vector machines. In: Proceedings of the 5th International Workshop Structural Health Monitoring, pp 1017–1024 8. Park S, Yun C-B, Roh Y, Lee J-J (2006) Pzt-based active damage detection techniques for steel bridge components. Smart Mater Struct 15:957–966 9. Park S, Lee J, Yun C, Inman DJ (2007) A built-in active sensing system-based structural health monitoring technique using statistical pattern recognition. J Mech Sci Technol 21:896–902 10. Worden K, Pierce SG, Manson G, Philp WR, Staszewski WJ, Culshaw B (2000) Detection of defects in composite plates using lamb waves and novelty detection. Int J Syst Sci 31:1397– 1409 11. Sohn H, Allen DW, Worden K, Farrar CR (2005) Structural damage classification using extreme value statistics. J Dyn Syst Meas Contr 127:125–132 12. Mirapeix J, Garcia-Allende P, Cobo A, Conde O, Lopez-Higuera J (2007) Real-time arcwelding defect detection and classification with principal component analysis and artificial neural networks. NDT&E Int 40:315–323 13. Lynch JE (2001) Ultrasonographic measurement of periodontal attachment levels. Ph.D. thesis, Department of Applied Science, College of William and Mary, Williamsburg, VA 14. McKeon JCP, Hinders MK (1999) Parallel projection and crosshole lamb wave contact scanning tomography. J Acoust Soc Am 106:2568–2577 15. Hou J, Leonard KR, Hinders MK (2005) Multi-mode lamb wave arrival time extraction of improved tomographic reconstruction. In: Thompson DO, Chimenti DE (eds) Review of Progress in Quantitative Nondestructive Evaluation, vol 24a. Melville, NY pp 736–743 16. Rudd KE, Leonard KR, Bingham JP, Hinders MK (2007) Simulation of guided waves in complex piping geometries using the elastodynamic finite integration technique. J Acoust Soc Am 121:1449–1458
6 Classification of Lamb Wave Tomographic Rays in Pipes to Distinguish …
205
17. Bingham JP, Hinders MK (2009) Lamb wave detection of delaminations in large diameter pipe coatings. Open Acoust (in press) 18. Leonard KR, Hinders MK (2003) Guided wave helical ultrasonic tomography of pipes. J Acoust Soc Am 114:767–774 19. Malyarenko EV, Hinders MK (2001) Ultrasonic lamb wave diffraction tomography. Ultrasonics 39:269–281 20. Ravanbod H (2005) Application of neuro-fuzzy techniques in oil pipeline ultrasonic nondestructive testing. NDT&E Int 38:643–653 21. Zhao X, Varma VK, Mei G, Ayhan B, Kwan C (2005) In-line nondestructive inspection of mechanical dents on pipelines with guided shear horizontal wave electromagnetic acoustic transducers. J Pressure Vessel Technol 127:304–309 22. Sun Y, Bai P, Sun H, Zhou P (2005) Real-time automatic detection of weld defects in steel pipe. NDT&E Int 38:522–528 23. Pierce AD, Kil H (1990) Elastic wave propagation from point excitations on thin-walled cylindrical shells. J Vib Acoust 112:399–406 24. Viktorov IA (1967) Rayleigh and lamb waves: physical theory and applications. Plenum Press, New York 25. Rose JL (1999) Ultrasonic waves in solid media. Cambridge University Press, New York 26. Webb AR (2002) Statistical pattern recognition, 2nd edn. Wiley, Hoboken, NJ 27. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, New York 28. Fromme P, Sayir MB (2002) Measurement of the scattering of a lamb wave by a through hole in a plate. J Acoust Soc Am 111:1165–1170 29. Hou J, Hinders MK (2002) Dynamic wavelet fingerprint identification of ultrasound signals. Mat Eval 60:1089–1093 30. Hinders MK, Hou JR (2005) Ultrasonic periodontal probing based on the dynamic wavelet fingerprint. In: Thompson DO, Chimenti DE, (eds) Review of Progress in Quantitative Nondestructive Evaluation, vol 24b. AIP Conference Proceedings, Melville, New York, pp 1549–1556 31. Hou JR, Rose ST, Hinders MK (2005) Ultrasonic periodontal probing based on the dynamic wavelet fingerprint. Eurasip J Appl Sign Process 7:1137–1146 32. Lynch JE, Hinders MK, McCombs GB (2006) Clinical comparison of an ultrasonographic periodontal probe to manual and controlled-force probing. Measurement 39:429–439 33. Hinders MK, Hou J, McKeon JCP (2005) Ultrasonic inspection of thin multilayers. In: Thompson DO, Chimenti DE (eds) 31st Review of Progress in Quantitative Nondestructive Evaluation 24b, vol 760, pp 1137–1144 (AIP Conference Proceedings) 34. Hou J, Leonard KR, Hinders MK (2004) Automatic multi-mode lamb wave arrival time extraction for improved tomographic reconstruction. Inverse Prob 20:1873–1888 35. Bingham JP, Hinders MK (2009) Lamb wave characterization of corrosion-thinning in aircraft stringers: experiment and 3D simulation. J Acoust Soc Am 126:103–113 36. Bingham JP, Hinders MK, Friedman A (2009) Lamb wave detection of limpet mines on ship hulls. Ultrasonics (in press) 37. Haralick RM, Shapiro LG (1992) Computer and robot vision, vol I. Addison-Wesley, Boston, MA 38. Horn BKP (1986) Robot vision. MIT Press, Cambridge, MA 39. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6:429–449 40. Melgani F, Bruzzone L (2004) Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci Remote Sens 42:1778–1790
Chapter 7
Classification of RFID Tags with Wavelet Fingerprinting Corey A. Miller and Mark K. Hinders
Abstract Passive radio frequency identification (RFID) tags lack the resources for standard cryptography, but are straightforward to clone. Identifying RF signatures that are unique to an emitter’s signal is known as physical-layer identification, a technique that allows for distinction between cloned devices. In this work, we study the effect real-world environmental variations have on the physical-layer fingerprints of passive RFID tags. Signals are collected for a variety of reader frequencies, tag orientations, and ambient conditions, and pattern classification techniques are applied to automatically identify these unique RF signatures. We show that identically programmed RFID tags can be distinguished using features generated from DWFP representations of the raw RF signals. Keywords RFID tag · Wavelet fingerprint · Pattern classification
7.1 Introduction Radio frequency identification (RFID) tags are widespread throughout the modern world, commonly used in retail, aviation, health care, and logistics [1]. As the price of RFID technology decreases with advancements in manufacturing techniques [2], new implementations of RFID technology will continue to rise. The embedding of RFID technology into currency, for example, is being developed overseas to potentially cut down on counterfeiting [3]. Naturally, the security of these RF devices has become a primary concern. Using techniques that range in complexity from simple eavesdropping to reverse engineering [4], researchers have shown authentication vulnerabilities in a wide range of current RFID applications for personal identification and security purposes, with successful cloning attacks made on proximity cards [5], credit cards [6], and even electronic passports [7]. C. A. Miller · M. K. Hinders (B) Department of Applied Science, William & Mary, Williamsburg, VA, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. K. Hinders, Intelligent Feature Selection for Machine Learning Using the Dynamic Wavelet Fingerprint, https://doi.org/10.1007/978-3-030-49395-0_7
207
208
C. A. Miller and M. K. Hinders
Basic passive RFID tags, which lack the resources to perform most forms of cryptographic security measures, are especially susceptible to privacy and authentication attacks because they have no explicit counterfeiting protections built in. These lowcost tags are the type found in most retail applications, where the low price favors the large quantity of tags required. One implementation of passive RFID tags is to act as a replacement for barcodes. Instead of relaying a sequence of numbers identifying only the type of object a barcode is attached to, RFID tags use an Electronic Product Code (EPC) containing not only information about the type of object, but also a unique serial number used to individually distinguish the object. RFID tags also eliminate the need for line-of-sight scanning that barcodes have, avoiding scanning orientation requirements. In a retail setting, these RFID tags are being explored for point-of-sale terminals capable of scanning all items in a passing shopping cart simultaneously [8]. Without security measures, however, it is straightforward to surreptitiously obtain the memory content of these basic RFID tags and reproduce a cloned signal [9]. An emerging subset of RFID short-range wireless communication technology is near-field communication (NFC), operating within the high-frequency RFID band at 13.56 MHz. Compatible with already existing RFID infrastructures, NFC involves an initiator that generates an RF field and a passive target, although interactions between two powered devices are possible. The smartphone industry is one of the leading areas for NFC research, as many manufacturers have begun putting NFC technology to their products. With applications enabling users to pay for items such as groceries and subway tickets by waving their phone in front a machine, NFC payment systems are an attractive alternative to the multitude of credit cards available today [10]. Similarly, NFC-equipped mobile phones are being explored for use as boarding passes, where the passenger can swipe their handset like a card, even when its batteries are dead [11]. A variety of approaches exist to solve this problem of RFID signal authentication, in which an RFID system identifies an RFID tag as being legitimate as opposed to a fraudulent copy. One such method involves the introduction of alternate tag-reader protocols, including the installation of a random number generator in the reader and tag [12], a physical proximity measure when scanning multiple tags simultaneously [13], or re-purposing the kill PIN in an RFID tag, which normally authorizes the deactivation of the tag [14]. Rather than changing the current tag-reader protocols, we approach this issue of RFID tag authentication by applying a wavelet-based RF fingerprinting technique, utilizing the physical layer of RF communication. The goal is to identify unique signatures in the RF signal that provides hardware-specific information. First pioneered to identify cellular phones by their transmission characteristics [15], RF fingerprinting has been recently explored for wireless networking devices [16], wired Ethernet cards [17], universal software radio peripherals (USRP) [18], and RFID devices [19–22]. Our work builds on that of Bertoncini et al. [20], in which a classification routine was developed using a novel wavelet-based feature set to identify 150 RFID tags collected with fixed tag orientation and distance relative to the reader with RF shielding. That dataset, however, was collected in an artificially protected environment and did not include physical proximity variations relative to the reader, one
7 Classification of RFID Tags with Wavelet Fingerprinting
209
of the most commonly exploited benefits of RFID technology over existing barcodes. The resulting classifier performance therefore can’t be expected to translate to real-world situations. Our goal is to collect signals from a set of 40 RFID tags with identical Electronic Product Codes (EPC) at a variety of orientation and RFID reader frequencies as well as over several days to test the robustness of the classifier. The effects of tag damage in the form of water submersion and physical crumpling are also briefly explored. Unlike Bertoncini et al., we use a low-cost USRP to record the RF signals in an unshielded RF environment, resulting in more realistic conditions and SNR values than previously examined.
7.2 Classification Overview The application of pattern classification for individual RFID tag identification begins with data collection, where each individual RFID tag is read and the EPC regions are windowed and extracted from all of the tag-reader events. A feature space is formed by collecting a variety of measurements from each of the EPC regions. Feature selection then reduces this feature space to a more optimal subset, removing irrelevant features. Once a dataset has been finalized, it is then split into training and testing sets via a resampling algorithm, and the classifier is trained on the training set and tested on the testing set. The classifier output is used to predict a finalized class label for the testing set, and the classifier’s performance can be evaluated. Each tag is given an individual class label; however, we are only interested in whether or not a new signal corresponds to an EPC from the specific tag of interest. The goal for this application is to identify false, cloned signals trying to emulate the original tag. We therefore implement a binary one-against-one classification routine, where we consider one individual tag at a time (the classifier tag), and all other tags (the testing tags) are tested against it one at a time. This assigns one of two labels to a testing signal, either ω = 1 declaring that the signal corresponds to an EPC from the classifier tag, or ω = −1 indicating that the signal does not correspond to an EPC from the classifier tag.
7.3 Materials and Methods Avery-Dennison AD-612 RFID tags were used in this study, which follow the EPCglobal UHF Class 1 Generation 2 (EPCGen2) standards [23]. There were 40 individual RFID tags available, labeled AD01, AD02, . . . , AD40. The experimental procedure involves writing the same EPC code onto each tag with a Thing Magic Mercury 5e RFID Reader1 paired with an omni-directional antenna (Laird Technologies2 ). Stand-alone RFID readers sold today perform all of the signal amplification, modulation/demodulation, mixing, etc. in special-purpose hardware. While this is 1 Cambridge, 2 St
MA (http://www.thingmagic.com). Louis, MI (http://www.lairdtech.com).
210
C. A. Miller and M. K. Hinders
Fig. 7.1 Experimental setup for RFID data collection is shown, with the RFID reader, tag, antenna, connection to the VSA, and USRP2 software-defined radio
beneficial for standard RFID use where only the demodulated EPC is of interest, it is inadequate for our research because we seek to extract the raw EPC RF signal. Preliminary work [20] collected raw RF signals through a vector signal analyzer recording 327.50 ms of data at a 3.2 MHz sampling frequency, a laboratory-grade instrument often used in the design and testing of electronic devices. While the vector signal analyzer proved useful for data collection in the preliminary work, it is not a practical tool that could be implemented in real-world applications. We thus explore the use of an alternate RF signal recording device, a software-defined radio (SDR) system. Software-defined radios are beneficial over standard RFID units as they contain their own A/D converters and the majority of their signal processing is software controlled, allowing them to transmit and receive a wide variety of radio protocols based solely on the software used. The SDR system used here is from the Universal Software Radio Peripheral (USRP) family of products developed by Ettus Research LLC,3 specifically the USRP2, paired with a GnuRadio [24] interface. With board schemes and open-source drivers widely available, the flexibility of the USRP system provides a simple and effective solution for our RF interface (Fig. 7.1). Data was collected in two separate sessions: the first taking place in an environment that was electromagnetically shielded over the span of a week and the second without any shielding taking place at William and Mary (W&M) over the span of 2 weeks. The first session included 25 individual AD-612 RFID tags labeled AD01 − AD25. The same EPC code was first written onto each tag with the Thing Magic Mercury 5e RFID Reader, and no further modifications were performed to the tags. Data was collected by placing one tag at a time in a fixed position near the antenna. Tag 3 Mountain
View, CA (http://www.ettus.com).
7 Classification of RFID Tags with Wavelet Fingerprinting
211
Fig. 7.2 Tag orientations used for data collection. Parallel (PL) oblique (OB) and upside-down (UD) can be seen, named because of the tag position relative to the antenna. Real-world degradation was also applied to the tags in the form of water submersion, as well as both light and heavy physical deformations
transmission events were recorded for 3 seconds for each tag using the USRP2, saving all data as a MATLAB format. Each tag was recorded at three different RFID reader operating frequencies (902, 915, and 928 MHz), with three tag orientations relative to the antenna being used at each frequency (parallel (PL), upside-down (UD), and a 45◦ oblique angle (OB)). The second session of data collection at W&M included a second, independent set of 15 AD-612 RFID tags labeled AD26 − AD40. Similar to before, the same EPC code was first written onto each tag with the Thing Magic Mercury 5e RFID Reader. The second session was different from the first in that the tags were no longer in a fixed position relative to the antenna, but rather simply held by hand near the antenna. This introduces additional variability into the individual tag-reader events throughout each signal recording. Tag transmission events were again recorded for 3 seconds for each tag. Data was collected at a single operating frequency (902 MHz) with a constant orientation relative to the antenna (parallel (PL)); however, data was collected on four separate days allowing for environmental variation (temperature, humidity, etc.). These tags were then split into two subsets, one of which was used for a water damage study while the other was used for a physical damage study. For the water damage, tags AD26 − AD32 were submerged in water for 3 hours, at which point they were patted dry and used to record data (labeled as Wet). They were then allowed to dry overnight, and again used to record data (Wet-to-Dry). For the physical damage, tags AD33 − AD40 were first lightly crumpled by hand (light damage) and subsequently heavily crumpled (heavy damage). Pictures of the tag orientation variations as well as the tag damage can be seen in Fig. 7.2. From these datasets, four separate studies were performed. First, a frequency comparison was run in which the three operating frequencies were used as training
212
C. A. Miller and M. K. Hinders
Table 7.1 There were 40 individual Avery-Dennison AD 612 RFID tags used for this study, split into z subsets Dz for the various comparisons. Tag numbers τi are given for each comparison Comparison type Dz Tags used (τi ) Frequency variations Orientation variations Different day recordings Water damage Physical samage
902, 915, 928 MHz PL, UD, OB Day 1, 2, 3, 4 Wet, Wet-to-dry Light, Heavy damage
i i i i i
= 1, . . . , 25 = 1, . . . , 25 = 26, . . . , 40 = 26, . . . , 32 = 33, . . . , 40
and testing datasets for the classifiers, collected while maintaining a constant PL orientation. Second, an orientation comparison was performed in which the three tag orientations were used as training and testing datasets, collected at a constant 902 MHz operating frequency. Third, the 4 days worth of constant PL and 902 MHz handheld recordings were used as training and testing datasets. Finally, the classifiers were trained on the 4 days’ worth of recordings, and the additional damage datasets were used as testing sets. The specific tags used for each comparison are summarized in Table 7.1.
7.4 EPC Extraction In most RFID applications, the RFID reader only has a few seconds to identify a specific tag. For example, consumers would not want a car’s key-less entry system that required the user to stand next to the car for half a minute while it interrogated the tag. Rather, the user expects access to their car within a second or two of being within the signal’s range. These short transmission times result in only a handful of individual EPCs being transmitted, making it important that each one is extracted efficiently and accurately. In our dataset, each tag’s raw recording is a roughly 3 second tag-to-reader communication. During this time there is continuous communication between the antenna and any RFID tags within range. This continuous communication is composed of repeated individual tag-reader (T⇔R) events. The structure and duration of each T⇔R event is pre-defined by the specific protocols used. The AD-612 RFID tags are built to use the EPCGen2 protocols [23], so we can use the inherent structure within these protocols to automatically extract the EPCs within each signal. Previous attempts at identifying the individual EPC codes within the raw signals involved a fixed-window cross-correlation approach, where a manually extracted EPC region was required for comparison [20]. With smart window sizing, this approach can identify the majority of EPC regions within a signal. As the communication period is shortened and the number of EPCs contained in each recording decreases, however, this technique becomes insufficient. We have developed an alternative technique that automatically identifies components of the EPCGen2 communication protocols. The new extraction algorithm is
7 Classification of RFID Tags with Wavelet Fingerprinting
213
Table 7.2 The EPC extraction routine used to find the EPC regions of interest for analysis For each tag AD01-AD40 { Raw recorded signal is sent to getEPC.m ◦ Find and window “downtime” regions between tag/reader communication periods ◦ Envelope windowed sections, identify individual T⇔R events For each T⇔R event • Envelope the signal, locate flat [EPC+] region • Set start/finish bounds on [EPC+] • Return extracted [EPC+] regions Each extracted [EPC+] is sent to windowEPC.m ◦ Generate artificial Miller (M=4) modulated preamble ◦ Locate preamble in recorded signal via cross-correlation ◦ Identify all subsequent Miller (M=4) basis functions via crosscorrelation ◦ Extract corresponding bit values ◦ Verify extracted bit sequence matches known EPC bit sequence ◦ Return start/end locations of EPC region Save EPC regions }
outlined in Table 7.2, with a detailed explanation to follow. It should be noted that the region identified as [EPC+] is a region of the signal that is composed of a preamble which initiates the transmission, a protocol-control element, the EPC itself, as well as a final 16-bit cyclic-redundancy check. The first step in the EPC extraction routine is to window each raw signal by locating the portions that occur between reader transmission repetitions. These periods of no transmission are referred to here as “downtime” regions. These are the portions of the signal during which the RFID reader is not communicating with the tag at all. An amplitude threshold is sufficient to locate the downtime regions, which divide the raw signal into separate sections, each of which contains several individual T⇔R events. There is another short “dead” zone between each individual T⇔R event where the RFID reader stops transmitting briefly. Because of this, the upper envelope of the active communication region is taken and another amplitude threshold is applied to identify these dead zones, further windowing the signal into its individual T⇔R events. Each individual T⇔R event is then processed to extract the individual [EPC+] region within. First, the envelope of the T⇔R event is taken, which highlights the back-and-forth communicating between the tag and the RFID reader. The [EPC+] region, being the longest in time duration out of all the communication, is relatively consistent in amplitude compared to the up-and-down structure of the signal. Therefore, a region is located that meets a flatness as well as time duration requirements corresponding to this [EPC+]. Once this [EPC+] region is found, an error check is applied that envelopes the region and checks this envelope for outliers that would indicate an incorrectly chosen area.
214
C. A. Miller and M. K. Hinders
Fig. 7.3 Excerpt from the EPC Class 1 Gen 2 Protocols showing the Miller basis functions and a generator state diagram [23]
The next step in this process is to extract the actual EPC from the larger [EPC+] region. For all Class 1 Gen 2 EPCs, the tags encode the backscattered data using either FM0 baseband or Miller modulation of a subcarrier, the encoding choice made by the reader. The Thing Magic Mercury 5e RFID Reader uses Miller (M=4) encoding, the basis functions of which can be seen in Fig. 7.3. The Miller (M=4) preamble is then simulated and cross correlated with the [EPC+] region to determine its location within. From the end of the preamble, the signal is broken up into individual bits, and cross-correlation is used to determine which bits are present for the remainder of the signal (positive or negative, 0 or 1). Upon completion, the bit sequence is compared to a second known bit sequence generated from the output of the RFID reader’s serial log for verification, shown in Table 7.3. The bounds of this verified bit sequence are then used to window the [EPC+] region down to the EPC only. A single T⇔R event as well as a close-up of a [EPC+] region can be seen in Fig. 7.4. The goal of the classifier is to identify individual RFID tags despite the fact that all the tags are of the same type, from the same manufacturer, and written with the same EPC. The raw RFID signal s(t) is complex valued, so an amplitude representation α(t) is used for the raw signal [25]. An “optimal” version of our signal was also reverse engineered using the known Miller (M=4) encoding methods, labeled s0 (t). We then subtract the raw signal from the optimal representation, producing an EPC error signal as well, labeled e E PC . These are summarized by s(t) = r(t) + ic(t) α(t) = r 2 (t) + c2 (t) e E PC (t) = s0 (t) − s(t)
(7.1)
This signal processing step of reducing the complex-valued s(t) to either α(t) or e E PC (t) will be referred to as EPC compression. A signal that has been compressed using either one of these methods will be denoted sˆ (t) for generality. Figure 7.5 compares the different EPC compression results on a typical complex RFID signal.
7 Classification of RFID Tags with Wavelet Fingerprinting
215
Table 7.3 Thing Magic Mercury 5e RFID Reader Serial Log
(17:05:31.625 - TX(63)): 00 29 CRC:1D26 (17:05:31.671 - RX(63)): 04 29 00 00 00 00 00 01 CRC:9756 (17:05:31.671 - TX(64)): 03 29 00 07 00 CRC:F322 (17:05:31.718 - RX(64)): 19 29 00 00 00 07 00 01 07 72 22 00 80 30 00 30 08 33 B2 DD D9 01 40 35 05 00 00 42 E7 CRC:4F31 (17:05:31.734 - TX(65)): 00 2A CRC:1D25 (17:05:31.765 - RX(65)): 00 2A 00 00 CRC:01E8 (17:05:31.765 - TX(66)): 00 2A CRC:1D25 (17:05:31.796 - RX(66)): 00 2A 00 00 CRC:01E8 (17:05:31.796 - TX(67)): 05 22 00 00 00 00 FA CRC:0845 (17:05:32.093 - RX(67)): 04 22 00 00 00 00 00 01 CRC:7BA9 (17:05:32.093 - TX(68)): 00 29 CRC:1D26 (17:05:32.140 - RX(68)): 04 29 00 00 00 00 00 01 CRC:9756 (17:05:32.140 - TX(69)): 03 29 00 07 00 CRC:F322 (17:05:32.187 - RX(69)): 19 29 00 00 00 07 00 01 07 72 22 00 80 30 00 30 08 33 B2 DD D9 01 40 35 05 00 00 42 E7 CRC:4F31 (17:05:32.203 - TX(70)): 00 2A CRC:1D25 (17:05:32.234 - RX(70)): 00 2A 00 00 CRC:01E8 (17:05:32.234 - TX(71)): 00 2A CRC:1D25 (17:05:32.265 - RX(71)): 00 2A 00 00 CRC:01E8 EPC code (in hex) is in bold.
7.5 Feature Generation For each RFID tag τi , where i is indexed according to the range given in Table 7.1, the EPC extraction routine produces N -many different EPCs sˆi, j (t), j = 1, . . . , N . Four different methods are then used to extract features from these signals: dynamic wavelet fingerprinting (DWFP), wavelet packet decomposition (WPD), higher order statistics, and Mellin transform statistics. Using these methods, M feature values are extracted which make up the feature vector X = xi, j,k , k = 1, . . . , M. It should be noted here that due to the computation time required to perform this analysis, the computer algorithms were adapted to run on William and Mary’s Scientific Computer Cluster.4
4 http://www.compsci.wm.edu/SciClone/.
216
C. A. Miller and M. K. Hinders
Amplitude
0.06 0.04 0.02 0
0.625
0.63
0.635
0.64
0.645
0.65
0.655
Time (s)
Fig. 7.4 A single T⇔R event with the automatically determined [EPC+] region highlighted in gray. Close-up view of a single [EPC+] region with the EPC itself highlighted in gray
7.5.1 Dynamic Wavelet Fingerprint The DWFP technique is used to generate a subset of the features used for classification. Wavelet-based measurements provide the ability to decompose noisy and complex information and patterns into elementary components. To summarize this process, the DWFP technique first applies a continuous wavelet transform on each original time-domain signal sˆi, j (t)[26]. The resulting coefficients are then used to generate “fingerprint”-type images Ii, j (a, b) that are coincident in time with the raw signal. Mother wavelets used in this study include the Daubechies-3 (db3), Symelet-5 (sym5), and Meyer (meyr) wavelets, chosen based on preliminary results. Since pattern classification uses one-dimensional feature vectors to develop decision boundaries for each group of observations, the dimension of the binary fingerprint images Ii, j (a, b) that are generated for each EPC signal needs to be reduced. A subset of ν individual values that best represent the signals for classification will be selected. The number ν (ν < M) of DWFP features to select is arbitrary, and can be adjusted based on memory requirements and computation time restraints. For this RFID application, we consider all cases of ν ∈ [1, 5, 10, 15, 20, 50, 75, 100].
7 Classification of RFID Tags with Wavelet Fingerprinting
217
Amplitude
0.05 0 -0.05 0.629
0.6295
0.63
0.6305
0.631
0.629
0.6295
0.63
0.6305
0.631
0.629
0.6295
0.63
0.6305
0.631
Amplitude
0.07 0.06 0.05 0.04 0.03
Amplitude
0.2 0 -0.2 -0.4
Time (s)
Fig. 7.5 The different EPC compression techniques are shown here, displaying the real r (t) and imaginary c(t) components of a raw [EPC+] region (black and gray, respectively, top), the amplitude α(t) (middle), and the EPC error e E PC (t) (bottom). The EPC portion of the [EPC+] signal is bound by vertical red dotted lines
Using standard MATLAB routines,5 the feature extraction process consists of several steps: 1. Label each binary image with individual values for all sets of connected pixels. 2. Relabel concentric objects centered around a common area (useful for the ringlike features found in the fingerprints). 3. Apply thresholds to remove any insignificant objects in the images. 4. Extract features from each labeled object. 5. Linearly interpolate in time between individual fingerprint locations to generate a smoothed array of feature values. 6. Identify points in time where the feature values are consistent among individual RFID tags yet separable between different tags. 5 MATLAB’s
Image Processing Toolbox (MATLAB, 2008, The Mathworks, Natick, MA).
218
C. A. Miller and M. K. Hinders
Fig. 7.6 An example of 8-connectivity and its application on a binary image
The binary nature of the images allows us to consider each pixel of the image as having a value of either 1 or 0. The pixels with a value of 0 can be thought of as the background, while pixels with an value of 1 can be thought of as the significant pixels. The first step in feature extraction is to assign individual labels to each set of 8-connected components in the image [27], demonstrated in Fig. 7.6. Since the fingerprints are often concentric shapes, different concentric “rings” are often not connected to each other, but still are components of the same fingerprint object. Therefore, the second step in the process is to relabel groups of concentric objects using their center of mass, which is the average time coordinate of each pixel, demonstrated in Fig. 7.7. The third step in the feature extraction process is to remove any fingerprint objects from the image whose area (sum of the pixels) is below a particular threshold. Objects that are too small for the computations in later steps are removed; however, this threshold is subjective and depends on the mother wavelet used. At this point in the processing, the image is ready for features to be generated. Twenty-two measurements are made on each remaining fingerprint object, including the area, centroid, diameter of a circle with the same area, Euler number, convex image, solidity, coefficients of second and fourth degree polynomials fit to the fingerprint boundary, as well as major/minor axis length, eccentricity, and orientation of an ellipse that has the same normalized second central moment as the fingerprint. The property measurements result in a sparse property array Pi, j,n [t], where n represents the property index n = 1, . . . , 22, since each extracted value is matched to the
7 Classification of RFID Tags with Wavelet Fingerprinting
219
Fig. 7.7 An example of the fingerprint labeling process. The components of the binary image and the resulting 8-connected components, where each label index corresponds to a different index on the “hot” colormap in this image. Concentric objects are then relabeled, resulting in unique labels for each individual fingerprint object, shown here as orange and white fingerprint objects for clarity
220
C. A. Miller and M. K. Hinders
time value of the corresponding fingerprint’s center of mass. Therefore, these sparse property vectors are linearly interpolated to produce a smoothed vector of property values, Pi, j,n (t). This process is shown for a typical time-domain EPC signal in Fig. 7.8. Once an array of fingerprint features for each EPC has been generated, it still needs to be reduced into a single vector of ν-many values to be used for classification. Without this reduction, not only is the feature set too large to process even on a computing cluster, but also most of the information contained within it is redundant. Since we are implementing a one-against-one classification scheme, where one testing tag (τt ) will be compared against features designed to identify one classifier tag (τc ), we are looking for feature values that are consistent among each individual RFID tag, yet separable between different tags. First, the dimensionality of the property array is reduced by calculating the intertag mean property value for each tag τi , μi,n (t) =
1 Pi, j,n (t). | j| j
(7.2)
Each inter-tag mean vector is then normalized to the range [0, 1]. Next, the difference in inter-tag mean vectors for property n is considered for all binary combinations of tags τi1 , τi2 , (7.3) dn (t) = |μi1 ,n (t) − μi2 ,n (t)| for i 1 , i 2 ∈ i for values of i shown in Table 7.1. We are left with a single vector representing the average intra-class difference in property n values as a function of time. Similarly, we compute the standard deviation within each class, 2 1 σi,n (t) = Pi, j,n (t) − μi,n (t) . | j| j
(7.4)
We next identify the maximum value of standard deviation among all tags τi at each point in time t, essentially taking the upper envelope of all values of σi,n (t), σn (t) = max σi,n (t) . i
(7.5)
Times tm , where m = 1, . . . , ν, are then identified for each property n at which the average intra-class difference dn (tm ) is high while the inter-class standard deviation σn (tm ) remains low. The resulting DWFP feature vector for EPC signal sˆi, j (t) is xi, j,km = Pi, j,n m (tm ).
(7.6)
7 Classification of RFID Tags with Wavelet Fingerprinting
221
Fig. 7.8 The DWFP is applied to an EPC signal sˆi, j (t), shown in gray (a). A close-up of the signal is shown for clarity (b), from which the fingerprint image (c) is generated, shown here with white peaks and gray valleys for distinction. Each fingerprint object is individually labeled and localized in both time and scale (d). A variety of measures are extracted from each fingerprint and interpolated in time, including the area of the on-pixels for each object (e), as well as the coefficients of a fourthorder polynomial ( p1 x 4 + p2 x 3 + p3 x 2 + p4 x + p5 ) fit to the boundary of each fingerprint object, with coefficients p3 shown here (f)
222
C. A. Miller and M. K. Hinders
7.5.2 Wavelet Packet Decomposition Another wavelet-based feature used in classification is generated by wavelet packet decomposition [28]. First, each EPC signal is filtered using a stationary wavelet transform, removing the first three levels of detail as well as the highest approximation level. A wavelet packet transform (WPT) is applied to the filtered waveform with a specified mother wavelet and the number of levels to decompose the waveform, generating a tree of coefficients similar in nature to the continuous wavelet transform. From the WPT tree, a vector containing the percentages of energy corresponding to the T terminal nodes of the tree is computed, known as the wavelet packet energy. Because the WPT is an orthonormal transform, the entire energy of the signal is preserved in these terminal nodes [29]. The energy matrix E j for each RFID tag τi can then be represented as
E i = e1,i , e2,i , . . . , e N ,i ,
(7.7)
where N is the number of EPCs extracted from tag τi and ei, j [b] is the energy from bin number b = 1, . . . , T of the energy map for signal j = 1, . . . , N . Singular value decomposition is then applied to each energy matrix E i : E i = Ui Σi Vi∗ ,
(7.8)
where Ui is composed of T -element left singular column vectors ub,i
Ui = u1,i , u2,i , . . . , uT,i , .
(7.9)
The Σi matrix is a T × N singular value matrix. The row space and nullspace of E i are defined in the N × N matrix Vi∗ , and are not used in the analysis of the energy maps. For the energy matrices E i , we found that there was a dominant singular value relative to the second highest singular value, implying that there was a dominant representative energy vector corresponding to the first singular vector u1,i . From the set of all singular vectors ub,i , the significant bins that have energies above a given threshold are identified. The threshold is lowered until all the vectors return a common significant bin. Finally, the WPT elements corresponding to the extracted bin are used as features. In the case of multiple bins being selected, all corresponding WPT elements are included in the feature set. Wavelet packet decomposition uses redundant basis functions and can therefore provide arbitrary time–frequency resolution details, improving upon the wavelet transform when analyzing signals containing close, high-frequency components.
7 Classification of RFID Tags with Wavelet Fingerprinting
223
7.5.3 Statistical Features Several statistical features were generated from the raw EPC signals sˆi, j (t): 1. The mean of the raw signal μi, j =
1 sˆi, j (t), |ˆs | t
where |ˆs | is the length of sˆi, j (t) 2. The maximum cross-correlation of sˆi, j (t) with another EPC from the same tag, sˆi,k (t), where τ j = τk
max
sˆi,∗ j (t)ˆsi,k (t
+ τ) .
t
3. The Shannon entropy
sˆi,2 j (t) ln(ˆsi,2 j (t)).
t
4. The unbiased sample variance 1 (ˆsi, j (t) − μi, j )2 . |ˆs | − 1 t 5. The skewness (third central moment) 1
σi,3 j |ˆs |
t
(ˆsi, j (t) − μi, j )3 .
6. The kurtosis (fourth central moment) κi, j =
1 σi,4 j |ˆs |
(ˆs j (t) − μi, j )4 . t
Statistical moments provide insight by highlighting outliers due to any specific flawtype signatures found in the data.
7.5.4 Mellin Features The Mellin transform is an integral transform, closely related to the Fourier transform and the Laplace transform, that can represent a signal in terms of a physical attribute similar to frequency known as scale. The β-Mellin transform is defined as [30]
224
C. A. Miller and M. K. Hinders
∞
M f ( p) =
sˆ (t)t p−1 dt,
(7.10)
0
for the complex variable p = − jc + β, with fixed parameter β ∈ R and independent variable c ∈ R. This variation of the Mellin transform is used because the β parameter allows for the selection of a variety of more specific transforms. In the case of β = 1/2, this becomes a scale-invariant transform, meaning invariant to compression or expansion of the time axis while preserving signal energy, defined on the vertical line p = − jc + 1/2. This scale transform is defined as 1 D f (c) = √ 2π
∞
sˆ (t)e(− jc−1/2) ln t dt.
(7.11)
0
This transform has the key property of scale invariance, which means that sˆ is a scaled version of a function sˆ , and they will have the same transform magnitude. Variations in each RFID tag’s local oscillator can lead to slight but measurable differences in the frequency of the returned RF signal, effectively scaling the signal. Zanetti et al. call this the time interval error (TIE), and extract the TIE directly to use as a feature for individual tag classification [21]. We observed this slight scaling effect in our data and therefore explore the use of a scale-invariant feature extraction technique. Mellin transform’s relationship with the Fourier transform can be highlighted by setting β = 0, which results in a logarithmic-time Fourier transform:
M f (c) =
∞
−∞
sˆ (t)e− jc(ln t) d(ln t).
(7.12)
Similarly, the scale transform of a function sˆ (t) can be defined using the Fourier transform of g(t) = sˆ (et ):
M f (c) =
∞
−∞
g(t)e− jct d(t) = F (g(t)).
(7.13)
References [30, 33] discuss the complexities associated with discretizing the fast Mellin transform (FMT) algorithm, as well as provide a MATLAB-based implementation.6 The first step in implementing this is to define both an exponential sampling step along with the number of samples needed for a given signal in order to exponentially resample it, an example of which can be seen in Fig. 7.9. Once the exponential axis has been defined, an exponential point-by-point multiplication with the original signal is performed. A fast Fourier transform (FFT) is then computed, followed by an energy normalization step. This process is summarized in Fig. 7.10.
6 http://profs.sci.univr.it/~desena/FMT.
7 Classification of RFID Tags with Wavelet Fingerprinting
225
Fig. 7.9 An example of uniform sampling (in blue, top) and exponential sampling (in red, bottom)
Once the Mellin transform is computed, features are extracted from the resulting Mellin domain including the mean of the Mellin transform, as well as the standard deviation, the variance, the second central moment, the Shannon entropy, the kurtosis, and the skewness of the mean-removed Mellin transform [31, 32].
7.6 Classifier Design Since the goal of our classification routine is to distinguish individual RFID tags from nominally identical copies, each individual RFID tag is assigned a unique class label. This results in a multiclass problem, with the number of classes being equivalent to the number of tags being compared. There are two main methods which can be used to address multiclass problems. The first uses classifiers that have multi-dimensional discriminant functions, which often output classification probabilities for each test object that then need to be reduced for a final classification decision. The second method uses a binary comparison between all possible pairs of classes utilizing a two-class discriminant function, with a voting procedure used to determine final classification. We have discussed in Sect. 7.2 our choice of the binary classification approach, allowing us to include intrinsically two-class discriminants in our analysis. Therefore, only two tags will be considered against each other at a time, a classifier tag τc ∈ D R and a testing tag τt ∈ DT , where D R represents the training dataset used and DT the testing dataset, outlined in Table 7.1. For each binary combination of tags (τc , τt ), a training set (R) is generated composed of feature vectors from k-many EPCs associated with tags τc and τt from
226
C. A. Miller and M. K. Hinders
Fig. 7.10 A raw EPC signal si, j (t) (top, left), the exponentially resampled axis (top, right), the signal resampled according to the exponential axis (middle, left), this same signal after pointby-point multiplication of the exponential axis (middle, right), and the resulting Mellin domain representation (bottom)
dataset D R . Corresponding known labels (ωk ) are ωk = 1 when k ∈ c, and ωk = −1 when k ∈ t. The testing set (T ) is composed of feature vectors of tag τt only from dataset DT , where predicted labels are denoted yk = ±1. In other words, the classifier is trained on data from both tags in the training dataset and tested on the testing tag only from the testing dataset. When D R = DT , which means the classifier is trained and tested on the same dataset, a holdout algorithm is used to split the data into R and T . The problem of class imbalance, where the class labels ωk are unequally distributed (i.e., |ωk = 1| |ωk = −1|, or vice versa), can affect classifier performance
7 Classification of RFID Tags with Wavelet Fingerprinting
227
and has been a topic of further study by several researchers [34–36]. While the number of EPCs extracted from each tag here does not present a significant natural imbalance as all recordings are approximately the same length in time, it is not necessarily true that the natural distribution between classes, or even a perfect 50:50 distribution, is ideal. To explore the effect of class imbalance on the classifier performance, a variable ρ is introduced here, defined as ρ=
|ωk = −1| , k ∈ R. |ωk = 1|
(7.14)
This variable defines the ratio of negative versus positive EPC labels in R, with ρ ∈ Z+ . When ρ = 1, the training set T contains an equal number of EPCs from tags τc as it does τt where under-sampling is used as necessary for equality. As ρ increases, additional tags are included at random from τm , m = c, t with ωm = −1 until ρ is satisfied. When all of the tags in D R are included in the training set, ρ is denoted as “all.” The process of selecting which classifiers to use is a difficult problem. The No Free Lunch Theorem states that there is no inherently best classifier for a particular application, and in practice several classifiers are often compared and contrasted. There exists a hierarchy of possible choices that are application dependent. We have previously determined that supervised, statistical pattern classification techniques using both parametric and nonparametric probability-based classifiers are appropriate for consideration. For parametric classifiers, we include a linear classifier using normal densities (LDC) and a quadratic classifier using normal densities (QDC). For nonparametric classifiers, we include a k-nearest-neighbor classifier (KNNC) for k = 1, 2, 3, and a linear support vector machine (SVM) classifier. The mathematical explanations for these classifiers can be found in [37–59]. For implementation of these classifier functions, we use routines from the MATLAB toolbox PRTools [58]. For the above classifiers that output densities, a function is applied that converts the output to a proper confidence interval, where the sum of the outcomes is one for every test object. This allows for comparison between classifier outputs. Since each EPC’s feature vector is assigned a confidence value for each class, the final label is decided by the highest confidence of all the classes.
7.7 Classifier Evaluation Since we have implemented a binary classification algorithm, a confusion matrix L (c, t), where τc is the classifier tag and τt is the testing tag, can be used to view the results of a given classifier. Each entry in a confusion matrix represents the number of EPCs from the testing tag that are labeled as the classifier tag, denoted by a label of yt = 1, and is given by
228
C. A. Miller and M. K. Hinders
L (c, t) =
|yt = 1| when τc ∈ R, τt ∈ T. |yt |
(7.15)
A perfect classifier would therefore have values of L = 1 whenever τc = τt (on the diagonal) and values of L = 0 when τc = τt (off-diagonal). Given the number of classifier configuration parameters used in this study, it does not make sense to compare individual confusion matrices to each other to determine classifier performance. Each entry of the confusion matrix is a measure of the number of EPCs from each testing set that is determined to belong to each possible training class. We can therefore apply a threshold h to each confusion matrix, where the value of h lies within the range [0, 1]. All confusion matrix entries that are above this threshold are positive matches for class membership, and all entries below the threshold are identified as negative matches for class membership. It follows that we can determine the number of false positive ( f + ), false negative, ( f − ), true positive(t+ ), and true negative (t− ) rates for each confusion matrix, given by f + (h) = t+ (h) = f − (h) = t− (h) =
|L (c, t) > h|, c = t |L (c, t) > h|, c = t |L (c, t) ≤ h|, c = t |L (c, t) ≤ h|, c = t.
(7.16)
From these values, we can calculate the sensitivity (χ ) and specificity (ψ), χ (h) = ψ(h) =
t+ (h) t+ (h)+ f − (h) t− (h) . t− (h)+ f + (h)
(7.17)
The concept of sensitivity and specificity values is inherent in binary classification, where testing data is identified as either a positive or negative match for each possible class. High values of sensitivity indicate that the classifier successfully classified most of the testing tags whenever the testing tag and classifier tag were the same, while high values of specificity indicate that the classifier successfully classified most of the testing tags being different than the classifier tag whenever the testing tag and the classifier tag were not the same. Since sensitivity and specificity are functions of the threshold h, they can be plotted with sensitivity (χ (h)) on the y-axis and 1 − specificity (1 − ψ(h)) on the x-axis for 0 < h ≤ 1 in what is known as a receiver operatic characteristic (ROC) [60]. The resulting curve on the ROC plane is essentially a summary of the sensitivity and specificity of a binary classifier as the threshold for discrimination changes. Points on the diagonal line y = x represent a result as good as random guessing, where classifiers performing better than chance have curves above the diagonal in the upper left-hand corner of the plane. The point (0, 1) corresponding to χ = 1 and ψ = 1 represents perfect classification. The area under each classifier’s ROC curve (|AUC|) is a common measure of a classifier’s performance, and is calculated in practice using simple trapezoidal integration. Higher |AUC| values generally correspond to classifiers with better performance [61]. This is not a strict rule, however, as a classifier with a higher |AUC|
7 Classification of RFID Tags with Wavelet Fingerprinting
229
Fig. 7.11 A comparison of confusion matrices L for classifiers of varying performance, with 0 → black and 1 → white. A perfect confusion matrix has values of L = 1 whenever τc = τt , seen as a white diagonal here, and values of L = 0 whenever τc = τt , seen as a black off-diagonal here. In general, |AUC| = 1 corresponds to a perfect classifier, while |AUC| = 0.5 performs as well as random guessing. This trend can be seen in the matrices
may perform worse in specific areas of the ROC plane than another classifier with a lower |AUC| [62]. Several examples of confusion matrices can be seen in Fig. 7.11, where each corresponding |AUC| value is provided to highlight their relationship to performance. It can be seen that the confusion matrix with the highest |AUC| has a clear, distinct diagonal of positive classifications while the lowest |AUC| has positive classifications scattered throughout the matrix. The use of |AUC| values for directly comparing classifier performance has recently been questioned [63, 64], identifying the information loss associated with summarizing the ROC curve distribution as a main concern. We therefore do not use |AUC| as a final classifier ranking measure. Rather, they are only used here to narrow the results down from all the possible classifier configurations to a smaller subset of the “best” ones. The values of χ (h) and ψ(h) are still useful measures of the remaining top classifiers. At this point, however, they have been calculated for a range of threshold values that extend over 0 < h ≤ 1. A variety of methods can be used to
230
C. A. Miller and M. K. Hinders
Fig. 7.12 ROC curves for the classifiers corresponding to the confusion matrices in Fig. 7.11, with |AUC| = 0.9871 (dotted line), |AUC| = 0.8271 (dash-dot line), |AUC| = 0.6253 (dashed line), and |AUC| = 0.4571 (solid line). The “perfect classifier” result at (0, 1) in this ROC space is represented by the black star (∗), and each curve’s closest point to this optimal result at threshold hˆ is indicated by a circle (◦)
determine a final decision threshold h for a given classifier configuration, the choice of which depends heavily on the classifier’s final application. A popular approach involves sorting by the minimum number of misclassifications, min( f + + f − ); however, this does not account for differences in severity between the different types of misclassifications [65]. Instead, the overall classifier results were sorted here using their position in the ROC space corresponding to the Euclidean distance from the point (0, 1) as a metric. Formally, this is dROC (h) =
(χ − 1)2 + (1 − ψ)2 .
(7.18)
For each classifier configuration, the threshold value hˆ corresponding to the minimum distance was determined, hˆ = arg min dROC (h) = {h|∀h : dROC (h ) ≥ dROC (h)}.
(7.19)
h
In other words, hˆ is the threshold value corresponding to the point in the ROC space that is closest to the (0, 1) “perfect classifier” result. The classifier configurations ˆ Figure 7.12 shows an example of the are then ranked by the lowest distance dROC (h). ROC curves for the classifiers that are generated from the confusion matrices found in Fig. 7.11. In it, the point corresponding to hˆ is indicated by a circle, with the (0, 1) point indicated by a star.
7 Classification of RFID Tags with Wavelet Fingerprinting
231
7.8 Results 7.8.1 Frequency Comparison The ultra-high-frequency (UHF) range of RFID frequencies spans from 868– 928 MHz; however, in North America UHF can be used unlicensed from 902–928 MHz (± 13 MHz from a 915 MHz center frequency). We test the potential for pattern classification routines to uniquely identify RFID tags at several operating frequencies within this range. Data collected at three frequencies (902, 915, and 928 MHz) while being held at a single orientation (PL) were used as training and testing frequencies for the classifier. Only amplitude (α(t)) signal compression was used in this frequency comparison. Table 7.4 shows the top individual classifier configuration for the RFID reader operating frequency comparison. Results are presented as sensitivity and specificity values for the threshold value hˆ that corresponds to the minimum distance dROC in the ROC space. Similarly, confusion matrices are presented in Fig. 7.13 for each classifier configuration listed in Table 7.4. ˆ over all values of the classifier conTable 7.4 The classifier configurations ranked by dROC (h) figuration variables when trained and tested on the frequency parameters 902, 915, and 928 MHz. D R and DT correspond to the training and testing datasets, respectively. ρ represents the ratio of negative versus positive EPCs in the training set. The threshold hˆ corresponding to the minimum ˆ and ψ(h) ˆ distance dROC is presented, along with the values of χ(h) DR
DT
|AUC|
Classifier configuration #DWFP
Classifier
ρ
Results ˆ ˆ χ (h) ψ(h)
hˆ (%)
Features (ν)
Accuracy (%)
902
902
100
QDC (MATLAB)
3
0.9983
1.000
0.997
96.6
99.7
902
915
1
LDC (PRTools)
5
0.9898
1.000
0.943
8.5
94.6
902
928
50
3NN
1
0.9334
0.960
0.950
10.9
95.0
915
902
1
LDC (PRTools)
12
0.4571
0.640
0.543
2.3
54.7
915
915
1
QDC (MATLAB)
9
0.9977
1.000
0.995
82.2
99.5
915
928
10
LDC (MATLAB)
all
0.5195
0.720
0.538
1.9
54.6
928
902
10
1NN
3
0.4737
0.520
0.757
9.1
74.7
928
915
1
LDC (MATLAB)
7
0.6587
0.880
0.498
2.0
51.4
928
928
75
QDC (MATLAB)
2
1.0000
1.000
1.000
86.9
100.0
C. A. Miller and M. K. Hinders 25
20
20
20
15 10
15 10
5
5
1
1
1
5
10 15 τ t ∈ DT
20
25
τ c ∈ DR
25
τ c ∈ DR
25
15 10 5
1
5
10 15 τ t ∈ DT
20
1
25
25
20
20
20
15 10
15 10
5
5
1
1
1
5
10 15 τ t ∈ DT
20
25
τ c ∈ DR
25
τ c ∈ DR
25
1
5
10 15 τ t ∈ DT
20
1
25
20
20
15 10 5
1
1
1
5
10 15 τ t ∈ DT
20
25
τ c ∈ DR
20
5
10 15 τ t ∈ DT
20
25
1
5
10 15 τ t ∈ DT
20
25
1
5
10 15 τ t ∈ DT
20
25
5
25
10
5
10
25
15
1
15
25
τ c ∈ DR
τ c ∈ DR
τ c ∈ DR
τ c ∈ DR
232
15 10 5
1
5
10 15 τ t ∈ DT
20
25
1
Fig. 7.13 Confusion matrices (L ) for the classifier configurations corresponding to the minimum ˆ over all combinations of (D R , DT ) where D R , DT ∈ 902, 915, 928 MHz. Values distance dROC (h) of L range from [0, 1] with 0 → black and 1 → white here
The classifier performed well when trained on the dataset collected at 902 MHz, regardless of what frequency the testing data was collected. Accuracies were above ˆ and specificity (ψ(h)) ˆ 94.6% for all three testing frequencies, and sensitivity (χ (h)) values were all above 0.943, very close to the ideal value of 1.000. These confusion matrices shown in Fig. 7.13 all display the distinct diagonal line which indicates accurate classification. When the classifier was trained on either the 915 MHz or 928 MHz datasets, however, the classification accuracy was low. Neither case was able to identify tags from other frequencies very well, even though they did well classifying tags from their own frequency. When D R = 915 MHz and DT = 928 MHz, for example, the |AUC| value was only 0.5195, not much higher than the 0.5000 value associated with random guessing. The corresponding confusion matrix
7 Classification of RFID Tags with Wavelet Fingerprinting
233
shows no diagonal but instead vertical lines at several predicted tag labels, indicating that the classifier simply labeled all of the tags as one of these values.
7.8.2 Orientation Comparison A second variable that is inherent in real-world RFID application is the orientation relative to the antenna at which the RFID tags are read. This is one of the main reasons why RFID technology is considered advantageous compared to traditional barcodes; however, antenna design and transmission power variability result in changes in the size and shape of the transmission field produced by the antenna [66]. It follows that changes in the tag orientation relative to this field will result in changes in the pre-demodulated RF signals. To test how the pattern classification routines will behave with a changing variable like orientation, data was collected at three different orientations (PL, OB, and UD) while being held at a common operating frequency (902 MHz). This data was used as training and testing sets for the classifiers. Again, only amplitude (α(t)) signal compression was used. Table 7.5 shows the top individual classifier configuration for the RFID tag orientation comparison. Results are presented as sensitivity and specificity values for a threshold value hˆ that corresponds to the minimum distance dROC in the ROC space. Similarly, confusion matrices are presented in Fig. 7.14 for each classifier configuration listed in Table 7.5. Similar to the frequency results, the classification results again show a single orientation that performs well as a training set regardless of the subsequent tag orientation of the testing set. When trained on data collected at the parallel (PL) orientation, the classification accuracies range from 94.9 to 99.7% across the three ˆ range from 0.880–1.000, meaning that over testing tag orientations. Values of χ (h) ˆ values range from 0.952 88% of the true positives are correctly identified, and ψ(h) to 0.997, indicating that over 95% of the true negatives are accurately identified as well. These accuracies are verified in the confusion matrix representations found in Fig. 7.13. When the classifiers are trained on either the oblique (OB) or upside-down (UD) orientations, we again see that the classifiers struggles to identify testing data from alternate tag orientations. The best performing of these results is for D R = ˆ = 0.920 and ψ(h) ˆ = 0.770 suggesting accurate true OB and DT = PL, where χ (h) positive classification with slightly more false positives as well, resulting in an overall accuracy of 77.6%. When D R = UD, the testing results are again only slightly better than random guessing, with |AUC| values of 0.5398 for DT = PL and 0.5652 for DT = OB.
234
C. A. Miller and M. K. Hinders
ˆ over all values of the classifier configTable 7.5 The classifier configurations ranked by dROC (h) uration variables when trained and tested on the orientation parameters PL, UD, and OB. D R and DT correspond to the training and testing datasets, respectively. ρ represents the ratio of negative versus positive EPCs in the training set. The threshold hˆ corresponding to the minimum distance ˆ and ψ(h) ˆ dROC is presented, along with the values of χ(h) DR
DT
|AUC|
Classifier configuration #DWFP
Classifier
ρ
Results χ
ψ
h (%)
Features (ν)
Accuracy (%)
PL
PL
1
QDC (MATLAB)
13
0.9979
1.000
0.997
88.9
99.7
PL
UD
75
3NN
1
0.9489
0.960
0.953
12.1
95.4
PL
OB
20
1NN
1
0.8627
0.880
0.952
2.8
94.9
UD
PL
10
LDC (MATLAB)
19
0.5398
0.680
0.658
2.9
65.9
UD
UD
1
QDC (MATLAB)
5
0.9994
1.000
0.995
73.6
99.5
UD
OB
5
LDC (MATLAB)
15
0.5652
0.680
0.622
1.9
62.4
OB
PL
10
LDC (MATLAB)
13
0.8250
0.920
0.770
5.8
77.6
OB
UD
5
1NN
4
0.6042
0.760
0.622
1.9
62.7
OB
OB
75
QDC (MATLAB)
2
1.0000
1.000
1.000
47.7
100.0
7.8.3 Different Day Comparison We next present classification results when data recorded on multiple days were used as training and testing datasets. The following analysis provides a better understanding of how signals taken from the same tag, same frequency, same orientation, but in subsequent recordings on multiple days are comparable to each other. It is important to note that the data used here was collected with the RFID tag being held by hand above the antenna. While it was held as consistently as possible, it was not fixed in position. Additionally, each subsequent recording was done when environmental conditions were intentionally different from the previous recordings (humidity, temperature, etc.). Data was collected on four different days (Day 1, 2, 3, and 4). This data was used as training and testing sets for the classifiers. Amplitude (α(t)) and EPC error (e E PC (t)) were both used as signal compression methods. Table 7.6 shows the top individual classifier configuration for the different day tag recording comparison. Results are presented as sensitivity and specificity values for a threshold value hˆ that corresponds to the minimum distance dROC in the ROC space. Similarly, confusion matrices are presented in Fig. 7.15 for each classifier configuration listed in Table 7.6.
235 25
20
20
20
15 10
15 10
5
5
1
1
1
5
10 15 τ t ∈ DT
20
25
τ c ∈ DR
25
τ c ∈ DR
25
15 10 5
1
5
10 15 τ t ∈ DT
20
1
25
25
20
20
20
15 10
15 10
5
5
1
1
1
5
10 15 τ t ∈ DT
20
25
τ c ∈ DR
25
τ c ∈ DR
25
1
5
10 15 τ t ∈ DT
20
1
25
20
20
15 10 5
1
1
1
5
10 15 τ t ∈ DT
20
25
τ c ∈ DR
20
5
10 15 τ t ∈ DT
20
25
1
5
10 15 τ t ∈ DT
20
25
1
5
10 15 τ t ∈ DT
20
25
5
25
10
5
10
25
15
1
15
25
τ c ∈ DR
τ c ∈ DR
τ c ∈ DR
τ c ∈ DR
7 Classification of RFID Tags with Wavelet Fingerprinting
15 10 5
1
5
10 15 τ t ∈ DT
20
25
1
Fig. 7.14 Confusion matrices (L ) for the classifier configurations corresponding to the minimum ˆ over all combinations of (D R , DT ) where D R , DT ∈ PL, UD, OB. Values of L distance dROC (h) range from [0, 1] with 0 → black and 1 → white here
The first thing to note in these results is the prevalence of the EPC error (e E PC (t)) signal compression compared to the amplitude (α(t)) signal compression. This suggests that e E PC (t) is more able to correctly classify the RFID tags than the raw signal amplitude. Unlike the two previous sets of results, where one frequency and one orientation classified well compared to the others, there is no dominant subset here. All the different days were classified similarly when tested against each other. This is expected, since there should be no reason data trained on a specific day should perform better than any other. |AUC| values were mainly above 0.6700 yet below 0.7500, with accuracies ranging from 63.6 to 80.9% when D R = DT . The confusion matrix representations of these classification results (Fig. 7.15) again indicate there is no single dominant training subset. We see that D R = DT
236
C. A. Miller and M. K. Hinders
ˆ over all values of the classifier configuTable 7.6 The classifier configurations ranked by dROC (h) ration variables when trained and tested on the different day parameters Day 1, 2, 3, and 4. D R and DT correspond to the training and testing datasets, respectively. ρ represents the ratio of negative versus positive EPCs in the training set. The threshold hˆ corresponding to the minimum distance ˆ and ψ(h) ˆ dROC is presented, along with the values of χ(h) DR
DT
|AUC|
Classifier configuration EPC Comp.
#DWFP Features (ν )
Classifier
ρ
Results χ
ψ
h (%)
Accuracy (%)
Day 1
Day 1
α, e E PC 15
QDC (MATLAB)
2
0.9949
1.000
0.986
23.9
98.7
Day 1
Day 2
e E PC
20
LDC (MATLAB)
4
0.7432
0.867
0.657
39.5
67.1
Day 1
Day 3
e E PC
10
1NN
12
0.6735
0.800
0.662
9.1
67.1
Day 1
Day 4
e E PC
10
QDC (MATLAB)
5
0.7287
0.667
0.724
37.6
72.0
Day 2
Day 1
e E PC
20
LDC (MATLAB)
10
0.7443
0.800
0.748
42.4
75.1
Day 2
Day 2
α, e E PC 20
QDC (MATLAB)
2
0.9990
1.000
0.986
42.4
98.7
Day 2
Day 3
e E PC
20
3NN
1
0.7990
0.800
0.790
52.6
79.1
Day 2
Day 4
e E PC
1
SVM
1
0.7083
0.800
0.733
20.1
73.8
Day 3
Day 1
e E PC
1
SVM
1
0.7014
0.867
0.619
21.9
63.6
Day 3
Day 2
e E PC
15
3NN
8
0.6919
0.800
0.719
4.6
72.4
Day 3
Day 3
α, e E PC 50
QDC (MATLAB)
5
1.0000
1.000
1.000
72.8
100.0
Day 3
Day 4
e E PC
3NN
7
0.6390
0.800
0.648
4.6
65.8
Day 4
Day 1
α, e E PC 1
3NN
3
0.7705
0.800
0.710
17.9
71.6
Day 4
Day 2
e E PC
5
1NN
3
0.7395
0.733
0.719
29.8
72.0
Day 4
Day 3
α
1
3NN
1
0.7422
0.667
0.819
57.6
80.9
Day 4
Day 4
α, e E PC 50
LDC (PRTools)
1
1.0000
1.000
1.000
95.7
100.0
50
results all show distinct diagonal lines, even with D R , DT = Day 4 where there are additional high off-diagonal entries in the matrix. This is indicated in Table 7.6 by ˆ value of 95.7. When D R = DT , there are still faint a relatively high threshold (h) diagonal lines present in some of the confusion matrices. For example, when D R = Day 2 and DT = Day 3, diagonal entries coming out of the lower left-hand corner are somewhat higher in accuracy (closer to white in the confusion matrix) than their surrounding off-diagonal entries. We see in Table 7.6 that this classifier has an |AUC| equal to 0.7990 and a 79.1% overall accuracy.
40
35
35
35
35
30
26 35 τ t ∈ DT
40
30
26 26
30
35 τ t ∈ DT
40
26 26
30
35 τ t ∈ DT
40
40
40
35
35
35
35
30
30
26
26 26
30
35 τ t ∈ DT
30
35 τ t ∈ DT
40
30
35 τ t ∈ DT
40
35
35
35
35
30
26
26 26
30
35 τ t ∈ DT
30
35 τ t ∈ DT
40
30
35 τ t ∈ DT
40
35
35
35
35
30
26
26 26
30
35 τ t ∈ DT
40
τ c ∈ DR
40
τ c ∈ DR
40
τ c ∈ DR
40
30
30
35 τ t ∈ DT
40
26
30
35 τ t ∈ DT
40
26
30
35 τ t ∈ DT
40
26
30
35 τ t ∈ DT
40
30
26 26
40
26 26
40
30
35 τ t ∈ DT
30
26 26
40
τ c ∈ DR
40
τ c ∈ DR
40
τ c ∈ DR
40
30
30
26 26
40
30
26
30
26 26
40
τ c ∈ DR
40
τ c ∈ DR
40
τ c ∈ DR
τ c ∈ DR
30
30
τ c ∈ DR
30
26 26
τ c ∈ DR
40
τ c ∈ DR
40
30
τ c ∈ DR
237
40
τ c ∈ DR
τ c ∈ DR
7 Classification of RFID Tags with Wavelet Fingerprinting
26 26
30
35 τ t ∈ DT
40
Fig. 7.15 Confusion matrices (L ) for the classifier configurations corresponding to the minimum ˆ over all combinations of (D R , DT ) where D R , DT ∈ Day 1, 2, 3, and 4. Values distance dROC (h) of L range from [0, 1] with 0 → black and 1 → white here
7.8.4 Damage Comparison We next present the results of the RFID tag damage analysis to explore how physical degradation affects the RFID signals and the resulting classification accuracy. The datasets from Day 1, 2, 3, and 4 are combined and used here as a single training set. The tags which make up this dataset, AD26 − AD40, are split into two subsets: tags AD26 − AD32 were subjected to water damage study, while tags AD33 − AD40 were subjected to a physical damage study. The AD-612 tags are not waterproof nor are they embedded in a rigid shell of any kind, although many RFID tags exist that are sealed to the elements and/or encased in a shell for protection. For the water damage, each tag was submerged in water for 3 hours, at which point they were patted dry
238
C. A. Miller and M. K. Hinders
ˆ over all values of the classifier configuTable 7.7 The classifier configurations ranked by dROC (h) ration variables when trained and tested on the tag damage comparisons for both water and physical damage. D R and DT correspond to the training and testing datasets, respectively. ρ represents the ratio of negative versus positive EPCs in the training set. The threshold hˆ corresponding to the ˆ and ψ(h) ˆ minimum distance dROC is presented, along with the values of χ(h) DR
DT
|AUC|
Classifier configuration EPC
#DWFP
Comp.
Features (ν )
Classifier
ρ
Results χ
ψ
h (%)
Accuracy (%)
Day 1, 2, 3, 4
Wet
α
1
SVM
1
0.6361
0.714
0.786
80.1
73.8
Day 1, 2, 3, 4
Wet-todry
α
1
3NN
17
0.7789
0.857
0.738
4.8
75.5
Day 1, 2, 3, 4
Light α Damage
5
1NN
16
0.7589
0.750
0.839
17.9
82.8
Day 1, 2, 3, 4
Heavy α Damage
20
LDC (PRTools)
7
0.7980
1.000
0.589
44.5
64.1
to remove any excess water and used to collect data (labeled as Wet). They were then allowed to air-dry overnight, and were again used to collect data (Wet-to-dry). For the physical damage, each tag was first gently crumpled by hand (light damage) and subsequently balled up and then somewhat flattened (heavy damage), with data being collected after each stage. Table 7.7 shows the top individual classifier configuration for the two RFID tag damage comparisons. Results are presented as sensitivity and specificity values for a threshold value hˆ that corresponds to the minimum distance dROC in the ROC space. Similarly, confusion matrices are presented in Fig. 7.16 for each classifier configuration listed in Table 7.7. The RFID tag damage classification results provide similar to the previous different day comparison. The water damage did not seem to have a severe effect on the classification accuracy, while the more severe physical damage showed lower classiˆ and ψ(h) ˆ values, the heavy fier accuracy. However, rather than relatively equal χ (h) ˆ ˆ damage resulted in values of χ (h) = 1.000 and ψ(h) = 0.589, which means that the classifier was optimistically biased and over-classified positive matches. This lower accuracy was not unexpected, as deformation of the tag’s antenna should distort the RF signal and therefore the classifier’s ability to identify a positive match for the tag.
7.9 Discussion The results presented above suggest that a dominant reader frequency, 902 MHz in this case, may exist at which data can be initially collected for the classifier to be trained on and then used to correctly identify tags read at alternate frequencies. In our analysis, we have explored reader frequencies that span the North American UHF range, yet were only part of the full 865–928 MHz UHF range for which the AD-612
30
τ c ∈ DR
τ c ∈ DR
7 Classification of RFID Tags with Wavelet Fingerprinting
26
30
26 30 τ t ∈ DT
τ c ∈ DR
26
τ c ∈ DR
239
37
33
26
30 τ t ∈ DT
33
37 τ t ∈ DT
37
33 33
37 τ t ∈ DT
Fig. 7.16 Confusion matrices (L ) for the classifier configurations corresponding to the minimum ˆ over all combinations of (D R , DT ) where D R = Day 1, 2, 3, and 4, and DT ∈ wet, distance dROC (h) wet-to-dry, light damage, and heavy damage. Values of L range from [0, 1] with 0 → black and 1 → white here
tags used here were optimized. Therefore, the dominant 902 MHz read frequency we observed lies at the center of the actual tags operating frequency range. It is of no surprise that the tags perform best at the center of their optimized frequency range rather than at the upper limit. Similarly, a classifier can be trained on a tag orientation (relative to the reader antenna) that may result in accurate classification of RFID tags regardless of their subsequent orientation to the reader antenna. Antenna design for both the readers and the tags is an active field of research [2], and it is expected that the RF field will be non-uniform around the antennas. This could explain why only one of the experimental orientations used here performs better than the others. Regardless of the field strength, however, the unique variations in the RF signature of an RFID tag should be present. It is promising that the classifier still had an accuracy of over 60% with these variations, and up to 94.9% accuracy if trained on the parallel (PL) orientation.
240
C. A. Miller and M. K. Hinders
Changes in the environmental conditions, like ambient temperature and relative humidity, were also allowed in the different day study where the RFID tags were suspended by hand near the antenna (in generally the same spot) for data collection on successive afternoons. It is important to note that the tags were not fixtured for this study, and that slight variations in both distance to the reader as well as orientation were inherent due to the human element. Even so, the classifier was generally able to correctly identify the majority of the RFID tags as being either a correct match or a correct mismatch when presented with a dataset it had never seen before, with accuracies ranging from 63.6 to 80.9%. This study represents a typical real-world application of RFID tags due to these environmental and human variations. As previously mentioned, the EPC compression method tended to favor the EPC error signal e E PC (t), although there was not a large difference in classifier performance between the different day comparison that used both α(t) and e E PC (t) compression, and the frequency/orientation comparisons that used only α(t) compression. The parameter ρ had a large spread of values across the classifiers, indicating that the classification results may not be very sensitive to class imbalance within the training set. The number of DWFP features also shows no consistent trend in our results, other than being often larger than 1, indicating that there may be room for feature reduction. With any application of pattern classification, a reduction in the feature space through feature selection can lead to improved classification results [44]. Individual feature ranking is one method that can be used to identify features on a one-by-one basis; however, it can overlook the usefulness of combining feature variables. In combination with a nested selection method like sequential backward floating search (SBFS), the relative usefulness of the DWFP features, as well as the remaining features, can be evaluated [51]. The results in Tables 7.4–7.6 where D R = DT are comparable to those observed by Bertoncini et al. [20], with some classifiers having 100% accuracy and the rest near 99%. In these instances, the classifier was trained on subsets of the testing data, so it is expected that the classifier performs better in these cases. It is also important to note that the final decision threshold hˆ used can still vary greatly depending on the classifier’s application. It is important to note that adjusting the classifier’s final threshold value does not alter the classification results of said classifier. The |AUC| takes into account all possible threshold values, and is therefore fixed for each classifier configuration. The threshold values only determine the distribution of error types, χ vs. ψ, within the results. Aside from the minimum dROC metric, weights can be applied to determine an alternate threshold if the application calls for a trade-off between false negative and false positive results. For example, if a user is willing to allow up to five false positives before allowing a false negative, a minimizing function can be used to identify this weighted optimal threshold. A comparison of different methods to determine a final threshold can be seen in Table 7.8, where the classifier configuration trained on Day 1 data and tested on Day 4 data from Table 7.6 is presented for several alternate threshold values. ˆ as was previously First, the threshold is shown for the minimum distance dROC (h), presented in Table 7.6. The threshold is then shown for the minimum number of total misclassifications ( f + + f − ), followed by minimum number of false positives ( f + ),
7 Classification of RFID Tags with Wavelet Fingerprinting
241
Fig. 7.17 A plot of sensitivity χ(h) (solid line) and specificity ψ(h) (dashed line) versus threshold for the classifier trained on Day 1 and tested on Day 4 from Table 7.6. Threshold values are shown corresponding to the min( f + + f − ) (h = 63.4, dash-dot line) as well as hˆ (h = 37.6, dotted line). The threshold value determines the values of χ and ψ, and is chosen based on the classifier’s final application
and then by that of the lowest number of false negatives ( f − ). Several weighting ratios are then shown, where the cost of returning a false positive ( f + ) is increased compared to the cost of returning a false negative ( f − ). For example, a weighting ratio of 1 : 5 [ f − : f + ] means that 5 f + cost as much as a single f − , putting more emphasis on reducing the number of f − present. It can be seen that the values of χ (h) and ψ(h) change as the threshold h changes. An overly optimistic classifier is the result of threshold values that are too low, when all classifications are identified as positive matches (χ ≈ 1 and ψ ≈ 0). Alternatively, an overly pessimistic classifier is the result of threshold values that are too high, resulting in all classifications identified as negative matches (χ ≈ 0 and ψ ≈ 1). The weighting ratio 1 : 10 [ f − : f + ] returns the most even values of χ and ψ, which matches the hˆ threshold. Figure 7.17 shows an example of the trade-off between the values of χ (h) and ψ(h) as the threshold h is increased, where two examples from Table 7.8 are highlighted. It is useful to discuss a few examples here to better understand the different threshold results. In a security application, for example, where RFID badges are used to control entry into a secure area, it is most important to minimize the number of f + results because allowing a cloned RFID badge access to a secure area could be devastating. In this situation, we would want the value of ψ to be as close to 1.0 as possible. In Table 7.8, this result corresponds to a threshold value of h = 65.3. Unfortunately, the value of χ at this threshold is 0.067, which means that almost all of the true ID badges would also be identified as negative matches. Therefore, that specific classifier is not appropriate for a security application. An alternate example is the use of RFID-embedded credit cards in retail point of sale. To a store, keeping the business of a repeat customer may be much more valuable than losing some merchandise to a cloned RFID credit card. In this sense, it is useful to determine an appropriate weight of [ f − : f + ] that evens the gains and losses of both cases. If it were determined that a repeat customer would bring in 20 times as much revenue as it would cost to refund a fraudulent charge due to a cloned RFID account, then a weight of 1 : 20 [ f − : f + ] could be used to determine the optimal
DT
Day 4
.. .
DR
Day 1
.. .
.. .
e E PC
.. .
10
QDC (MATLAB) .. .
Classifier configuration EPC #DWFP Classifier Comp. Features
.. .
5
ρ
.. .
0.7287
|AUC|
0.133 0.067 1.000 0.200 0.667 0.867 0.933
min( f + ) min( f − ) 1:5 [ f + : f − ] 1:10 [ f + : f − ] 1:15 [ f + : f − ] 1:20 [ f + : f − ]
0.667
Results χ
min( f + + f − )
ˆ min dROC (h)
Sorted by
1.000 0.000 0.981 0.724 0.529 0.443
0.995
0.724
ψ
65.3 0.1 56.7 37.6 30.8 28.2
63.4
37.6
h (%)
93.8 6.7 92.9 72.0 55.1 47.6
93.8
72.0
Accuracy (%)
Table 7.8 The classifier configuration trained on Day 1 and tested on Day 4 data from Table 7.6 using several different metrics to determine the final threshold h. Metrics include min( f + + f − ), min( f + ), min( f − ), a weight of 1:5, 1:10, 1:15, and 1:20 for [ f + : f − ], meaning that up to 20 f + are allowed for each f − . The choice of metric used to determine the final threshold value depends on the classifier’s final application
242 C. A. Miller and M. K. Hinders
7 Classification of RFID Tags with Wavelet Fingerprinting
243
classifier threshold. From Table 7.8, it can be seen that a corresponding threshold is h = 28.2, resulting in values of χ = 0.933 and ψ = 0.443. This classifier would incorrectly identify 7% of the repeat customers as being fraudulent while correctly identifying 44% of the cloned signals as being fraudulent. This specific classifier could be useful in this retail example.
7.10 Conclusion The USRP software-defined radio system has been shown to capture signals at a usable level of detail for RFID tag classification applications. Since the signal manipulations are performed in software, this allows us to extract not only the raw RF signal, but it also allows us to generate our own, ideal signal to compare against. A new signal representation has been created this way, that is, the difference between the recorded and ideal signal representations, e E PC (t), and has proven to be very useful in the classification routines. The binary classification routine has been explored on more real-world grounds, including exposure to a variety of environmental conditions without the use of RF shielding to boost the SNR level. To explore classifier robustness without fixed proximity and orientation relative to the RFID reader, several validation classifications were performed, including an RFID reader frequency comparison, a tag orientation comparison, a multi-day data collection comparison, as well as physical damage and water exposure comparisons. The frequency comparison was performed to determine the effect that the variability of potential RFID readers inspection frequencies would have on the classification routines. The results were promising, although not perfect, and suggest that while it is best to train a classifier on all possible scenarios, a main frequency (i.e., center frequency) could potentially be used for a master classifier training set. A similar orientation comparison was done, altering the RFID tag’s orientation relative to the antenna. Again, the results showed it was best to train the classifiers on the complete set of data; however, there was again promise for a potential single main orientation that could be used to train a classifier. In the multi-day collection comparison, data was collected by hand in an identical fashion but on separate days. The results showed that the inconsistency associated with holding an RFID tag near the antenna cause the classifiers to have trouble correctly identifying EPCs as coming from their correct tag. Two further comparisons were performed to assess the degree that physical degradation had on the RFID tags. When subjected to water, promising classifier configurations were found that were on the same level of accuracy as results seen for undamaged tags, suggesting that the water may not have a significant effect on the RFID classification routines. A separate subset of the RFID tags was subjected to a similar degradation analysis, this time with physical bending as a result of being crumpled by hand. The results show that, as expected, bending of the RFID tag’s antenna caused degradation in the raw signal that caused the classifier to misclassify many tags.
244
C. A. Miller and M. K. Hinders
Applications of RFID technology that implement a fixed tag position are a potential market for the classification routine we present. One example of this is with ePassports, which are embedded with an RFID chip containing digitally signed biometric information [7]. These passports are placed into a reader that controls the position and distance of the RFID chip relative to the antennas. Additionally, passports are generally protected from the elements and can be replaced if they undergo physical wear and tear. We have demonstrated a specific emitter identification technique that performs well given these restrictions. Acknowledgements This work was performed using computational facilities at the College of William and Mary which were provided with the assistance of the National Science Foundation, the Virginia Port Authority, Sun Microsystems, and Virginia’s Commonwealth Technology Research Fund. Partial support for the project is provided by the Naval Research Laboratory and the Virginia Space Grant Consortium. The authors would like to thank Drs. Kevin Rudd and Crystal Bertoncini for their many helpful discussions.
References 1. Ngai EWT, Moon KKL, Riggins FJ, Yi CY (2008) RFID research: an academic literature review (1995–2005) and future research directions. Int J Prod Econ 112(2):510–520 2. Abdulhadi AE, Abhari R (2012) Design and experimental evaluation of miniaturized monopole UHF RFID tag antennas. IEEE Antennas Wirel Propag Lett 11:248–251 3. Khan MA, Bhansali US, Alshareef HN (2012) High-performance non-volatile organic ferroelectric memory on banknotes. Adv Mat 24(16):2165–2170 4. Juels A (2006) RFID security and privacy: a research survey. IEEE J Sel Areas Commun 24(2):381–394 5. Halamka J, Juels A, Stubblefield A, Westhues J (2006) The security implications of verichip cloning. J Am Med Inform Assoc 13(6):601–607 6. Heydt-Benjamin T, Bailey D, Fu K, Juels A, O’Hare T (2007) Vulnerabilities in first-generation RFID-enabled credit cards. In: Dietrich S, Dhamija R (eds) Financial Cryptography and Data Security, vol 4886. Lecture Notes in Computer Science. Springer, Berlin/Heidelberg, pp 2–14 7. Richter H, Mostowski W, Poll E (2008) Fingerprinting passports. In: NLUUG Spring conference on security, pp 21–30 8. White D (2005) NCR: RFID in retail. In: RFID: applications, security, and privacy, pp 381–395 9. Westhues J (2005) Hacking the prox card. In: RFID: applications, security, and privacy, pp 291–300 10. Smart Card Alliance (2007) Proximity mobile payments: leveraging NFC and the contactless financial payments infrastructure. Whitepaper 11. Léopold E (2009) The future of mobile check-in. J Airpt Manag 3(3):215–222 12. Wang MH, Liu JF, Shen J, Tang YZ, Zhou N (2012) Security issues of RFID technology in supply chain management. Adv Mater Res 490:2470–2474 13. Juels A (2004) Yoking-proofs for RFID tags. In: Proceedings of the second annual IEEE pervasive computing and communication workshops (PERCOMW 2004), PERCOMW ’04, pp 138–143 14. Juels A (2005) Strengthening epc tags against cloning. In: Proceedings of the 4th ACM workshop on wireless security, WiSe ’05, pp 67–76. ACM, New York 15. Riezenman MJ (2000) Cellular security: better, but foes still lurk. IEEE Spectr 37(6):39–42 16. Suski WC, Temple MA, Mendenhall MJ, Mills RF (2008) Using spectral fingerprints to improve wireless network security. In: Global telecommunications conference, 2008. IEEE GLOBECOM 2008. IEEE. IEEE, New Orleans, pp 1–5
7 Classification of RFID Tags with Wavelet Fingerprinting
245
17. Gerdes RM, Mina M, Russell SF, Daniels TE (2012) Physical-layer identification of wired ethernet devices. IEEE Trans Inf Forensics Secur 7(4):1339–1353 18. Kennedy IO, Scanlon P, Mullany FJ, Buddhikot MM, Nolan KE, Rondeau TW (2008) Radio transmitter fingerprinting: a steady state frequency domain approach. In: Proceedings of the IEEE 68th vehicular technology conference (VTC 2008), pp 1–5 ˇ 19. Danev B, Heydt-Benjamin TS, Capkun S (2009) Physical-layer identification of RFID devices. In: Proceedings of the 18th conference on USENIX security symposium, SSYM’09. USENIX Association, Berkeley, CA, USA, pp 199–214 20. Bertoncini C, Rudd K, Nousain B, Hinders M (2012) Wavelet fingerprinting of radio-frequency identification (RFID) tags. IEEE Trans Ind Electron 59(12):4843–4850 ˇ 21. Zanetti D, Danevs B, Capkun S (2010) Physical-layer identification of UHF RFID tags. In: Proceedings of the sixteenth annual international conference on Mobile computing and networking, MobiCom ’10. ACM, New York, NY, USA, pp 353–364 22. Romero HP, Remley KA, Williams DF, Wang C-M (May 2009) Electromagnetic measurements for counterfeit detection of radio frequency identification cards. IEEE Trans Microw Theory Tech 57(5):1383–1387 23. EPCglobal Inc. (2008) EPC radio-frequency identity protocols: class-1 generation-2 UHF RFID protocol for Communications at 860 MHz–960 MHz Version 1.2.0 24. GNU Radio Website (2011) Software. http://www.gnuradio.org 25. Ellis KJ, Serinken N (2001) Characteristics of radio transmitter fingerprints. Radio Sci 36(4):585–597 26. Hou JD, Hinders MK (2002) Dynamic wavelet fingerprint identification of ultrasound signals. Mater Eval 60(9):1089–1093 27. Haralick RM, Shapiro LG (1992) Computer and robot vision, vol 1. Addison-Wesley, Boston, MA 28. Learned RE, Willsky AS (1995) A wavelet packet approach to transient signal classification. Appl Comput Harmon Anal 2(3):265–278 29. Feng Y, Schlindwein FS (2009) Normalized wavelet packets quantifiers for condition monitoring. Mech Syst Signal Process 23(3):712–723 30. de Sena A, Rocchesso D (2007) A fast Mellin and scale transform. EURASIP J Adv Signal Process 2007(1):089170 31. Harley JB, Moura JMF (2011) Guided wave temperature compensation with the scale-invariant correlation coefficient. In: 2011 IEEE International Ultrasonics Symposium (IUS), pp 1068– 1071, Orlando, FL 32. Harley JB, Ying Y, Moura JMF, Oppenheim IJ, Sobelman L (2012) Application of Mellin transform features for robust ultrasonic guided wave structural health monitoring. AIP Conf Proc 1430:1551–1558 33. Sundaram H, Joshi SD, Bhatt RKP (1997) Scale periodicity and its sampling theorem. IEEE Trans Signal Proc 45(7):1862–1865 34. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449 35. Weiss GM, Provost F (2001) The effect of class distribution on classifier learning: Technical Report ML-TR-44. Department of Computer Science, Rutgers University 36. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning. Morgan Kaufmann Publishers, Inc, Burlington, pp 179–186 37. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York 38. Fukunaga K (1990) Introduction to statistical pattern recognition. Computer science and scientific computing, 2nd edn. Academic Press, Boston 39. Kuncheva LI (2004) Combining pattern classifiers. Wiley, New York 40. Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin 41. Webb AR (2012) Statistical pattern recognition. Wiley, Hoboken 42. Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
246
C. A. Miller and M. K. Hinders
43. Kanal L (1974) Patterns in pattern recognition: 1968–1974. IEEE Trans Inf Theory 20(6):697– 722 44. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Patt Anal Mach Intell 22(1):4–37 45. Watanabe S (1985) Pattern recognition: human and mechanical. Wiley-Interscience publication, Hoboken 46. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1):245–271 47. Jain AK, Chandrasekaran B (1982) Dimensionality and sample size considerations in pattern recognition practice. In: Krishnaiah PR, Kanal LN (eds) Classification pattern recognition and reduction of dimensionality. Handbook of Statistics, vol 2. Elsevier, Amsterdam, pp 835–855 48. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517 49. Dash M, Liu H (1997) Feature selection for classification. Intellect Data Anal 1(1–4):131–156 50. Raymer ML, Punch WF, Goodman ED, Kuhn LA, Jain AK (2000) Dimensionality reduction using genetic algorithms. IEEE Trans Evol Comput 4(2):164–171 51. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182 52. Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Stat Sin 20(1):101–148 53. Romero E, Sopena JM, Navarrete G, Alquézar R (2003) Feature selection forcing overtraining may help to improve performance. In: Proceedings of the international joint conference on neural networks, 2003, vol 3. IEEE, Portland, OR, pp 2181–2186 54. Lambrou T, Kudumakis P, Speller R, Sandler M, Linney A (1998) Classification of audio signals using statistical features on time and wavelet transform domains. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing. vol 6. IEEE, Washington, pp 3621–3624 55. Smith SW (2003) Digital signal processing: a practical guide for engineers and scientists. Newnes, Oxford 56. Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Applications of mathematics. Springer, Berlin 57. Theodoridis S, Koutroumbas K (1999) Pattern recognition. Academic Press, Cambridge 58. Duin RPW, Juszczak P, Paclik P, Pekalska E, de Ridder D, Tax DMJ, Verzakov S (2007) PRTools4.1, A Matlab toolbox for pattern recognition 59. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge 60. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874 61. Hanley JA, McNeil BJ (1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148(3):839–843 62. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159 63. Lobo JM, Jiménez-Valverde A, Real R (2007) Auc: a misleading measure of the performance of predictive distribution models. Glob Ecol Biogeogr 17(2):145–151 64. Hanczar B, Hua J, Sima C, Weinstein J, Bittner M, Dougherty ER (2010) Small-sample precision of ROC-related estimates. Bioinformatics 26(6):822–830 65. Hand DJ (2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn 77(1):103–123 66. Rao KVS, Nikitin PV, Lam SF (2005) Antenna design for UHF RFID tags: a review and a practical application. IEEE Trans Antennas Propag 53(12):3870–3876
Chapter 8
Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot Eric A. Dieckman and Mark K. Hinders
Abstract In order to perform useful tasks for us, robots must have the ability to notice, recognize, and respond to objects and events in their environment. This requires the acquisition and synthesis of information from a variety of sensors. Here we investigate the performance of a number of sensor modalities in an unstructured outdoor environment including the Microsoft Kinect, thermal infrared camera, and coffee can radar. Special attention is given to acoustic echolocation measurements of approaching vehicles, where an acoustic parametric array propagates an audible signal to the oncoming target and the Kinect microphone array records the reflected backscattered signal. Although useful information about the target is hidden inside the noisy time-domain measurements, the dynamic wavelet fingerprint (DWFP) is used to create a time–frequency representation of the data. A small-dimensional feature vector is created for each measurement using an intelligent feature selection process for use in statistical pattern classification routines. Using experimentally measured data from real vehicles at 50 m, this process is able to correctly classify vehicles into one of five classes with 94% accuracy. Keywords Acoustic parametric array · Mobile robotics · Wavelet fingerprint · Vehicle classification
8.1 Overview Useful robots are able to notice, recognize, and respond to objects and events and then make decisions based on this information in real time. I find myself strongly disapproving of drivers checking their smartphones while I’m trying to jaywalk with a cup of coffee from WaWa. I think they should watch out for me, because I’m trying not to spill. Since I’m too lazy to go down to the crosswalk and wait for the light, I usually check for oncoming traffic and note both what type of vehicles are coming my way and estimate my chances of being able to cross safely without spilling on E. A. Dieckman · M. K. Hinders (B) Department of Applied Science, William & Mary, Williamsburg, VA 23187-8795, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. K. Hinders, Intelligent Feature Selection for Machine Learning Using the Dynamic Wavelet Fingerprint, https://doi.org/10.1007/978-3-030-49395-0_8
247
248
E. A. Dieckman and M. K. Hinders
myself. While this may be a simple task for a human, an autonomous robotic assistant must exhibit many human behaviors to successfully complete the task. Now consider having a robotic assistant fetch me a cup of coffee from across the street [1] which I would find useful. In particular, we will consider the sensing aspects of creating a useful autonomous robot. A robot must be able to sense and react to objects and events occurring over a range of distances. For our case of a mobile walking-speed robot, this includes long-range sensors that can detect dangers such as oncoming motor vehicles, in time to evade, as well as close-range sensors that provide more information about stationary objects in the environment. In addition, sensors must be able to provide useful information in a variety of environmental conditions. While an RGB camera may provide detailed information in a well-lit environment, it is less useful on a foggy night. The key to creating a useful autonomous robot is to equip the robot with a number of complementary sensors so that it can learn about its environment and make decisions. In particular, we are interested in the use of acoustic echolocation as a long-range sensor modality for mobile robotics. While sonar has long been used as a sensor in underwater environments, the short propagation of ultrasonic waves in air has restricted its use elsewhere. Lower frequency acoustic signals in the audible range are able to propagate long distances in air, but traditional methods of creating highly directional audible acoustic signals require very large speaker arrays not feasible for a mobile robot. In addition, the complex interactions of these signals with objects in the environment and ubiquitous environmental noise make the reflected signals very difficult to analyze. We use an acoustic parametric array to generate our acoustic echolocation signal. This is a physically small speaker that uses nonlinear acoustics to create a tight beam of low-frequency sound that can propagate long distances [34–38]. Such a highly directional signal provides good spatial resolution that allows a distinction between the target and environmental clutter. Systematic experimental investigations and simulations allow us to study the propagation of these nonlinear sound beams and their interaction with scatterers [39, 40]. These sensor signals are very noisy, making it difficult for the robot to extract useful information. One common technique that can provide additional insight is to transform the problem into an alternate domain. For the simple case of one-dimensional time-domain signal this most commonly takes the form of Fourier transform. While this converts the signal to the frequency domain and can reveal previously hidden information, all time-domain information is lost in the transformation. A better solution for time-domain data is to transform the original signal into a joint time– frequency domain. This can be accomplished by a number of methods, but there is no one best time–frequency representation. Uncertainty limits restrict simultaneous time and frequency resolution, some methods are very complex and hard to implement, and the resulting two-dimensional images can be even more difficult to analyze than the original signal.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
249
8.2 rMary Creating an autonomous robot able to act independently of human control has long been an area of active research in robotics. New low-cost sensors and recent advances in signal processing necessary to analyze large amounts of streaming data have only increased the number of researchers focusing on autonomous robotics, buoyed by a greater public awareness of the field. In particular, DARPA-funded competitions have enabled focused efforts to create autonomous vehicles and humanoid robots. The 2004 DARPA Grand Challenge and follow-on 2007 DARPA Urban Challenge [15] focused on creating autonomous vehicles that could safely operate over complex courses in rural and urban environments. These competitions rapidly expanded the boundaries of the field, leading to recent near-commercial possibilities such as Google’s self-driving cars [16, 17]. Similarly, the recent and rapid rise of unmanned aerial vehicles (AUVs) has led to a large amount of research in designing truly autonomous drone aircraft. Designing autonomous vehicles, whether surface and aerial, comes with its own difficulties, namely, collecting and interpreting data at a fast enough rate to make decisions. This often requires expensive sensors that only large research programs can afford. Commercialization of such technologies will require lower cost alternative sensor modalities. A more tractable problem is the design of walking-speed robotic platforms. Compared to road or air vehicles, the lower speed of such platforms allows the use of low-cost sensor modalities that take longer to acquire and analyze data, since more time is available to make a decision. Using commercially available wheeled platforms (such as all-terrain vehicles) shifts focus from the engineering problems in creating a humanoid robot to the types of sensors used and how such data can be combined. For these reasons, we will focus on the analysis of different sensor modalities for a walking-speed robot. The goal in autonomous robotics is to create a robot with the ability to perform tasks normally accomplished by a human. An added bonus is the ability to do tasks that are too dangerous for humans such as entering dangerous environments in disaster situations.
8.2.1 Sensor Modalities for Mobile Robots We can classify sensor modalities as “active” or “passive” depending on whether they transmit a signal (i.e., radar) or use information already present in the environment (i.e., an RGB image), respectively. The use of passive sensors is often preferred to reduce the possibility of detection in covert operations and to reduce annoyance. Another important consideration is the range at which a given sensor performs. For example, imaging systems can provide detailed information about objects near the sensor but may not detect fast-moving hazards (such as an oncoming vehicle) at a great enough distance to allow a robot time to evade. Long-range sensors such as
250
E. A. Dieckman and M. K. Hinders
radar or LIDAR are able to detect objects at a greater distance, giving the robot more time to maneuver out of the way. This long-range detection often requires expensive sensors which don’t provide detailed information about the target. A combination of near- and long-range sensors will give a robot the most information about its environment. Once the sensor has measured some information about its environment, the robot needs to know how to interact. A real-world example of this difficulty comes from agriculture, where smart machines have the ability to replace human workers in repetitive tasks. One agricultural application in particular is the thinning of lettuce, where human laborers paid by the acre thin healthy plants unnecessarily. A robotic “Lettuce Bot” is towed behind a tractor, imaging individual lettuce plants as it passes and using computer vision algorithms to comparing these images to a database of over a million images to decide which plants to remove by dousing them with a concentrated dose of fertilizer [18]. Though this machine claims 98% accuracy while driving at 2 kph and may be cost-competitive with manual labor, it also highlights issues with image-based analysis on mobile robots. Creating a large enough database for different types of lettuce is a monumental task, given the different colors, shapes, soil types, and other variables. Even the sun creates problems, causing shadows that are difficult for the computer vision software to correctly match. Shielding the sensor and restraining the image database to a particular geographical region (thereby reducing the number of lettuce variants and soil types) allow these techniques to work for this particular application but the approach is not scalable to more unstructured environments. While the human brain has evolved to process images quickly and easily, automated interpretation of images is a difficult problem. Using non-imaging sensors can ease the signal processing requirements, but requires sophisticated machine learning techniques to deal with large amounts of abstract data. In addition to the range limitations of different sensors and the difficulty in analyzing the resulting data, individual sensor modalities tend to work better in particular environmental conditions. For example, a webcam can create detailed images in a well-lit environment but fail to provide much useful information on a dark night while passive infrared images can detect subtle changes in emissivity from surfaces in a variety of weather conditions. Because of the limitations of individual sensors, intelligent combinations of complementary sensors must be used to create the most robust awareness of an unstructured environment. The exact manner in which these modalities are combined is referred to as data fusion [19]. Our focus is on the performance of different sensor modalities in real-world, unstructured environments under a variety of environmental conditions. Our robotic sensor platform, rMary, contains a number of both passive and active sensors. Passive vision-based sensors include a standard RGB webcam and infrared sensors operating in both the near-infrared and long-wave regions of the infrared spectrum. Active sensors include a three-dimensional depth mapping system that uses infrared projection, a simple radar system, and a speaker/microphone combination to perform acoustic echolocation in the audible range. We will apply machine learning algorithms to this data to automatically detect and classify oncoming vehicles at long distances.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
251
8.3 Investigation of Sensor Modalities Using rMary To collect data in unstructured outdoor environments, we have created a mobile sensor platform with multiple sensors (Fig. 8.1). This platform, named rMary, was first placed in service in 2003 to collect infrared data from passive objects [20]. The robot is remotely operated using a modified RC aircraft controller and is steered using four independent drive motors synced to allow agile skid-steering. Power is supplied to these motors from a small battery bank built into the base of the platform, where the control electronics are also located. The low center of gravity, inflatable rubber tires, and a custom-built suspension system allow off-road transit to acquire measurements in otherwise inaccessible locations. The current sensors on rMary include the following: • Raytheon ControlIR 2000B long-wave infrared camera • Microsoft Kinect (2010) – Active IR projector – IR and RGB sensors – 4-channel microphone array • Sennheiser Audiobeam acoustic parametric array • Coffee can FMCW ISM-band radar A parabolic dish microphone can also be attached, but the Kinect microphone array provides superior audio recordings. Sensor control and data acquisition are accomplished using a low-powered Asus EeePC 1000h Linux netbook. This underpowered laptop was deliberately used to show that data can be easily acquired with commodity hardware. The computer’s single internal USB hub does restrict the number of simultaneous data streams, which only became an issue when trying to collect video data from multiple hardware devices using non-optimized drivers. Each of the sensors, whose location is shown in Fig. 8.1, will be discussed in the sections that follow.
8.3.1 Thermal Infrared (IR) A Raytheon ControlIR 2000B infrared camera couples a long-wave focal plane array microbolometer detector to a 50 mm lens to provide 320 × 240 resolution at 30 Hz over a 18◦ × 13.5◦ field of view. Although thermal imaging cameras are now low cost and portable enough to be used by home inspectors for energy audits, this was one of the few uncooled, portable infrared imaging systems available when first installed in 2006. These first experiments showed that the sensor was able to characterize passive (non-heat-generating) objects through small changes in their thermal signatures [21, 22]. The sensor measures radiance in the long-wave region (8-15 μm) of the infrared spectrum where radiation from passive objects is maximum.
252
E. A. Dieckman and M. K. Hinders
Fig. 8.1 The rMary sensor platform contains a forward-looking long-wave infrared camera, mounted upright in an enclosure for stability and weather protection, an acoustic parametric array, the Microsoft Kinect sensor bar, and a coffee can radar. All sensors are powered by the on-board battery bank and controlled with a netbook computer running Linux
For stability and protection from the elements, the camera is mounted vertically in an enclosed locker. A polished aluminum plate with a low emissivity value makes a good reflector of thermal radiation and allows the camera to image objects in front of rMary. Figure 8.2 shows several examples of images of passive objects acquired with the thermal infrared camera, both indoors and outside.
8.3.2 Kinect Automated interpretation of images from the thermal infrared camera requires segmentation of the images to distinguish areas of interest, which can be a difficult image processing task. In addition, the small field-of-view and low resolution of the infrared camera used here led us to investigate possible alternatives. While there are still relatively few long-wave thermal sensors with enough sensitivity to measure the small differences in emissivity between passive objects, other electronics now contain infrared sensors. One of the most exciting alternatives was the Microsoft Kinect, released in November 2010 as an accessory to the Xbox 360 gaming console. The Kinect was immensely popular, selling millions of units in the first several months, and integrates active infrared illumination, an IR sensor, and an RGB camera to output 640 × 480 RGBD (RGB + depth) video at 30 Hz. It also contains a tilt motor, accelerometer, and 4-channel microphone array, all at total cost of less than USD $150.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
253
Fig. 8.2 Examples of passive objects imaged with the thermal IR camera include (clockwise from top left) a car in front of a brick wall, a tree trunk with foliage, a table and chairs in front of a bookcase, and a window in a brick wall
Access to this low-cost suite of sensors is provided by two different open source driver libraries: libfreenect [23], with a focus on audio support and motor controls and OpenNI [24], with greater focus on skeletal tracking and object segmentation. Other specialized libraries such as nestk [25] used these drivers to provide highlevel functions and ease of use. In June 2011, Microsoft released their own SDK that provides access to the raw sensor streams and high-level functions, but these libraries only worked in Windows 7 and are closed-source with restrictive licenses [26]. In addition, Microsoft changed the license agreement in March 2012 to require use of the “Kinect for Windows” sensor instead of the identical but cheaper Kinect sensor for Xbox. We investigate the usefulness and limitations of the Kinect sensor for robotics, particularly the raw images recorded from the infrared sensor and the depth-mapped RGB-D images. Since our application is more focused on acquiring raw data for later processing than utilizing the high-level skeletal tracking algorithms, we are using the libfreenect libraries to synchronously capture RGB-D video and multi-channel audio streams from the Kinect. The Kinect uses a structured light approach similar in principle to [27] to create a depth mapping. An infrared projector emits a known pattern of dots, allowing the calculation of depth based on triangulation of the specific angle between the emitter
254
E. A. Dieckman and M. K. Hinders
and receiver, an infrared sensor with 1280 × 1024 resolution. The projected pattern is visible in some situations in the raw image from the infrared sensor, to which the open-source drivers allow access. To reduce clutter in the depth mapping, the infrared sensor also has a band-stop filter at the projector’s output frequency of 830 nm. The Kinect is able to create these resulting 640 × 480 resolution, 11-bit depth-mapped images at video frame rates (30 Hz). The stated range of the depth sensing is 1.2–3.5 m, but in the right environments can extend to almost 6 m. An example of this image for an indoor environment is shown in Fig. 8.3, along with the raw image from the IR sensor and a separate photograph for comparison. In addition to this colormapped depth image, the depth information can also be overlaid on the RGB image acquired from a 1280 × 1024 RGB sensor to create a three-dimensional point-cloud representation (Fig. 8.4). Since the Kinect was designed to work as an accessory to a gaming system, it works well in indoor environments, and others have evaluated its applications to indoor robotics, object segmentation and tracking, and three-dimensional scanning. Figure 8.5 shows a sampling of the raw IR and depth-mapped images for several outdoor objects. The most visible feature when compared to images acquired in indoor environments is that the raw infrared images are very well illuminated, or even over-exposed. Because of this, the projector pattern is difficult to detect in the infrared image, and the resulting depth-mapped images don’t tend to have much structure. There is more likely to be useful depth information if the object being imaged is not in direct sunlight and/or is located very close to the sensor. Unlike the thermal IR camera which operates in the long-wave region of the IR regime, the Kinect’s infrared sensor operates in the near-infrared. This is necessary so that a distance can be calculated from the projected image, but the proximity of the near-infrared to the visible spectrum allows the sensor to become saturated. Figure 8.6 shows a series of images of the same scene as the sun emerges from behind a cloud. As there is more sunlight, the infrared sensor becomes saturated and no depth mapping can be constructed. Although the Kinect’s depth mapping is of little use in outdoor environments during the day, it may still be useful outside at night. However, the point-cloud library representation will not work at night because it requires well-illuminated webcam image on which to overlay the depth information. An example of the usefulness of the Kinect depth mapping at night is shown in Fig. 8.7, where the depth mapping highlights obstacles not visible in normal webcam images. In summary, the Kinect’s depth sensor will work outside under certain circumstances. Unfortunately, the Kinect’s infrared sensor will not replace the more expensive thermal imaging camera to detect small signals from passive objects since it operates in the near-infrared regime instead of long-wave regime that is more sensitive to such signals. However, very small and inexpensive LWIR cameras are now available as smartphone attachments.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot Fig. 8.3 In an indoor environment, the Kinect is able to take a raw infrared image (top) and convert it to a corresponding depth-mapped image (middle), which overlays depth information on an RGB image. The speckle pattern barely visible on the raw infrared image is the projected infrared pattern that allows the Kinect to create this depth mapping. As shown in this mapping, darker colored surfaces, such as the desk chair on the left of the image, are closer to the sensor while lighter colors are farther away. Unexpected infrared reflectors can confuse this mapping and produce erroneous results such as the light fixture in the center of the image. The bottom image is a photograph of the same scene (with the furniture slightly rearranged) for comparison
255
256
E. A. Dieckman and M. K. Hinders
Fig. 8.4 Instead of the two-dimensional colormapped images, the Kinect depth-mapping can be overlaid on the RGB image and exported to a point-cloud format. These point-cloud library (PCL) images contain real-world distances and allow for three-dimensional visualization on a computer screen. Examples are shown for an indoor scene (top) and an outdoor scene acquired in low-light conditions (bottom), viewed at an oblique angle to highlight the 3-D representation
8.3.3 Audio Our main interest in updating rMary is to see how acoustic sensors could be integrated into mobile robotics. Past work with rMary’s sibling rWilliam (Fig. 8.8) investigated the use of air-coupled ultrasound in mobile robotics [28, 29], as have others [30–32]. The high attenuation of ultrasound in air limits the use of ultrasound scanning for mobile robots. Instead, we wish to study the use of low-frequency acoustic echolocation for mobile robots. This is similar in principle to how bats navigate, though at much lower frequencies and with much lower amplitude signals. A similar use of this technology is found in the Sonar ruler app for the iPhone that attempts to measure distances using the speaker and microphone, with mixed results [33]. Using signals in the audible range reduces the attenuation, allowing for propagation over useful distances. However, there is more background noise in the audible frequency range,
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
257
Fig. 8.5 Images acquired outdoors using the Kinect IR sensor (left) and the corresponding depth mapped images (right) for a window (top) and tree (bottom) show the difficulties sunlight creates for the infrared sensor
requiring the use of coded excitation signals and sophisticated signal processing techniques to find the reflected signal in inherently noisy data. One way to ensure that the reflected signal is primarily backscattering from the target rather than clutter (unwanted reflections from the environment) is to create a tightly spatially controlled beam of low-frequency sound using an acoustic parametric array. We can also use insights gleaned from simulations to improve the analysis methods. Dieckman [2] discusses in detail a method of simulating the propagation of the nonlinear acoustic beam produced by the acoustic parametric array and its scattering from targets. The properties of the acoustic parametric array have been studied in depth [34–38] and has been used for area denial, concealed weapons detection, and nondestructive evaluation [39–41]. In brief, the parametric array works by generating ultrasonic signals at frequencies f 1 and f 2 , whose difference is in the audible range. As these signals propagate, the nonlinearity of air causes self-demodulation of the signal, creating signals at the sum ( f 1 + f 2 ) and difference ( f 2 − f 1 ) frequencies. Since absorption is proportional to the square of frequency, only the difference frequency remains as the signal propagates away from the array (Fig. 8.9). The acoustic parametric array allows for tighter spatial control of the lowfrequency sound beam than a standard loudspeaker of the same size. Directivity of
258
E. A. Dieckman and M. K. Hinders
Fig. 8.6 As the sun emerges from behind a cloud and sunlight increases (top to bottom), the Kinect’s infrared sensor (left) becomes saturated and the Kinect is unable to construct a corresponding depth mapped image (right)
a speaker depends on the ratio of the size of the speaker to the wavelength of sound produced, with larger speakers able to create more directive low-frequency sound. Line arrays (speakers arranged in a row) are the traditional way to create directional low-frequency sound, but can take up a great deal of space [42]. Using nonlinear acoustics, the acoustic parametric array is able to create directional low-frequency sound in a normal-sized speaker, as shown in Fig. 8.10.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
259
Fig. 8.7 The Kinect depth mapping (A) works well in nighttime outdoor environments, detecting a light pole not visible in the illuminated RGB image (B). The image from the thermal camera (C) also shows the tree and buildings in the background, but has a smaller field of view and lower resolution than the raw image from the Kinect IR sensor (D) (images resized from original resolutions)
Fig. 8.8 A 50 kHz ultrasound scanner mounted on rWilliam is able to detect objects at close range
260
E. A. Dieckman and M. K. Hinders
Fig. 8.9 The acoustic parametric array creates signals at two frequencies f 1 and f 2 in the ultrasonic range (pink shaded region). As the signals propagate away from the parametric array, the nonlinearity of air allows the signals to self-demodulate, creating signals at the sum and difference frequencies. Because attenuation is proportional to the square of frequency, the higher frequency signals attenuate more quickly, and after several meters only the audible difference frequency remains
For our tests, we have mounted the Sennheiser Audiobeam parametric array to rMary, with power supplied directly from rMary’s battery. This commercially available parametric array uses a 40 kHz carrier signal to produce audible sound pressure levels of 75 ± 5 dB at a distance of 4 m from the face of the transducer. The echolocation signals we use are audible in front of the transducer at distances exceeding 50 m in a quiet environment, but would not necessarily be obtrusive to pedestrians passing through the area and are easily masked by low levels of external noise. To record the backscattered echolocation signal, as well as ambient noise from our environment, we use the 4-channel microphone array included in the Kinect. This array is comprised of four spatially separated high-quality capsule microphones with a sampling rate of 16 kHz. The array separation is not large enough to allow implementation of beamforming methods at distances of interest here. The low sampling rate means that acoustic signals are limited to a maximum frequency content of 8 kHz. Audio data recorded by the Kinect microphone array was compared to data recorded using a parabolic dish microphone (Dan Gibson EPM, 48 kHz sampling rate), whose reflector dish directs sound onto the microphone. Figure 8.11 shows that
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
261
Fig. 8.10 The acoustic beam created by the acoustic parametric array has a consistently tighter beam pattern than the much physically larger line array at low frequencies and fewer sidelobes at higher frequencies
the microphones used in the Kinect actually perform better than the parabolic dish microphone [43, 44]. All data used in our subsequent analysis is recorded with the Kinect array.
8.3.4 Radar The final sensor on rMary is a coffee can radar. A collaboration between MIT and Lincoln Labs in 2011 produced a design for a low-cost radar system that uses two metal coffee cans as antennas [45]. Simple amplifier circuits built on a breadboard power low-cost modular microwave (RF) components to send and acquire signals
262
E. A. Dieckman and M. K. Hinders
Fig. 8.11 Even though the 4-channel Kinect microphone array has tiny capsule microphones that only sample at 16 kHz, they provide a cleaner signal than the parabolic dish microphone with a 48 kHz sampling rate
Fig. 8.12 A low-cost coffee can radar was built and attached to rMary to test the capabilities of radar sensors on mobile robots
through the transmit (Tx) and receive (Rx) antennas. The entire system is powered by 8 AA batteries, which allows easy portability and the total cost of components is less than USD $350. Our constructed coffee can radar is shown in Fig. 8.12. The signal processing requirements of the coffee can radar system are reduced by using a frequency modulated continuous wave (FMCW). In this setup, the radar transmits an 80 MHz chirped waveform centered at 2.4 Ghz (in the ISM band). The same waveform is then used to downconvert, or “de-chirp” the signal so that the
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
263
residual bandwidth containing all the information is small enough to digitize with a sound card. This information is saved in .wav files and analyzed in Matlab. The system as originally designed has 10 mW Tx power with an approximate maximum range of 1 km and can operate in one of three modes: Doppler, range, or Synthetic Aperture Radar (SAR). In Doppler mode the radar emits a continuouswave signal at a given frequency. By measuring any frequency shifts in this signal, moving objects are differentiated from stationary ones. Images from this mode show an object’s speed as a function of time. In ranging mode, the radar signal is frequency modulated, with the magnitude of this modulation specifying the transmit bandwidth. This allows the imaging of stationary or slowly moving objects, and the resulting images show distance from the radar (range) as a function of time. SAR imaging is basically a set of ranging measurements acquired over a wide area to create a three-dimensional representation of the radar scattering from a target [46–48]. While SAR imaging has the greatest potential application in mobile robotics, since the robotic platform is already moving over time, we here look at ranging measurements in our feasibility tests of the radar. Figure 8.13 shows a ranging measurement of three vehicles approaching rMary with initial detection of the vehicles at a distance of 70 m. Since the ranging image is a color-mapped plot of time versus range, the speed of approaching vehicles can also be calculated directly from the image data.
Fig. 8.13 The ranging image acquired using a coffee can radar shows three vehicles approaching rMary. The vehicles’ speed can be calculated from the slope of the line
264
E. A. Dieckman and M. K. Hinders
These measurements demonstrate the feasibility of this low-cost radar as a longrange sensor for mobile robotics. Since the radar signal is de-chirped to facilitate processing with a computer sound card, these measurements may not contain information about scattering from the object, unlike the acoustic echolocation signal. However, radar ranging measurements could provide an early detection system for a mobile robot, detecting objects at long range before other sensors are used to classify the object. This detection distance is dependent upon a number of parameters, most important of which is the availability of line-of-sight to the target. Over the course of several months, we collected data on and near the William & Mary campus as vehicles approached rMary as shown in Fig. 8.14 at ranges of up to 50 m. The parametric array projected a narrow-beam, linear chirp signal down the road, which backscattered from on-coming vehicles. We noted in each case whether the on-coming vehicle was a car, SUV, van, truck, bus, motorcycle, or other. The rMary platform allows us to investigate the capabilities and limitations of a number of low-cost sensors in unstructured outdoor environments. A combination of short- and long-range sensors provides a mobile robot with the most useful information about its environment. Previous work focused on passive thermal infrared and air-coupled ultrasound as possible short-range sensor modalities. Our work looked at the suitability of the Microsoft Kinect as a short-range active infrared depth sensor, as well as the performance of a coffee can radar and acoustic echolocation via acous-
Fig. 8.14 We collected data with rMary on the sidewalk, but with traffic approaching head on
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
265
tic parametric array as long-range sensors for mobile robotics. While the low-cost depth sensor on the Microsoft Kinect is of limited use in outdoor environments, the coffee can radar has the potential to provide low-cost long-range detection capability. In addition, the Kinect microphone array can be paired with an acoustic parametric array to provide high-quality acoustic echolocation measurements.
8.4 Pattern Classification So far we know that our transmitted signal is present in the backscattered reflection from a target at distances exceeding 50 m. The hope is that this reflected signal contains useful information that will allow us to determine the type (class) of vehicle. Since we are using a coded signal we also expect that a time–frequency representation of the data will prove useful in this classification process. The next step is to use statistical pattern classification techniques to find that useful information in these signals to differentiate between vehicles of different classes. These analyzes are written in parallel to run in MATLAB on a computing cluster to reduce computation time.
8.4.1 Compiling Data To more easily compare the large number of measurements from different classes we organize the measured data into structures. The greater than 4000 individual measurements we have collected are spread over 5926 files including audio, radar, and image data organized in timestamped directories. Separate plaintext files associate each timestamp with its corresponding vehicle class. If we are to run our analysis routines on computing clusters, providing access to this more than 3.6 GB of original data becomes problematic. Instead we create smaller data structures containing only the information we require. These datasets range in size from 11–135 MB for 108– 750 measurements and can easily be uploaded to parallel computing resources. Much of the reduction is size is due to the fact that we only require access to the audio data for these tests and can eliminate the large image files. One additional reduction is accomplished by choosing a single channel of the 4-channel Kinect audio data to include in the structure. The array has a small enough spacing that all useful information is present in every channel, as seen in Fig. 8.15. Resampling all data to the acceptable minimum rate allowed by the Nyquist–Shannon sampling theorem further reduces the size of the data structure. Our goal is to differentiate between vehicle classes, so it is natural to create data structures divided by class. Since it doesn’t make sense to compare data from different incident signals, we create these structures for a number of data groups. We have also allowed the option to create combination classes, for example, vans, trucks, and buses are combined into the “vtb” class. This allows vehicles with similar frontal
266
E. A. Dieckman and M. K. Hinders
Fig. 8.15 Due to the close spacing of the microphone array on the Kinect, all four channels contain the same information. The parabolic microphone is mounted aft of the Kinect array, causing the slight delay visible here, and is much more sensitive to noise
profiles to be grouped together to create a larger dataset to train our classifier. When creating these structures, data is pulled at random from the entire set of possibilities. The data structure can contain either all possible data or equal amounts of data from each class (or combined class), which can help reduce classification errors due to unequal data distribution. It is also important to note that due to the difficulty of detecting individual reflections inside a signal, not every measurement inside the data structure is guaranteed to be usable. Tables 8.1 and 8.2 of the amount of data in each group only provide an upper limit on the number of usable measurements.
8.4.2 Aligning Reflected Signals The first step in our pattern classification process is to align the signals in time. This is crucial to ensure that we are comparing the signals reflected from vehicles to each other, rather than comparing a vehicle reflection to a background measurement that contains no useful information. Our control of the transmitted signal gives us several advantages that we can exploit in this analysis. First, since the frequency content of the transmitted signal is known, we can apply a bandpass filter to the reflected signal to reduce noise at other frequencies. In some cases, this will highlight reflections that were previously hidden in the noise floor, allowing for automated peak detection.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
267
Table 8.1 Maximum amount of data in each classification group (overlaps possible between groups) Group c s v t b m o 5-750 HO 10-750 HO 100-900 HO 250-500 HO 250-500 NHO 250-500 all 250-1000 HO 250-1000 NHO 250-1000 all 250-comb HO 250-comb NHO 250-comb all
520 191 94 61 41 102 552 589 1141 613 630 1243
501 190 88 74 21 95 515 410 925 589 431 1020
52 22 13 16 1 17 55 52 107 71 53 124
32 18 7 3 3 6 42 28 70 45 31 76
15 7 16 3 1 4 27 20 47 30 21 51
5 3 1 0 0 0 3 2 5 3 2 5
1 0 0 3 0 3 3 6 9 6 6 12
Table 8.2 Maximum amount of data in each classification group when binned (overlaps possible). Group
c
s
vtb
5-750 HO 10-750 HO 100-900 HO 250-500 HO 250-500 NHO 250–500 all 250–1000 HO 250–1000 NHO 250–1000 all 250-comb HO 250-comb NHO 250-comb all
520 191 94 61 41 102 552 589 1141 613 630 1243
501 190 88 74 21 95 515 410 925 589 431 1020
99 47 36 22 5 27 124 100 224 146 105 251
More often, however, the backscattered reflection remains hidden among the noise even after a bandpass filter is applied. In this case we obtain better results using peak detection on the envelope signal. To create this signal, we take our original signal f (x) which has already been bandpass filtered, and take the absolute value of its Hilbert transform | fˆ(x)|. This is the analytic signal, which discards the negative frequency components of a signal created by the Fourier transform in exchange for dealing with a complex-valued function. The envelope signal is then constructed by applying a very lowpass filter to the analytic signal. This process is shown in Fig. 8.16.
268
E. A. Dieckman and M. K. Hinders
Fig. 8.16 The envelope signal is created by taking the original bandpass-filtered data (top), creating the analytic signal (middle), and applying a very lowpass filter (5th order Butterworth, f c = 20 Hz) (bottom). The envelope signal is created by taking the original bandpass-filtered data (top), creating the analytic signal (middle), and applying a very lowpass filter (5th order Butterworth, f c = 20 Hz) (bottom)
In some cases, even peak detection on the envelope signal will not give optimal results. Occasionally, signals will have a non-constant DC offset that complicates the envelope signal. This can often be corrected by detrending (removing the mean) the signal. A more pressing issue is that the envelope signal is not a reliable detection method if reflections aren’t visible in the filtered signal. Even when peaks can be detected in the envelope signal, they tend to be very broad. As a general rule, peak detection is less sensitive to variations in threshold as the peak grows sharper. Adding a step to the peak detection that finds the mean value of connected points above a certain threshold ameliorates this problem, but since the peak widths of the envelope signal are not uniform, finding the same point on each reflected signal becomes an issue. Some of these issues are highlighted in Fig. 8.17. Instead, we can exploit another feature of our transmitted signal—its shape. All of our pulses are linear frequency chirps which have well-defined characteristics and, more importantly, maintain a similar shape even after they reflect from a target.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
269
Fig. 8.17 Even for multiple measurements from the same stationary vehicle, the envelope (red) has difficulty consistently finding peaks unless they are obviously visible in the filtered signal (blue). The shifted cross-correlation (green) doesn’t have this limitation
By taking the cross-correlation of our particular transmitted signal and the reflected signal and accounting for the time shift inherent to the process, a sharp peak that can be easily found by an automated peak detection algorithm is created at time point where the reflected signal begins. Peak detection in any form requires setting a threshold at a level which reduces the number of false peaks detected without disqualifying actual peaks. This is a largely trial-and-error process and can easily introduce a human bias into the results. Setting the threshold as a percentage of the signal’s maximum value will also improve the performance of the peak detection.
270
E. A. Dieckman and M. K. Hinders
Fig. 8.18 Individual reflections aren’t visible in the bandpass-filtered signal from an oncoming vehicle at 50 m (top) or in its detrended envelope signal (middle). Cross-correlation of the filtered signal and the 100–900 transmitted pulse (bottom) show clear peaks at the beginning of each reflected pulse, which can be used in an automated detection algorithm
Several problems with the automated peak detection are clearly visible in Fig. 8.18, where a detection level set to 70% of the maximum value will only detect one of the three separate reflections that a human eye would identify. Although we could adjust the threshold level to be more inclusive, it would also increase the rate of false detection and add more computational load to filter these out. Adjustment of the threshold is also not ideal as it can add a human bias to the procedure. Another issue is due to the shape of the correlated waveform, caused by a vehicle’s noise increasing as it nears the microphone. The extra noise in the first part of the signal is above the detection threshold and will lead to false detection. This is an easier problem to solve—our algorithm will reject any peaks that are not separated by large enough distance. A separation distance of half the length of the cut signals reduces the rate of false detection. It is also important to note the position of the sensors on the robotic platform at this point. If we were using a regular loudspeaker, the transmitted pulse would be recorded along with the reflected signal and the cross-correlation would detect both
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
271
Fig. 8.19 For the simpler case of a stationary van at 25 m, the backscatter reflection is clearly visible once it has been bandpass filtered (blue). Automated peak detection may correctly find the peaks of the envelope signal (red), but is much more sensitive to the threshold level than the shifted cross-correlation signal (green) due to the sharpness of its peaks
signals, complicating the detection process. Mounting the microphone array behind the speaker could help, but care would have to be taken with speaker selection. Since the acoustic parametric array transmits an ultrasonic signal, the audible signal is only audible at greater distances than the position of the microphone array. The three detection methods (filtered signal, envelope signal, and shifted crosscorrelation) are summarized in Fig. 8.19, which uses the simplified situation of data from a stationary vehicle at 25 m to illustrate all three methods. For our analysis we will use the shifted cross-correlation to align the signals in time.
8.4.3 Feature Creation with DWFP We use the dynamic wavelet fingerprint (DWFP) to represent our time-domain waveforms in a time-scale domain. This analysis has proven useful in past work to reveal subtle features in noisy signals [4–9] by transforming a one-dimensional, time-domain waveform to a two-dimensional time-scale image. An example of the DWFP process is shown in Fig. 8.20 for real-world data.
272
E. A. Dieckman and M. K. Hinders
Fig. 8.20 A one-second long acoustic signal reflected from a bus (top) is filtered (middle) and transformed into a time-scale image that resembles a set of individual fingerprints (bottom). This image is a pre-segmented ternary image that can easily be analyzed using existing image processing algorithms
The main advantage of the DWFP process is that the output is a pre-segmented image that can be analyzed using existing image processing techniques. We implement these libraries to create a number of one-dimensional parameter waveforms that describe the image, and by extension of our original signal. This analysis yields approximately 25 parameter waveforms. As an overview, our feature extraction process takes a time-domain signal and applies a bandpass filter. A pre-segmented fingerprint image is created using the DWFP process, from which a number of one-dimensional parameter waveforms are extracted. In effect, our original one-dimensional time-domain signal is now represented by multiple parameter waveforms. Most importantly, the time axis is maintained throughout this process so that features of the parameter waveform are directly correlated to events in the original time-domain signal. A visual representation of the process is shown in Fig. 8.21. The user has control of a large number of parameters in the DWFP creation and feature extraction process, which greatly affects the appearance of the fingerprint images, and thus the extracted features. The parameters that most affect the fingerprint image are the wavelets used for pre-filtering and performing the continuous wavelet transform to create the DWFP image. A list of candidate wavelets is shown in Table 8.3. However, there is no way to tell a priori which combination of parameters will create the ideal representation for a particular application. We use a computing cluster to run this process in parallel for a large number of parameter combinations, combined with past experience with analysis of DWFP images to avoid an entirely brute force implementation.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
273
Fig. 8.21 A one-second long 100–900 backscatter signal is bandpass filtered and converted to a ternary image using the DWFP process. Since the image is pre-segmented it is easy to apply existing image analysis techniques and create approximately 25 one-dimensional parameter waveforms that describe the image. Our original signal is now represented by these parameter waveforms, three examples of which are shown here (ridge count, filled area, and orientation). Since the time axis is maintained throughout the entire process, features in the parameter waveforms are directly correlated to events in the original time-domain image
8.4.4 Intelligent Feature Selection Now that each original one-dimensional backscatter signal is represented by a set of continuous one-dimensional parameter waveforms, we need to determine which will best differentiate between different vehicles. The end goal is to create a small dimensional feature vector for each original backscatter signal which contains the value of a parameter waveform at a particular point in time. By choosing these time
274
E. A. Dieckman and M. K. Hinders
Table 8.3 List of usable wavelets. For those wavelet families with multiple representations (db, sym, and coif), the default value used is shown Name Matlab name Prefiltering Transform Haar Daubechies Symlets Coiflets Meyer Discrete meyer Mexican hat Morlet
haar db3 sym5 coif3 meyr dmey mexh morl
X X X X
X X X X X X X X
points correctly, we have created a new representation of the signal that is much more information dense than the original signal. This feature vector completely describes the original signal and can be used in statistical pattern classification algorithms to classify the data in seconds. For this analysis we are using a variant of linear discriminant analysis to find the points in time where the parameter waveform has the greatest separation between different classes, but also where signals of the same class have a small variance. For each parameter waveform, all of those signals from a single class are averaged to create a mean and corresponding standard deviation signal. Comparing the mean signals to each other and keeping a running average of the difference allows us to create an overall separation distance signal (δ), while a measure of the variance between signals of the same class comes from the maximum standard deviation of all signals (σ ). Instead of using iterative methods to simultaneously maximize δ and minimize σ , we create a ratio signal ρ = σδ and find its maxima (Fig. 8.22). We save the time point and value of ρ of the top 5–10 points for each extracted feature. When this process has been completed for all parameter waveforms, this list is sorted based on decreasing ρ and reduced to the top 25–50 points, keeping track of both points and feature name. Restricting the process in this manner tends to create a feature vector with components from many of the features, as shown in Fig. 8.23. The number of top points saved for both steps is a user parameter, shown in Table 8.4 and restricted to mitigate the curse of dimensionality. Feature vectors can then be created for each original backscatter signal by taking the value of the selected features at the given points. This results in a final, dense feature vector representation for each original signal. The entire pattern classification process for data from three classes is summarized in Fig. 8.24.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
275
Fig. 8.22 For each of the parameter waveforms (FilledArea shown here), a mean value is created by averaging all the measurements of that class (top). The distance between classes is quantified by the separation distance (middle left) and tempered by the intraclass variance, represented by the maximum standard deviation (middle right). The points that are most likely to separate the classes are shown as the peaks of the joint separation curve (bottom)
8.4.5 Statistical Pattern Classification The final step in our analysis is to test the ability of the feature vector to differentiate between vehicle classes using various pattern classification algorithms. This is often the most time-consuming step in the process, but since we have used intelligent feature selection to create an optimized and small feature vector, this is the fastest step of the entire process here, and can be completed in seconds on a desktop machine. Of course, we have simply shifted the hard computational work that requires a computing cluster to the feature selection step. That is not to say that there are no advantages to doing the analysis this way—having such small feature vectors allows us to easily test a number of parameters of the classification. Before we can run pattern classification routines we must separate our data into training and testing (or validation) datasets. By withholding a subset of the data for testing the classifier’s performance, we can eliminate any “cheating” that comes from using training data for testing. We also use equal amounts of data from each class for both testing and training to eliminate bias from unequal-sized datasets.
276
E. A. Dieckman and M. K. Hinders
Fig. 8.23 The list of top features selected for 100–900 (left) and 250–500 (right) datasets illustrate how features are chosen from a variety of different parameter waveforms Table 8.4 List of user parameters in feature selection Setting Options Peakdetect Viewselected Selectnfeats
Joint, separate Binary switch Z+
Topnfeats
Z+
Description Method to choose top points View selected points Keep this many top points for each feature Keep this many top points overall
Our classification routine are run in MATLAB, using a number of standard classifiers included in the PRTools toolbox [14]. Because of our small feature vectors and short classification run time, we run the pattern classification many times, randomizing the data used for the testing and training for each run. This gives us an average classification performance and allows us to use standard deviation of correct classifications as a measure of classification repeatability. While this single-valued metric is useful in comparing classifiers, more detailed information about classifier performance will come from the average confusion matrix. For n classes, this is an n × n matrix that plots the estimated class against the known class. The confusion matrix for a perfect classification would resemble the identity matrix, with values of 1 on the diagonal and 0 elsewhere. An example of a confusion matrix is shown in Fig. 8.25.
Fig. 8.24 For each class, every individual measurement is filtered and transformed to a fingerprint image, from which a number of parameter waveforms are extracted. For each of these parameter waveforms, an average is created for each class. A comparison of these average waveforms finds the points that are best separated between the classes, and the feature vector is compiled using the values of the parameter waveform at these points. This image diagrams the process for sample data from three classes (blue, green, and red)
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot 277
278
E. A. Dieckman and M. K. Hinders
Fig. 8.25 This example of a real confusion matrix shows good classification performance, with high values on the diagonal and low values elsewhere. The confusion matrix allows more detailed visualization of the classification performance for specific classes than the single-value metric of overall percent correct. For example, although this classification has a fairly high accuracy of 86% correct, the confusion matrix shows that most of the error comes from misclassifying SUVs into the van/truck/bus class [udc classifier, 20 runs]
8.5 Results We illustrate the use of the pattern classification analyses on data collected from both stationary and oncoming vehicles. Due to the similar frontal profiles of vans, trucks, and buses, and to mitigate the small number of observations recorded from these vehicles, we will create a combined class of these measurements. The classes for classification purposes are then “car”, “SUV”, and “van/truck/bus”. For this threeclass problem, a classification accuracy of greater than 33% means the classifier is performing better than random guessing.
8.5.1 Proof-of-Concept: Acoustic Classification of Stationary Vehicles We begin our analysis with data collected from stationary vehicles. The first test is a comparison of classification performance when observations come from a single vehicle as compared to multiple vehicles. Multiple observations were made from vehicles in a parking lot at distances between 5 and 20 m. The orientation is approximately head-on (orthogonal) but with slight repositioning after every measurement to construct a more realistic dataset. The classification accuracy shown in Fig. 8.26 validates our expectation that classification performs better when observations are exclusively from a single vehicle rather than from a number of different vehicles.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
279
Fig. 8.26 The first attempt to classify vehicles based on their backscattered acoustic reflection from a 250–1000 transmitted signal shows very good classification accuracy when all observations come from a single vehicle at 20 m (top). When observations come from multiple vehicles at 5 m, the classification accuracy is much lower but still better than random guessing (bottom) [knnc classifier, 10 runs]
We see that reasonable classification accuracies can be achieved even when the observations come from multiple vehicles. Optimizing the transmitted signal and recognizing the importance of proper alignment of the reflected signals will help improve classification performance, as seen in Fig. 8.27. Here, data is collected from multiple stationary vehicles at a range of short distances between 10 and 25 m using both a 10–750 and 250–1000 transmitted signal. Classification performance seems slightly better for the shorter chirp-length signal, but real improvement comes from ensuring the signals are aligned in time. For this dataset, alignment was ensured by visual inspection of all observations in the 250–1000 dataset. This labor-intensive manual inspection has been replaced by cross-correlation methods described earlier in the analysis of data from oncoming vehicle. While these initial tests exhibit poorer classification performance than the data from oncoming vehicles, it is worth noting that these datasets consist of relatively few observations and are intended as a proof-of-concept. These stationary datasets were used to optimize the analysis procedure for the more interesting data from oncoming vehicles. For example, alignment algorithms weren’t yet completely developed, and the k-nearest neighbor classifier used to generate the above confusion matrices has proven to have consistently worse performance than the classifiers used for
280
E. A. Dieckman and M. K. Hinders
Fig. 8.27 Observations from multiple stationary vehicles at distances of 10–25 m shows approximately equal classification performance for both the 10–750 (left) and 250–1000 (right) transmitted signals. Manual inspection of the observations in the 250–1000 dataset to ensure clear visual alignment leads to markedly improved classification performance (bottom) [knnc, 10 runs]
the results shown from oncoming vehicles. Nevertheless, we see that better-thanrandom-guessing classification accuracy is possible using only the acoustic echolocation signal.
8.5.2 Acoustic Classification of Oncoming Vehicles Now that we have seen that it is possible to classify stationary vehicles using only the reflected acoustic echolocation signal, the more interesting problem is trying to classify oncoming vehicles at greater distances.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
281
Since the DWFP feature creation process has a large number of user parameters, the first step was to find which few parameters will give us the best classification performance. This reduces the parameter space in our analysis and allows us to focus on more interesting details of the classification, such as the effect of the transmitted signal on the classification accuracy. Through previous work we have seen that the choice of wavelet in both the prefiltering and transform stage in the DWFP process causes the greatest change in the fingerprint appearance, and thus the features extracted from the fingerprint. Table 8.5 shows the classification accuracy for different prefiltering wavelets, while Table 8.6 shows the classification accuracy for different transform wavelets. In both cases, the dataset being classified is from vehicles at 50 m approaching head-on using the 100–900 transmitted signal. The settings for other user parameters are: filtering at 5 levels, removing the first 5 details, 15 slices of thickness 0.03, and removing fingerprints that do not have a solidity in the range 0.3–0.6. The mean correct classification rate is created from 20 classification runs for each classifier. These are the parameters for the rest of our analysis unless noted otherwise. The choice of prefiltering wavelet does not affect the classification accuracy much. The variance measure (given by the standard deviation of repeated classifications) is not shown, but is consistently around 0.07 for all classifiers. With this knowledge, there is no obvious preference of prefiltering wavelet and we chose coif3 for further analysis. The choice of transform wavelet does seem to affect the classification accuracy somewhat more than the choice of the prefiltering wavelet. Still, the choice of classifier is by far the most important factor in classification accuracy. We select db3 as the default transform wavelet, with dmey and sym5 as alternatives. From this analysis we can also select a few good classifiers for our problem. Pre-selecting classifiers violate the Ugly Duckling theorem, which states that we should not prefer one classifier over another, but because the underlying physical situation is similar between all of our datasets we are justified in selecting a small number of well-performing classifiers. We will use the top five classifiers from our initial analysis: nmsc, perlc, ldc, fisherc, and udc. The klldc and pcldc classifiers also performed well, but since they are closely related to ldc, we choose other classifiers for diversity and to provide a good mix of parametric and non-parametric classifiers. More in-depth analysis of the effect that physical differences have on classification accuracy can be explored, using coif3 as a prefiltering wavelet and db3 as the transform wavelet, with the nmsc, perlc, ldc, fisherc, and udc classifiers. Table 8.7 shows the datasets constructed for the following analyses. All datasets are constructed from data pulled at random from the overall datasets from that particular signal type and contain an equal number of instances for each class. Most of the datasets consist of three classes: car (c), SUV (s), and a combined van/truck/bus (vtb), though few datasets with data from all five classes: car (c), SUV (s), van (v), truck (t), and bus (b) were created to attempt this individual classification. Requiring an equal number of instances from each class leads to small datasets, even after creating the combined van/truck/bus class to mitigate this effect. In addition, not all
Classifier qdc udc haar 0.40 0.81 db3 0.49 0.78 sym5 0.41 0.81 coif3 0.55 0.85 Average 0.46 0.81
PW
ldc 0.83 0.80 0.81 0.77 0.80
klldc 0.78 0.76 0.81 0.80 0.79
pcldc 0.79 0.78 0.81 0.79 0.79
nmc 0.66 0.57 0.54 0.60 0.59
nmsc 0.87 0.86 0.85 0.89 0.87
loglc 0.75 0.69 0.73 0.77 0.73
fisherc 0.82 0.79 0.79 0.80 0.80
knnc 0.56 0.59 0.54 0.57 0.56
parzenc 0.63 0.58 0.52 0.59 0.58
parzendc 0.73 0.78 0.72 0.69 0.73
kernelc 0.59 0.57 0.58 0.66 0.60
perlc 0.86 0.80 0.84 0.86 0.84
svc 0.67 0.72 0.73 0.72 0.71
nusvc 0.70 0.68 0.69 0.71 0.69
treec 0.51 0.49 0.54 0.47 0.50
Average 0.70 0.69 0.69 0.71 0.70
Table 8.5 A comparison of prefiltering wavelet (PW) choice on classification accuracy. The transform wavelet is db3. Data is from the 100–900 approaching vehicle dataset with a train/test ratio of 0.7 and classification into three classes (c, s, vtb). The differences in performance between classifiers falls within the measure of variance for a single classifier (not shown here for reasons of space). Since there seems to be no preferred prefiltering wavelet, future analysis will use coif3
282 E. A. Dieckman and M. K. Hinders
haar db3 sym5 coif3 meyr dmey mexh morl Average
TW
Classifier qdc udc 0.45 0.68 0.55 0.85 0.55 0.77 0.45 0.77 0.43 0.65 0.43 0.84 0.47 0.64 0.47 0.68 0.47 0.73
ldc 0.73 0.77 0.78 0.68 0.77 0.82 0.76 0.82 0.77
klldc 0.67 0.80 0.79 0.68 0.77 0.80 0.77 0.82 0.76
pcldc 0.68 0.79 0.81 0.67 0.77 0.82 0.73 0.82 0.76
nmc 0.42 0.60 0.53 0.49 0.51 0.50 0.55 0.52 0.51
nmsc 0.81 0.89 0.86 0.76 0.86 0.88 0.78 0.84 0.83
loglc 0.59 0.77 0.72 0.64 0.66 0.81 0.63 0.72 0.69
fisherc 0.66 0.80 0.82 0.64 0.80 0.80 0.76 0.83 0.76
knnc 0.47 0.57 0.53 0.49 0.53 0.47 0.57 0.51 0.52
parzenc 0.47 0.59 0.54 0.51 0.61 0.50 0.51 0.51 0.53
parzendc 0.67 0.69 0.73 0.68 0.68 0.71 0.63 0.76 0.69
kernelc 0.54 0.66 0.61 0.55 0.62 0.54 0.61 0.51 0.58
perlc 0.77 0.86 0.80 0.75 0.77 0.88 0.73 0.85 0.80
svc 0.61 0.72 0.71 0.65 0.69 0.71 0.64 0.70 0.68
nusvc 0.63 0.71 0.66 0.63 0.65 0.62 0.63 0.65 0.65
treec 0.48 0.47 0.49 0.56 0.47 0.52 0.44 0.51 0.49
Average 0.61 0.71 0.69 0.62 0.66 0.69 0.64 0.68 0.70
Table 8.6 A comparison of transform wavelet (TW) choice on classification accuracy shows very similar classification performance for many wavelet choices. The prefiltering wavelet is coif3. Data is from the 100–900 approaching vehicle dataset with train/test ratio of 0.7 and classification into three classes (c, s, vtb). Due to space constraints, the variance is not shown
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot 283
284
E. A. Dieckman and M. K. Hinders
Table 8.7 A survey of the datasets used in this analysis (in order of appearance) shows the number of classes, total number of instances, and distance at which data was acquired. The small size of the datasets is a direct result of requiring the datasets to have an equal number of instances per class and the relatively few observations from vans, trucks, and buses Dataset Classes Instances Distance (m) 100–900 a 100–900 b 100–900 c 250-comb a 250-comb b 250-comb c 250–500 250–1000 250-comb HO 250-comb NHO 5–750 10–750 100–900
3 3 3 3 3 3 3 3 3 3 3 3 5
108 108 108 251 251 251 66 66 98 98 108 108 35
50 50 50 25, 30, 50 25, 30, 50 25, 30, 50 30, 50 < 25, 25, 30, 60 50 25, 30 50 50 50
Table 8.8 Increasing the amount of data used for training increases the classification accuracy, but reduces the amount of available data for validation. As too much of the available data is used for training the classifier becomes overtrained and the variance of the accuracy measurement increases. Data is from the 100–900 approaching vehicle dataset with classification into three classes (c, s, vtb) Train % Classifier nmsc perlc ldc fisherc udc Avg 0.25 0.5 0.6 0.7 0.8 0.9
0.82 ± 0.06 0.89 ± 0.05 0.89 ± 0.03 0.90 ± 0.06 0.91 ± 0.06 0.91 ± 0.08
0.77 ± 0.06 0.85 ± 0.04 0.84 ± 0.06 0.87 ± 0.05 0.89 ± 0.06 0.86 ± 0.12
0.52 ± 0.08 0.69 ± 0.07 0.76 ± 0.06 0.81 ± 0.06 0.79 ± 0.11 0.82 ± 0.13
0.43 ± 0.09 0.74 ± 0.08 0.75 ± 0.06 0.79 ± 0.06 0.80 ± 0.10 0.82 ± 0.12
0.72 ± 0.08 0.80 ± 0.05 0.82 ± 0.06 0.83 ± 0.06 0.82 ± 0.07 0.88 ± 0.10
0.65 0.79 0.81 0.84 0.84 0.86
of the instances are usable due to the difficulty of detecting and aligning the signals. This is especially true for the 250 ms signals. We will first look at the influence of the train/test ratio on the classification performance. Table 8.8 shows the classification accuracy as a function of train/test ratio for the same 100–900 dataset used in our earlier analysis of wavelets and classifiers shown in Tables 8.5 and 8.6. In general, the classifiers are able to perform well even when only 25% of the dataset is used for training, with the notable exception of the fisherc classifier. Classification accuracy increases with increasing training ratio, but when too much of the dataset is used for training (90% here) not enough data is
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
285
Table 8.9 Classification of datasets whose instances are selected at random from the larger dataset containing all possible observations shows repeatable results for both 100–900 and 250-comb data. The overall lower performance of the 250-comb datasets is likely due to the greater variety in the observations present in this dataset Dataset Classifier nmsc perlc ldc fisherc udc Avg 100–900: a b c 250-comb: a b c
0.81 ± 0.04 0.84 ± 0.06 0.86 ± 0.05 0.60 ± 0.06 0.58 ± 0.05 0.52 ± 0.06
0.84 ± 0.05 0.76 ± 0.09 0.84 ± 0.05 0.55 ± 0.06 0.55 ± 0.06 0.45 ± 0.05
0.78 ± 0.05 0.69 ± 0.08 0.79 ± 0.07 0.56 ± 0.06 0.56 ± 0.06 0.51 ± 0.06
0.78 ± 0.06 0.74 ± 0.08 0.77 ± 0.06 0.56 ± 0.03 0.54 ± 0.03 0.50 ± 0.05
0.76 ± 0.07 0.71 ± 0.08 0.72 ± 0.05 0.53 ± 0.04 0.56 ± 0.05 0.50 ± 0.04
0.79 0.75 0.77 0.56 0.56 0.50
available for validation and the variance of the classification accuracy increases. A more in-depth look at this phenomenon comes from the confusion matrices, shown in Fig. 8.28. For future analysis we choose a train/test ratio of 0.6 to ensure we have enough data for validation, with the caveat that our classification performance could be few points higher if we used a higher training ratio.
8.5.2.1
Repeatability of Classification Results
Since our code creates a dataset for classification by randomly selecting observations from a given class from among all the total possibilities, we would expect some variability between these separate datasets. Table 8.9 shows the classification results for three datasets compiled from all available data from the 100–900 and 250-comb overall datasets. Classification performance is similar among both the 100–900 and 250-comb datasets. The 250-comb dataset has an overall lower classification performance, likely due to the greater variety of observations present in the dataset. The 250-comb data is a combination of 250–500 and 250–1000 data, created with the assumption that the time between chirps is less important than the length of the chirp. A comparison of classification performance of all the 250 ms chirps, shared in Table 8.10 and Fig. 8.29 calls this into question. While it is possible that this particular 250-comb dataset that was randomly selected from the largest and most diverse dataset just suffered from bad luck, there is clearly a difference in classification performance between the 250–500/250–1000 datasets and the combined 250-comb dataset. This leads us to believe that the entire transmitted signal, including the space between chirps, is important in defining a signal, rather than just the chirp length. That said, both the 250–500 and 250–1000 datasets exhibit good classification performance, albeit with large variances. These large variances are caused by the relatively small number of useful observations in each dataset. Figure 8.30 shows
286
E. A. Dieckman and M. K. Hinders
Fig. 8.28 This example of the classification of 98 instances pulled from the 100–900 dataset shows how increasing the training ratio first improves classification performance and then increases variance due to overtraining and a lack of data to validate the classifier. The mean confusion matrix is shown for the fisherc classifier at 25% training data (top), 60% train (middle), and 90% train (bottom)
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
287
Fig. 8.29 A comparison of the confusion matrices (perlc classifier) of the 250–500 (top), 250–1000 (middle), and 250-comb (bottom) datasets shows that the non-combined datasets have a much higher classification performance than the 250-comb dataset. Both the 250–500 and 250–1000 datasets are small, with 17 and 19 total instances, respectively, compared with the 215 total instances of the 250-comb dataset. The classification performance of the 250–1000 dataset is almost perfect
288
E. A. Dieckman and M. K. Hinders
Table 8.10 A comparison of classification performance between the 250 datasets shows that the spacing between chirps is important in defining our transmitted signal Dataset Classifier nmsc perlc ldc fisherc udc Avg 250–500 0.92 ± 0.09 0.85 ± 0.12 0.72 ± 0.16 0.75 ± 0.14 0.70 ± 0.15 0.79 250–1000 0.99 ± 0.03 0.98 ± 0.05 0.56 ± 0.22 0.68 ± 0.19 0.97 ± 0.07 0.84 250-comb: a 0.60 ± 0.06 0.55 ± 0.06 0.56 ± 0.06 0.56 ± 0.03 0.53 ± 0.04 0.56
Table 8.11 A comparison of 250-comb data acquired in a “head-on” orientation and data collected at a slight angle shows that both orientations have a similar classification performance. Orientation Classifier nmsc perlc ldc fisherc udc Avg Head-on Oblique
0.64 ± 0.08 0.56 ± 0.11 0.54 ± 0.11 0.52 ± 0.09 0.65 ± 0.12 0.58 0.63 ± 0.09 0.56 ± 0.07 0.57 ± 0.10 0.55 ± 0.12 0.58 ± 0.08 0.58
an example of how these variances play out and why the confusion matrices are so important to understanding the classification results. Here, both the ldc and udc classifiers have an average overall correct rate of around 70%, but the udc classifier has difficulty correctly classifying the van/truck/bus class. This example also illustrates the importance of having large datasets to create training and testing sets with a sufficient number of observations for each class. Both the 250–500 and 250–1000 datasets used here have fewer than 10 instances per class, meaning that even at a 60% training percentage, the classification can only be tested on a few instances. We are forced to use these small datasets in this situation because our automated detection routing has a good deal of difficulty locating peaks from the 250 signals. The detection rate of this 250–500 dataset is 26% and the rate for the 250–1000 dataset is 29%. This is compared to a detection rate of 89% for the 100–900 signal. For this reason, and reasons discussed earlier, the 100–900 signal remains our preferred transmission signal.
8.5.2.2
Head-on Versus Oblique Reflections
Another useful comparison is between data acquired “head-on” and at a slight angle. Results from stationary vehicles confirm that the recorded signal contains reflected pulses, and Table 8.11 shows that both datasets have a similar classification performance. With an average correct classification rate of 58% for both, this 250-comb data isn’t an ideal dataset for reasons that we discussed above, but was the only dataset containing observations from both orientations.
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
289
Fig. 8.30 A comparison of the confusion matrices from the ldc (top) and udc (bottom) classifiers on data from the 250–500 dataset highlights the importance of using the extra information present in the confusion matrix. Both classifiers have a mean overall correct rate of approximately 70%, but the udc classifier has much more difficult time classifying vans/trucks/buses into the correct class
8.5.2.3
Comparison of All Input Signals and Classification into Five Classes
Finally, we make an overall comparison between the different incident signals for the three-class problem, and make an attempt at classifying the data from one dataset into the five original, non-grouped classes.
290
E. A. Dieckman and M. K. Hinders
Table 8.12 A comparison of datasets from all of the transmitted signal types shows very good classification performance for all classifiers except ldc and fisherc. Removing these classifiers gives the average in the last column (avg2). The 100–900 and 250 datasets have been discussed previously in more detail and are included here for completeness. The final row shows the lone five-class classification attempt, with only seven instances per class Signal Classifier nmsc perlc ldc fisherc udc Avg Avg2 0.98 ± 0.05 10–750 0.96 ± 0.06 100–900 0.89 ± 0.06 250–500 0.92 ± 0.09 250–1000 0.99 ± 0.03 100–900 0.94 ± 5C 0.09 5–750
0.99 ± 0.02 0.94 ± 0.04 0.86 ± 0.05 0.85 ± 0.12 0.98 ± 0.05 0.92 ± 0.08
0.53 ± 0.19 0.61 ± 0.13 0.77 ± 0.07 0.72 ± 0.16 0.56 ± 0.22 0.47 ± 0.14
0.54 ± 0.14 0.50 ± 0.17 0.80 ± 0.09 0.75 ± 0.14 0.68 ± 0.19 0.45 ± 0.19
0.96 ± 0.07 0.88 ± 0.09 0.85 ± 0.06 0.70 ± 0.15 0.97 ± 0.07 0.82 ± 0.15
0.80
0.98
0.78
0.93
0.83
0.87
0.79
0.82
0.84
0.98
0.72
0.89
Table 8.12 shows the results from these comparisons. Even though our reflection detection algorithm has difficulties with both the 5–750 and 10–750 datasets (as well as with the 250- datasets as discussed earlier) and can only detect the reflection in 25% of the observations, we get good classification performance. The ldc and fisherc classifiers give anomalously low mean overall classification performance rates with high variance. Removing these classifiers, we can calculate an average performance for the remaining three classifiers, shown in the last column of Table 8.12. With mean overall classification rates ranging from 82 to 98% we can’t say much about one signal being preferred to another, except that our algorithms are able to detect the reflections in the 100–900 signal best. Even with the severely limited data available for classification into five classes (only 7 instances per class), we surprisingly find good classification performance, with an average classification rate of 89%. The best (nmsc at 94%) and worst (udc at 82%) classifiers for this data is shown in Fig. 8.31.
8.6 Conclusions We have shown that oncoming vehicles can be classified with a high amount of accuracy, and at useful distances, using only reflected acoustic pulses. Finding and aligning these reflected signals is a nontrivial, vital step in the process, but one that can be successfully automated, especially if the transmitted signal is optimized to the application. Useful feature vectors that differentiate between vehicles of different
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
291
Fig. 8.31 Classification of 100–900 data into five classes is only possible with a limited dataset of 7 instances per class, but manages mean classification performance rates of 82% for the udc classifier (top) and 94% for the nmsc classifier (bottom). Classification of 100–900 data into five classes is only possible with a limited dataset of seven instances per class, but manages mean classification performance rates of 82% for the udc classifier (top) and 94% for the nmsc classifier (bottom)
292
E. A. Dieckman and M. K. Hinders
classes can be formed by using the Dynamic Wavelet Fingerprint to create alternative time–frequency representations of the reflected signal, and intelligent feature selection algorithms create information-dense representations of our data that allow for very fast and accurate classification. We have found that a 100–900 linear chirp transmitted signal is best optimized for this particular problem. The signal contains enough energy to propagate long distances, while remaining compact enough in time to allow easy automatic detection. With this signal we can consistently attain correct overall classification rates upwards of 85% at distances of 50 m. We have investigated a number of sensor modalities that may be appropriate for mobile walking-speed robots operating in unstructured outdoor environments. A combination of short- and long-range sensors is necessary for a robot to capture usable data about its environment. Our prior work had focused on passive thermal infrared and air-coupled ultrasound as possible short-range sensor modalities. This work looked at the suitability of the Microsoft Kinect as a short-range active infrared depth sensor, as well as the performance of a coffee can radar and acoustic echolocation via acoustic parametric array as long-range sensors for mobile robotics. In particular, we have demonstrated that the most exciting feature of the Microsoft Kinect, a low-cost depth sensor, is of limited use in outdoor environments. The active illumination source in the near infrared is both limited to a range of several meters and easily saturated by sunlight so that it is mostly useful in nighttime outdoor environments. The infrared sensor is tuned to this near infrared wavelength and provides little more information than the included RGB webcam. The Kinect 4-channel microphone array proved to be of high quality. The microphones are not spatial separated enough to allow for implementation of beamforming methods at distances over several meters and are limited to a relatively low 16 kHz sampling rate by current software, but the design of the capsule microphones and built-in noise cancellation algorithms allow for high-quality recording. Construction of a coffee can radar showed that such devices are feasible for mobile robotics, providing long-range detection capability at low cost and in a physically small package. Since the radar signal is de-chirped to facilitate processing with a computer sound card, these measurements do not contain much useful information about scattering from the target. However, radar ranging measurements could provide an early detection system for a mobile robot, detecting objects at long range before other sensors are used to classify the object. Another exciting possible use of the radar sensor is the creating of synthetic aperture radar (SAR) images. This method to create a three-dimensional representation of the radar scattering from a target is essentially a set of ranging measurements acquired over a wide area. Normally this requires either an array of individual radar sensors or a radar that can be steered by beam forming but is a natural fit for mobile robotics since the radar sensor is in motion on a well-defined path. The main focus of our work has been the use of acoustic echolocation as a longrange sensor for mobile robotics. Using coded signals in the audible range increases the range of the signal while still allowing for detection in noisy environments. The acoustic parametric array is able to create a tight beam of this low-frequency sound,
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
293
directing the majority of the sound energy on the target. This serves the dual purpose of reducing clutter in the backscattered signal and keeping noise pollution added to the surrounding environment to a minimum level. As a test of this sensor modality, several thousand acoustic echolocation measurements were acquired from approaching vehicles in a variety of environmental conditions. The goal was to classify the vehicle into one of five classes (car, SUV, van, truck, or bus) based on the frontal profile. To test feasibility as a long-range sensor, vehicles were interrogated at distances up to 50 m. Initial analysis of the measured backscattered data showed that useful information about the target under is buried deep in noise. Time–frequency representations of the data, in particular, representations created using the dynamic wavelet fingerprint (DWFP) process reveal hidden information. The formal framework of statistical pattern classification allowed us to intelligently create small-dimensional, informationdense feature vectors that best describe the target. This process was able to correctly classify vehicles using only the backscattered acoustic signal with 94% accuracy.
References 1. Pratkanis AT, Leeper AE, Salisbury JK (2013) Replacing the office intern: an autonomous coffee run with a mobile manipulator. In: IEEE international conference on robotics and automation 2. Dieckman EA (2013) Use of pattern classification algorithms to interpret passive and active data streams from a walking-speed robotic sensor platform, William & Mary Doctoral Dissertation 3. Cohen L (1995) Time-frequency analysis. Prentice Hall, New Jersey 4. Hou J, Hinders M (2002) Dynamic wavelet fingerprint identification of ultrasound signals. Mater Eval 60(9):1089–1093 5. Hinders M, Hou J, Keon JM (2005) Wavelet processing of high frequency ultrasound echoes from multilayers. In: Review of Progress in Quantitative Nondestructive Evaluation, vol 24, pp 1137–1144 6. Hou J, Hinders M, Rose S (2005) Ultrasonic periodontal probing based on the dynamic wavelet fingerprint. J Appl Signal Process 7:1137–1146 7. Hinders M, Jones R, Leonard KR (2007) Wavelet thumbprint analysis of time domain reflectometry signals for wiring flaw detection. Eng Intell Syst 15(4):65–79 8. Bertoncini C, Hinders M (2010) Fuzzy classification of roof fall predictors in microseismic monitoring. Measurement 43:1690–1701 9. Miller CA (2013) Intelligent feature selection techniques for pattern classification of timedomain signals. Doctoral dissertation, The College of William and Mary 10. Bingham J, Hinders M (2009) Lamb wave characterization of corrosion-thinning in aircraft stringers: experiment and 3d simulation. J Acous Soc Am 126(1):103–113 11. Bertoncini C, Hinders M (2010) Ultrasonic periodontal probing depth determination via pattern classification. In: Thompson D, Chimenti D (eds) 31st review of progress in quantitative nondestructive evaluation, vol 29. AIP Press, pp 1556–1573 12. Miller CA, Hinders MK (2014) Classification of flaw severity using pattern recognition for guided wave-based structural health monitoring. Ultrasonics 54:247–258 13. Bertoncini C (2010) Applications of pattern classification to time-domain signals. Doctoral dissertation, The College of William and Mary 14. Duin R, Juszczak P, Paclik P, Pekalska E, de Ridder D, Tax D, Verzakov S (2007) Prtools4.1, a Matlab toolbox for pattern recognition. Delft University of Technology, http://prtools.org/ 15. DARPA (2004) DARPA grand challenge, http://archive.darpa.mil/grandchallenge04/
294
E. A. Dieckman and M. K. Hinders
16. Guizzo E (2011) How Google’s self-driving car works. IEEE spectrum online, http://spectrum. ieee.org/automaton/robotics/artificial-intelligence/how-google-self-driving-car-works 17. Fisher A (2013) Inside Google’s quest to popularize self-driving cars. Popular science. http:// www.popsci.com/cars/article/2013-09/google-self-driving-car 18. The Economist, “March of the lettuce bot,” The Economist, December 2012. http://economist. com/news/technology-quarterly/21567202-robotics-machine-helps-lettuce-farmers-justone-several-robots/ 19. Hall DL, Llinas J (1997) An introduction to multisensor data fusion. Proc IEEE 85:6–23 20. Fehlman WL (2008) Classification of non-heat generating outdoor objects in thermal scenes for autonomous robots. Doctoral dissertation, The College of William and Mary 21. Fehlman WL, Hinders MK (2010) Passive infrared thermographic imaging for mobile robot object identification. J Field Robot 27(3):281–310 22. Fehlman WL, Hinders M (2009) Mobile robot navigation with intelligent infrared image interpretation. Springer, London 23. Open Kinect. http://openkinect.org/ 24. OpenNI. http://openni.org/ 25. Burrus N, RGBDemo. http://labs.manctl.com/rgbdemo/ 26. Microsoft Research, Kinect for windows SDK. http://www.microsoft.com/en-us/ kinectforwindows/ 27. Liebe CC, Padgett C, Chapsky J, Wilson D, Brown K, Jerebets S, Goldberg H, Schroeder J (2006) Spacecraft hazard avoidance utilizing structured light. In: IEEE aerospace conference. Big Sky, Montana 28. Gao W, Hinders MK (2005) Mobile robot sonar interpretation algorithm for distinguishing trees from poles. Robot Autonom Syst 53:89–98 29. Hinders M, Gao W, Fehlman W (2007) Sonar sensor interpretation and infrared image fusion for mobile robotics. In: Kolski S (ed) Mobile robots: perception and navigation. Pro Literatur Verlag, Germany, pp 69–90 30. Barshan B, Kuc R (1990) Differentiating sonar reflections from corners and planes by employing an intelligent sensor. IEEE Trans Pattern Anal Mach Intell 12(6):560–569 31. Kleeman L, Kuc R (1995) Mobile robot sonar for target localization and classification. Int J Robot Res 14(4):295–318 32. Hernandez A, Urena J, Mazo M, Garcia J, Jimenez A, Jimenez J, Perez M, Alvarez F, DeMarziani C, Derutin J, Serot J (2009) Advanced adaptive sonar for mapping applications. J Intell Robot Syst 55:81–106 33. Laan Labs, Sonar ruler. https://itunes.apple.com/app/sonar-ruler/id324621243?mt=8. Accessed 5 Aug 2013 34. Westervelt P (1963) Parametric acoustic array. J Acous Soc Am 35:535–537 35. Bennett MB, Blackstock D (1975) Parametric array in air. J Acous Soc Am 57:562–568 36. Yoneyama M, Fujimoto J, Kawamo Y, Sasabe S (1983) The audio spotlight: an application of nonlinear interaction of sound waves to a new type of loudspeaker design. J Acous Soc Am 73:1532–1536 37. Pompei FJ (1999) The use of airborne ultrasonics for generating audible sound beams. J Audio Eng Soc 47:726–731 38. Gan W-S, Yang J, Kamakura T (2012) A review of parametric acoustic array in air. Appl Acoust 73:1211–1219 39. Achanta A, McKenna M, Heyman J (2005) Non-linear acoustic concealed weapons detection. In: Proceedings of the 34th applied imagery and pattern recognition workshop (AIPR05), pp 7–27 40. Hinders M, Rudd K (2010) Acoustic parametric array for identifying standoff targets. In: Thompson D, Chimenti D (eds) 31st review of progress in quantitative nondestructive evaluation, vol 29. AIP Press, pp 1757–1764 41. Calicchia P, Simone SD, Marcoberardino LD, Marchal J (2012) Near- to far-field characterization of a parametric loudspeaker and its application in non-destructive detection of detachments in panel paintings. Appl Acoust 73:1296–1302
8 Pattern Classification for Interpreting Sensor Data from a Walking-Speed Robot
295
42. Ureda MS (2004) Analysis of loudspeaker line arrays. J Audio Eng Soc 52:467–495 43. Tashev I (2009) Sound capture and processing: practical approaches. Wiley, West Sussex 44. Tashev I (2011) Audio for kinect: from idea to “Xbox, play!”. http://channel9.msdn.com/ events/mix/mix11/RES01 45. Charvat GL, Williams JH, Fenn AJ, Kogon S, Herd JS (2011) Build a small radar system capable of sensing range, doppler, and synthetic aperture radar imaging. Massachusetts Institute of Technology: MIT OpenCourseWare RES.LL-003 46. Skolnik M (2008) Radar handbook. McGraw-Hill, New York 47. Richards M (2005) Fundamentals of radar signal processing. McGraw-Hill, New York 48. Pozar DM (2011) Microwave engineering. Wiley, Hoboken 49. Barker JL (1970) Radar, acoustic, and magnetic vehicle detectors. IEEE Trans Vehicul Technol 19:30–43 50. Mills MK (1970) Future vehicle detection concepts. IEEE Trans Vehicul Technol 19:43–49 51. Palubinskas G, Runge H (2007) Radar signatures of a passenger car. IEEE Geosci Remote Sens Lett 4:644–648 52. Lan J, Nahavandi S, Lan T, Yin Y (2005) Recognition of moving ground targets by measuring and processing seismic signal. Measurement 37:189–199 53. Sun Z, Bebis G, Miller R (2006) On-road vehicle detection: a review. IEEE Trans Pattern Anal Mach Intell 28:694–711 54. Buch N, Velastin SA, Orwell J (2011) A review of computer vision techniques for the analysis of urban traffic. IEEE Trans Intell Transp Syst 12:920–939 55. Thomas DW, Wilkins BR (1972) The analysis of vehicle sounds for recognition. Pattern Recognit 4(4):379–389 56. Nooralahiyan AY, Dougherty M, McKeown D, Kirby HR (1997) A field trial of acoustic signature analysis for vehicle classification. Transp Res C 5(3–4):165–177 57. Nooralahiyan AY, Kirby HR, McKeown D (1998) Vehicle classification by acoustic signature. Math Comput Model 27(9–11):205–214 58. Bao M, Zheng C, Li X, Yang J, Tian J (2009) Acoustic vehicle detection based on bispectral entropy. IEEE Signal Process Lett 16:378–381 59. Guo B, Nixon MS, Damarla T (2012) Improving acoustic vehicle classification by information fusion. Pattern Anal Appl 15:29–43 60. Averbuch A, Zheludev VA, Rabin N, Schclar A (2009) Wavelet-based acoustic detection of moving vehicles. Multidim Syst Sign Process 20:55–80 61. Averbuch A, Zheludev VA, Neittaanmaki P, Wartiainen P, Huoman K, Janson K (2011) Acoustic detection and classification of river boats. Appl Acoust 72:22–34 62. Quaranta V, Dimino I (2007) Experimental training and validation of a system for aircraft acoustic signature identification. J Aircraft 44:1196–1204 63. Wu H, Siegel M, Khosla P (1999) Vehicle sound signature recognition by frequency vector principal component analysis. IEEE Trans Instrum Measur 48:1005–1009 64. Aljaafreh A, Dong L (2010) An evaluation of feature extraction methods for vehicle classification based on acoustic signals. In: Proceedings of the 2010 international conference on networking, sensing and control (ICNSC), pp 570–575 65. Lee J, Rakotonirainy A (2011) Acoustic hazard detection for pedestrians with obscured hearing. IEEE Trans Intell Transp Syst 12:1640–1649 66. Braun ME, Walsh SJ, Horner J, Chuter R (2013) Noise source characteristics in the ISO 362 vehicle pass-by noise test: literature review. Appl Acoust 74:1241–1265 67. Westervelt P (1957) Scattering of sound by sound. J Acous Soc Am 29:199–203 68. Westervelt P (1957) Scattering of sound by sound. J Acous Soc Am 29:934–935 69. Pompei FJ (2002) Sound from ultrasound- The parametric array as an audible sound source. Doctoral dissertation, Massachusetts Institute of Technology 70. Pierce AD (1989) Acoustics: an introduction to its physical principles and applications. The Acoustical Society of America, New York 71. Beyer RT (1960) Parameter of nonlinearity in fluids. J Acous Soc Am 32:719–721 72. Hamilton MF, Blackstock DT (1998) Nonlinear acoustics. Academic, San Diego 73. Beyer RT (1974) Nonlinear acoustics. Brown University Department of Physics, Providence
Chapter 9
Cranks and Charlatans and Deepfakes Mark K. Hinders and Spencer L. Kirn
Abstract Media gate-keepers who curate information and decide which things regular people should see and read are largely superfluous in the age of social media. There have always been hoaxsters and pranksters and hucksters, but who and what online sources to trust and believe is becoming an urgent problem for us regular people. Snake oil salesmen provide insight into what to watch for when skeptically evaluating the claims of machine learning software salesreps. Digital photos are so easy to manipulate that we can’t believe what we see in pictures, of course, but now deepfake videos mean we can’t tell when video evidence is deliberately misleading. Classic faked UFO photos instruct us to pay attention to the behavior of the photographer, which can now be done automatically by analyzing tweetstorms surrounding events. Topic modeling gives a way to form time-series plots of tweetstorm subjects that can then be transformed via dynamic wavelet fingerprints to isolate shapes that may be characteristic of organic versus artificial virality. Keywords Tweetstorm · Digital charlatans · Natural language processing · Wavelet fingerprint
9.1 Digital Cranks and Charlatans Question: Whose snake oil is more dangerous, the crank or the charlatan? First some historical context. Medicine in the modern sense is a fairly recent phenomenon, with big advances happening when we’re at war. During the Civil War, surgeons mostly knew to wash their saws between amputations, but their primary skill was speed because anesthesia (ether) wasn’t really used in the field hospitals. Part of the punishment for losing WWI was Germany giving up their aspirin patent. Antibiotics can be thought of as a WWII innovation along with radar and the A-bomb. In our current wars to ensure the flow of oil and staunch the resulting spread of terrorism, M. K. Hinders (B) · S. L. Kirn W&M Applied Science Department, Williamsburg, VA 23187-8795, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 M. K. Hinders, Intelligent Feature Selection for Machine Learning Using the Dynamic Wavelet Fingerprint, https://doi.org/10.1007/978-3-030-49395-0_9
297
298
M. K. Hinders and S. L. Kirn
we have been doing an excellent job of saving lives of those who have been gravely wounded on the battlefield, as evidenced by the now common sight of prosthetic arms and legs in everyday America. Same goes for the dramatic drop in murder rates that big-city mayors all took credit for. Emergency medical technology on the battlefield and in the hood is really quite amazing, but it’s very recent. In the late ninenteenth century there really wasn’t what we would now call medicine [1–7]. Nevertheless, there was a robust market for patent medicines, i.e., snake oil. Traveling salesmen would hawk them, you could order them from the Sears and Roebuck catalog, and so on. There was no way to know what was in them because the requirement to list ingredients on the label didn’t arise until the Pure Food and Drug Act of 1906. The modern requirement to prove both safety and efficacy prior to marketing a drug or treatment was far off into the distant future. You might think, then, that soothing syrup that was so effective at getting a teething baby to sleep was mostly alcohol like some modern cough syrups. Nope. Thanks to recent chemical analyses of samples from the Henry Ford museum, we now know that most of these snake oils were as much as thirty percent by weight opiates. Rub a bit of that on the baby’s gums. Have a swig or three yourself. Everybody has a nice long nap. Some of the hucksters who mixed up their concoctions in kitchens, barns, etc. knew full well that their medicines didn’t actually treat or cure any ailments, but the power of suggestion can and does make people feel better if it’s wrapped up properly in a convincing gimmick. This turns out to be how ancient Chinese medicine works. Acupuncture makes you feel better because you think it should make you feel better. Billions of people over thousands of years can’t have been wrong! Plus, you see the needles go in and you feel a tingling that must be your Qi unblocking. Don’t worry, placebos still work even after you’ve been told that they’re placebos. Similarly, tap water in a plastic bottle sells for way more if there’s a picture of nature on the label. You know perfectly well there’s no actual Poland Spring in Maine and that your square-bottled Fiji is no more watery than regular water, but their back stories make the water taste so much more natural. Plus, everybody remembers that gross kid in school who put his mouth right on the water fountain, ew! Hucksters who know perfectly well that their snake oil isn’t medicine are charlatans. They are selling you pretend medicine (or tap water) strictly to make a buck. The story they tell is what makes their medicine work. Even making it taste kind of bad is important because we all know strong medicine has side effects. Oh, and expensive placebos actually do work better than cheap ones, so charlatans don’t have to feel bad about their high-profit margins. Some of the hucksters have actually convinced themselves that the concoction they’ve mixed up has curative powers bordering on the magical, kind of like the second-grade teacher whose effervescent vitamin tablets prevent you from getting sick touching in-flight magazines. These deluded individuals are called cranks. Charlatans are criminals. Cranks are fools. But whose snake oil is more dangerous? Note again that all medicine has side effects, which you hear listed during commercials for medicines on the evening news. Charlatans don’t want to go to prison and they want a reliable base of repeat customers. Hence, they will typically take great care to not kill people off via side effects. Cranks are much more likely to
9 Cranks and Charlatans and Deepfakes
299
Fig. 9.1 Charlatans know full well their snake oil works via the placebo effect. Cranks may overlook dangerous side effects because they fervently believe in their concoction’s magical curative powers. School pills give a student knowledge without having to attend lessons [8–10] so that the student’s time can instead be applied to Athletic pursuits. School pills are sugar coated and easily swallowed, unlike most machine learning software
overlook quite dangerous side effects in their fervent belief in the ubiquitous healing powers of their concoctions. That fervent belief can also make cranks surprisingly effective at convincing others. They are, in fact, true believers. Their snake oil is way more dangerous than the charlatan’s, even the charlatan who is a pure psychopath that lacks any ability to care whether you live or die (Fig. 9.1). Raymond Merriman is a financial astrologer and author of several financial market timing books, and is the recipient of the Regulus Award for Enhancing Astrology’s Image as a Profession [11]. “Financial markets offer an objective means to test astrological validity. The Moon changes signs every 2–3 days and is valuable for short-term trading. Planetary stations and aspects identify longer term market reversals. Approximately 4–5 times per year, markets will form important highs or lows, which are the most favorable times to buy and sell.” If you’d like to pay actual money to attend, his workshop “provides research studies showing the correlation of astrological factors to short-term and longer term financial market timing events in stock markets, precious metals, and Bitcoin.” His website does offer a free weekly forecast, so there’s that. Guess what he didn’t forecast in Q1 2020? Warren Buffet is neither a crank nor a charlatan. He is the second or third wealthiest, and one of the most influential, business people in America [12] having earned the nickname “Oracle of Omaha” by predicting the future performance of companies and then hand-picking those that he thought were going to do well. Ordinary people who bought shares in his company have done quite well. Nevertheless, Buffett advocates index funds [13] for people who are either not interested in managing their
300
M. K. Hinders and S. L. Kirn
own money or don’t have the time. He is skeptical that active management can outperform the market in the long run, and has advised both individual and institutional investors to move their money to low-cost index funds that track broad, diversified stock market indices. Buffett said in one of his letters to shareholders that [14] “when trillions of dollars are managed by Wall Streeters charging high fees, it will usually be the managers who reap outsized profits, not the clients.” Buffett won a friendly wager with numerous managers that a simple Vanguard S & P 500 index fund will outperform hedge funds that charge exorbitant fees. Some call that a bet, some call it a controlled prospective experiment. If you’re trying to figure out whether a new miracle cure or magic AI software actually works, you have to figure out some way to test it in order to make sure you aren’t being fooled or even fooling yourself. A casual perusal of software platforms for traders, both amateur day-traders and professional money managers, sets off my bullshit detectors which I have honed over 35 years of learning about and teaching pseudoscience. It seems obvious to me that there are lots of cranks and charlatans in financial services. I suppose that’s always been the case, but now that there’s so much AI and machine learning snake oil in the mix it has become suddenly important to understand their modus operandi. OK, so the stock market isn’t exactly life and death, but sometimes it sure feels like it and a little soothing syrup would often hit the spot right about 4pm Eastern time. How do we go about telling the crank from the charlatan when the snake oil they’re selling is machine learning software? They’re pretty unlikely to list the algorithmic ingredients on their labels, since that might involve open-sourcing their software. Even if you could get access to the source code, good luck decoding that uncommented jumble of function calls, loops and what ifs. The advice of diplomats might be: trust, but verify. Scientists would approach it a bit differently. One of the key foundational tenets of science is that skeptical disbelief is good manners. The burden of proof is always with the one making a claim of some new finding, and the more amazing the claim is the more evidence that the claimant must provide. Politicians (and cranks) always seem to get cross when we don’t immediately believe their outlandish promises, so we can use that sort of emotional response as a red flag. Another key aspect of science is that no one experiment, no matter how elegantly executed, is convincing by itself. Replication, replication, replication. Independent verification by multiple laboratories is how science comes around to new findings. Much of the research literature in psychology is suspect right now because of their socalled replication crisis [15]. Standard procedures in psychology have often involved what’s called p-hacking, which is basically slicing and dicing the results with different sorts of statistical analysis methods until something notable shows up with statistical significance. In predictive financial modeling it’s a trivial matter to back test your algorithms over a range of time slices and market-segment dices until you find a combination where your algorithms work quite well. Charlatans will try to fool you by doing this. Cranks will fool themselves. We can use this as another red flag, and again science has worked out for us a way to proceed. Let’s say you wanted to design a controlled test of urine therapy [16–24] which is the crazy idea that drinking your own tinkle has health benefits. I suppose that loners who believe chugging their first water of the morning will make them feel better do
9 Cranks and Charlatans and Deepfakes
301
actually feel better, but that’s almost certainly just the placebo effect [25]. Hence, any test of urine therapy would have to be placebo controlled, which brings up the obvious question of what one would use for a placebo. Most college students that I’ve polled over the last several years agree that warm Natty Light tastes just like piss, so that would probably work. So the way a placebo-controlled study goes is, people are randomly assigned to either drink their own urine or chug a beer, but they can’t know which group they’re in so we’ll have to introduce a switcheroo of some sort. What might work is that everybody pees into a cup and then gives the cup to an assistant who either hands back that cup of piss or hands over an identical cup full of warm Natty Light according to which group the testee is in. It might even be necessary to give the testee an N95 facemask and/or the tester a blindfold so as to minimize the chances of inadvertently passing some clues as to what the cup contains. Then another research assistant who has no idea whether the testee has been drinking urine or beer will be asked to assess some proposed health benefit, which is the part that makes it double blind. If the person interpreting the results knows who’s in which group, that can easily skew their assessment of the results. Come to think of it, the testees should probably get an Altoids before they go get their health assessed. That scheme is pretty standard, but we’ve left out a very important thing that needs to get nailed down before we start. Namely, what is urine therapy good for? If we look for ill-defined health benefits we almost certainly will be able to p-hack the results afterwards in order to find some, so we have to be crystal clear up front about what it is that drinking urine might do for us. If there’s more than one potential health benefit, we might just have to run the experiment again with another whole set of testees. Don’t worry, Natty Light is cheaper than water. So when somebody has a fancy new bit of Python code for financial modeling or whatever, we’re going to insist that they show that it works. The burden of proof lies with them. We are appropriately skeptical of their claims, and the more grandiose their claims are the more skeptical we are. That’s just good manners, after all. And since we’re afraid of p-hacking, we’re going to insist that a specific benefit be described. We might even phrase that as, “What is it good for?” Obviously we’re not going to buy software unless it solves a need, and we’re not dumb enough to believe that a magic bit of coded algorithm could solve a whole passel of needs, so pick one and we’ll test that. We’ll also need something to compare against, which I suppose could be random guessing but probably is going to be some market average or what an above-average hedge fund can do or whatever.1 Then we back test, but we’re going to have to blind things first. That probably means having assistants select slices of time and dices of market sectors and running back tests without knowing whether they’re using the candidate machine learning software or the old stand-by comparison. Other assistants who weren’t involved in running the back tests can then assess the results or if there’s a simple metric of success then that can be used to automatically score the new software against the standard. Of course, some of those back tests should run forward a bit in to the future so that we can get a true test of whether they work or not. If the vendor really believes in their software, they might even be willing to 1 Warren
Buffet would say compare to some Vanguard low-load index funds.
302
M. K. Hinders and S. L. Kirn
agree to stake a bet on predictions a fair chunk into the future. If they’re right, you’ll buy more software from them. If they’re wrong, you get your money back and they have to drink their own urine every morning for month.
9.2 Once You Eliminate the Possible, Whatever Remains, No Matter How Probable, Is Fake News [26] Harry Houdini was a mamma’s boy. He loved his mommy. It was awkward for his wife. Then his mommy died, and he wanted nothing more than to talk to her one last time to make sure she was all right and knew that he still loved her. Eternally awkward for his wife. I know what you’re thinking, so here’s Houdini in his own words, “If there ever was a son who idolized and worshipped his Mother, that son was myself. My Mother meant my life, her happiness was synonymous with my peace of mind.” Notice that it’s capital-M. Houdini was perhaps the world’s most famous performer. He may have been the most talented magician ever. He was also an amazing physical specimen, and many of his escapes depended on the rare combination of raw talent, physical abilities, and practice, practice, practice. His hotness also helped with at least half of his audience, because of course he had to be half-clothed to prove he wasn’t hiding a key or whatnot. That may have been awkward for both Mrs. Houdinis. Sir Arthur Conan Doyle was Houdini’s frenemy. He thought that Houdini wasn’t abnormally talented physically, but that he had the paranormal ability to disapparate his physical body from inside the locked chest under water, or whatever, and then re-apparate it back on land, stage, etc. I suppose that could explain all manner of seemingly impossible escapes, but geez. Doyle’s mind was responsible for Sherlock Holmes, who said that when you have eliminated the impossible, whatever remains, however improbable, must be the truth. Doyle apparently didn’t allow for the possibility of Houdini’s rare combination of raw talent, physical abilities, and practice, practice, practice. He did, however, believe in fairies. The cardboard cutout kind. Not in person, of course, but photos. Notice in Fig. 9.2 that the wings of the hovering fairies are captured crisply but the waterfall in the background is blurred. That’s not an Instagram filter, because this was a century ago. Otherwise, those little girls had a pretty strong selfie game. It’s always been fun for kids to fool adults. Kate and Margaret Fox were two little girls in upstate New York in 1848 who weren’t tired even though it was their bedtime, so they were fooling around. When their mom said to “quiet down up there” they said, of course, that it wasn’t them. It was Mr. Slipfoot. A spirit. They asked him questions and he responded in a sort of Morse code by making rapping noises. Had momma Fox been less gullible she might have asked the obvious question about ghosts and spirits, “If they can freely pass through walls and such, how can they rattle a chain or make a rapping noise? Now go to sleep, your father and I worked hard all day and we’re tired.”
9 Cranks and Charlatans and Deepfakes
303
Fig. 9.2 So you see, comrade, the Russians did not invent fake news on social media
They didn’t quiet down, because they had merely been playing all day and weren’t tired at all. When their mom came to tell them one last time to hush they had an apple tied to a string so they could make the rapping noises with her standing right there. Mr. Slipfoot backed up their story. Momma Fox was amazed. She told the neighbors. They were similarly amazed, and maybe even a bit jealous because their own special snowflakes hadn’t been singled out by the Spirits to communicate on behalf of the living. How long ‘till I die, and then what? Everybody ever has had at least some passing curiosity about that. Kate and Margaret had a married sister who saw the real potential. She took them on tour, which they then did for the rest of their lives. Margaret gave it up for a while and admitted it was all trickery, but eventually had to go back to performing tricks to make a living.2 What amazes me is that as little kids they figured out how to make the raps by swishing their fingers and then their toes, and do it loudly enough to make a convincing stage demonstration. Margaret says, “The rappings are simply the result of a perfect control of the muscles of the leg below the knee, which govern the tendons of the foot and allow action of the toe and ankle bones that is not commonly known.” That sounds painful. Margaret continues, “With control of the muscles of 2 Rapping
noises, not those tricks you perv.
304
M. K. Hinders and S. L. Kirn
the foot, the toes may be brought down to the floor without any movement that is perceptible to the eye. The whole foot, in fact can be made to give rappings by the use only of the muscles below the knee. This, then is the simple explanation of the whole method of the knock and raps.” My knee sometimes makes these noises when I first get out of bed in the morning. Look up gullible in an old thesaurus and you’ll find its antonym is Houdini. You’ll also find his picture next to the definition of momma’s boy in most good dictionaries from the roaring twenties. To quote one final time from his book [27] “I was willing to believe, even wanted to believe. It was weird to me and with a beating heart I waited, hoping that I might feel once more the presence of my beloved Mother.” Also, “I was determined to embrace Spiritualism if there was any evidence strong enough to down the doubts that have crowded my brain for the last thirty years.” As Houdini traveled the world performing, he did two things in each town. First, he gave money to have the graves of local magicians maintained. Second, he sought out the locally most famous Spiritualist medium and asked her if she could please get in contact with his beloved Mother, which they all agreed they could easily do. Including Lady Doyle, Sir Arthur’s wife. They all failed. Lady Doyle failed on Mother’s birthday. Spiritualist mediums were all doing rather rudimentary magic tricks, which Houdini spotted immediately because it takes one to know one. That pissed him off so much that he wrote the book I’ve been quoting from. To give an idea of the kind of things that Spiritualists were getting away with, imagine what you might think if your favorite medium was found to have your family jewels in his hand.3 You would be within your rights to assume that you were being robbed. But no, it was the Spirits who dematerialized your jewels from your safe and rematerialized them into the hands of the medium while the lights were off for the séance. It’s proof that the spirits want your medium to have your valuables, and if you call the police that will piss them off and you’ll never get the spirits to go find Mother and bring her around to tell you that she’s been thinking of you and watching over you and protecting you from evil spirits as any good Mother would do for her beloved son. Most people remember that Houdini left a secret code with his wife, so that after he died she could ask mediums to contact him and if it was in fact the Great Houdini he could verify his presence with that code. What most people don’t know is that Houdini also arranged codes with other close associates who died before he did. No secret codes were ever returned. If anybody could escape from the great beyond it would be the world’s (former) greatest escape artist. So much for Spiritualism. But wait, just because Houdini’s wife wasn’t able to contact her husband after he died, that doesn’t mean anything. Like Mother would let him talk to that woman now that she had him all to herself, for eternity.
3 Diamonds
and such, you perv. Geez.
9 Cranks and Charlatans and Deepfakes
305
9.3 Foo Fighters Was Founded by Nirvana Drummer Dave Grohl After the Death of Grunge The term “flying saucer” is a bit of a misnomer. When private pilot Kenneth Arnold kicked off the UFO craze [28] by reporting unidentifiable flying objects in the Pacific Northwest in 1947, he said they skipped like saucers, not that they were shaped like saucers. It’s kind of a shame that he didn’t call them flapjacks, because there actually was an experimental aircraft back then with that nickname, see Fig. 9.3. Many of the current stealth aircraft are also flying wing designs that look quite a lot like flying saucers from some angles, and some of the newest military drones look exactly like flying pancakes.
Fig. 9.3 The Flying Flapjackwas an experimental U.S. Navy fighter aircraft designed by Charles H. Zimmerman for Vought during World War II. This unorthodox design consisted of a flyingsaucer-shaped body that served as the lifting surface [29]. The proof-of-concept vehicle was built under a U.S. Navy contract and it made its first flight on November 23, 1942. It has a circular wing 23.3 feet in diameter and a symmetrical NACA airfoil section. A huge 16-foot diameter three-bladed propeller was mounted at the tip of each airfoil blanketing the entire aircraft in their slipstreams. Power was provided by two 80 HP Continental engines. Although it had unusual flight characteristics and control responses, it could be handled effectively without modern computerized flight control systems. It could almost hover and it survived several forced landings, including a nose-over, with no serious damage to the aircraft, or injury to the pilot. Recently, the Vought Aircraft Heritage Foundation Volunteers donated over 25,000 labor hours to complete a restoration effort, and the aircraft is on long-term loan from the Smithsonian Institution
306
M. K. Hinders and S. L. Kirn
Fig. 9.4 Most UFO sightings are early aircraft or weather balloons or meteorological phenomena [31]. The planet Venus is commonly reported as a UFO, including by former President Jimmy Carter
So, lots of people started seeing flying saucers and calling up the government to report them. The sightings were collected and collated and evaluated in the small, underfunded Project Bluebook (Fig. 9.4). Then one crashed near Roswell, NM [30] and the wreckage was recovered. Never mind, said the Air Force. “It was just a weather balloon.” Actually it was a top-secret Project Mogul balloon, but they couldn’t say that in 1947 because it was top secret. If a flying saucer had crashed and been recovered, of course it would have been top secret. Duh. In 1947 we were this world’s only nuclear superpower. Our industrial might and fighting spirit (and radar) had defeated the Axis of evil. Now all we had to worry about was our Soviet frenemies. And all those other Commies. But if we could reverse-engineer an interplanetary space craft we would be unstoppable. Unless the aliens attacked, in which case we had better hurry up and figure out all their advanced technology so we can defend ourselves. Ergo, any potentially usable physical evidence from a UFO would be spirited away to top-secret government labs where “top men” would be put to work in our national interest. The key thing to know about government secrecy is that it’s compartmentalized [32–43]. You don’t even know what you don’t know because before you can know, somebody who already knows has to decide that you have a need to know what you don’t yet know. You will then be read into the program and will know, but you can’t ever say what you know to somebody who doesn’t have a need to know. There’s no way to know when things become knowable, so it’s best to make like you don’t know.
For example, Vice President Truman didn’t know about the Manhattan Project [44] because the President and the Generals didn’t think he needed to know. Then when Roosevelt died the Generals had to have a slightly awkward conversation about this
9 Cranks and Charlatans and Deepfakes
307
Fig. 9.5 Trent photos with indications of how they are typically cropped. The Condon Committee concluded,“The object appears beneath a pair of wires, as is seen in Plates 23 and 24. We may question, therefore, whether it could have been a model suspended from one of the wires. This possibility is strengthened by the observation that the object appears beneath roughly the same point in the two photos, in spite of their having been taken from two positions.” and concludes [46] “These tests do not rule out the possibility that the object was a small model suspended from the nearby wire by an unresolved thread”
top-secret gadget that could end the war and save millions of American lives. It’s not too surprising that there’s no good evidence of UFOs, that I know of.... In 1950 Paul and Evelyn Trent took a couple of pictures of a flying saucer above their farm near McMinnville, OH. Without wreckage, pictures are the best evidence, except that in the 1950s this is how deepfakes were done so it’s worth critically assessing both the photos themselves and the behavior of those involved. Life magazine [45] published cropped versions of the photos (Fig. 9.5) but misplaced the original negatives. That’s highly suspicious, because if real it would be a huge story. In those days the fakery typically happened when prints were made from the negatives so careful analysis of the original negatives was key. Another oddity is that because the roll of film in Paul Trent’s camera was not entirely used up they did not have the film developed immediately. You never really knew if your pictures had come out until you had them developed, of course, so you’d think the Trents would take a bunch of selfies to finish out the roll and run right down to the drug store. The Trents said that the pictures were taken at 7:30 pm, but analysis of the shadows shows pretty clearly that it was instead morning. Also, the other objects in the photos—oil tank, bush, fencepost, garage—allow the 3D geometry to be reconstucted. Clearly it’s a small object hanging from the wires and the photos were taken from two different lateral positions, rather than a large distant object zooming past with two photos snapped from the same position while panning across as the object passed by. Despite being thoroughly debunked, the McMinnville UFO photographs remain perhaps the best publicized in UFO history. They’re most convincing if the overhead wires are cropped out of the pictures. Rex Heflin’s UFO photos are at least as famous as the Trent’s (Fig. 9.6) and again it’s the behavior of the photographer that is most of interest. First of all, he is the
308
M. K. Hinders and S. L. Kirn
Fig. 9.6 In 1965 an Orange County, CA highway inspector named Rex Heflin [47] snapped three close-up photos through the windows of his truck of a low-flying UFO with the Polaroid camera he carried for work. They clearly show a round, hat-like object with a dark band around its raised superstructure, and in the side mirror you can see the line of telephone poles it was hanging from
only one who saw it, but a large UFO in the distance would have been above one of the busiest freeways in southern California at noon on a weekday. He originally produced three Polaroids, then later came up with a fourth. Rather than providing the originals for analysis, he said the “men from NORAD” took them so all he had was copies. He had snapped the photos from inside the cab of is pickup, so the UFO is framed by that, and the telephone poles are only visible accidentally in the side mirror of one of the photos. Oops. Both the Trent and the Heflin photos have been analyzed by experts on both sides. Ufologists declared them genuine. Others, not so much. Or any. Just, nope. Double nope. And these are the best UFO photos in existence, something both sides agree to stipulate. So there’s that. What we needed was a controlled test of this sort of thing. What you might call a hoax. It turns out that ufologists are fun to pwn. The Trent and Heflin episodes are what I call shallowfakes. Warminster isn’t quite a deepfake, but it’s at least it’s past your navel. In 1970 Flying Saucer Watch published photographs taken in March of that year at Cradle Hill in Warminster, UK showing a UFO at night. This was a place where ufologists would gather together in order to watch the skies and socialize. It was before the Internet. Mixed in with the enthusiasts were a couple of hoaxers, one of whom had a camera and excitedly took a picture or three of a UFO across the way. Figure 9.7 shows one of those photos and also what they actually were looking at. Two more of the Warminster photos are shown in Fig. 9.8. Part of the hoax was to shine the purple spotlight and pretend to take a picture of it and then see if anybody noticed the difference between the UFO in the photo and what they all saw blinking purple on and off on the other hill. The second part of the test was that some of the
9 Cranks and Charlatans and Deepfakes
309
Fig. 9.7 A UFO over Cradle Hill in Warminster, UK. The photo on the left was pre-exposed with an indistinct UFO shape. The photo on the right shows what the ufologists were actually seeing. Note that the UFO in the photo, which indicated by an arrow, isn’t a purple spotlight
Fig. 9.8 Two photos of the scene of the UFO over Cradle Hill in Warminster, UK, except that these pictures were taken on different nights (with different patterns of lights) and from different enough perspectives (streetlight spacings are different) that ufologists investigating the sightings should have noticed
photos were taken on a different night, so the pattern of lights in the pictures was a bit different. Nobody noticed that either. The hoaxers let the experiment run for a couple of years and with some amusement watched a series of pseudo-scholarly analyses pronounce the photos genuine. See for example [48]. At this point you might want to refer back to the Spectral Intermezzo chapter. That was intended to be a bit silly, of course, but now that we’ve talked explicitly about cranks/charlatans and hoaxes/pwns, you should be better able to spot such obvious attempts to apply machine learning lingo to pseudoscientific phenomena. You might also appreciate Tim Minchin’s 9-min beat poem, Storm [49] where he notes that Scooby Doo is like CSI for kids.
310
M. K. Hinders and S. L. Kirn
9.4 Digital Imaging Is Why Our Money Gets Redesigned so Often A picture is worth a thousand words, so in 2002 the website worth1000.com started having photoshopping contests. Go ahead and pick your favorites from Figs. 9.9, 9.10, and 9.11. I may have done the middle one. We’ve all gotten used to the idea that most images we see in print have been manipulated, but people didn’t really pay attention until Time magazine added a tear to an existing photo of the Gipper [51]. First there were Instagram filters (Fig. 9.12) and now Chinese smartphones come with automatic image manipulation software built in, and it’s considered bad manners to not “fix” a friend’s pic before you upload a selfie. We are now apparently in the age of Instagram Face [52]. “It’s a young face, of course, with poreless skin and plump, high cheekbones. It has catlike eyes and long, cartoonish lashes; it has a small, neat nose and full, lush lips.” So now we’re at this strange moment in technology development when seeing may or may not be believing [54–58]. Figures 9.9 and 9.12 are all in good fun, but you just have to get used to the idea that the news isn’t curated text delivered to your driveway in the morning by the paperboy anymore. The news is images that come to that screen you’re holding all the damn time and scrolling through when you should be watching where you’re going. “Powerful images of current events, of controversies, of abuses have been an important driver of social change and public policy. If the public, if the news consuming, image consuming, picture drenched public loses confidence in the
Fig. 9.9 Three of the five stooges deciding the fate of post-war Europe [50]. Shemp and Curley–Joe are not pictured. Joe Besser isn’t a real stooge, sorry Stinky
9 Cranks and Charlatans and Deepfakes
311
Fig. 9.10 UFO over William and Mary took no talent and almost no time. The lighting direction isn’t even consistent between the two component images
ability of photographers to tell the truth in a fundamental way, then the game is up.” [59] Fake news on social media seems to be able to influence elections. Yikes! At some level we’re also going to have to get used to the idea that any given picture might be totally fake. I’m willing to stipulate for the record that there is no convincing photographic evidence of alien visitation. The best pictures from the old days were faked, and ufologists showed themselves incapable of even moderate skepticism during the Warminster pwn.4 It would be trivially simple to fake a UFO selfie these days. But what about video? It’s bad news. Really bad news, for Hermione Granger in particular.
4I
said pwn here, but it was a reasonably well-controlled prospective test of ufologists.
312
M. K. Hinders and S. L. Kirn
Fig. 9.11 This took some real talent, and time [50]. Even today, I would expect it to win some reddit gold even though I’m not entirely sure what that means
Fig. 9.12 This Guy Can’t Stop Photoshopping Himself Into Kendall Jenner’s Instagram Pics, and it’s hilarious because she has someone to’Shop her pics before she uploads them [53]
We probably should have seen it coming, though. In 2007 a series of videos hit the interwebs showing UFOs over various cities around the world [60, 61] made by a 35-year-old professional animator who had attended one of the most prestigious art schools in France and brought a decade of experience with computer graphics and commercial animation to the party. It took a total of 17 h to make the infamous Haiti and Dominican Republic videos, working by himself using a MacBook Pro and a suite of commercially available three-dimensional animation programs. The videos are 100% computer-generated, and may have been intended as a viral marketing ploy in which case it worked, since they racked up tens of millions of views on YouTube [62]. “No other footage of a UFO sailing through the sky has been watched by as many people.”
9 Cranks and Charlatans and Deepfakes
313
9.5 Social Media Had Sped up the Flow of Information Deepfake is the muggle name for putting the face of a celebrity onto another body in an existing video. The software is free. It runs on your desktop computer. It’s getting better fast enough that soon analysis of the deepfake video footage won’t be able to tell the difference [63–68]. One strategy is to embed watermarks or some sort of metadata into the video in order to tell if it’s been altered [69]. Some young MBA has probably already declared confidently via a thick slide deck that Blockchain is the answer. YouTube’s strategy [70] is to “identify authoritative news sources, bring those videos to the top of users’ feeds, and support quality journalism with tools and funding that will help news organizations more effectively reach their audiences. The challenge is deciding what constitutes authority when the public seems more divided than ever before on which news sources to trust—or whether to trust the traditional news industry at all.” Recall the lesson from the Trents and Rex Heflin that we should look to the behavior of the photographer to assess the reliability of their photographs. Are they cranks for charlatans? I forgot to mention it before, but the Trents were repeaters who had previously faked UFO photos. Those early attempts didn’t go viral via Life magazine, of course, but in my rulebook any previous faking gets you a Pete Rose style lifetime ban. Any hint of schenanigans gets you banished. We can’t afford to play around anymore, so if you’ve been photoshopping yourself into some random Kardashian’s social media we don’t ever believe your pics even if you’re trying to be serious this time. Content moderation at scale isn’t practical, but Twitter can be analyzed in bulk to see what people are tweeting about and if you have enough tweets, topic modeling can go much further than keeping track of what’s trending. Tweetstorms are localized in time, so trending isn’t really an appropriate metric. Trends are first derivatives. We should at least look at higher derivatives, don’t you think? Second derivatives give rate of change, or acceleration. That would be better. How fast something trends and then detrends should be pretty useful. Our current @POTUS has figured out that tweeting something outrageous just often enough distracts the mainstream media quite effectively and allows the public narrative to be steered. The mainstream media even use up screentime on their old-fashioned newscasts to read for oldsters the things @POTUS just tweeted. They also show whatever viral video trended on the moms’ Facebook a couple of days ago and had been seen by all the digital natives a couple of days before that. These patterns can be used to identify deepfakes before they make it to the screens of people who are too used to believing their own eyes. It’s not the videos per se that we analyze, but the pattern of spread of the videos and most importantly the cloud of converstations that are happening in the cloud during that spread. Social media platforms come and go on a relatively short time scale, so the key is to be able to analyze the text contained in whatever platform people are using to communicate and share. We’re talking about Twitter, but the method described below can be adapted to a wide variety of communication modalities, including voice because speech-to-text already works pretty well and is getting better.
314
M. K. Hinders and S. L. Kirn
Fig. 9.13 A group of squirrels that isn’t a family unit is called a scurry. A family of squirrels is called a dray. Most squirrels aren’t social, though, so they normally don’t gather in a group. Oh look: pizza [71, 72]
Unless you downloaded this chapter first and haven’t read the rest of this book, you probably can guess what I’m about to say: dynamic wavelet fingerprints. Tweetstorms have compact support, by which I mean that they start and stop. Tweetstorms have echoes via retweeting and such. Wavelets are mathematically suited to analyzing time-domain signals that start and stop and echo in very complicated ways. Wavelet fingerprints allow us to identify patterns in precisely this sort of complex time-domain signal. First we have to be able to extract meaning from tweetstorms, but that’s what topic modeling does. Storms of tweets about shared cultural events will be shaped by the natural time scale of the events and whether people are watching it live or on tape delay or are waking up the next morning and just then seeing the results. There should be fingerprints that allow us to identify these natural features. Deepfakery-fueled tweetstorms and echoes should leave fingerprints that allow us to identify un-natural features. It should be possible to flag deepfakes before they make it to the moms’ Facebook feeds and oldsters’ television newscasts (Fig. 9.13).
9.6 Discovering Latent Topics in a Corpus of Tweets In an age where everyone is rushing to broadcast opinions and inanities and commentaries on the Internet, any mildly notable event can sometimes trigger a tweetstorm. We define a tweetstorm as any surge in a specific topic on social media, where tweet is used in a generic sense as a label for any form of unstructured textual data posted to the interwebs. Since tweetstorms generate vast amounts of text data in short amounts of time for text analytics, this gives us the ability to dissect and analyze Internet chatter in a host of different situations. Companies can analyze transcribed phone calls from their call centers to see what customers call about most often, which is exactly what happens when they say “this call may be monitored for quality control pur-
9 Cranks and Charlatans and Deepfakes
315
poses.” Journalists can analyze tweets from their region to determine which stories are the most interesting to their audience and will likely have the greatest impact on the local community. Political campaigns can analyze social media posts to discern various voters’ opinions on issues and use those insights to refine their messages. Topic modeling is an extraordinarily powerful tool that allows us to process largescale, unstructured text data to uncover latent topics in what would otherwise be obscured from human analysis. According to the Domo Data Never Sleeps study, nearly 475,000 tweets are published every minute [73]. Even with this firehose of data continually streaming in, it is possible to run analysis in real time to examine the popular topics of online conversation at any given instant. Using unsupervised machine learning techniques to leverage large-scale text analytics allows for the discovery of hidden structures, groupings, and themes throughout massive amounts of data. Although humans possess a natural ability to observe patterns, manual analysis of huge datasets is highly impractical. To extract topics from Twitter we have explored a range of topic modeling and document clustering methods including non-negative matrix factorization, doc2vec, and k-means clustering [74].
9.6.1 Document Embedding Document and word embeddings are instrumental in running any kind of natural language modeling algorithm because they produce numerical representations of language, which can be interpreted by a computer. There are two common methods for creating both document and word embeddings: document-term methods and neural network methods. Document-term methods are the simplest and most common ways to create document embeddings. Each of these produces a matrix with dimensions m × n, where m is the total number of documents in the corpus being analyzed and n is the total number of words in the vocabulary. Every entry in the matrix represents how a word in the vocabulary is related to each document. A weight is assigned to each entry, often according to one of three common weighting styles. The first of these is a simple binary weighting where a 1 is assigned to all terms that appear in the corresponding document and a 0 is assigned to all other terms. The frequency of each term appearing in a specific document has no bearing on the weight, so a term that appears once in the text has the same weight as a term that appears 100,000 times. The second method, the term frequency weighting method, accounts for the frequent or infrequent use of a term throughout a text. Term frequency counts the occurrences of each term in a document and uses that count as the assigned weight for the corresponding entry of the document-term matrix. The last weighting style is term frequency-inverse document frequency (TF-IDF). This weighting scheme is defined as N +1 , (9.1) t f -id f t,d = t f t,d × log d ft
316
M. K. Hinders and S. L. Kirn
where t f -id f t,d is the weight value for a term t in a document d, t f is the number of occurrences of term t in document d, N is the total number of documents in the collection, and d f t is the total number of documents the term t occurred in. The TFIDF weight for each term can be normalized by dividing each term by the Euclidean norm of all TF-IDF weights for the corresponding document. This normalization ensures that all entries of the document-term matrix are between zero and one. The TF-IDF weighting scheme weights terms that appear many times in a few documents higher than terms that appear many times in many documents, allowing for more descriptive terms to have larger weights. Document-term methods for word and document embeddings are simple and effective, but they can occasionally create issues with the curse of dimensionality [75] since they project language into spaces that may reach thousands of dimensions. Not only do such large embeddings cause classification issues, but they can cause memory issues because the document-term matrix might have tens or hundreds of thousands of entries for even a small corpus and several millions for a larger corpus. Since large document-term embeddings can lead to issues downstream with uses of document and word vectors, it is beneficial to have embeddings in lower dimensions. This is why methods like word2vec and doc2vec, developed by Le and Mikolov [76, 77], are so powerful. These methods work by training a fully connected neural network to predict context words based on an input word or document. The weight matrices created in the training stage contain a vector for each term and document in the training set. Each of these vectors is close to semantically similar terms and documents in the embedding space. The greatest benefits of using neural network embedding methods are that we can control the size of the embedding vectors, and that the vectors are not sparse. This allows vectors to be passed to downstream analysis methods without encountering the issues of large dimension size and sparsity that commonly arise in document-term methods. However, they require significantly more data to train a model. While we can make document-term embeddings from only a few documents, it takes thousands or perhaps millions of documents and terms to accurately train these embeddings. Furthermore, the individual dimensions are arbitrary, unlike document-term embeddings where each dimension represents a specific term. Where sufficient training data does exist, doc2vec has been proven to be adept at topic modeling [78–81].
9.6.2 Topic Models Clustering is the generic term used in machine learning for grouping objects based on similarities. Soft clustering methods allow objects to belong to multiple different clusters and have weights associated with each cluster, while hard clustering assigns objects to a single group. Matrix decomposition algorithms are popular soft clustering methods because they run quickly and produce interpretable results. Matrix methods for topic modeling are used for document-term embedding methods, but they are useless for more sophisticated embedding methods like doc2vec because
9 Cranks and Charlatans and Deepfakes
317
Fig. 9.14 Decomposition of a document-term matrix, A, into a document-topic, W , and a topicterm matrix, H
their dimensions in these are arbitrary. However, there has been some work on implementing non-negative matrix factorization (NMF) on word co-occurrence matrices to create word and document embeddings like those generated by word2vec and doc2vec [80, 82]. Non-negative matrix factorization (NMF) was first proposed as a topic modeling technique by Xu et al. [83]. NMF is a matrix manipulation method that decomposes an m × n matrix A into two matrices A = W H,
(9.2)
where W is an m × k matrix, H is a k × n matrix, and k is a predefined parameter that represents the number of topics. Decomposing a large matrix into two smaller matrices (Fig. 9.14) is useful for analyzing text from a large corpus of documents to find underlying trends and similarities. The matrix A is a document-term matrix, where each row represents one of the m documents in the corpus and each column represents one of the n terms. The decomposition of A is calculated by minimizing the function L =
1 || A − W H||, 2
(9.3)
where both W and H are constrained to non-negative entries [83] and L is the objective function to be minimized. The matrix norm can be any p-norm, but is usually either the 1 or 2 norm or a combination of the two. In topic modeling applications, k is the number of topics to be extracted from the corpus of m documents. Thus, the matrix W is referred to as the document-topic matrix. Each entry in the documenttopic matrix is the weight of a particular topic in a certain document, where a higher weight indicates the prevalence of that topic throughout the corresponding document. The matrix H is the topic-term matrix. Weights in H represent the relevance of each term to a topic.
318
M. K. Hinders and S. L. Kirn
A popular extension of NMF is as a dynamic topic model (DTM) which tracks topics through time updating the weights of terms in topics and adding new topics when necessary [84]. One of the original DTMs was actually derived for latent dirichlet allocation (LDA) [85]. Several popular DTM extensions of NMF include [86–88]. Principal component analysis (PCA) is a common method for dimension reduction. PCA uses eigenvectors to find the directions of greatest variance in the data and uses those as features in a lower dimensional space. In topic modeling, we can use these eigenvectors as topics inherent in the dataset. PCA operates on the TF-IDF document-term matrix A. We calculate the covariance matrix of A by pre-multiplying it by its transpose C=
1 AT A, n−1
(9.4)
which gives a symmetric covariance matrix with dimensions of the total vocabulary 1 term acting as a normalizer. The covariance matrix shows how often each with the n−1 word co-occurs with every other word in the vocabulary. We can then diagonalize it to find the eigenvectors and eigenvalues C = E D E −1 ,
(9.5)
where the matrix D is a diagonal matrix with eigenvalues as the entries and E is the eigenvector matrix. The eigenvectors have shape 1 × V , the total number of terms in the vocabulary. We select the eigenvectors that correspond to the largest eigenvalues as our principal components and eliminate all others. These eigenvectors are used as the topics and the original document-term matrix can now be recast into a lower dimensional document-topic space. The terms defining a topic are found by selecting the largest entries in each eigenvector. Singular value decomposition (SVD) uses the same process as PCA, but it does not rely on the covariance matrix. In the context of topic modeling SVD is also referred to as latent semantic indexing (LSI) [89] which decomposes matrix A into A = UΣ V T ,
(9.6)
where Σ is the diagonal matrix with singular values along its entries. The matrix U represents the document-topic matrix and the matrix V is the topic-term matrix. The largest values of Σ represent the most common topics occurring in the original corpus. We select the largest values and the corresponding vectors from the topic-term matrix as the topics. Latent dirichlet allocation (LDA) is a probabilistic model which assumes that documents can be represented as random mixtures over latent topics, where each topic is a distribution of vocabulary terms [90]. LDA is described as a generative process which draws a multinomial topic distribution θd =Dir(α), where α is a corpus-level scaling parameter, for each document, d. Then for each word wn ∈ {w1 , w2 , . . . , w N },
9 Cranks and Charlatans and Deepfakes
319
where N is the total number of terms in d, we draw one of k topics, z d,n , from θd and choose a word wd,n from p(wn |z n , β). This probability is a multinomial distribution of length V , where V is the number of terms in the total vocabulary. In this algorithm α and β are corpus-level parameters that are fit through training. The parameter α is a scaling parameter and β is a k × V matrix representing the word probabilities in each topic. β can be compared to the document-topic matrix observed in prior methods. The k-means algorithm seeks to partition an n-dimensional dataset into a predetermined number of clusters, where the variance of each cluster is minimized. The data are initially clustered randomly, and the mean of the cluster is calculated. These means, called the centroids and denoted μ j , are used to calculate the variance in successive iterations of clusters. After the centroids have been calculated, the distance between each element of the dataset and each centroid is computed, and samples are re-assigned to the cluster with the closest centroid. New centroids are then calculated for the updated clusters. This process continues until the variance for each cluster reaches a minimum, and the iterative process has converged. Minimization of the within-cluster sum-of-squares k n i=1 j=1
min (||xi − μ j ||2 )
μ j ∈C
is the ultimate goal of the k-means algorithm [91]. Here, a set of n datapoints x1 , x2 , . . . , xn is divided into k clusters, each with a centroid μ j . The set of all centroids is denoted C. Unfortunately, clustering becomes difficult since the concept of distance in very high-dimensional spaces is poorly defined. However, k-means clustering often remains usable because it evaluates the variance of a cluster rather than computing the distance between all points in a cluster. It is generally useful to reduce the dimension using SVD or PCA before applying clustering to document-term embeddings, or to train a doc2vec model and cluster the resulting lower dimension embeddings. Evaluation of our topic models requires a way to quantify the coherence of each topic generated. Mimno et al. suggest that there are four varieties of “bad” topics that may arise in topic modeling [92]: 1. Chained: Chained topics occur when two distinct concepts arise in the same topic because they share a common word. For example, in a corpus of documents containing texts about river banks and financial banks, a model might generate a topic containing the terms “river”, “financial”, and “bank”. Although “river bank” and “financial bank” are two very different concepts, they still appear in the same topic due to the shared word “bank”. 2. Intruded: Intruded topics contain sets of distinct topic terms that are not related to other sets within that topic. 3. Random: Random topics are sets of terms that make little sense when strung together.
320
M. K. Hinders and S. L. Kirn
4. Unbalanced: Unbalanced topics have top terms that are logically connected, but also contain a mix of both general terms and very specific terms within the topic. We want to be able to locate “bad” topics automatically, as opposed to manually sifting through and assigning scores to all topics. To accomplish this, Mimno et al. propose a coherence score C(t, V (t) ) =
M m−1
log
m=2 l=1
D(vm(t) , vl(t) ) + 1 D(vl(t) )
,
(9.7)
where t is the topic, V (t) = [v1(t) · · · v(t) M ] is the list of the top M terms in topic t, D(vl(t) ) is the document frequency of term vl(t) , and D(vm(t) , vl(t) ) is the co-document frequency of the two terms vm(t) and vl(t) [92]. The document frequency is a count of the number of documents in the corpus in which a term appears, and the co-document frequency is the count of documents in which two different terms appear. A 1 is added in the numerator of the log function to avoid the possibility of taking the log of zero. For highly coherent topics, the value of C will be close to zero while for incoherent topics this value will become increasingly negative. The only way for C to be positive is for the terms vm(t) to only appear in documents that also contain vl(t) .
9.6.3 Uncovering Topics in Tweets As an example showing the application of topic modeling to real Twitter data, we have scraped a set of tweets from the 2018 World Cup in Russia and extracted the topics of conversation. Soccer is the most popular sport worldwide, so the World Cup naturally creates tweet storms whenever matches occur. Beyond a general interest in the event, national teams and television coverage push conversation on social media by promoting and posting various hashtags during coverage. Two of the common World Cup hashtags were #WorldCup and #Russia2018. Individual games even had their own hashtags such as #poresp for a game between Portugal and Spain or the individual hashtags for the teams #por #esp. Along with the continued swell of general online activity, the rise of 5G wireless technology will facilitate online chatter at stadiums, arenas, and similar venues. KT Corp., a South Korean telecom carrier, was the first to debut 5G technology commercially at the 2018 Winter Olympics. 5G wireless will allow for increased communication at speeds reaching up to 10 gigabits per second, even in areas with congested networks [93]. Furthermore, 5G will allow for better device-to-device communication. For text analysis, this means that more people will have the ability to post thoughts, pictures, videos, etc. during events and might even be able to communicate with event officials through platforms like Twitter to render a more interactive experience at events like the World Cup. This will create a greater influx
9 Cranks and Charlatans and Deepfakes
321
of text data which event coordinators can use to analyze their performance and pinpoint areas that need improvement. Twitter can be accessed through the API with a Twitter account, and data can be obtained using the Tweepy module. The API provides the consumer key, consumer secret, access token, and access token secret, sequences necessary to retrieve Twitter data. Once access to the Twitter API has been established, there are two ways to gather tweets: one can either stream them in real time, or pull them from the past. To stream tweets in real time we use the tweepy.Stream method. To pull older tweets we use the tweepy.Cursor method, which also provides an option to search for specific terms throughout Twitter using the tweepy.API().search function and defining query terms in the argument using q=‘query words’. Further information about streaming, cursoring, and other related functions can be found in the Tweepy documentation [94]. We pulled tweets from the group stage of the 2018 World Cup by searching for posts containing the hashtag #WorldCup. These tweets were published between June 19–25, 2018. Our dataset is composed of about 16,000 tweets—a small number relative to the total volume of Twitter, but a sufficient amount for this proof-of-concept demonstration.
9.6.4 Analyzing a Tweetstorm We have found that NMF produces better results for our purposes than the other topic modeling methods we have explored, so the remainder of this chapter relies on the output of Sci-kit Learn’s NMF implementation [91]. However, the same analysis could be carried out on any topic model output. Our goal is to obtain the most coherent set of topics possible by varying the number of extracted topics, k. Testing k values ranging from 10 to 25, we find that the most coherent topics are produced when k = 17. The topics are as follows: • Topic 0: russia2018 worldcup, coherence score: −1.099 • Topic 1: memberships escort discounts google website number news great lasele bitcoin, coherence score: −1.265 • Topic 2: 2018 russia, coherence score −1.386 • Topic 3: world cup russia win, coherence score: −0.510 • Topic 4: mex (Mexico) ger (Germany) germex mexicovsalemania, coherence score: −1.449 • Topic 5: arg (Argentina) isl (Iceland) argisl 11, coherence score: −0.869 • Topic 6: nigeria croatia cronga nga (Nigeria) cro (Croatia) supereagles win match soarsupereagles 20, coherence score: −0.998 • Topic 7: worldcup2018 worldcuprussia2018 rusya2018 rusia2018, coherence score: −0.992 • Topic 8: mexico germany germex game, coherence score: −1.610
322
M. K. Hinders and S. L. Kirn
• Topic 9: bra (Brazil) brazil brasui switzerland brasil neymar 11 copa2018 coutinho, coherence score: −1.622 • Topic 10: fifa football, coherence score: −1.386 • Topic 11: vs serbia, coherence score: −1.099 • Topic 12: argentina iceland 11, coherence score: −0.903 • Topic 13: england tunisia eng (England) threelions football tun (Tunisia) engtun tuneng go tonight, coherence score: −1.485 • Topic 14: fraaus france australia 21 match pogba, coherence score: −1.196 • Topic 15: fifaworldcup2018 fifaworldcup soccer russia2018worldcup fifa18, coherence score: −0.439 • Topic 16: messi ronaldo penalty argisl lionel miss, coherence score: −1.009 Overall, these topics are intelligible and there are no obvious chained, intruded, unbalanced, or random topics. Even the least coherent topic, topic 9, still makes good sense, with references to Brazil and Switzerland during their match, Neymar and Coutinho are both star Brazilian players. There are a few topics that overlap, such as topics 4 and 8 that both discuss the Germany–Mexico match, and topics 5, 12, and 16 which all discuss the Argentina–Iceland match. Since these particular matches were highly anticipated, it is not surprising that they prompted enough Twitter activity to produce several topics. Tweets are significantly shorter than other documents we might run topic modeling on, such as news articles, so there are far fewer term co-occurrences. In our dataset, a portion of the tweets only reference a single occurrence in a match. As a result, some of our extracted topics are extremely specific. For example, topic 16 seems to be primarily about Lionel Messi missing a penalty kick, while topic 5 is about the Argentina–Iceland game in general. One distinction between topics 5 and 12 seems to be the simple difference in how the game is referenced—the tweets represented by topic 5 use the hashtag shortenings “arg” and “isl” for Argentina and Iceland, while the tweets generating topic 12 use the full names of the countries. A similar difference is seen in topics 4 and 8 referencing the Germany–Mexico match. Topics 6, 9, 13, and 14 all also reference specific games that occurred during the time of data collection. Beyond individual games we also see topics that reference the hashtags that were used during the event and how these hashtags overlap in the dataset. Specifically topics 0, 7, 15 all show different groupings of hashtags. Topic 0 is by far the most popular of these hashtags with over 9000 tweets referencing it, while topics 7 and 15 each have fewer than 2000 tweets. This makes sense because we searched by #WorldCup so most tweets should have a reference to it by default.5 These were also the two hashtags that were being promoted by the traditional media companies covering the event. Although topics extracted from such a dataset provide insight, they only tell a fraction of the story. With enough tweets, we can use time-series representations to 5 It
should be noted here that by definition all tweets have the hashtag #WorldCup because that was our query term. However, many tweets that push the character limit are truncated through the API. If #WorldCup appears in the truncated part of the tweet is does not show up in the information we pull from the API.
9 Cranks and Charlatans and Deepfakes
323
Fig. 9.15 Time-series representation of the World Cup dataset. Each point in the time series represents the total number of tweets occurring within that given five-minute window. All times are in UTC
gain a deeper understanding of the topic and a way to measure a topic’s relevance by allowing us to identify various trending behaviors. Did a topic build slowly to a peak the way a grassroots movement might? Did it spike sharply indicating collective reaction to a singular event might? These questions can be answered with time-series representations. To create a time series from topics, we use the topic-document matrix, W , and the publish time of each tweet. In our case the tweets in W are in chronological order. Figure 9.15 shows the time-series representation of the full dataset, f. Each data point in this figure represents an integer number of tweets that were collected during a five-minute window. We first iterate through the rows of W , collecting all the tweets that correspond to one single-time window, and count the number of nonzero entries for each topic. Next, W is normalized across topics so that each row represents a distribution across topics. We then convert W to a binary matrix, Wbi n , by setting every entry in W greater than 0.1 to 1, and sum across each column to get a count for each topic i 0 +f(t) Wbi n (i, k), (9.8) G(t, k) = i=i 0
where G is a T × k matrix representing the time series for each of the discovered topics, t = 1 . . . T , T is the total time of the time series f, and i 0 is calculated as i0 =
t−1
f( j).
(9.9)
j=0
The result is a matrix G, where each column gives the time-series representation of a topic. Two examples of topic progression over time can be seen in Fig. 9.16. This figure shows the time-series plots of World Cup topics 4 and 5, which discuss the Germany–Mexico match and the Argentina–Iceland match, respectively. Topic 4 spikes shortly after the match started at 15:00 UTC, with additional peaks at 16:50
324
M. K. Hinders and S. L. Kirn
Fig. 9.16 Time-series representation for topics 4 and 5. Topic 4 is about the Germany–Mexico match, that occurred on June 17, and topic 5 is about the Argentina–Iceland match that occurred on June 16
and 15:35. The latter two times correspond to events that occurred during the game: at 15:35 Mexico scored the first goal to take a 1-0 lead, and at 16:50 the game ended, signifying Mexico’s defeat of the reigning World Cup champions. Topic 5 focuses on the Argentina–Iceland game which took place on June 16 at 13:00 UTC. The highest peak occurs at the conclusion of the game, while another peak occurs at the start. This was such a popular game because although Iceland was a major underdog, they were able to play to a tie the star-studded Argentinian team. Two other peaks occur at 13:20 and 13:25, the times when the two teams scored their only goals in the 1-1 tie. Although we are able to identify important characteristics of each topic using this sort of standard time-series representation, we would like to be able to analyze each topic’s behavior at a deeper level. Fourier analysis does not give behavior localized in time and spectrograms are almost never the most efficient representations of localized frequency content. Thus we look to wavelet analysis, which converts a one-dimensional time series into two dimensions with coefficients representing both wavelet scale and time. Wavelet scale is the wavelet analog to frequency, where high wavelet scales represent low-frequency behavior and low-wavelet scales represent high-frequency behavior [95]. The dynamic wavelet fingerprint is a technique developed [96] to analyze ultrasonic waveforms, though it has been found to be an effective tool for a wide array of problems [97–105]. It is also effective for characterizing behaviors exhibited by time series extracted from topic modeling millions of tweets [74]. The key to this method, and why it has been so broadly applicable is that it has the ability to identify behavior that is buried deeply in noise in a way the traditional Fourier analysis is unable to. By casting the time series into higher dimensions and isolating behavior in time we have shown that we can identify common behavior across diverse topics from different tweet storms, even if this behavior was not obvious from looking at their time series. DWFP casts one-dimensional time-series data into a two-dimensional black and white time-scale image. Every mother wavelet creates a unique fingerprints, which
9 Cranks and Charlatans and Deepfakes
325
can highlight different features in the time series [95]. The DWFP is performed on a time series w(t), where t = 1, . . . , T by calculating a continuous wavelet transform C(a, b) =
+∞
−∞
w(t)ψa,b dt,
(9.10)
where a and b represent the wavelet scale and time shift of the mother wavelet ψa,b , respectively. The time shift value, b, represents the point in time the wavelet begins and allows wavelet analysis to be localized in both time and frequency similar to a spectrogram [106]. The resulting ns × T coefficient matrix from (9.10), C(a, b), represents a three-dimensional surface with a coefficient for each wavelet ψa,b , where ns is the total number of wavelet scales. This matrix is then normalized between −1 and 1 (9.11) Cnor m (a, b) = C(a, b)/max(|C(a, b)|). With the normalized surface represented by Cnor m (a, b), a thick contour slice operation is performed to create a binary image I(a, b) such that I(a, b) =
1, s − r2t ≤ Cnor m (a, b) ≤ s + 0, otherwise
rt 2
,
(9.12)
where s represents the center of each of the S total thick contour slices, s = (± 1S , ± 2S , . . . , ± SS ), and r t represents the total width of each slice. This gives a binary matrix, I(a, b), which we call a fingerprint. Figure 9.17 illustrates the steps performed to create a wavelet fingerprint from the topic 4 time series. Shown at the top is the raw time series for topic 4. Noise is removed from the time series using a low-pass filter resulting in the middle image of Fig. 9.17. Once the noise is removed from a time series it can be processed by the DWFP algorithm, which generates the fingerprint shown at the bottom of Fig. 9.17. It should be noted that that shading present in Fig. 9.17 as well as other fingerprints in this chapter is simply for readability. Generally when analyzing fingerprints just the binary image is used. The alternating shades show different objects in a fingerprint. A single object represents either a peak or a valley in the Cnor m matrix, where peaks refer to areas where all entries are positive and valley is an area where all entries are negative. One of the main advantages of using the DWFP for time-series analysis is the ability to create variations in fingerprint representations with different mother wavelets. This is depicted in Fig. 9.18, which shows three different DWFP transformations of the filtered time series in Fig. 9.17. All parameters remain constant—only the mother wavelet is changed. The top image uses the Mexican Hat wavelet, the same as in Fig. 9.17, the middle fingerprint was created using the 4th Daubechies wavelet, and the bottom image was created using the 6th Gaussian wavelet. From these representations we can extract features such as ridge count, filled area, best fit ellipses, etc. Detailed explanations of these features can be found in [107]. Each different repre-
326
M. K. Hinders and S. L. Kirn
Fig. 9.17 Illustration of process to create DWFP of topic 4. Beginning with the raw waveform (top) we run a low-pass filter to filter out the noise. The filtered waveform (middle) is then passed through the DWFP process to create the fingerprint (bottom). The resulting fingerprint gives a twodimensional representation of the wavelet transform. In the above example we used the Mexican Hat wavelet, with r t = 0.12 and s = 5. The different shades show the different objects within the fingerprint
sentation gives a unique description of the behavior of the signal and can be used to identify different characteristics. For example the middle fingerprint in Fig. 9.18 has more activity in low scale wavelets, while the Mexican Hat DWFP shows more low scale information from the signal and a much wider fingerprint. Individual objects in a fingerprint tell a story about the temporal behavior of a signal over a short time. Each object in a fingerprint describes behavior in the time series in a manner that is not possible through traditional Fourier analysis. In [74] we analyzed seven different tweet storms of various volumes and contexts. Some
9 Cranks and Charlatans and Deepfakes
327
Fig. 9.18 Example of the ability to change representations of the same time series by simply changing the mother wavelet. Represented are three DWFP transforms of the filtered time series in Fig. 9.17. The top is the same Mexican Hat representation shown in Fig. 9.17. The middle representation was created using the fourth Daubechies wavelet. The bottom is was created using the 6th Gaussian wavelet
were political tweet storms, while some were focused on major sporting events including this World Cup dataset. Through this analysis, we identified 11 distinct objects that described behavior that differentiated different types of topics. We called these objects characteristic storm cells. Figure 9.19 shows the fingerprints from topics 5 (top), 6 (middle), and 9 (bottom). In each of these topics, as well as the fingerprint for topic 4 at the bottom of Fig. 9.17, the far left object resembles storm cell 8 as found in [74]. We found this storm cell was descriptive of topics that build up to a peak and maintain that peak for some time. Each of the topics representing individual games shows this slow build up. Though there are occasionally quick spikes for goals scored, these spikes will be registered
328
M. K. Hinders and S. L. Kirn
Fig. 9.19 Fingerprints for topics 5 (top), 6 (middle), and 9 (bottom). Each topic represents a different match that occurred during data collection and all three show relatively similar behavior. Different shades in each fingerprint represent different objects. Objects give a description of the temporal behavior of a signal over a short time
in the second object which describes the behavior during the match. These second objects represent various storm cells such as storm cell 7 which describes behavior after the initial increase in volume. Many of the match topics appear similar in the time-series representation, so it is no surprise that they look similar in the DWFP representation. However, when looking at some of the other topics, while their time-series representations look quite different we can show that the behavior is actually quite similar. Figure 9.20 shows two different topics with their time series and corresponding fingerprints. The top two images are for topic 0, which represents tweets that had strong correlation to the hashtags #Russia2018 and #WorldCup, while the bottom two images are for topic 1,
9 Cranks and Charlatans and Deepfakes
329
Fig. 9.20 Time series and DWFP representations for topics 0 (top 2) and 1 (bottom 2). Even though these two topics emit much different time signatures we can still identify common patterns in their behavior using the DWFP. Specifically the objects indicated by the red arrows look quite similar to the characteristic storm cell 0 as found in [74]
330
M. K. Hinders and S. L. Kirn
which represents mostly ads and spam that ran along with the World Cup coverage. The time series for topic 0 mimics the time series for the dataset as a whole, Fig. 9.15, with a rhythmic pattern over the three days of data. This should be expected because most tweets in the dataset referenced #WorldCup. However, topic 1 does not have this same shape. It exhibits two quick spikes around the same times topic 0 has its local maximum, showing the advertisers targeted their content for when traffic was at its heaviest, but there is not the same rhythmic pattern over the three days as shown from topic 0. While these time series look quite different in the time domain, they share some key characteristics in the DWFP domain, most notably an object sitting between the two local maxima, marked by the red arrow in both fingerprints, that resembles storm cell 0 in [74]. This shows the underlying behavior of the advertisers, who tailor their ads to World Cup fans, thus they try to run their ads when more users are online to maximize exposure while minimizing the overall costs of running the ads. As teams and networks continue to push conversation and coverage to social media over time, the amount of data on Twitter will continue to grow. For instance, during the 2018 Olympics in South Korea, NBC began streaming live events online and providing real-time coverage via Twitter [108], and several NFL games and Wimbledon matches were also aired on Twitter in 2016 [109]. The application of topic modeling and DWFP analysis to social media could be far more powerful than what was illustrated above. Disinformation campaigns have been launched in at least 28 different countries [110] using networks of automated accounts to target vulnerable audiences with their fake news stories disguised as traditional news [111–114]. Content moderation at scale is an intractable problem in a free society, so analyzing characteristics of individual stories posted to social media is not the answer. Instead we need to look for the underlying behavior of the actors operating campaigns. We can use the DWFP to uncover this behavior by identifying key temporal signals buried deeply in the noise of social media. Networks of automated accounts targeting specific audiences will necessarily emit different time signatures than would a network of human users discussing some topic, as illustrated by ads timed to coincide with high World Cup tweet volume. This can be exploited by identifying the fingerprint pattern that best identifies this time signature. Topics giving this time signature can then be flagged so that preventative action can be taken to slow the dissemination of that disinformation. Paired with spatial methods such as retweet cascades [115] and network analysis [116, 117] this can differentiate bot driven topics from disinformation campaigns from all other normal topics on social media. Part of the inherent advantage of the DWFP representation is that it can highlight signal features of interest that are buried deeply in noise.
9 Cranks and Charlatans and Deepfakes
331
9.7 DWFP for Account Analysis Much of the spread of inflammatory content and disinformation on social media is pushed by bots, many of whom operate as part of disinformation campaigns [118– 120]. Thus it is advantageous to be able to identify if an individual account is a bot. Bot accounts are adept at disguising themselves as humans so they can be exceedingly difficult to detect [121]. Many bot accounts on social media platforms are harmless and upfront about their nature. Examples of these include news aggregators and customer service bots. Of greater concern to the public are accounts that do not identify themselves as bots, but instead pretend to be real human accounts in order to fool other users. These types of bots have been identified as a large component in spreading misinformation on social media [122, 123]. They were instrumental in the coordinated effort by the Russian operated Internet Research Agency to sow discord into the United States’ voter base in order to attempt influence the 2016 US Presidential Election [124–126]. Raising concerns about malicious bot accounts pushing misinformation and misconstruing public sentiment has lead to the need to develop robust algorithms that can identify bots operating in near real time [127]. We can again use Tweepy to extract metadata from individual Twitter users [94]. Metadata related to an account include total number of followers/following, total number of tweets, creation date, bio, etc. Metadata is commonly used to detect bots [128–131]. One of the most popular currently available bot detection algorithms, Botometer—originally BotOrNot—uses over 1,000 features extracted from an account’s metadata and a random forest classifier to return the likelihood an account is a bot [132, 133]. However, training a bot detection system purely on metadata will not create an algorithm that is robust through time. Bot creators are constantly evolving their methods to subvert detection [134]. Thus, any bot detection method needs to be able to look beyond surface level metadata and analyze behavioral characteristics of both bots and their networks, as these will be more difficult to manipulate [135, 136]. We use the wavelet fingerprint to analyze the temporal posting behavior of an account to identify inorganic behaviors. Recently we [74] illustrated the basic method analyzing a corpus of tweets curated from 7 distinct tweet storms that occurred between 2015 and 2018 that we deemed to have cultural significance. The Twitter API was used to stream tweets as events unfolded for the Brett Kavanaugh confirmation hearings, Michael Cohen congressional testimony, President Trump’s summit with North Korea’s Kim Jung Un, the release of the Mueller Report on Russian meddling in the 2016 presidential election, and the 2018 World Cup. Open datasets were found for the 2018 Winter Olympics, 2017 Unite the Right rally in Charlottesville VA, and the 2015 riots in Baltimore over the killing of Freddie Gray. In all we collected and analyzed 28,285,124 tweets. The key to identifying twitterbots is sophisticated analysis of the timing of tweets. Pre-programmed posts will not have the organic signatures characteristic of humans posting thoughts and re-tweeting mots. The wavelet fingerprint method can be used to analyze tweetstorms and automatically classify different types of tweetstorms. We first use topic modeling to identify the things that are being tweeted about, and
332
M. K. Hinders and S. L. Kirn
then form time series of the topics of interest. We then perform our particular type of wavelet transform on the time series to return the black-and-white time-scale images. We then extract features from these wavelet fingerprints to use for machine learning in order to classify categories of tweetstorms. Using the time series of each topic we run wavelet fingerprint analyses to get a two-dimensional, time-scale, binary image. Gaussian mixture model (GMM) clustering is used to identify individual objects, or storm cells, that are characteristic to specific local behaviors commonly occurring in topics. The wavelet fingerprint transformation is volume agnostic, meaning we can compare tweet storms of different intensities. We find that we can identify behavior, localized in time, that is characteristic to how different topics propagate through Twitter. Organic tweetstorms are driven by human emotions, e.g., outrage or team spirit, and many people are often simultaneously expressing those same, or perhaps the contra-lateral, emotions via tweets. It’s the characteristic fingerprints of those collective human emotions that bots can’t accurately mimic, because we’re pulling out the topics from the specific groups of words being used. We have found, for example, that fans tweeting about World Cup matches are distinguishable from fans tweeting about Olympic events, presumably because football fans care much more deeply and may be tweeting from a pub. With enough tweets, we can use time-series representations to gain a deeper understanding of the topic and a way to measure a topic’s cultural relevance by allowing us to identify various trending behaviors. Did a topic build slowly to a peak the way a grassroots movement might? Did it spike sharply indicating collective reaction to a singular event might? These questions can be answered with time-series representations. Although we are often able to identify important characteristics of each topic using this sort of standard time-series representation, we need to be able to analyze each topic’s behavior at a deeper level. Fourier analysis does not give behavior localized in time and spectrograms are almost never the most efficient representations of localized frequency content. Thus we look to wavelet analysis, which converts a one-dimensional time series into two dimensions with coefficients representing both wavelet scale and time. Wavelet scale is the wavelet analog to frequency, where high wavelet scales represent low-frequency behavior and low-wavelet scales represent high-frequency behavior. Each mother wavelet creates a unique fingerprints, which can highlight different features in the time series. One of the main advantages of using the wavelet fingerprint for time-series analysis is the ability to create variations in fingerprint representations with different mother wavelets. As an illustration of how we can use the DWFP to identify inorganic behavior emitting from a single account we analyze the account trvestuff (@trvestuff). At 7:45 am Eastern time on 22 Jan 20, one of our curated sock puppets (@IBNowitall) with 8 known human followers, tweeted the following, “Al Gore tried to warn us all about adverse climate change. He had a powerpoint and everything. Now even South Park admits that manbearpig is real. It’s time to get ready for commerce in the Arctic. I’m super serial.” It was retweeted almost immediately by a presumed bot programmed to watch for tweets about climate change. The bot’s profile pic is clearly a teenager purporting to be in Tacoma, WA where it would have been rather early. First-period
9 Cranks and Charlatans and Deepfakes
333
Fig. 9.21 Time-series analysis of the posting behavior of the account trvestuff (@trvestuff). The top plot shows the time series of the raw number of posts every 15 min for two weeks. The middle plot shows the wavelet fingerprint of this time series. The wavelet fingerprint was created using the Gauss 2 wavelet, 50 wavelet scales, 5 slices, and a ridge thickness of 0.12. The bottom plot shows the ridge count for the wavelet fingerprint. There is a clearly un-natural repetition to this signal
bell there is 7:35 am and this was a school day. Although this twitter account is only a few months old, that profile pic has been on the interwebs for more than 5 years, mostly in non-English-speaking countries. We ran our topic modeling procedure on that bot’s corpus of tweets and formed a time signature of retweets characteristic of bots. Figure 9.21 shows the actual tweeting behavior of trvestuff plotted in time. The top plot in this figure shows the time-series of all tweets by trevstuff over a two week period from 5 Mar 20 until 19 Mar 20. Each point in the time series shows how many tweets the account posts for a 15 min interval. There is clearly a repetitive behavior exhibited by trvestuff. With 12 h of tweeting 4–6 tweets every hour, then 12 h off. This is confirmed by looking at the wavelet fingerprint shown in the middle image of Fig. 9.21. Compare this to Fig. 9.22 which shows the same information for a prolific human tweeter (@realDonaldTrump) and looks much less regular, with a few spikes of heavy tweeting activity. Looking at the fingerprint we cannot see any repetitive behavior, outside of maybe a slight diurnal pattern, which would be natural for any human tweeter. Features are extracted from the fingerprints which describe the tweeting behavior of an account in different ways. For example at the bottom of both Figs. 9.21 and 9.22 is the ridge count for the corresponding wavelet fingerprint. The ridge count tabulates the number of times each column of pixels in the fingerprint switch from
334
M. K. Hinders and S. L. Kirn
Fig. 9.22 Time-series analysis of the posting behavior of the account Donald Trump (@realDonaldTrump). The top plot shows the time series of the raw number of posts every 15 min for two weeks. The middle plot shows the wavelet fingerprint of this time series. The wavelet fingerprint was created using the Gauss 2 wavelet, 50 wavelet scales, 5 slices, and a ridge thickness of 0.12. The bottom plot shows the ridge count for the wavelet fingerprint
Fig. 9.23 Autocorrelation coefficients for each of the ridge counts shown in Figs. 9.21 (top) and 9.22 (bottom). Each data point shows the correlation coefficient of the ridge count with itself offset by that many data points
9 Cranks and Charlatans and Deepfakes
335
0 to 1 or 1 to 0. We can use the autocorrelation of the ridge counts to unearth any repetitive behaviors. Figure 9.23 shows the normalized auto correlation. Each plot shows how similar the vector is to its self shifted by some amount of time steps, with a maximum of 1 if they are exactly the same. Again we see further evidence of the artificial nature of trevstuff’s tweeting behavior. The President’s shows no real repetition. There is a small spike when the vector is offset by about three days. This is from the ridge count spikes on March 9th and March 12th aligning. However, the prominence of that spike is not close to the prominence of the spikes exhibited from trvestuff. The DWFP was probably unnecessary to see that trvestuff is a bot posing as a human. Sophisticated bots operating on Twitter will not be this obvious, of course, but the goal is to detect bots automatically. The DWFP has shown itself to be adept at identifying behavior that is buried deeply in noise, so it can identify these inorganic signals that are buried deeply in the noise of a Twitter bot’s posting behavior. This is the capability necessary to uncover sophisticated bots spreading spam and disinformation through Twitter. But what else might this be useful for?
9.8 In-Game Sports Betting Some people think Ultimate Frisbee is a sport. Some people think the earth is shaped like a Frisbee. Some people think soccer is a sport. Some people think the earth is shaped like a football. I think American football is a sport. I think basketball is a sport. I think baseball and golf are excellent games for napping. I think Frisbee is a sport for dogs. I think muggle quidditch is embarrassing. I have no particular opinion on soccer, except that I don’t think kids should get a trophy if their feet never touched the ball. But now that sports betting has been legalized nationally (What were the odds of that?) we may have to expand our definition of sports to be pretty much any contest that you can bet on [137]. Americans currently wager a total of $150 billion each year through illegal sports gambling, although it’s not clear how much of that is March Madness office pools where most people make picks by their teams’ pretty colors or offensive mascots. Some people are worried that sports betting will create ethical issues for the NCAA. I’m worried that academically troubled student-athletes are encouraged to sign up for so-called paper classes, which are essentially no-show independent studies involving a single paper that allows functionally illiterate football players to prop up their GPAs to satisfy the NCAA’s eligibility requirements.6 As many others have 6
Here’s an actual UNC Tarheel paper that was written for an actual intro class, in which the studentathlete finished with an actual A-: On the evening of December Rosa Parks decided that she was going to sit in the white people section on the bus in Montgomery, Alabama. During this time blacks had to give up there seats to whites when more whites got on the bus. Rosa Parks refused to give up her seat. Her and the bus driver began to talk and the conversation went like this. “Let me have those front seats” said the driver. She didn’t get up and told the driver that she was tired of giving her seat to white people. “I’m going to have you arrested,” said the driver. “You may do that,” Rosa
336
M. K. Hinders and S. L. Kirn
pointed out, people who think that most big-time college athletes are at school first and foremost to be educated are fooling themselves. They’re there to work and earn money and prestige for the school. That money all stays inside the athletic departments. The English department faculty don’t even get free tee shirts or tickets to the games, because that might create a conflict of interest if someday a semi-literate star player needed an A- to keep his eligibility. Profs. do get to share in the prestige, though. There might be a small faculty discount at the university bookstore, which mostly sells tee shirts. Tweetstorms about the Olympics give characteristically different fingerprints than the World Cup. Fans are somewhat less emotionally engaged for the former, and somewhat more likely to be drinking during the latter. Winter Olympics hooligans simply isn’t a thing, unless Canada loses at curling. Both the Olympics and the World Cup do take place over about a fortnight, and whoever paid for the broadcast rights does try hard to generate buildup. Hence, related tweetstorms can be considered to have compact support in time. Cultural events also often happen over about this sort of time scale because people collectively are fickle and the public is thought to have short attention spans. There’s also always some new disaster happening or looming to grab the public’s attention and perhaps generate advertising dollars. We can analyze tweetstorms about global pandemics in a manner similar to how we did the Olympics and the World Cup. The minutiae of the wavelet fingerprints are different in ways that should allow us to begin to automatically characterize tweetstorm events as they begin to build and develop so that proper action can be taken. Of particular interest is distinguishing tweetstorms that blow up organically versus those that are inflated artificially. Fake news is now widely recognized as a problem in society, but some politicians have gotten into the habit of accusing anything that they disagree with of being fake news. Analyzing the tweetstorms that surround an event should allow us to determine its inherent truthiness. We expect this to be of critical importance during the next few election cycles, now that deepfake videos can be produced without the hardware, software and talent that used to be confined to Hollywood special effects studios. These techniques can be used to analyze in real time the minutiæ of in-game sports betting in order to adjust odds on the fly faster than human bookies could ever do. For a slow-moving sport like major league baseball, fans inside the stadium will be able to hold a beer in one hand and their stadium-WiFi-connected smartphone in the other. Teams will want to provide excellent WiFi and convenient apps that let fans bet on every little thing that happens during the game. They can encourage fans to come to the game by exploiting the demise of net neutrality to give a moment’s advantage to fans inside the stadium as compared to those streaming the game at home. The in-stadium betting can be easily monetized by taking a small cut off the top for those using the team’s app, but the real issue with in-game betting is adjusting the odds to make sure they are always in Panem’s favor. We expect a rich storm of Parks responded. Two white policemen came in and Rosa Parks asked them “why do you all push us around?” The police officer replied and said “I don’t know, but the law is the law and you’re under arrest.”
9 Cranks and Charlatans and Deepfakes
337
information to develop around any live sporting event as people draw on multiple sources of information, guidance, speculation, etc. in order to improve their odds. As of 2020, that seems likely to be primarily twitter, although we use the terms “tweets” and “tweetstorms” in the general sense, rather than referring to a particular social media platform because those come and go all the time. Speech-to-text technology is robust enough that radio broadcasts can be entrained into topic modeling systems, and databases such as in-game, in-stadium beer sales can be updated and accessed in real time. Meteorological information and the like can also be incorporated via a comprehensive machine learning approach to fine tune the odds. Gotta go, the game’s about to start.
9.9 Virtual Financial Advisor Is Now Doable Meanwhile, Warren Buffet and Bill Gates are persuading fellow billionaires to each commit to giving away half of their wealth to good causes before they die. If none of the billionaires listed at givingpledge.org think you’re a good cause to give their money to, you might be interested in some fancier investment strategies than diversified, low-load index funds. Econometrics is the field of study where mathematical modeling and statistics is used to figure out why the market(s) just did that thing that nobody except Warrent Buffet predicted. Business analytics and machine learning are all the rage right now, and applying artificial intelligence to financial modeling and prediction seems like a no-lose business proposition. As I may have mentioned, there seem to be lots of cranks and charlatans in this business right now. If you page back through this book you should see that the dynamic wavelet fingerprint technique works quite well to identify subtle patterns in time-series data. It would be a trivial matter to form wavelet fingerprints from econometric datasets, and then to look for patterns that you think might predict the future. The human eye is very good at seeing patterns in these kinds of fingerprint images, but will see patterns even if they aren’t there, so you have to be very careful about fooling yourself. The good news is you can place bets on yourself and see if your predictions return profit or loss. You could also write a simple wavelet fingerprint app for day traders and see if that hits the jackpot. If you make billions of dollars doing that sort of thing, be sure to sign on to the giving pledge. Generously supporting The Alma Mater of a Nation would be a worthy cause, and we’ll be happy to name a building after you and yours. Machine learning has rather a lot of potential for doing good in financial services, which we close with to end on a hopeful note, as opposed to deepfakes and twitter bots and such. Structured data is typically comprised of well-defined data types whose pattern makes it easily searchable, while unstructured data is comprised of data that is usually not as easily searchable including formats like audio, video, and social media postings that require more preprocessing. It’s now possible to make use of both structured and unstructured data types in a unified machine learning paradigm. Key to the applications discussed in this book is extracting meaning from time-series data
338
M. K. Hinders and S. L. Kirn
using the standard statistical methods as well as time-series and time-frequency/timescale transformations which convert time-series signals to higher dimensional representations where image processing methods and shape identification methods can be brought to bear. The standard tools available via day-trading platforms such as E*trade, TD Ameritrade, etc.—and even the more sophisticated tools on Bloomberg terminals—can be leveraged to develop advanced signal processing methods for machine learning for developing a virtual financial advisor (VFA). A VFA could, for example, give advice on purchasing a car that would best fit a person’s particular financial situation, accounting for marital status, dependents, professional status, savings, total debt, credit score, etc. as well as previous cars purchased. Rather than a simple credit score decision tree, the goal is a more holistic assessment of the person’s situation and financial trajectory and how notional car-buying decisions would affect that financial trajectory both over the short term and much longer. Natural language processing allows the VFA to simply engage the person in a conversation without the pressure of a salesman trying to close a deal, and without having to type a bunch of information into a stupid webform. Of course, major financial decisions like buying a car should never be done without an assessment of the state of the larger economy and economic outlook. The VFA algorithms should allow for incorporation of micro- and macro-economic data including realtime assessments of markets, trending topics on world financial markets discussion sites (verbal and text-based), and historical data/analysis with an eye toward precursor signals of significant financial events.7 Topic modeling and pattern classification via dynamic wavelet fingerprints allow a machine learning system to incorporate and make sense of patterns in all of these diverse and disparate information sources in order to give give neutral, but tailored guidance to a potential car buyer. The VFA would need to be able to • Leverage data of people with similar attributes, e.g., education, professional status, life events, financial situations, and previous car ownership. The data types and formats available, especially the time resolution and length of archived data history, will be critical in this algorithm development. • Understand semantics in conversations via an up-to-date a library of car names, nicknames and related jargon, but freely available sources from Wikipedia to specialized subreddits can be scraped and analyzed with topic-modeling methods. • Present alternatives as suggestions, but not rank ordered in any way because most people view their cars as an extension of their own self-image and part of the “personality” of the VFA is that it doesn’t presume to know which car makes you cool or whatever human emotions happen to be driving your car-buying decisions. The recommendations would be simply couched in terms of how various options might affect the your financial situation now and in the future. Of course the VFA would need to be integrated as a module in an intelligent personal assistant technology with an interface to interact using (Warren Buffet’s) voice, which 7 As
I’m typing this, oil is at $20 per barrel and some analysts are predicting single digits. Who could have predicted that?
9 Cranks and Charlatans and Deepfakes
339
means that it could be used to keep track of television shows such as Motorweek and several of the numerous car-repair and rework shows airing currently, automotive podcasts, etc. Similarly, it could monitor financial news television shows, podcasts, etc. with a particular focus on identifying and isolating discussions about the state of the automotive industry. A side benefit of the ability to monitor and classify automotive industry and financial audio streams may be to “bookmark” segments that would be of interest for you and suggest that you might want to listen or watch current news about the cars you’ve expressed interest in buying. The TextRank algorithm for automatic keyword extraction and summarization using Levenshtein distance as relation between text units can be leveraged to tl;dr them. This sort of system could be expanded to give advice for other decisions such as home purchases (e.g., finding a home, mortgages and insurance) and college savings (e.g., 529 College Savings). The VFA would give advice on how to approach purchasing a home, for example, by helping with mortgages and insurance as well as finding homes that best suit a person’s family and financial situation. It would make use of existing home-search tools, like Realtor.com, Zillow.com, etc. Buying a home is a much more significant financial decision with much longer term consequences, so a feature of a VFA personality for this would be entering into a patient, longterm conversation about the range of choices. Also, once the person’s dream home appears on the market there is often a sudden urgency to downselecting on all the ancillary financial issues, especially financing options and the like. A virtual financial advisor is ideally suited to all configurations of a hurry-up-and-wait scenario and can patiently and diligently watch both the local housing market and mortgage rates for things to perfectly align. Individuals can be confident that the VFA is driven by what’s in their best interests, not just the need to make its sales numbers this quarter, and will present ranges of options with implications for their long-term financial well being at the center of the calculus. You wouldn’t have to worry about whether the VFA might be a crank (fool) or a charlatan (crook) because its sole purpose is to collect information and detect patterns and then give you tailored, unbiased advice. Similarly, modules to offer advice concerning other long-term financial planning decisions, such as college savings, could be incorporated. Specialized expertise—such as details about how to maximize benefits under the Post-9/11 GI Bill or why it’s better to have grandparents open a 529 College Savings Plan or how the typical discount rate at private colleges affects cost of attendance—could be incorporated into the VFA. There is a wealth of insider information about how college works and what it actually costs and how US News rankings, Final Four results, and post-college placement statistics, etc. affect everything from the acceptance rate, cost of attendance, and subsequent earning potential for the child. The VFA can help to demystify college for a young parent who needs to know now how to start planning enough in advance to help open the right doors 10 or 15 years hence without resorting to bribe a coach or something. We think wavelet fingerprints can help.
340
M. K. Hinders and S. L. Kirn
References 1. Stromberg J (2013) What’s in century-old ‘Snake Oil’ medicines? Mercury and lead, smithsonian.com April 8, 2013. https://www.smithsonianmag.com/science-nature/whats-in-centuryold-snake-oil-medicines-mercury-and-lead-16743639/ 2. Silberman S (2012) Are warnings about the side effects of drugs making us sick? PLOS (The Public Library of Science). http://blogs.plos.org/neurotribes/2012/07/16/are-warningsabout-the-side-effects-of-drugs-making-us-sick/. Accessed 16 July 2012 3. Preston E (2014) Say no to nocebo: how doctors can keep patients’ minds from making them sicker. Discover Magazine. http://blogs.discovermagazine.com/inkfish/2014/07/09/say-noto-nocebo-how-doctors-can-keep-patients-minds-from-making-them-sicker/. Accessed 9 July 2014 4. Zublin F (2015) Keep your whooping cough to yourself, OZY.com. https://www.ozy.com/ immodest-proposal/keep-your-whooping-cough-to-yourself/60310. Accessed 31 May 2015 5. Lewandowsky S, Mann ME, Bauld L, Hastings G, Loftus EF (2013) The subterranean war on science. Association for Psychological Science. https://www.psychologicalscience.org/ observer/the-subterranean-war-on-science 6. Mole B (2016) FDA: homeopathic teething gels may have killed 10 babies, sickened 400. Ars Technica. https://arstechnica.com/science/2016/10/fda-homeopathic-teething-gelsmay-have-killed-10-babies-sickened-400/. Accessed 13 Oct 2016 7. Gorski D (2010) The dietary supplement safety act of 2010: a long overdue correction to the DSHEA of 1994? Science-Based Medicing. http://www.sciencebasedmedicine.org/?p=3772. Accessed 8 Feb 2010 8. Summers D (2014) Dr. Oz: world’s best snake oil salesman. The Daily Beast. https://www. thedailybeast.com/dr-oz-worlds-best-snake-oil-salesman. Accessed 14 June 2014 9. McCoy T (2014) Half of Dr. Oz’s medical advice is baseless or wrong, study says. The Washington Post. http://www.washingtonpost.com/news/morning-mix/wp/2014/12/19/halfof-dr-ozs-medical-advice-is-baseless-or-wrong-study-says/. Accessed 19 Dec 2014 10. Kaplan K (2014) Real-world doctors fact-check Dr. Oz, and the results aren’t pretty. LA Times. http://www.latimes.com/science/sciencenow/la-sci-sn-dr-oz-claims-fact-checkbmj-20141219-story.html. Accessed 19 Dec 2014 11. Ray Merriman Workshop at ISAR Conference on “Reimagining the Future”. https://isar2020. org/, https://www.mmacycles.com/events/ray-merriman-workshop/. Accessed 9 Sept 2020 12. How Warren Buffet became one of the wealthiest people in America, A chronological history of the Oracle of Omaha: 1930–2019 By Joshua Kennon the balance. https://www.thebalance. com/warren-buffett-timeline-356439. Accessed 25 June 2019 13. Yochim D, Voigt K (2019) Index funds: how to invest and best funds to choose. NerdWallet. https://www.nerdwallet.com/blog/investing/how-to-invest-in-index-funds/. Accessed 5 Dec 2019 14. Buffett W (2017) ‘Oracle of Omaha’, criticises Wall Street and praises immigrants. The Guardian. https://www.theguardian.com/business/2017/feb/25/warren-buffettberkshire-hathaway-wall-street-apple-annual-letter. Accessed 25 Feb 2017 15. Yong Ed (2016) Psychology’s replication crisis can’t be wished away. The Atlantic. https:// www.theatlantic.com/science/archive/2016/03/psychologys-replication-crisis-cant-bewished-away/472272/. Accessed 4 March 2016 16. Reilly J (2014) Chin chin: urine-drinking hindu cult believes a warm cup before sunrise straight from a virgin cow heals cancer - and followers are queuing up to try it. The Daily Mail. http://www.dailymail.co.uk/news/article-2538520/Urine-drinking-Hindu-cultbelieves-warm-cup-sunrise-straight-virgin-cow-heals-cancer-followers-queuing-try-it. html. Accessed 13 Jan 2014
9 Cranks and Charlatans and Deepfakes
341
17. Adams C (2014) Is there a scientifically detectable difference between high-price liquor and regular stuff? The Straight Dope. http://www.straightdope.com/columns/read/3142/is-therea-scientifically-detectable-difference-between-high-price-liquor-and-regular-stuff/, http:// www.straightdope.com/columns/read/3142/is-there-a-scientifically-detectable-differencebetween-high-price-liquor-and-regular-stuff/. Accessed 3 Jan 2014 18. Adams W (2014) Wine expert reviews cheap beer. Devour.com. http://devour.com/video/ wine-expert-reviews-cheap-beer/. Accessed 26 Feb 2014 19. Burke J (2013) Woo-ing wine drinkers. Skepchick. http://skepchick.org/2013/01/guest-postwoo-ing-wine-drinkers/. Accessed 19 Jan 2013 20. The bottled water taste test. BuzzFeedVideo. https://youtu.be/2jIC6MBkjjs. Accessed 30 Nov 2014 21. Peters J (2008) What’s the best adult diaper? That depends. Slate Geezers issue. http:// www.slate.com/articles/life/geezers/2008/09/whats_the_best_adult_diaper.html. Accessed 10 Sept 2008 22. Barry-Jester AM (2016) What went wrong in flint. FiveThirtyEight. https://fivethirtyeight. com/features/what-went-wrong-in-flint-water-crisis-michigan/. Accessed 26 Jan 2016 23. Groll E (2015) WHO: to avoid MERS, Don’t Drink Camel Urine. Foreign Policy. https:// foreignpolicy.com/2015/06/08/who-to-avoid-mers-dont-drink-camel-urine/. Accessed 8 June 2015 24. Lee C (2015) Magic placebo more effective than ordinary placebo. Ars Technica. https:// arstechnica.com/science/2015/04/a-tale-of-two-placebos/, http://journals.plos.org/plosone/ article?id=10.1371/journal.pone.0118440. Accessed 22 April 2015 25. Kubo K, Guillory N (2016) People tried their own urine for the first time and they were disgusted. Buzz Feed. https://www.buzzfeed.com/kylekubo/people-tried-their-own-urine-forthe-first-time-and-they-wer. Accessed 5 Jan 2016 26. This section adapted with permission from: Ignatius B. Nowitall, "Scientific Adulting and BS Detection" W&N Edutainment, 2020 27. Houdini H (1924) A magician among the spirits. Harper, New York 28. Mizokami K (2018) 70 years and counting, the UFO phenomenon is as mysterious as ever. Popular Mechanics. https://www.popularmechanics.com/space/a22025557/world-ufo-day2018/. Accessed 2 July 2018 29. Wright T (2013) Why there will never be another Flying Pancake. The end of Vought V173. Air & Space Magazine. http://www.vought.org/rest/html/rv-1731.html, https://www. airspacemag.com/history-of-flight/restoration-vought-v-173-7990846/ 30. Webster D (2017) In 1947, A high-altitude balloon crash landed in roswell. The Aliens Never Left Smithsonian Magazine. https://www.smithsonianmag.com/smithsonian-institution/in1947-high-altitude-balloon-crash-landed-roswell-aliens-never-left-180963917/. Accessed 5 July 2017 31. Project BLUE BOOK - Unidentified Flying Objects. United States Air Force, 1952–1969. https://www.archives.gov/research/military/air-force/ufos.html 32. “Top Secret America" ws a project nearly two years in the making that describes the huge national security buildup in the United States after the Sept. 11, 2001, attacks. The project was last updated in September 2010. http://projects.washingtonpost.com/top-secret-america/ 33. Kirk M, Gilmore J, Wiser M, Smith M (2014) United States of Secrets. Frontline. https:// www.pbs.org/wgbh/frontline/film/united-states-of-secrets/. Accessed 13 May 2014 34. Lundberg J, Pilkington M, Denning R, Kyprianou K (2013) Mirage Men, a journey into paranoia, disinformation and UFOs. Random Media. World premiere at Sheffield Doc/Fest in June 2013. https://www.imdb.com/title/tt2254010/ 35. Plackett B (2012) Declassified at last: air force’s supersonic flying saucer schematics. Wired. https://www.wired.com/2012/10/the-airforce/. Accessed 05 Oct 2012 36. Cowing K (2014) CIA admits that it owns all of the flying saucers. NASA Watch. http:// nasawatch.com/archives/2014/12/cia-admits-that.html. Accessed 29 Dec 2014
342
M. K. Hinders and S. L. Kirn
37. Dickson C (2015) Obama adviser John Podesta’s biggest regret: keeping America in dark about UFOs. Yahoo News. https://www.yahoo.com/news/outgoing-obama-adviser-johnpodesta-s-biggest-regret-of-2014--keeping-america-in-the-dark-about-ufos-234149498. html. Accessed 13 Feb 2015 38. Cooper H, Blumenthal R, Kean L (2017) Glowing auras and ‘Black Money’: the Pentagon’s Mysterious U.F.O. Program. New York Times. https://www.nytimes.com/2017/12/16/us/ politics/pentagon-program-ufo-harry-reid.html. Accessed 16 Dec 2017 39. Tchou A (2011) I saw four green objects in a formation, an interactive map of 15 years of UFO sightings. Slate. http://www.slate.com/articles/news_and_politics/maps/2011/01/i_ saw_four_green_objects_in_a_formation.html. Accessed 11 Jan 2011 40. “British UFO Files Reveal Shocking Sightings And Hoaxes From 1950 To Present” HUFFINGTON POST May 25, 2011 https://www.huffingtonpost.com/2011/03/03/uk-releases-8500pages-of_n_830880.html 41. Darrach HB, Jr, Ginna R (1952) HAVE WE VISITORS FROM SPACE? LIFE Magazine. http://www.project1947.com/shg/csi/life52.html. Accessed 7 April 1952 42. Sofge E (2009) The 10 most influential UFO-inspired books, movies and TV shows. Popular Mechanics. https://www.popularmechanics.com/space/g204/4305349/. Accessed 17 March 2009 43. Mikkelson D (2014) Did boyd bushman provide evidence of alien contact? Snopes.com. https://www.snopes.com/fact-check/boyd-bushman-aliens/. Accessed 31 Oct 2014 44. When president Roosevelt died Truman was was informed by secretary of war, Harry Stimson of a new and terrible weapon being developed by physicists in New Mexico. https://www. history.com/this-day-in-history/truman-is-briefed-on-manhattan-project 45. "Farmer Trent’s Flying Saucer". Life: 40. https://books.google.com/books? id=50oEAAAAMBAJ. Accessed 26 June 1950 46. SCIENTIFIC STUDY OF UNIDENTIFIED FLYING OBJECTS Conducted by the University of Colorado Under contract No. 44620-67-C-0035 With the United States Air Force, Dr. Edward U. Condon, Scientific Director (1968). Electronic edition1999 by National Capital Area Skeptics (NCAS). http://files.ncas.org/condon/text/case46.htm 47. Dom Armentano OC’s moment in UFO history, The Orange County Register October 30, 2009. https://www.ocregister.com/2009/10/30/ocs-moment-in-ufo-history/. See also: https://www.washingtonpost.com/news/morning-mix/wp/2015/01/21/two-decades-ofmysterious-air-force-ufo-files-now-available-online/ 48. Bowen C (1970) “Progress at Cradle Hill” in the March/April edition of Flying Saucer Review. An international journal devoted to the study of Unidentified Flying Objects vol 17, no 2 (1970). http://www.ignaciodarnaude.com/ufologia/FSR%201971%20V%2017%20N%202. pdf 49. Minchin’s T (2011) Storm the animated movie. https://www.youtube.com/watch? v=HhGuXCuDb1U. Accessed 7 April 2011 50. Worth1000.com is now part of DesignCrowd.com, which has preserved all the amazing Worth1000 content here so you can search the archives to find old favorites and new contest art. https://blog.designcrowd.com/article/898/worth1000-on-designcrowd 51. Tumulty K (2007) How the right went wrong time magazine. (PHOTOGRAPH BY DAVID HUME KENNERLY. TEAR BY TIM O’BRIEN) http://content.time.com/time/covers/0, 16641,20070326,00.html. Accessed 15 March 2007 52. “The Age of Instagram Face, How social media, FaceTune, and plastic surgery created a single, cyborgian look.” by Jia Tolentino. The New Yorker. https://www.newyorker.com/ culture/decade-in-review/the-age-of-instagram-face. Accessed 12 Dec 2019 53. This guy can’t stop photoshopping himself into Kendall Jenner’s instagram pics, twisted Sifter. http://twistedsifter.com/2017/01/guy-photoshops-himself-into-kendall-jenners-pics/. Kirby Jenner, Fraternal Twin of Kendall Jenner is @KirbyJenner. Accessed 5 Jan 2017 54. Shiffman D (2013) How to tell if a “shark in flooded city streets after a storm" photo is a fake in 5 easy steps. Southern Fried Science. http://www.southernfriedscience.com/how-to-tell-if-ashark-in-flooded-city-streets-after-a-storm-photo-is-a-fake-in-5-easy-steps/. Accessed 23 Jan 2013
9 Cranks and Charlatans and Deepfakes
343
55. Geigner T (2013) This week’s bad photoshopping lesson comes from scientology (from the the-thetans-did-it dept). Techdirt. Accessed 16 May 2013 56. Live Science Staff “10 Paranormal Videos Debunked” Live Science. https://www.livescience. com/33237-6-paranormal-video-hoaxes.html. Accessed 27 April 2011 57. Boehler P (2013) Nine worst doctored photos of Chinese officials. South China Morning. https://www.scmp.com/news/china-insider/article/1343568/slideshow-doctoredphotos-chinese-officials. Accessed 30 Oct 2013 58. Dickson EJ (2014) Plastic surgeons say Three-Boob Girl is a hoax. The Daily Dot. https:// www.dailydot.com/irl/three-boob-internet-hoax/. Accessed 23 Sept 2014 59. Bruce Shapiro: What happens when Photoshop goes too far? PBS Newshour. https://www.pbs. org/newshour/show/now-see-exhibit-chronicles-manipulated-news-photos. Accessed 26 July 2015 60. Neil Slade Haiti UFO DEBUNKED Slow Motion and Enhanced Stills, Video posted on Aug 10, 2007. https://www.youtube.com/watch?v=rrrx9izp0Lchttps://www.snopes.com/ fact-check/ufos-over-haiti/ 61. Sarno D (2007) It came from outer space. Los Angeles Times. Accessed 22 Aug 2007. http:// www.latimes.com/newsletters/topofthetimes/la-et-ufo22aug22-story.html 62. Newitz A (2012) Why is this the most popular UFO footage on YouTube?” io9, Gizmodo. https://io9.gizmodo.com/5912215/why-is-this-the-most-popular-ufo-footage-on-youtube. Accessed 22 May 2012 63. Weiskott E (2016) Before ‘Fake News’ Came False Prophecy. The Atlantic. https://www. theatlantic.com/politics/archive/2016/12/before-fake-news-came-false-prophecy/511700/. Accessed 27 Dec 2016 64. Holiday R (2012) How your fake news gets made (Two Quick Examples). Forbes. https://www.forbes.com/sites/ryanholiday/2012/05/24/how-your-fake-news-gets-madetwo-quick-examples/. Accessed 24 May 2012 65. Achenbach J (2015) Why do many reasonable people doubt science? National Geographic. https://www.nationalgeographic.com/magazine/2015/03/ 66. Bergstrom CT, West J (2017) Calling bullshit: data reasoning in a digital world. INFO 198/BIOL 106B. University of Washington, Autumn Quarter 2017. https://callingbullshit. org/syllabus.html 67. Aschwanden C (2015) Science isn’t broken, it’s just a hell of a lot harder than we give it credit for. FiveThirtyEight. https://fivethirtyeight.com/features/science-isnt-broken/. Accessed 19 Aug 2015 68. Yglesias M (2018) Mark Zuckerberg has been apologizing for reckless privacy violations since he was a freshman. Vox. https://www.vox.com/2018/4/10/17220290/mark-zuckerbergfacemash-testimony. Accessed 11 April 2018 69. Constine J (2018) Truepic raises $8M to expose Deepfakes, verify photos for Reddit. TechCrunch. https://techcrunch.com/2018/06/20/detect-deepfake/. Accessed 20 June 2018 70. Building a better news experience on YouTube, together. YouTube Official Blog, Monday. https://youtube.googleblog.com/2018/07/building-better-news-experience-on.html. Accessed 9 July 2018 71. Trulove R (2018) Wildlife writer and conservationist of over a half century. Quora. https:// top.quora.com/What-is-a-group-of-squirrels-called. Accessed 25 March 2018 72. Blakeslee S (1997) Kentucky doctors warn against a regional dish: squirrels’ brains. New York Times. https://www.nytimes.com/1997/08/29/us/kentucky-doctors-warn-againsta-regional-dish-squirrels-brains.html. Accessed 29 Aug 1997 73. Data Never Sleeps 6.0, https://www.domo.com/learn/data-never-sleeps-6 74. Kirn SL, Hinders MK (2020) Dynamic wavelet fingerprint for differentiation of tweet storm types. Soc Netw Anal Min 10:4. https://doi.org/10.1007/s13278-019-0617-3 75. Raschka S (2015) Python machine learning. Packt Publishing Ltd., Birmingham 76. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
344
M. K. Hinders and S. L. Kirn
77. Mikolov T et al (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 3111–3119 78. Jedrzejowicz J, Zakrzewska M, (2017) Word embeddings versus LDA for topic assignment in documents. In: Nguyen N, Papadopoulos G, Jedrzejowicz P, Trawinski B, Vossen G (eds) Computational collective intelligence. ICCCI, (2017) Lecture notes in computer science, vol 10449. Springer, Cham 79. Lau JH, Baldwin T (2016) An emperical evaluation of doc2vec with practical insights into document embedding generation. In: Proceedings of the 1st workshop on representation learning for NLP. Berlin, Germany, pp 78–86 80. Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. Adv Neural Inf Process Syst 2177–2185 81. Niu L, Dai X (2015) Topic2Vec: learning distributed representations of topics, CoRR, arXiv:1506.08422 82. Xun G et al (2017) Collaboratively improving topic discovery and word embeddings by coordinating global and local contexts. KDD 17 Research Paper, pp 535–543 83. Xu W et al (2003) Document clustering based on non-negative matrix factorization. In: Research and development in information retrieval conference proceedings, pp 267–273 84. Patrick J et al (2018) Scalable generalized dynamic topic models. arXiv:1803.07868 85. David B, John L (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, pp 113–129 86. Saha A, Sindhwani V (2012) Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization. In: Proceedings of the fifth ACM international conference on Web search and data mining, pp 693–702 87. Derek G, Cross James P (2017) Exploring the political agenda of the European parliament using a dynamic topic modeling approach. Politi Anal 25(1):77–94 88. Yong C et al (2015) Modeling emerging, evolving, and fading topics using dynamic soft orthogonal nmf with sparse representation. In: 2015 IEEE international conference on data mining, pp 61–70 89. Deerwester S et al (1990) Indexing by latent semantic analysis. J Amer Soc Inf Sci 41(6):391– 407 90. David B et al (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022 91. Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825– 2830 92. Mimno D et al (2011) Optimizing semantic coherence in topic models. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp 262–272 93. Kim S, Sohee Kim, 5G is making its global debut at Olympics, and it’s wicked fast, https://www.bloomberg.com/news/articles/2018-02-12/5g-is-here-super-speed-makesworldwide-debut-at-winter-olympics 94. Tweepy Documentation, http://docs.tweepy.org/en/v3.5.0/index.html 95. Daubechies I (1992) Ten lectures on wavelets, vol 61. Siam 96. Hou J, Hinders MK (2002) Dynamic Wavelet fingerprint identification of ultrasound signals. Mater Eval 60:1089–1093 97. Bertoncini CA, Hinders MK (2010) Fuzzy classification of roof fall predictors in microseismic monitoring. Measurement 43(10):1690–1701 98. Bertoncini CA, Rudd K, Nousain B, Hinders M (2012) Wavelet fingerprinting of radiofrequency identification (RFID) tags. IEEE Trans Ind Electron 59(12):4843–4850 99. Bingham J, Hinders M (2009) Lamb wave characterization of corrosion-thinning in aircraft stringers: experiment and three-dimensional simulation. J Acoust Soc Amer 126(1):103–113 100. Bingham J, Hinders M, Friedman A (2009) Lamb wave detection of limpet mines on ship hulls. Ultrasonics 49(8):706–722 101. Hinders M, Bingham J, Rudd K, Jones R, Leonard K (2006) Wavelet thumbprint analysis of time domain reflectometry signals for wiring flaw detection. In: Thompson DO, Chimenti DE (eds) Review of progress in quantitative nondestructive evaluation, vol 25, American Institute of Physics Conference Series, vol 820, pp 641–648
9 Cranks and Charlatans and Deepfakes
345
102. Hou J, Leonard KR, Hinders MK (2004) Automatic multi-mode lamb wave arrival time extraction for improved tomographic reconstruction. Inverse Probl 20(6):1873–1888 103. Hou J, Rose ST, Hinders MK (2005) Ultrasonic periodontal probing based on the dynamic wavelet fingerprint. EURASIP J Adv Signal Process 2005:1137–1146 104. Miller CA, Hinders MK (2014) Classification of flaw severity using pattern recognition for guided wave-based structural health monitoring. Ultrasonics 54(1):247–258 105. Skinner E, Kirn S, Hinders M (2019) Development of underwater beacon for arctic through-ice communication via satellite. Cold Reg Sci Technol 160:58–79 106. Cohen L (1995) Time-frequency analysis, vol 778. Prentice Hall, Upper Saddle River 107. Bertoncini CA (2010) Applications of pattern classification to time-domain signals. PhD dissertation, William and Mary, Department of Physics 108. Pivot to Video: Inside NBC’s Social Media Strategy for the 2018 Winter Games 109. Dillinger A, Everything you need to know about Watching the NFL on Twitter, https://www. dailydot.com/debug/how-to-watch-nfl-games-on-twitter/ 110. Bradshaw S, Howard PN (2018) The global organization of social media disinformation campaigns. J Int Affairs 71:23–35 111. Keller FB et al (2019) Political astroturfing on Twitter: how to coordinate a disinformation campaign. Polit Commun 1–25 112. Pierri F et al (2019) Investigating Italian disinformation spreading on Twitter in the context of 2019 European elections, arXiv:1907.08170 113. Yao Y et al (2017) Automated crowdturfing attacks and defenses in online review systems. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, pp 1143–1158 114. Zannettou S et al (2019) Disinformation warfare: Understanding state-sponsored trolls on Twitter and their influence on the web. In: Companion proceedings of the 2019 World Wide Web Conference, pp 218–226 115. Starbird K and Palen L (2012) (How) will the revolution be retweeted?: information diffusion and the 2011 Egyptian uprising. In: Proceedings of the acm 2012 conference on computer supported cooperative work, pp 7–16 116. Xiong F et al (2012) An information diffusion model based on retweeting mechanism for online social media. Phys Lett A 376(30–31):2103–2108 117. Zannettou S et al (2017) The web centipede: understanding how web communities influence each other through the lens of mainstream and alternative news sources. In: Proceedings of the 2017 internet measurement conference, pp 405–417 118. Shao S et al (2017) The spread of fake news by social bots. 96:104 arXiv:1707.07592 119. Woolley SC (2016) Automating power: social bot interference in global politics. First Monday 21(4) 120. Woolley SC (2020) The reality game: how the next wave of technology will break the truth. PublicAffairs, ISBN, p 9781541768246 121. Schneier B (2020) Bots are destroying political discourse as we know it. The Atlantic 122. Ferrara E et al (2016) The rise of social bots. Commun ACM 59(7):96–104 123. Shao C et al (2018) The spread of low-credibility content by social bots. Nat Commun 9(1):4787 124. Mueller RS (2019) Report on the investivation into Russian interference in teh 2016 Presidental Election. US Department of Justice, Washington, DC 125. Bessi A, Ferrara E (2016) Social bots distort the 2016 US Presidential Election online discussion. First Monday 21(11–7) 126. Linvill DL et al (2019) "The Russians are Hacking my Brain!" Investigating Russia’s Internet Research Agency Twitter tactics during the 2016 United states Presidential Campaign. Computers in Human Behavior 127. Subrahmanian VS et al (2016) The DARPA Twitter bot challenge. Computer 49(6):38–46 128. Minnich A et al (2017) Botwalk: efficient adaptive exploration of Twitter bot networks. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 467–474
346
M. K. Hinders and S. L. Kirn
129. Varol O et al (2017) Online human-bot interactions: detection, estimation, and characterization. In: Eleventh international AAAI conference on web and social media 130. Kudugunta S, Ferrara E (2018) Deep neural networks for bot detection. Inf Sci 467:312–322 131. Yang K et al (2019) Scalable and generalizable social bot detection through data selection. arXiv:1911.09179 132. Davis CA et al (2016) Botornot: a system to evaluate social bots. In: Proceedings of the 25th international conference companion on the World Wide Web, pp 273–274 133. Yang K et al (2019) Arming the public with artificial intelligence to counter social bots. Hum Behav Emerg Technol 1(1):48–61 134. Cresci S et al (2017) The paradigm-shift of social spambots: evidence, theories and tools for the arms race. In: Proceedings of the 26th international conference on World Wide Web companion, pp 963–972 135. Cresci S et al (2017) Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Trans Dependable Secure Comput 15(4):561–576 136. Cresci S et al (2019) On the capability of evolved spambots to evade detection via genetic engineering. Online Soc Netw Media 9:1–16 137. Stradbrooke S (2018) US supreme court rules federal sports betting ban is unconstitutional. Calvin Ayre.com. https://calvinayre.com/2018/05/14/business/us-supreme-courtrules-paspa-sports-betting-ban-unconstitutional/. Accessed 14 May 2018